Supplementary Information for

Fengkai Zhang and Zhongming Zhao

"The influence of neighboring nucleotide composition on single nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with human SNPs"


Introduction

Supplementary materials for the paper "The influence of neighboring nucleotide composition on single nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with human SNPs" are provided in this page, including Perl programs and instructions, dbSNP original SNP data (fasta format), reformatted data, and some result files. We run our Perl programs in Windows XP and Perl package 5.8. The final tables and figures were made by Microsoft Excel.


Reformat fasta data

  • Task:

Reformat the original SNP data for further analysis.

  • Source data:

Data used in this study were downloaded from NCBI  dbSNP database (ftp://ftp.ncbi.nih.gov/snp/). The database updates approximately once every two months, so it may be difficult to track the old version data. The original data we used (Build 119) can be found here (human119_rs_fasta and mouse119_rs_fasta). All data files should be uncompressed and put in a folder to run our Perl programs, for example: /fasta.

  • Programs:

For mouse SNP data:

Program 1 : snp_reformat_v4_bat.pl

Program 2 : snp_reformat_v3.pl

For human SNP data:

Program 1: snp_reformat_300.pl

Program 2: snp_reformat_300_bat.pl

  • Notes for programs:

Program 2 is called by program 1. Two arguments are passed to program 1: input_path and output_path. One example to run the programs is

perl -w snp_reformat_v4_bat.pl \fasta \out

It keeps the same file names for reformatted files from the input files, but changes the file type to ".out" (mouse) and ".300"(human) from original ".fas".


Neighboring nucleotide composition

  • Source data:

Reformatted data generated above.

  • Programs:

For mouse SNP data:

Program 3: snp_snp_analysis_all_1more_pos.pl

Program 4: snp_snp_analysis_all_1more_pos_bat_ver2.pl

Program 5: snp_snp_analysis_all_1more_pos_combine_ver2.pl

For human SNP data:

Program 3: snp_snp_analysis_all_1more_pos_300.pl

Program 4: snp_snp_analysis_all_1more_pos_300_bat.pl

Program 5: snp_snp_analysis_all_1more_pos_300_combine.pl

  • Notes for programs:

Program 4 automatically calls the program 3, which analyzes one file (i.e., one chromosome) each time. One example to run program 4 is

perl -w snp_snp_analysis_all_1more_pos_bat_ver2.pl \out \more

Program 5 joins the results of each chromosome generated by program 4 (mouse and human). The final result file is 'all_rs_chs.more' (mouse and human). One example to run program 5 is

perl -w snp_snp_analysis_all_1more_pos_bat_ver2.pl \more \combine


Annotation


Supplementary tables and figure

Supplementary table 1: Proportion of neighboring nucleotides

Supplementary table 2: Proportion bias of neighboring nucleotides

Supplementary fig. 1: Linear correlation between the G+C content difference from the mouse genome average and the proportion of bias for nucleotide C at the -1 site (A) and G at the +1 site (B) observed on each chromosome.

 


Contact

If you have any question, please contact:

Zhongming Zhao

Email: zzhao@vcu.edu
Phone: 804-828-8129

Fengkai Zhang

Email: fzhang@vcu.edu
Phone: 804-828-9710

 

Last updated: June 25, 2004