Supplementary Information for
Fengkai Zhang and Zhongming Zhao
"The influence of neighboring nucleotide composition on single
nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with
human SNPs"
Introduction
Supplementary materials for the paper "The influence of neighboring
nucleotide composition on single nucleotide polymorphisms (SNPs) in the
mouse genome and its comparison with human SNPs" are provided in this page,
including Perl programs and instructions, dbSNP original SNP data (fasta
format), reformatted
data, and some result files. We run our Perl programs in Windows XP and Perl
package
5.8. The final tables and figures were made by Microsoft Excel.
Reformat fasta data
Reformat the original SNP data for further analysis.
Data used in this study were downloaded from NCBI dbSNP database
(ftp://ftp.ncbi.nih.gov/snp/).
The database updates approximately once every two months, so it may be
difficult to track the old version data. The original data we used (Build 119)
can be found here (human119_rs_fasta
and mouse119_rs_fasta). All data files
should be uncompressed and put in a folder to run our Perl programs, for
example: /fasta.
For mouse SNP data:
Program 1 :
snp_reformat_v4_bat.pl
Program 2 : snp_reformat_v3.pl
For human SNP data:
Program 1: snp_reformat_300.pl
Program 2:
snp_reformat_300_bat.pl
Program 2 is called by program 1. Two arguments are passed to program 1: input_path and output_path.
One example to run the programs is
perl -w snp_reformat_v4_bat.pl \fasta \out
It keeps the same file names for reformatted files from the input
files, but changes the file type to ".out" (mouse)
and ".300"(human) from original ".fas".
Neighboring nucleotide composition
Reformatted data generated above.
For mouse SNP data:
Program 3:
snp_snp_analysis_all_1more_pos.pl
Program 4:
snp_snp_analysis_all_1more_pos_bat_ver2.pl
Program 5:
snp_snp_analysis_all_1more_pos_combine_ver2.pl
For human SNP data:
Program 3: snp_snp_analysis_all_1more_pos_300.pl
Program 4: snp_snp_analysis_all_1more_pos_300_bat.pl
Program 5: snp_snp_analysis_all_1more_pos_300_combine.pl
Program 4 automatically calls the program 3, which analyzes one file (i.e., one chromosome)
each time. One example to run program 4 is
perl -w
snp_snp_analysis_all_1more_pos_bat_ver2.pl \out \more
Program 5 joins the results of each chromosome
generated by program 4 (mouse and
human). The final result file is 'all_rs_chs.more' (mouse
and human). One example to
run program 5 is
perl -w
snp_snp_analysis_all_1more_pos_bat_ver2.pl \more \combine
Annotation

Supplementary tables and figure
Supplementary table 1: Proportion of
neighboring nucleotides
Supplementary table 2: Proportion bias
of neighboring nucleotides
Supplementary fig. 1: Linear
correlation between the G+C content difference from the mouse genome
average and the proportion of bias for nucleotide C at the -1 site (A)
and G at the +1 site (B) observed on each chromosome.
Contact
If you have any question, please contact:
Zhongming Zhao
Email: zzhao@vcu.edu Phone:
804-828-8129
Fengkai Zhang
Email: fzhang@vcu.edu Phone:
804-828-9710
|