After this practical you will:
Try out some of the scripts from the lecture. Some of these are in directory /chalmers/users/kemp/MVE360/lecture6/.
Write a Perl program that predicts whether a short stretch of genomic
sequence comes from a CpG island by summing the log likelyhood ratios of
transition probabilities between every pair of consecutive nucleotides
in the sequence.
Test your program with the sequences in files
/chalmers/users/kemp/MVE360/practical6/test_seq1 and
/chalmers/users/kemp/MVE360/practical6/test_seq2.
Your program should give the values -12.866 and 49.275 for these data files.
For your convenience, the file /chalmers/users/kemp/MVE360/practical6/cpg_islands.pl defines an associative array containing log likelyhood ratios of transition probabilities for dinucleotides in putative human CpG islands, and has code for reading a FASTA format sequence into a string variable.
The human alpha globin gene cluster located on chromosome 16 contains five putative CpG islands (NCBI entry). Download the sequence of this region in FASTA format, and use the program written in part (a) to predict the locations of CpG islands.
Plot a graph showing the log-odds score for a sliding window (you can use the gnuplot program for this).
unix> ./cpg_islands.pl file.fasta > outfile unix> gnuplot gnuplot> plot "outfile" with lines gnuplot> exit unix>
Experiment with different window sizes (e.g. 500 nucleotides).
Compare the results obtained using your program with the annotations in the NCBI data file (search for "CpG" within the NCBI entry).
Start on Practical 7.
Demonstrate your solutions to exercise 3.
Ensure that your names are included in a comment in your program.