After this practical you will:
Try out some of the scripts from the lecture. Some of these are in directory /chalmers/users/kemp/MVE360/lecture6/.
Write a Perl program that predicts whether a short stretch of genomic
sequence comes from a CpG island by summing the log likelyhood ratios of
transition probabilities between every pair of consecutive nucleotides
in the sequence.
Test your program with the sequences in files
/chalmers/users/kemp/MVE360/practical5/test_seq1 and
/chalmers/users/kemp/MVE360/practical5/test_seq2.
Your program should give the values -12.866 and 49.275 for these data files.
For your convenience, the file /chalmers/users/kemp/UMF018/practical3/cpg_islands.pl defines an associative array containing log likelyhood ratios of transition probabilities for dinucleotides in putative human CpG islands, and has code for reading a FASTA format sequence into a string variable.
The human alpha globin gene cluster located on chromosome 16 contains five putative CpG islands (NCBI entry). Download the sequence of this region in FASTA format, and use the program written in part (a) to predict the locations of CpG islands.
Plot a graph showing the log-odds score for a sliding window (use the gnuplot program for this).
unix> ./cpg_islands.pl file.fasta > outfile unix> gnuplot gnuplot> plot "outfile" with lines gnuplot> exit unix>
Experiment with different window sizes (e.g. 500 nucleotides).
Compare the results obtained using your program with the annotations in the NCBI data file (search for "CpG" within the NCBI entry).
In answering this question you can reuse your solution to Question 3 in Practical 3.
Write a Perl program that reads a UniProt file (Swiss-Prot format) and writes out the sequences of the alpha helices. There should be one line of output for each alpha helix in the protein. The accession code of the UniProt entry (e.g. P00784) should be given as an argument on the command line, and your Perl program should retrieve the UniProt entry from the ExPASy web site using the lynx program.
unix> ./uniprot_helices.pl P00784
Write a Perl program that generalises your solution to part (a) by taking the name of the feature type of interest as the second command line argument, e.g.
unix> ./uniprot_features.pl P00784 HELIX unix> ./uniprot_features.pl Q9NS75 TRANSMEM
Demonstrate your solutions to exercise 3.
Ensure that your names are included in a comment in your program.