Bioinformatics (2015/2016)

Practical 6

Perl (5)

Aims

Objectives

After this practical you will:

Exercises

  1. Try out some of the scripts from the lecture. Some of these are in directory /chalmers/users/kemp/MVE360/lecture6/.

  2. Write a Perl program that predicts whether a short stretch of genomic sequence comes from a CpG island by summing the log likelyhood ratios of transition probabilities between every pair of consecutive nucleotides in the sequence. Test your program with the sequences in files /chalmers/users/kemp/MVE360/practical6/test_seq1 and /chalmers/users/kemp/MVE360/practical6/test_seq2.
    Your program should give the values -12.866 and 49.275 for these data files.

    For your convenience, the file /chalmers/users/kemp/MVE360/practical6/cpg_islands.pl defines an associative array containing log likelyhood ratios of transition probabilities for dinucleotides in putative human CpG islands, and has code for reading a FASTA format sequence into a string variable.

    1. Modify the Perl program written in Question 2 so that it can be used to predict CpG islands in long genomic sequences by summing the log likelyhood ratios of transition probabilities between every pair of consecutive nucleotides in a sliding window. For each window position, print out the position and the score, so that the scores can be plotted on a graph using the gnuplot program.
    2. The human alpha globin gene cluster located on chromosome 16 contains five putative CpG islands (NCBI entry). Download the sequence of this region in FASTA format, and use the program written in part (a) to predict the locations of CpG islands.

      Plot a graph showing the log-odds score for a sliding window (you can use the gnuplot program for this).

      unix> ./cpg_islands.pl file.fasta > outfile
      unix> gnuplot
      gnuplot> plot "outfile" with lines
      gnuplot> exit
      unix>
      

      Experiment with different window sizes (e.g. 500 nucleotides).

      Compare the results obtained using your program with the annotations in the NCBI data file (search for "CpG" within the NCBI entry).

  3. Start on Practical 7.

What to demonstrate or hand in

Demonstrate your solutions to exercise 3.

Ensure that your names are included in a comment in your program.


Last Modified: 1 March 2016 by Graham Kemp