Bioinformatics (2014/2015)

Practical 3

Perl (3)

Aims

Objectives

After this practical you will:

Exercises

  1. Try out some of the scripts from the lectures. Some of these are in directory /chalmers/users/kemp/MVE360/lecture4/.

  2. Write a Perl program that reads a UniProt file and writes out the sequences of the alpha helices. There should be one line of output for each alpha helix in the protein. The name of the UniProt file should be given as an argument on the command line. The positions of alpha helices are indicated in a UniProt file on lines that begin with "FT   HELIX".

    Test your program with UniProt files P00784.uniprot (papaya proteinase I) and P00785.uniprot (kiwi fruit actinidin). These files are in directory /chalmers/users/kemp/MVE360/practical1/.

  3. Study the program /chalmers/users/kemp/MVE360/lecture4/translate.pl

    Generalise this program so that it prints the translations of a DNA sequence in all six possible reading frames (sequence as read in, in three phases; reverse complement, in three phases).

  4. Modify the program reverse_complement.pl so that it can print the reverse complement of DNA sequences that contain nucleotide ambiguity codes ("Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences", Tables 1 and 2).

  5. Modify the program embl_orf.pl so that it prints out the translated sequence of the longest open reading frame. The output should use one-letter amino-acid residue codes, and the output should have 10 characters per line.

  6. Write a Perl program that reads a nucleotide sequence from an EMBL databank file, and finds the longest subsequence whose reverse complement is also present in the sequence.

What to demonstrate or hand in

Demonstrate your solutions to exercise 2 and 3. Check your solutions against this sample output.

Ensure that your names are included in a comment in your program.


Last Modified: 13 February 2015 by Graham Kemp