Sequence bioinformatics (2010/2011) | Graham Kemp's classes

Practical: Perl 1

Aims

Objectives

After this practical you will:

Exercises

  1. Try out some of the scripts from the lectures. Some of these are in directories /chalmers/users/kemp/UMF018/lecture1/ and /chalmers/users/kemp/UMF018/lecture2/.

  2. This question refers to the program countdown.pl from the first lecture.

    1. Copy the program countdown.pl and modify it so that it counts down from a number entered by the user in response to a prompt, e.g.

      unix> ./countdown_a.pl
      Type in a number: 3
      3...
      2...
      1...
      BOOM!
      unix>
      
    2. Copy your solution for part (a) and modify it so that it uses a "for" loop instead of a "while" loop.

    3. Copy your solution for part (b) and modify it so that it counts down from a number given as a command line argument.

      unix> ./countdown_c.pl 3
      3...
      2...
      1...
      BOOM!
      unix>
      
  3. Write a Perl program that reads a Swiss-Prot file and stores the sequence as a string in variable '$sequence', and then writes out the sequence of the protein with one line per residue. Each line of output should contain the residue's one-letter code and the residue's position in the sequence. The name of the Swiss-Prot file should be given as an argument on the command line.

    Test your program with Swiss-Prot files PAPA_CARPA (papaya proteinase I) and ACTN_ACTCH (kiwi fruit actinidin). These files are in directory /chalmers/users/kemp/UMF018/practical1/.

    unix> ./sp_sequence.pl PAPA_CARPA
    M 1
    A 2
    M 3
    ... etc.
    unix>
    
  4. Write a Perl program that reads a Swiss-Prot file and writes out the sequences of the alpha helices. There should be one line of output for each alpha helix in the protein. The name of the Swiss-Prot file should be given as an argument on the command line.

  5. Some Perl programs from Introduction to Bioinformatics by Arthur M. Lesk are in directory /chalmers/users/kemp/UMF018/lesk/.

    The program 'dotplot.pl' reads two sequences given at the end of the program file and writes out "raw" PostScript showing a dotplot comparing the sequences. Run this program, redirecting standard output to a file, and then view the resulting PostScript document (e.g. using the program gv).

  6. Write a Perl program that takes the names of two Swiss-Prot files on the command line and reads the sequences from these two files (you should be able to reuse some code from Question 3). Your program should then print pairs of numbers to standard output as follows: there should be two integers per line, separated by a space; each pair of integers indicates positions in the first and second sequences that have the same amino acid residue (e.g. if the first sequence has an "A" at position i and the second sequence has an "A" at position j, then pair "i j" should be written). For example, if the first sequence is "MGLPKSFVSM" and the second is "MAMIPSISKL" then (assuming, for simplicity, that the first position in each sequence is position "0") your program should produce the following output:

    0 0
    0 2
    2 9
    3 4
    4 8
    5 5
    5 7
    8 5
    8 7
    9 0
    9 2
    

    Test your program with Swiss-Prot files PAPA_CARPA (papaya proteinase I) and ACTN_ACTCH (kiwi fruit actinidin). These files are in directory /chalmers/users/kemp/UMF018/practical1/.

    If you write the output of your program to a file, you can then view a dotplot using the program /chalmers/users/kemp/UMF018/practical1/dotplot.tcl. This is a Tcl/Tk program; Tcl is another scripting language; Tk is a graphical toolkit that extends Tcl by adding commands for building user interfaces. To run this program, type its name and give the name of your file containing pairs of numbers as a command line argument.

Work to be handed in

Either

  1. Show your solution to Question 4 to me during the practical session on Monday 1 November 2010, or
  2. If this is not possible, you should print the program that is your solution to Question 4 and the output that is produced when you run the program with file PAPA_CARPA as input. There is an envelope marked "Sequence Bioinformatics" in the tray outside my office (room 6475, EDIT building). Put your solution into this envelope no later than 17:00 on Monday 8 November 2010.

Ensure that your names are included in a comment in your program.


Last Modified: 26 October 2010 by Graham Kemp