After this practical you will:
Try out some of the scripts from the lecture. Some of these are in directory /chalmers/users/kemp/UMF018/lecture3/.
Modify the program reverse_complement.pl so that it can print the reverse complement of DNA sequences that contain nucleotide ambiguity codes ("Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences", Tables 1 and 2).
Study the program /chalmers/users/kemp/UMF018/lesk/translate.pl
Generalise this program so that it prints the translations of a DNA sequence in all six possible reading frames (sequence as read in, in three phases; reverse complement, in three phases).
This question is based on the lecture on RNA bioinformatics.
Write a Perl program that reads a DNA sequence from a file, and finds whether it contains a subsequence that matches the following PatScan pattern:
p1=3...3 YYAGG p2=3...3 GNRA ~p2 AGCAG ~p1
Your program should print the subsequence that matches. Test your program with the sequence in file /chalmers/users/kemp/UMF018/practical2/dna
This question is based on the lecture on RNA bioinformatics.
Write a Perl program that reads a set of aligned RNA sequences from a file specified on the command line and finds positions that are covarying to maintain Watson-Crick complementarity. You can assume that there is one sequence per line, and that the sequences contain only the characters a, c, g and u (no white space, gaps or ambiguity codes).
You can test your program with multiple alignment files 'ma1' and 'ma2' in directory /chalmers/users/kemp/UMF018/practical2.
Write a Perl program that reads a nucleotide sequence from an EMBL databank file, and finds the longest subsequence whose reverse complement is also present in the sequence.
Modify the program embl_orf.pl so that it prints out the translated sequence of the longest open reading frame. The output should use one-letter amino-acid residue codes, and the output should have 10 characters per line.
Study the program
/chalmers/users/kemp/UMF018/lesk/assemble.pl
Try running this program with different input strings.
Try this program with the following input fragments:
rs International Mas onal Mas ernational Masters Prog me in Bio Bioinformatics Chalmers Interna rs Programme in Bio
Now try the program with the same fragments in a different order:
Chalmers Interna rs International Mas onal Mas ernational Masters Prog rs Programme in Bio me in Bio Bioinformatics
Can you explain the difference in the program's output? Try to modify the program so that it assembles fragments correctly regardless of the order in which they appear in the input stream.
Either:
Ensure that your names are included in a comment in your program.