After this practical you will:
Copy the example programs from directory /chalmers/users/kemp/MVE360/practical2.
The program dotplot.pl is incomplete. This program should print a letter at (row i, column j) if the character at position i in the first sequence matches the character at position j in the second sequence. That is, it should produce the following output for the given values of $seq1 and $seq2:
D D O O O OO O R R O O O OO O T T H H Y H H O O O OO O D D G K I N
Complete program dotplot.pl so that it produces the desired output for any strings $seq1 and $seq2.
Modify the program dotplot.pl so that letters are only printed if the characters at positions i+1 and j+1 also match. Observe how this reduces the noise in the dotplot.
Modify the program global_alignment.pl so that an extra line out output is printed between the two aligned sequence, indicating exact matches with the character "|", e.g.
AT-CGAT || || | ATACG-T
Modify the program global_alignment.pl so that
the percent identity between the two sequences is written out.
Add a comment to your program explaining how you have decided
to calculate the percent identity.
Copy the program global_alignment.pl to the file
local_alignment.pl.
Modify this program so that it implements the Smith-Waterman algorithm
for finding an optimal local alignment.
Test your program with the sequences "PAWHEAE" and "HDAGAWGHEQ".
File substitution_matrix.pl comtains a piece of Perl code that initialises an associative array with values for a simple substitution matrix for aligning a pair of DNA sequences:
%substitution_matrix = ( "AA"=> 2, "AC" => -1, "AG"=> -1, "AT"=> -1, "CA"=> -1, "CC" => 2, "CG"=> -1, "CT"=> -1, "GA"=> -1, "GC" => -1, "GG"=> 2, "GT"=> -1, "TA"=> -1, "TC" => -1, "TG"=> -1, "TT"=> 2, );
Copy the program global_alignment.pl to the file dna.pl. Add the code for initialising the scoring matrix associative array to this file, and use score values from this associative array instead of the variables $MATCH and $MISMATCH when calculating diagonal scores.
Modify the substitution matrix to reflect that transitions are more common than transversions.
Modify the program global_alignment.pl so that it
counts the total number of optimal alignments for the two sequences.
Test your program with the sequences "ATTA" and "ATTTTA".
Copy the program global_alignment.pl to the file levenshtein.pl. Modify this program so that it calculates the Levenshtein distance (edit distance) between the two sequence.
Demonstrate your versions of these programs with suitable test data:
Ensure that your names are included in a comment in your program.
Some hints are available if you need them.