Bioinformatics (2011/2012)

Practical 1

Pairwise sequence alignment

Aims

• To reinforce the basic concepts of pairwise sequence alignment described in the lectures.
• To give practice in C or Java programming.

Objectives

After this practical you will:

• understand how filtering can reduce the noise in a dotplot;
• understand how a dynamic programming algorithm finds an optimal pairwise sequence alignment;
• understand the difference between global and local alignment algorithms;
• be familiar with C or Java programs for pairwise sequence alignment.

Exercises

1. Copy the example programs from directory /chalmers/users/kemp/MVE360/practical1.
Compile and run these programs.

2. The program dotplot.c (or Dotplot.java) prints a letter at (row i, column j) if the character at position i in the first sequence matches the character at position j in the second sequence.
Modify the program dotplot.c (or Dotplot.java) so that letters are only printed if the characters at positions i+1 and j+1 also match. Observe how this reduces the noise in the dotplot.

3. Modify the program global_alignment.c (or GlobalAlignment.java) so that it prints out the values in the trace matrix.

4. Modify the program global_alignment.c (or GlobalAlignment.java) so that an extra line out output is printed between the two aligned sequence, indicating exact matches with the character "|", e.g.

```    AT-CGAT
|| || |
ATACG-T
```
5. Modify the program global_alignment.c (or GlobalAlignment.java) so that the percent identity between the two sequences is written out.
Add a comment to your program explaining how you have decided to calculate the percent identity.

6. Copy the program global_alignment.c (or GlobalAlignment.java) to the file local_alignment.c (or LocalAlignment.java). Modify this program so that it implements the Smith-Waterman algorithm for finding an optimal local alignment.
Test your program with the sequences "PAWHEAE" and "HDAGAWGHEQ".

7. Extra questions, if you have time

8. Modify the program global_alignment.c (or GlobalAlignment.java) so that it counts the total number of optimal alignments for the two sequences.
Test your program with the sequences "ATTA" and "ATTTTA".

9. Copy the program global_alignment.c (or GlobalAlignment.java) to the file levenshtein.c (or Levenshtein.java). Modify this program so that it calculates the Levenshtein distance (edit distance) between the two sequence.

10. Five different ways of calculating the percent identity are mentioned in the lecture handout.
Modify the program global_alignment.c (or GlobalAlignment.java) so that all of these are calculated.

11. Modify the program global_alignment.c (or GlobalAlignment.java) so that it writes out all of the optimal alignments for the two sequences.

What to demonstrate or hand in

Demonstrate your versions of these programs with suitable test data:

• dotplot.c (or Dotplot.java) with the modification described in exercise 2;
• global_alignment.c (or GlobalAlignment.java) with the modifications described in exercises 3, 4 and 5;
• local_alignment.c (or LocalAlignment.java) as described in exercise 6.

Hints

Some hints are available if you need them.