Computational methods in bioinformatics (2017-2018)

# Practical 1

## Pairwise sequence alignment

### Aims

• To reinforce the basic concepts of pairwise sequence alignment described in the lectures.

### Objectives

After this practical you will be able to:

• implement a dynamic programming algorithm to find optimal pairwise sequence alignments;
• describe difference between global and local alignment algorithms;
• calculate the Hamming distance and Levenshtein distance between a pair of sequences.

### Exercises

Some programs that can be used as a starting point are available online. The task descriptions below refer to the C program "global_alignment.c", but if you want to use Java you are welcome to use the program "GlobalAlignment.java" instead, or if you want to use Perl you are welcome to use the program "global_alignment.pl" instead, or you can write and use an equivalent program in your preferred programming language.

1. Modify the program global_alignment.c (or equivalent) so that an extra line of output is printed between the two aligned sequence, indicating exact matches with the character "|", e.g.

```    AT-CGAT
|| || |
ATACG-T
```
2. Modify the program global_alignment.c (or equivalent) so that the percent identity between the two sequences is written out.
Add a comment to your program explaining how you have decided to calculate the percent identity.

3. Modify the program global_alignment.c (or equivalent) to output the Hamming distance between the aligned sequence strings.

4. Copy the program global_alignment.c (or equivalent) to the file local_alignment.c (or equivalent). Modify this program so that it implements the Smith-Waterman algorithm for finding an optimal local alignment.
Test your program with the sequences "PAWHEAE" and "HDAGAWGHEQ". Check your results with the online program.

5. Copy the program global_alignment.c (or equivalent) to the file levenshtein.c (or equivalent). Modify this program so that it calculates the Levenshtein distance (the edit distance) between the two sequenceis.

6. Modify the program global_alignment.c (or equivalent) so that it counts the total number of optimal alignments for the two sequences.
Test your program with the sequences "ATTA" and "ATTTTA".

7. Modify the program global_alignment.c (or equivalent) so that it writes out all of the optimal alignments for the two sequences.

### What to submit

You should submit your solutions via the Fire system new Fire system.

Everyone should do questions 1, 2, 3 and 4. Those aiming for a higher grade should also attempt questions 5, 6 and 7.

Test your programs with suitable test data. Upload the following to the Fire system:

• global_alignment.c (or equivalent)

• local_alignment.c (or equivalent)

• levenshtein.c (or equivalent)

• A (plain text or PDF) file that shows the result of running your programs.