Computational methods in bioinformatics (2016-2017)

# Practical 1

## Pairwise sequence alignment

### Aims

• To reinforce the basic concepts of pairwise sequence alignment described in the lectures.
• To give practice in C or Java programming.

### Objectives

After this practical you will be able to:

• implement a dynamic programming algorithm to find optimal pairwise sequence alignments;
• describe difference between global and local alignment algorithms;
• calculate the Hamming distance and Levenshtein distance between a pair of sequences.

### Exercises

Copy the example programs from directory /chalmers/users/kemp/TDA507/practical1 (also available online).
Compile and run these programs.

1. Modify the program global_alignment.c (or GlobalAlignment.java) so that an extra line of output is printed between the two aligned sequence, indicating exact matches with the character "|", e.g.

AT-CGAT
|| || |
ATACG-T

2. Modify the program global_alignment.c (or GlobalAlignment.java) so that the percent identity between the two sequences is written out.
Add a comment to your program explaining how you have decided to calculate the percent identity.

3. Modify the program global_alignment.c (or GlobalAlignment.java) to output the Hamming distance between the aligned sequence strings.

4. Copy the program global_alignment.c (or GlobalAlignment.java) to the file local_alignment.c (or LocalAlignment.java). Modify this program so that it implements the Smith-Waterman algorithm for finding an optimal local alignment.
Test your program with the sequences "PAWHEAE" and "HDAGAWGHEQ". Check your results with the online program.

5. Copy the program global_alignment.c (or GlobalAlignment.java) to the file levenshtein.c (or Levenshtein.java). Modify this program so that it calculates the Levenshtein distance (the edit distance) between the two sequenceis.

6. Modify the program global_alignment.c (or GlobalAlignment.java) so that it counts the total number of optimal alignments for the two sequences.
Test your program with the sequences "ATTA" and "ATTTTA".

7. Modify the program global_alignment.c (or GlobalAlignment.java) so that it writes out all of the optimal alignments for the two sequences.

### What to submit

You should submit your solutions via the Fire system.

Everyone should do questions 1, 2, 3 and 4. Those aiming for a higher grade should also attempt questions 5, 6 and 7.

Test your programs with suitable test data. Upload the following to the Fire system:

• global_alignment.c (or GlobalAlignment.java)

• local_alignment.c (or LocalAlignment.java)

• levenshtein.c (or Levenshtein.java)

• A (plain text or PDF) file that shows the result of compiling and running your programs.