Bioinformatics (2014/2015)

# Practical 2

## Hints

1.

2.

3. There are several ways that this test can be implemented.
One way to do this is to nest another if statement inside the one that is already there.
Alternatively, you can form a conjunction of tests using the logical and operator: &&
But can you find a simpler way to modify the code to perform this new test?

4. Look at the code for printing the alignment near the end of the program. An additional line of output has to be produced. For each position in the alignment, test whether the aligned characters are the same. If they are, print "|". If they are not, print a space.

5. You should declare a variable that will be used to count the number of matches. Initialise this variable to zero. Modify the code that you wrote to answer exercise 4 so that this variable is incremented each time a match is found. Look at the lecture handout for more information on calculating the percent identity.

6. Refer to the description of the local pairwise alignment algorithm in the lecture handout.

The first row and column of the score matrix should be initialised to zero.

When filling the score matrix a score of zero should be used instead of a negative value and, in this case, the trace matrix value should be set to STOP.

When we trace back, we should start at a cell with the largest score in the score matrix. This can be done by noting the largest score found so far while we are filling the matrix, and the position where it was found. For example, declare new variable (e.g. \$maxScore) to record the largest score found so far, and two further variables (e.g. \$minI and \$minJ) to record the row and column where this largest score was found.

7. Use the on-line Global Alignment program with the sequences "ATTA" and "ATTTTA", and count the number of optimal global alignments by hand. You might find it useful to print out the output from that program, and to draw the optimal alignment paths on the score matrix.

Declare another two-dimensional matrix (e.g. optimal_paths[][]), and use this to record the total number of optimal ways of reaching each cell. The entries in the first row and column of this matrix should be initialised to 1. When filling the score matrix, it might be convenient to store the three possible scores for a cell in three separate variables. Calculate the number of optimal ways of reaching the present cell based on the number of ways of reaching the preceeding cells, and the direction(s) from which the present cell can be reached with the optimal score.

8. Read the description of Levenshtein distance, and think about how this relates to match and mismatch scores and the gap penalty in the dynamic programming algorithm for global alignment.