Bioinformatics (2015/2016)

Lecture GK-6

More on Perl and sequence alignment

Aims

To present more examples of Perl programmes.
To describe methods for constructing multiple sequence alignments.
To introduce suffix tries, suffix trees, suffix arrays, the Burrows-Wheeler Matrix (BWM), the Burrows-Wheeler Transform (BWT), and the relationships between these.

Objectives

After this lecture you will be able to:

use Perl together with other Unix programs, passing input/output to/from your Perl program from/to other programs;
describe and apply the sum of pairs method for scoring multiple alignments;
discuss how multi-dimensional dynamic programming can yield an optimal multiple sequence alignment;
describe the Feng-Doolittle algorithm for progressive multiple sequence alignment;
draw a suffix tree for a given string;
find the suffix array for a given string;
describe how the Burrows-Wheeler Matrix and Burrows-Wheeler Transform are derived;
describe how the Burrows-Wheeler Matrix and the initial string can be reconstructed from the Burrows-Wheeler Transform.

Supplementary Material

The lecture handout, featuring some of the lecture slides, is available on-line (one per page, four per page).

The Feng-Doolittle algorithm for progressive multiple sequence alignment is described in:

Feng, D.F. and Doolittle, R.F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25, 351-60 (Journal web site)

Wikipedia entry for Suffix tree

Example showing suffix array, the Burrows-Wheeler Matrix, the Burrows-Wheeler Transform, and reconstruction of the Burrows-Wheeler Matrix and the initial string from the Burrows-Wheeler Transform.

Last Modified: 24 February 2016 by Graham Kemp