Bioinformatics (2015/2016)
Lecture GK-6
More on Perl and sequence alignment
Aims
-
To present more examples of Perl programmes.
-
To describe methods for constructing multiple sequence alignments.
-
To introduce suffix tries, suffix trees, suffix arrays,
the Burrows-Wheeler Matrix (BWM), the Burrows-Wheeler Transform (BWT),
and the relationships between these.
Objectives
After this lecture you will be able to:
-
use Perl together with other Unix programs, passing input/output
to/from your Perl program from/to other programs;
-
describe and apply the sum of pairs method for scoring multiple alignments;
-
discuss how multi-dimensional dynamic programming can yield an
optimal multiple sequence alignment;
-
describe the Feng-Doolittle algorithm for progressive multiple
sequence alignment;
-
draw a suffix tree for a given string;
-
find the suffix array for a given string;
-
describe how the Burrows-Wheeler Matrix and Burrows-Wheeler Transform
are derived;
-
describe how the Burrows-Wheeler Matrix and the initial string can be
reconstructed from the Burrows-Wheeler Transform.
Supplementary Material
The lecture handout, featuring some of the lecture slides, is
available on-line
(one per page,
four per page).
The Feng-Doolittle algorithm for progressive multiple sequence alignment is
described in:
-
Feng, D.F. and Doolittle, R.F. (1987)
Progressive sequence alignment as a prerequisite to correct phylogenetic trees.
J. Mol. Evol., 25, 351-60
(Journal
web site)
Wikipedia entry for
Suffix tree
Example showing suffix array, the
Burrows-Wheeler Matrix, the Burrows-Wheeler Transform, and
reconstruction of the Burrows-Wheeler Matrix and the initial string from
the Burrows-Wheeler Transform.
Last Modified: 24 February 2016
by Graham Kemp