Structural Bioinformatics (2011/2012)
Structural Bioinformatics
- an applied algorithms course
"Structural bioinformatics" (TDA506) is an applied algorithms course that is
relevant for computing science students, particularly those in the Algorithms
track of the CSALL programme.
Understanding the algorithms that are used in addressing important problems
in structural bioinformatics is an important part of the course.
Several Chalmers computing science students with an interest in algorithms
have chosen to do Masters projects in this area.
The following list gives examples of some of the tasks that are performed in
structural bioinformatics, and the algorithms that can be used in addressing
these. This gives a flavour of the algorithmic content of the course, and
why it is an appropriate course for students in the Algorithms track.
Domain assignment
-
a graph partitioning problem, minimum cut.
-
solutions featured include heuristic Kernighan-Lin algorithm,
and an algorithm that uses the fact that, in this application, the
nodes in the graph are connected in sequence (along a protein chain).
-
also introduce Voronoi diagrams, used here in calculating the weights
on the edges between nodes in the graph.
Macromolecular structure determination by X-ray crystallography
-
includes 3D skeletonisation algorithm, based on image processing method.
Macromolecular structure determination by NMR spectroscopy
-
clustering (with ensembles of conformations).
-
identifying cores (rigid substructures) across an ensemble of structures.
Structural alignment
-
Finding correspondences between two subsets of points in 3D that
minimises the root mean square deviation.
-
dynamic programming is used in one approach.
-
sub-matrix matching, with branch-and-bound algorithm used in another.
-
fast heuristic method using geometric hashing.
-
least squares optimisation.
Comparative modelling
-
constraint logic programming approach to side-chain modelling.
-
maximum likelihood.
-
statistical mechanics.
-
methods for efficient clash detection (finding overlapping spheres).
Molecular mechanics
-
Monte Carlo optimisation, using Metropolis criterion.
-
simulated annealing.
Fold recognition
-
algorithms based on profiles and pairwise potentials.
-
again, dynamic programming is mentioned (needed to deal with protein
chains of different length).
Protein folding
-
"Rosetta" algorithm, which uses simulated annealing with Bayesian
scoring functions.
-
another conformational search method is described; this one has
similarities to the CKY chart parsing algorithm in computational
linguistics.
-
in addition to folding "real" protein chains, a simplified toy problem
is presented, in which one searches for the self-avoiding walk on a 2D
(or 3D) lattice that maximises a scoring function.
Protein docking
-
includes method based on maximising overlap of radial density functions.
-
methods for finding (local) "knobs and holes" on molecular surfaces.
-
combinatorial docking algorithm (for large multimeric proteins).
-
fitting data sets with different resolutions; includes 3D versions of
smoothing and edge detection methods from image processing.
Membrane proteins
-
3D packing problem with symmetry constraints.
-
algorithms for identifying and characterising channels in proteins.
-
HMM representation of membrane protein folds.
Near the end of the course, a colleague from a local computational drug
design company gives some lectures. These include:
-
application of genetic algorithms (in modelling carbohydrate structures)
-
algorithms for conformational search in drug design
-
search methods for virtual screening.
At the end of this course, students with a computing background would be
well prepared for further research or employment in, for example,
computational drug design. Several previous Masters students have done
Masters projects in this area, and some have gone on to PhD positions
in this area.
In any case, students will have seen how knowledge of algorithms and
data structures can be applied to a wide range of problems in an application
area (structural biology).
Last Modified: 28 September 2011
by Graham Kemp