Structural Bioinformatics

- an applied algorithms course

"Structural bioinformatics" (TDA506) is an applied algorithms course that is relevant for computing science students, particularly those in the Algorithms track of the CSALL programme. Understanding the algorithms that are used in addressing important problems in structural bioinformatics is an important part of the course. Several Chalmers computing science students with an interest in algorithms have chosen to do Masters projects in this area.

The following list gives examples of some of the tasks that are performed in structural bioinformatics, and the algorithms that can be used in addressing these. This gives a flavour of the algorithmic content of the course, and why it is an appropriate course for students in the Algorithms track.

Domain assignment

a graph partitioning problem, minimum cut.
solutions featured include heuristic Kernighan-Lin algorithm, and an algorithm that uses the fact that, in this application, the nodes in the graph are connected in sequence (along a protein chain).
also introduce Voronoi diagrams, used here in calculating the weights on the edges between nodes in the graph.

Macromolecular structure determination by X-ray crystallography

includes 3D skeletonisation algorithm, based on image processing method.

Macromolecular structure determination by NMR spectroscopy

clustering (with ensembles of conformations).
identifying cores (rigid substructures) across an ensemble of structures.

Structural alignment

Finding correspondences between two subsets of points in 3D that minimises the root mean square deviation.
dynamic programming is used in one approach.
sub-matrix matching, with branch-and-bound algorithm used in another.
fast heuristic method using geometric hashing.
least squares optimisation.

Comparative modelling

constraint logic programming approach to side-chain modelling.
maximum likelihood.
statistical mechanics.
methods for efficient clash detection (finding overlapping spheres).

Molecular mechanics

Monte Carlo optimisation, using Metropolis criterion.
simulated annealing.

Fold recognition

algorithms based on profiles and pairwise potentials.
again, dynamic programming is mentioned (needed to deal with protein chains of different length).

Protein folding

"Rosetta" algorithm, which uses simulated annealing with Bayesian scoring functions.
another conformational search method is described; this one has similarities to the CKY chart parsing algorithm in computational linguistics.
in addition to folding "real" protein chains, a simplified toy problem is presented, in which one searches for the self-avoiding walk on a 2D (or 3D) lattice that maximises a scoring function.

Protein docking

includes method based on maximising overlap of radial density functions.
methods for finding (local) "knobs and holes" on molecular surfaces.
combinatorial docking algorithm (for large multimeric proteins).
fitting data sets with different resolutions; includes 3D versions of smoothing and edge detection methods from image processing.

Membrane proteins

3D packing problem with symmetry constraints.
algorithms for identifying and characterising channels in proteins.
HMM representation of membrane protein folds.

Near the end of the course, a colleague from a local computational drug design company gives some lectures. These include:

application of genetic algorithms (in modelling carbohydrate structures)
algorithms for conformational search in drug design
search methods for virtual screening.

At the end of this course, students with a computing background would be well prepared for further research or employment in, for example, computational drug design. Several previous Masters students have done Masters projects in this area, and some have gone on to PhD positions in this area. In any case, students will have seen how knowledge of algorithms and data structures can be applied to a wide range of problems in an application area (structural biology).

Last Modified: 28 September 2011 by Graham Kemp