Graham Kemp's project pages

Master's thesis project suggestion

Learning from data to build better protein models

Background

Knowing a protein's three-dimensional structure can give insights into the molecular basis for its biological function, and improve our understanding of disease processes. Because experimental methods for determining protein structures are time consuming and expensive there is a demand for computational methods that can help predict a protein's structure.

Models of protein structures can be built using a dynamic programming algorithm that constructs longer fragments from pairs of shorter ones [1].

In our current implementation we build models starting with very small "building blocks". We believe that better models could be build if we start with longer fragments that are taken from a large collection of known protein structures.

Zipping and assembly data structure

Figure 1: Graphical presentation of how a long fragment is constructed from shorter ones using a dynamic programming approach. A long fragment spanning positions 1-68 is modelled by combining a fragment spanning positions 1-63 and a fragment spanning positions 64-68. A fragment spanning positions 7-12 is modelled by combining a fragment spanning positions 7-9 and a fragment spanning positions 10-12. And so on.

Project description

The aim of this project is to make intelligent use of the vast amount of structural information that is available today in the Protein Data Bank, and thus build better quality models.

Our hope is that large scale data analysis will uncover patterns that can guide the selection of longer molecular fragments that can be used when building protein models.

References

[1] Wånggren, M., Billeter, M. and Kemp, G.J.L. (2016) Computational protein modelling based on limited sets of constraints. In Proceedings of the 12th International Workshop on Constraint-Based Methods for Bioinformatics, pp. 99-113.

Special prerequisites

The course "Computational methods in bioinformatics" (Chalmers: TDA507, GU: DIT741) is recommended, but not required.

Suggestion author

This project is suggested by Graham Kemp.
Last Modified: 28 October 2020 by Graham Kemp