Knowing a protein's three-dimensional structure can give insights into the molecular basis for its biological function, and improve our understanding of disease processes. Because experimental methods for determining protein structures are time consuming and expensive there is a demand for computational methods that can help predict a protein's structure.
Models of protein structures can be built using a dynamic programming algorithm that constructs longer fragments from pairs of shorter ones [1].
In our current implementation we build models starting with very small "building blocks". We believe that better models could be build if we start with longer fragments that are taken from a large collection of known protein structures.
Figure 1: Graphical presentation of how a long fragment is constructed from shorter ones using a dynamic programming approach. A long fragment spanning positions 1-68 is modelled by combining a fragment spanning positions 1-63 and a fragment spanning positions 64-68. A fragment spanning positions 7-12 is modelled by combining a fragment spanning positions 7-9 and a fragment spanning positions 10-12. And so on.
The aim of this project is to make intelligent use of the vast amount of structural information that is available today in the Protein Data Bank, and thus build better quality models.
Our hope is that large scale data analysis will uncover patterns that can guide the selection of longer molecular fragments that can be used when building protein models.
[1] Wånggren, M., Billeter, M. and Kemp, G.J.L. (2016) Computational protein modelling based on limited sets of constraints. In Proceedings of the 12th International Workshop on Constraint-Based Methods for Bioinformatics, pp. 99-113.
The course "Computational methods in bioinformatics" (Chalmers: TDA507, GU: DIT741) is recommended, but not required.