Structural Bioinformatics (2011/2012)

Practical GK-4

Protein domains

Aims

Objectives

After this practical you will:

Exercises

  1. Protein Data Bank entry 2CSN contains the structure of Casein Kinase-1 from Schizosaccharomyces pombe.

    1. Study the contact map for this protein kinase. Identify structural domains and contacts between elements of secondary structure.

    2. Look at the structure of 2CSN using RasMol and relate features identified in the distance map to the three-dimensional structure. You might find it useful to look at a backbone trace, and to colour segments of the main chain.

  2. Write a C program that can read a Protein Data Bank file and generate a distance map.

    A simple approach is to make use of the "dotplot.tcl" program in directory /chalmers/users/kemp/TDA506/practical4. This program reads a file with pairs of numbers (one pair of numbers per line), and plots a point for each pair of numbers.

    You can write a program that reads the coordinates of "CA" atoms in a PDB file then, for each pair of CA atoms (the CA atom in residue i and the CA atom in residue j), writes out a pair of numbers (i and j) if the distance between the CA atoms is less than a threshold. I recommend that you use the program atom_array.c from Practical 2 as a starting point (copy this file to your own filespace, and give it a suitable name, e.g. make_distance_map.c, and modify the appropriate lines in a Makefile in the same directory to refer to this new program). Modify the function read_data() so that atom records are only stored in the atom array if the atom name is " CA ". After all CA atoms have been read into the atom array, find all pairs whose separation is less than a threshold distance (e.g. 7Å).

    Run this program and redirect the output to a file, e.g.

    ./make_distance_map 2CSN.pdb > 2CSN.pairs
    

    The program /chalmers/users/kemp/TDA506/practical4/dotplot.tcl can then be used to plot the points as a distance map. To run this program, type its name and give the name of your file containing pairs of numbers as a command line argument, e.g.

    /chalmers/users/kemp/TDA506/practical4/dotplot.tcl 2CSN.pairs
    
  3. (For those who want more programming practice.)
    Write a program that can read a Protein Data Bank (assume that it contains only one chain, like Protein Data Bank entry 1CDH) and identify the residue at which the chain is most clearly partitioned into two parts/domains. Your program should use the simple scoring function that is used in the DOMAK program: (intA/extAB)*(intB/extAB)


Last Modified: 24 January 2012 by Graham Kemp