Practical 4 b): Introduction to protein structure prediction

Bert de Groot

Secondary structure prediction of alpha-dendrotoxin
Tertiary structure prediction
Homology modeling


During part a) of the practical the emphasis was on protein sequence retrieval and analysis. We will now slowly turn towards protein structure and focus on what can be deduced on a protein's structure based on it's sequence. Specifically, we will predict the structure of a small protein based on its sequence similarity to another protein, with known structure.

We are going to predict the structure of the alpha-dendrotoxin from the green mamba snake. This is the toxin contained in the venom of the green mamba that endangers the prey after a bite.

First, we will extract the toxin sequence from the UniProt database. Search for "alpha-dendrotoxin" after selecting "UniProtKB". Click on the required sequence (it should be the first one listed in the UniProtKB database: IVBI_DENAN (P00980)), and on the the top of page, click on Format, right-click the 'Text', and save the swiss-prot file to a local file called "venom.swissprot". Also save the sequence in FASTA format (top right) in a file like venom.fasta or similar, for later use.

Go back to Contents

Secondary structure prediction of alpha-dendrotoxin

As discussed in the lectures, a protein's sequence (primary structure) can be used as a basis for a prediction of its secondary structure. The principle of such methods is based on the fact that different amino acids and amino acid combinations have different preferences for different types of secondary structure. Alanine, for example is often found in alpha helices, whereas prolines are known to destabilise helices. Automated procedures exist that have optimised prediction algorithms against a databank of proteins with known structures. One such prediction program is available as an online server: the JPRED secondary structure prediction server. Submit (copy&paste) the venom sequence (letter code) in the main window and do not forget to click the checkbox under "Skip searching PDB before prediction" before hitting the 'Run' button. The server may take some time to complete, after which the prediction is presented. View the results in HTML format. You'll notice that the JPRED server first carried out a multiple sequence alignment before presenting the secondary structure prediction.

Question: Why do you think this is? answer.

The prediction is presented near the bottom of the window, in the line starting with "Jnet". A dash (-) stands for unstructured (i.e. neither helix nor sheet), E stands for extended, or sheet, and H stands for helix. As you can see, the server predicts the protein to start from the N-terminus with an unstructured loop, followed by two beta strands and a short helix.

Go back to Contents

Tertiary structure prediction

Now that we have the sequence of our protein of interest, we need a suitable template structure of a homologous protein on the basis of which we can build a model of the venom structure. For this, we visit the Protein Data Bank. The protein we're going to use as a template is the bovine (cow) pancreatic trypsin inhibitor. In the search field, search for "trypsin inhibitor bovine". Among the search results select "4PTI" (or search for it directly), and from the main 4PTI window, select "Download/Display File". On the Download File menu, select the corresponding "PDB" format and no compression. You should be prompted for a location where to download the file "4PTI.pdb".

Have a look at the structure by starting rasmol. In a unix window, on the command prompt, type:

rasmol 4PTI.pdb 

Or, if rasmol doesn't work well, try pymol instead:

pymol 4PTI.pdb 

Please note that the commands in the gray boxes can be easily transferred to the command prompt with copy-and-paste (select text by dragging the mouse over it with the left mouse button pressed, and paste by pressing the middle mouse button).

We now see a so-called wireframe representation of the protein structure: atoms (with different colors for the different chemical elements: grey for carbon; red for oxygen and blue for nitrogen) are not shown directly, but the bonds between atoms are shown as lines. Under "display", also try other representations such as "sticks", "spacefill", "ball & stick" and "cartoons". Note that the structure starts with a long, unstructured loop, followed by a beta-hairpin (a two-stranded beta-sheet) and ends with a short alpha-helix. Exit rasmol under "file" -> "exit".

Go back to Contents

Homology modeling

Now we have everything we need to predict a tertiary structure of the alpha-dendrotoxin from the green mamba snake: its sequence and a structure of a homologous template. For building the model, we use the "WHAT IF" molecular modeling package. To start WHAT IF, type:


If this gives an error, please try to change to your home directory ("cd") and try again.

on the WHAT IF prompt, load the template structure with:

getmol 4PTI.pdb

and press enter if WHAT IF asks for a name. We first need to align the venom sequence that we've retrieved before with the structure of the bovine trypsin inhibitor structure that we've just loaded. For this, we first need the protein sequence corresponding to the protein structure that we have loaded. This can be done by WHATIF:


For residue range, type


and as output file name take:


Now enter the sequence menu:


First load the sequence that corresponds to the template structure:

getseq bpti.pir

as format, choose


Now load the sequence of the green mamba venom:

getseq venom.swissprot

and choose the Swissprot format (3). We now have both sequences loaded and can perform the alignment:


For the first sequence, choose


and for the second


and choose default values for the gap-open penalty and the gap-elongation penalty. We see that the percentage of sequence identity is only 37%. So our task is now to predict a protein structure based on a structure of which almost two-thirds of the sequence is different! First, write out the aligned sequences for later use:


And now the second sequence.


And now it is time to build the actual model:

%getpir template.pir
%getpir model.pir

Since we chose to use the "Slow but good" version of the structure prediction module, WHAT IF will take a moment to complete. As soon as the WHAT IF prompt returns, write out the model structure with:


and exit WHATIF


View the structure with

rasmol model.pdb

We will now validate our model structure using a protein structure validation server. This server compares the structure to a database of known structures and checks if the geometry (bond lengths and angles), atom contacts etc. are comparable to other protein structures. For this we will use the MolProbity server. Visit the main page and start. Upload the model (model.pdb) using the browse button and enter the main page. After the calculation is finished, press "Continue". Now we can analyse the results and view e.g. the main Ramachandran plot (click the "Analyse geometry without all-atom contacts" and after that "Run programs"). Look at the Ramachandran plot, in either kinemage or PDF format. The Ramachandran depicts the backbone torsion angles plus contour lines depicting the most favoured regions (as found for other proteins). As can be seen, all residues are located in the favoured regions so there are no outliers to worry about. Also check some of the other options and look for possible anomalies in the model structure. Note that such tools can be extremely useful for identifying possible errors in model structures (or in experimentally determined structures), but that the real hard test for our model structure is the comparison to its x-ray structure. Therefore, we will now download the true structure from the Protein Data Bank. The entry is called 1DTX.pdb. Retrieve it from the server as we did before and download it to your local account. View the structure with

rasmol 1DTX.pdb

Question: How good was the secondary structure prediction?

Comparing the model structure with the x-ray structure is easiest with the two structures superimposed, such that we can compare atom by atom where the main differences are located. This can be done with the program gmx confrms. This program needs an additional file, the generation of which would go beyond the scope of this course, which can be obtained here (right click and "Save link as..." if the file is just shown in the browser. Otherwise copy&paste the contents to a file called index.ndx). Run gmx confrms with the following options:

gmx confrms -f1 1DTX.pdb -f2 model.pdb -n1 index.ndx -o fit_whatif_xray.pdb 

(select "4" to select the protein backbone for fitting). gmx confrms prints that the overall deviation between the two structures (measured over all atoms in the protein backbone, so excluding the side chain atoms) is about 0.1 nm.

Question: Does that mean that our model is good or is that really a large deviation? answer.

gmx confrms has written a PDB file with both structures superimposed:

rasmol fit_whatif_xray.pdb

To concentrate only on the protein, and remove the ions from the rasmol view, type in the rasmol comand line:

restrict protein


color chain

The true structure is colored blue, our model structure red. As can be seen, the two structures are rather similar. Especially the backbone structure is rather well predicted by the model. Some sidechains, however, show larger deviations.

Now, for comparison, we are going to build a model using an internet server, the SWISS-MODEL server. Note that this server requires a working E-mail address. Put your E-mail address in the specified field, provide a name and title, and paste the sequence of the snake venom in the sequence window (or use the SWISS-PROT access code: P00980). Before hitting the "Submit" button, scroll down, and next to the "Use a specific template", specify the structure of bovine pancreatic trypsin inhibitor with "4pti" chain "A" as template. Now, submit the request by hitting the "submit" button. Depending on the load of the server, it may take a couple of minutes for the model to finish. Actually, you may receive multiple emails, with a status of your request. The last E-mail should contain the coordinates in PDB format as an attachment, but you can also download them immediately from the results page. If you do not receive this E-mail within a couple of minutes, you may retrieve the coordinates here and an index file here. Assuming your model is called "swissmodel.pdb", superimpose the coordinates to the correct structure:

gmx confrms -f1 1DTX.pdb  -f2 swissmodel.pdb -n1 index_swiss.ndx -o fit_swiss_xray.pdb

And select "16" and "4" when prompted for a group. View the result with:

rasmol fit_swiss_xray.pdb

Also, compare this model with the model generated by WHAT IF, downloading the file: here, and then:

gmx confrms -f1 model.pdb -f2 swissmodel.pdb -n1 index_whatif.ndx  -o fit_swiss_whatif.pdb

Select "11" and "4" when prompted for a group. View the result with:

rasmol fit_swiss_whatif.pdb

Question: How similar/different are the two models? Which of the two models would you prefer, and why?

Go back to Contents

Further references

Go back to Contents