During part a) of the practical the emphasis was on protein sequence
retrieval and analysis. We will now slowly turn towards protein
structure and focus on what can be deduced on a protein's structure
based on it's sequence. Specifically, we will predict the structure of
a small protein based on its sequence similarity to another protein,
with known structure.
We are going to predict the structure of the alpha-dendrotoxin from
the green mamba snake. This is the toxin contained in the venom of the
green mamba that endangers the prey after a bite.
First, we will extract the toxin sequence from the UniProt
database. Search for "alpha-dendrotoxin" after selecting "UniProtKB". Click on the required sequence
(it should be the first one listed in the UniProtKB database: IVBI_DENAN (P00980)), and on the
the top of page, click on Format, right-click the 'Text', and save the swiss-prot file to a local file
called "venom.swissprot". Also save the sequence in FASTA format
(top right) in a file like venom.fasta or similar, for later use.
As discussed in the lectures, a protein's sequence (primary structure)
can be used as a basis for a prediction of its secondary
structure. The principle of such methods is based on the fact that
different amino acids and amino acid combinations have different
preferences for different types of secondary structure. Alanine, for
example is often found in alpha helices, whereas prolines are known to
destabilise helices. Automated procedures exist that have optimised
prediction algorithms against a databank of proteins with known
structures. One such prediction program is available as an online server:
JPRED secondary structure prediction server. Submit (copy&paste) the venom
sequence (letter code) in the main window and do not forget to
click the checkbox under "Skip searching PDB before prediction" before
hitting the 'Run' button. The server may take some time to complete,
after which the prediction is presented. View the results in HTML format.
You'll notice that the JPRED server first carried out a multiple
sequence alignment before presenting the secondary structure
Why do you think this is? answer.
The prediction is presented near the bottom of the window, in the line starting with "Jnet". A dash (-) stands for unstructured (i.e. neither helix nor sheet), E stands for extended, or sheet, and H stands for helix. As you can see, the server predicts the protein to start from the N-terminus with an unstructured loop, followed by two beta strands and a short helix.
Now that we have the sequence of our protein of interest, we need a
suitable template structure of a homologous protein on the basis of
which we can build a model of the venom structure. For this, we visit
the Protein Data Bank. The
protein we're going to use as a template is the bovine (cow)
pancreatic trypsin inhibitor. In the
search field, search for "trypsin inhibitor bovine". Among the search
results select "4PTI" (or search for it directly), and from the main 4PTI window, select
"Download/Display File". On the Download File menu,
select the corresponding "PDB" format and no compression.
You should be prompted for a location where
to download the file "4PTI.pdb".
Have a look at the structure by starting rasmol. In a unix window, on the command prompt, type:
Or, if rasmol doesn't work well, try pymol instead:
on the WHAT IF prompt, load the template structure with:
and choose default values for the gap-open penalty and the gap-elongation penalty. We see that the percentage of sequence identity is only 37%. So our task is now to predict a protein structure based on a structure of which almost two-thirds of the sequence is different! First, write out the aligned sequences for later use:
makseq 1 template.pir 1
And now the second sequence.
makseq 2 model.pir 1
%getpir template.pir %getpir model.pir %bldpir 1 2 all y
Since we chose to use the "Slow but good" version of the structure prediction module, WHAT IF will take a moment to complete. As soon as the WHAT IF prompt returns, write out the model structure with:
%makmol 4PTI.pdb model.pdb 0 all 0
We will now validate our model structure using a protein structure validation server. This server compares the structure to a database of known structures and checks if the geometry (bond lengths and angles), atom contacts etc. are comparable to other protein structures. For this we will use the MolProbity server. Visit the main page and start. Upload the model (model.pdb) using the browse button and enter the main page. After the calculation is finished, press "Continue". Now we can analyse the results and view e.g. the main Ramachandran plot (click the "Analyse geometry without all-atom contacts" and after that "Run programs"). Look at the Ramachandran plot, in either kinemage or PDF format. The Ramachandran depicts the backbone torsion angles plus contour lines depicting the most favoured regions (as found for other proteins). As can be seen, all residues are located in the favoured regions so there are no outliers to worry about. Also check some of the other options and look for possible anomalies in the model structure. Note that such tools can be extremely useful for identifying possible errors in model structures (or in experimentally determined structures), but that the real hard test for our model structure is the comparison to its x-ray structure. Therefore, we will now download the true structure from the Protein Data Bank. The entry is called 1DTX.pdb. Retrieve it from the server as we did before and download it to your local account. View the structure with
Comparing the model structure with the x-ray structure is easiest with the two structures superimposed, such that we can compare atom by atom where the main differences are located. This can be done with the program gmx confrms. This program needs an additional file, the generation of which would go beyond the scope of this course, which can be obtained here (right click and "Save link as..." if the file is just shown in the browser. Otherwise copy&paste the contents to a file called index.ndx). Run gmx confrms with the following options:
gmx confrms -f1 1DTX.pdb -f2 model.pdb -n1 index.ndx -o fit_whatif_xray.pdb
Question: Does that mean that our model is good or is that really a large deviation? answer.
gmx confrms has written a PDB file with both structures superimposed:
Now, for comparison, we are going to build a model using an internet server, the SWISS-MODEL server. Note that this server requires a working E-mail address. Put your E-mail address in the specified field, provide a name and title, and paste the sequence of the snake venom in the sequence window (or use the SWISS-PROT access code: P00980). Before hitting the "Submit" button, scroll down, and next to the "Use a specific template", specify the structure of bovine pancreatic trypsin inhibitor with "4pti" chain "A" as template. Now, submit the request by hitting the "submit" button. Depending on the load of the server, it may take a couple of minutes for the model to finish. Actually, you may receive multiple emails, with a status of your request. The last E-mail should contain the coordinates in PDB format as an attachment, but you can also download them immediately from the results page. If you do not receive this E-mail within a couple of minutes, you may retrieve the coordinates here and an index file here. Assuming your model is called "swissmodel.pdb", superimpose the coordinates to the correct structure:
gmx confrms -f1 1DTX.pdb -f2 swissmodel.pdb -n1 index_swiss.ndx -o fit_swiss_xray.pdb
gmx confrms -f1 model.pdb -f2 swissmodel.pdb -n1 index_whatif.ndx -o fit_swiss_whatif.pdb
Go back to Contents