There are two main techniques for solving protein structures: x-ray crystallography and Nuclear Magnetic Resonance (NMR). As can be seen from the current PDB holdings, more than 77,000 protein structures have been solved so far, and are available from the Protein Data Bank. About 12% of these are NMR structures, the rest are x-ray crystallographic structures. As can also be seen on the PDB holdings list, the number of solved proteins grows even faster, due to advancements in structure determination techniques. Nevertheless, the number of known protein sequences is almost an order of magnitude larger (currently about 533,000, as available from the UniProtKB/Swiss-Prot database).
Experimentally solved protein structures are stored at the Protein Data Bank, from which individual protein structures can be retrieved as so-called PDB files. Before we will turn to the structure determination itself, let us have a closer look at a typical PDB file, to see what can be learned about the background of the structure (experimental conditions etc.) and the structural quality (the resolution, coordinate uncertainty). We will focus on PDB entry 1DWR (an x-ray structure of myogloblin complexed with carbon monoxid) as it can be downloaded from the Protein Data bank.
PDB file format
The initial lines of a PDB entry contain information on:
HEADER OXYGEN TRANSPORT 11-DEC-99 1DWR
TITLE MYOGLOBIN (HORSE HEART) WILD-TYPE COMPLEXED WITH CO
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: MYOGLOBIN;
COMPND 3 CHAIN: A
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: EQUUS CABALLUS;
SOURCE 3 ORGANISM_COMMON: HORSE;
SOURCE 4 ORGANISM_TAXID: 9796;
SOURCE 5 ORGAN: HEART
KEYWDS OXYGEN TRANSPORT, RESPIRATORY PROTEIN
EXPDTA X-RAY DIFFRACTION
AUTHOR 2 I.SCHLICHTING
REVDAT 3 24-FEB-09 1DWR 1 VERSN
REVDAT 2 29-APR-05 1DWR 1 REMARK HET HETNAM FORMUL
REVDAT 2 2 HETATM
REVDAT 1 03-MAR-00 1DWR 0
JRNL AUTH K.CHU,J.VOJTECHOVSKY,B.H.MCMAHON,R.M.SWEET,
JRNL AUTH 2 J.BERENDZEN,I.SCHLICHTING
JRNL TITL CRYSTAL STRUCTURE OF A NEW LIGAND BINDING
JRNL TITL 2 INTERMEDIATE IN WILDTYPE CARBONMONOXY MYOGLOBIN
JRNL REF NATURE V. 403 921 2000
JRNL REFN ISSN 0028-0836
JRNL PMID 10706294
JRNL DOI 10.1038/35002641
REMARK 2 RESOLUTION. 1.45 ANGSTROMS.
REMARK 3 REFINEMENT.
REMARK 3 PROGRAM : X-PLOR 3.851
REMARK 3 AUTHORS : BRUNGER
REMARK 3 DATA USED IN REFINEMENT.
REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.45
REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 20
REMARK 3 DATA CUTOFF (SIGMA(F)) : 0.0
REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL
REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL
REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 96.1
REMARK 3 NUMBER OF REFLECTIONS : 23794
REMARK 3 FIT TO DATA USED IN REFINEMENT.
REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT
REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM
REMARK 3 R VALUE (WORKING SET) : 0.211
REMARK 3 FREE R VALUE : 0.255
REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.0
REMARK 3 FREE R VALUE TEST SET COUNT : NULL
REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL
REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES.
REMARK 3 BOND LENGTHS (A) : 0.013
REMARK 3 BOND ANGLES (DEGREES) : 1.93
Following the introductory material, some specific information regarding the protein and its crystalline form are provided, including:
SEQRES 1 A 153 GLY LEU SER ASP GLY GLU TRP GLN GLN VAL LEU ASN VAL SEQRES 2 A 153 TRP GLY LYS VAL GLU ALA ASP ILE ALA GLY HIS GLY GLN SEQRES 3 A 153 GLU VAL LEU ILE ARG LEU PHE THR GLY HIS PRO GLU THR [..] FORMUL 6 HOH *132(H2 O1) HELIX 1 1 SER A 3 ASP A 20 1 18 HELIX 2 2 ASP A 20 HIS A 36 1 17 HELIX 3 3 HIS A 36 GLU A 41 1 6 [..] CRYST1 63.600 28.800 35.600 90.00 106.50 90.00 P 1 21 1 2
Then come the actual atom coordinates (or structure), whose listing
takes up most of the average PDB file.
Each listing begins with "ATOM" and is followed by:
ATOM 1 N GLY A 1 -2.316 16.963 14.230 1.00 20.85 N ATOM 2 CA GLY A 1 -2.992 16.384 15.439 1.00 19.00 C ATOM 3 C GLY A 1 -2.137 15.253 16.013 1.00 18.18 C ATOM 4 O GLY A 1 -2.132 14.129 15.477 1.00 18.59 O ATOM 5 N LEU A 2 -1.387 15.560 17.074 1.00 15.98 N ATOM 6 CA LEU A 2 -0.561 14.582 17.790 1.00 13.52 C ATOM 7 C LEU A 2 -1.065 14.383 19.213 1.00 12.59 C
This goes on for a while, until the end of the peptide chain, which is marked by the "TER" line. If there are any other molecules that co-crystallized with the protein (such as solvent molecules or ligands) they are listed as "hetero-atoms" near the end of the file.
HETATM 1203 FE HEM A 154 14.347 28.659 5.074 1.00 8.13 FE
HETATM 1204 CHA HEM A 154 15.659 31.898 5.315 1.00 5.90 C
HETATM 1205 CHB HEM A 154 13.490 28.753 8.433 1.00 6.10 C
HETATM 1206 CHC HEM A 154 13.145 25.505 4.903 1.00 5.24 C
HETATM 1207 CHD HEM A 154 15.262 28.616 1.824 1.00 6.55 C
Go back to Contents
The main technique for determining protein structures is x-ray
crystallography. Since the first protein structure (myoglobin) was solved by this
technique by John Kendrew and Max Perutz in the late fifties, several
thousand others followed. As can be appreciated from the picture on the right,
which shows John Kendrew with the structural model of myoglobin, at
that time the determination of a structure the size of a protein,
without the aid of a computer, was a formidable task.
It is important to note that in both x-ray crystallography and NMR,
protein structures are not measured directly in the
experiment. Rather, a set of data is collected (a diffraction pattern
or a NMR spectrum), from which a model of the protein structure
is derived. To appreciate the difference between data and structure,
we'll now look at two different structures of the same protein, and
the corresponding x-ray crystallographic data. For this, we will
concentrate on the bacterial light driven proton pump
here for more background information on bR.
First download two bR structures from the Protein Data Bank, with PDB entries 1BRR and 1QHJ. Save both PDB files to your local account (see the last lecture if you forgot how to download from the PDB). View the structures with rasmol:
cartoons color structure
select ret and *a cpk
By looking at the structures, which of the two structures would you prefer, in terms of coordinate accuracy?
Remember, so far we only looked at the coordinates, which represent a model
that was optimized against the actually measured data. So, let
us now have a look at the data. In x-ray crystallography, data are
collected by measuring a diffraction pattern that is obtained from
x-rays reflected by a protein crystal. As mentioned in the lecture,
this diffraction pattern itself does not suffice to determine the complete structure
since only the amplitudes of the diffracted waves were collected, not
their phases. In x-ray crystallography, however, there are a number of
tricks available (e.g. isomorphous replacement, molecular replacement)
but we will not go into that in detail here. What is important to
remember is that eventually, an atomic electron density map is obtained.
Why do primarily the electrons of a molecular sample contribute to the
diffraction of x-rays? answer.
Visit the Uppsala electron density server
to view the electron density map 1BRR. Enter the PDB code (1BRR), and
wait for the summary page to load. Several plots with information on
this structure are available. Feel free to browse around to check the
meaning of the individual plots. At the bottom of the summary, the
electron density viewer can be activated. Select the "Astex viewer" and click "Go". After a while, a
java applet should appear with the electron density and the model
structure visible. In the lower part of the window you'll see the sequence of
the protein. With the mouse shift to around position 80 until you see the
sequence fragment "WARYA", and click on the "Y". You should now see the six-membered aromatic ring of the tyrosine).
Do you find the electron density for the aromatic
ring convincing? Now shift the focus to residue S35 which is numbered 33 in
the sequence, the "S" in the sequence "SDPDA". How is
the fit between the model and the data here? To see the retinal, the
light sensor in the center of the protein, go to the very end of the sequence.
Now repeat the procedure for entry 1QHJ. How is the fit for residue
Y83 (numbered 79 here)? and for S35 (numbered 31 here)? And the retinal?
Based on the data and on the model structures, would you say there is
a large impact of the resolution of the data on the accuracy of the structural model?
What ranges of resolution do you think belong to low,
medium and high resolution structures. What are the typical structural
features do you expect to be resolved, respectively.
The highest resolution x-ray crystallographic structures have a
resolution of approx. 0.8 Angstrom or even somewhat better. To see an example
of such a dataset, look at the density for structure 2B97. Note that you can zoom into the
map by clicking "Shift" on the keyboard and moving the mouse with the left
button pressed. Do you recognize the difference in appearance?
Although the resolution of this structure is rather high, at 0.75 Angstrom, the
hydrogen atoms (e.g. on the side chains) are still difficult to see. Why is
A measure for the coordinate uncertainty of the individual atoms due to the thermal motion in the crystal is given by
the temperature factor (or B factor).
Low B-factors (< 30) correspond to well-defined parts of the structure, whereas high B-factors (> 80) might indicate highly disordered parts of the structure or even mis-interpreted parts of the model.
Question: How do the temperature factors of a crystallographic
structure in principle compare to the flexibilities of a protein in a MD simulation? answer.
The other main technique for determining protein structures is NMR. In
contrast to x-ray crystallography, no crystals are required for an NMR
experiment. Rather, the structure is determined of the protein in
solution. Therefore, it has the advantage that the protein can be
studied in its native environment. On the other hand, the resolution
of an NMR structure is usually lower and there is a size limitation of
a few hundred amino acids for structure determination using NMR.
It would go beyond the scope of this course to explain the
NMR experiment in detail. We will therefore only briefly touch on the
experimental setup and then focus on the structure building and
refinement step based on the obtained data.
The NMR signal is recorded as a nuclear magnetic resonance spectrum of predominantly the
hydrogen atoms after the sample has been subjected to a (number of)
strong magnetic pulse(s). Mainly hydrogen atoms give rise to the
signal, because of the magnetic spin properties of the hydrogen
nucleus (a proton). The naturally occurring isotopes of the other
elements that are found in proteins, carbon (12C) and
have a zero nuclear magnetic moment. Nitrogen (14N) does
have a non-zero magnetic moment, but can usually not be used in NMR,
for reasons that would go beyond the scope of this course to explain.
These elements, therefore, can only
be utilized in NMR experiments when chemically replaced by a specific
isotope, like 13C or 15N. The most structurally
relevant information is usually obtained from a so called NOESY
experiment (Nuclear Overhauser Enhancement SpectroscopY).
The Nuclear Overhauser Effect or Nuclear Overhauser Enhancement is the
change (enhancement) of the signal intensity from a given nucleus as a
result of exciting or saturating the resonance frequency of another
nucleus. Since this effect is distance-dependent, it can be used to
derive the distance between an interacting pair of protons. In
practice, protons closer than 6A apart can be identified this way.
Now, we will calculate a model of the structure of a small protein, the B1 domain of protein G, from the proton-proton distance information obtained from a NOESY experiment. Download the data file containing the distance information here. You can have a look at the file (with the program "more" or "less" or a browser or editor of your choice) to assure yourself that there are indeed only distance bounds listed in this file. Additionally, we need an initial guess of the structure. Since we don't know the structure yet, we have to start from an unstructured peptide chain, which can be obtained here . Have a look at the structure with:
Note that, in contrast to x-ray crystallography, where a
single structure is presented, to reflect the fact that the NMR experiment probes
an ensemble of protein molecules in solution, an NMR structure is
usually represented by an ensemble of structures, that all fulfill the
For starting CNS a library libg2c is needed. After downloading libg2c.tgz create a folder for the library files
>tar -C libg2c -xvf libg2c.tgz cd libg2c
Now return to the directory where the protein structure, topology and CNS input files are
/usr/global/cns/intel-i686-linux_g77/bin/cns < anneal.inp
When the CNS structure calculation has finished, switch back to that window and type:
cat anneala_*.pdb > anneal.pdb
Which parts of the structure are well-defined, and which parts show
There is also an x-ray structure available of the B1 domain of protein
G, available under the PDB code 1PGB. Download it from the Protein Data Bank and compare it to
the just calculated NMR structure.
Question: What are the main differences between the NMR and x-ray structures of the B1 domain of protein G? hint
Which limitations do you think have NMR and x-ray crystallography, respectively?
Question: How do you expect these settings to change the results?
Principles of protein structure and basic in biophysics and biochemistry:
Go back to Contents