Practical 4 a): Introduction to protein sequence retrieval and analysis

Bert de Groot

Contents
pairwise align Introduction
Multiple sequence alignment of insulin
Phylogenetic analysis of hemoglobin
Optional exercise
References

Introduction

The last lecture gave an introduction into protein sequences (or primary structures), and we have learnt which information can be extracted on the sequence level. In short, these include:

Today, we will focus on the last two aspects, as structural analyses will follow later. First we will analyse the peptide hormone insulin.

Go back to Contents

Multiple sequence alignment of insulin

As you may know, insulin is essential for normal metabolism, as it stimulates glucose uptake after a meal. Malfunction of insulin leads to diabetes, which is characterized by decreased glucose tolerance resulting from a relative deficiency of insulin (or, alternatively, a lack of sensitivity to insulin on the receptor side).

For a sequence analysis of insulin, we obviously first need its sequence. For this, we visit the SWISS-PROT database, which can be accessed via the ExPASy Proteomics Server.

The SWISS-PROT entry of human insulin starts with some general information on the sequence, starting with basic entry information, the name and origin of the protein, literature references connected with this sequence, and some comments concerning the function. This section is followed by a number of cross-references to other databases concerning insulin, like for example the Protein Data Bank (PDB), where protein structures are stored. As we can see, lots of structural information on insulin is available. Question: Using BLAST, the selected sequences have already been aligned, to assess the similarity to our target sequence. Why do we need to do another alignment?

We will now download and open an aligment viewer:

wget http://www.jalview.org/getdown/release/jalview-all-2.11.1.3-j1.8.jar
java -jar jalview-all-2.11.1.3-j1.8.jar

Close all windows within Jalview. Load the result of insulin by going to File > Input Alignment > From File and load the aligment file. To focus on conserved residues, under "Colour", activate a coloring scheme, e.g. BLOSUM62 Score, and tick the "By Conservation" setting in the same menu. You can keep the default conservation threshold. This way,those residues that are highly conserved get highlighted according to the selected threshold.

Question: Which are the most conserved residues? Why might these residues be conserved? All cysteine (C) residues seem highly conserved. What might be the reason?

On the right, a picture of the insulin structure is shown, with the A chain in yellow and the B chain in magenta. As can be seen, there are two "bridges" connecting the A and the B chain, formed by Cysteine (C) residues on both chains. This is an important structural feature of insulin, strongly stabilising the structure. Therefore, it can be easily understood that these C residues are among the strongly conserved residues in the hormone. As is known from other structural studies, residues interacting with the insulin receptor include: the N-terminus of the A-chain (G-I-V-E), the C-terminus of the A-chain (Y-C-N), and the C-terminus of the B-chain (G-F-F-Y), so also for these residues there is a clear reason for their conservation. For the other conserved residues, the reason for their conservation is less clear, although their mutation has shown altered activity, hence indicating a functional role.


Phylogenetic analysis of hemoglobin

Another application of multiple sequence analyses is the derivation of evolutionary information, in particular the analysis of common ancestors among different species, and their grouping (also known as taxonomy) based on sequence similarity. This analysis is known as phylogenetic analysis, and trees representing the sequence relationships are known as phylogenetic trees.

In this course we will generate two phylogenetic trees, and compare the results, to see if the mutational pattern in the one protein (and the associated phylogenetic tree) is similar to that of the other. For this we will take the alpha and beta chain of hemoglobin. Hemoglobin is the universal oxygen transporter in nature. It takes up oxygens in the lungs (or gills for fish) and transports it via the blood in red blood cells to the brain, muscle, or another destination in the body where oxygen is required. In fact, the reason why blood is coloured red is because of the hemoglobin. Hemoglobin contains iron, which in that particular state is colored red, not unlike rust. Although part of the same protein, the two sequences of the alpha and beta chain have evolved independently, and hence, two separate phylogenetic trees can be constructed.

For the sequence retrieval, we follow the same procedure as we have done above for insulin, first for the human hemoglobin alpha chain (search for "human hemoglobin alpha"), and then for the hemoglobin beta chain.


Questions:


(due to limited time, please go ahead with practical 4b and only then come back to these optional exercises, if time allows).

Go back to Contents

Optional exercises

Protein Domains

Protein function typically results from modular structural features called Domains. Due to evolutionary shuffling these domains can conservatively exist in proteins that share functional aspects.
Domains can be used for classifying proteins into families that display structural and sequence similarities within conserved regions of the protein. Pfam is an online database of proteins that contains a large set of Multiple Sequence Alignments of protein domains.

Pfam consists of two parts:

  • Pfam A with highly characterized protein MSAs with known functions
  • Pfam B with all the other MSAs NOT a part of Pfam A

    Using the link provided to the Pfam website search for the keyword Q12809 using the Jump to option. Try to answer the following questions:

    Questions:

    Reproducibility

    If time allows, build a phylogenetic tree of a very different protein (like a ribosomal elongation factor or F1-ATPase) and compare the result to that of hemoglobin.

    Go back to Contents

    Further references

    Go back to Contents