Computer Exercise in HMM's

Practical use of HMMER software package

Log in on the mdstud-system.Open a connection to bio.lundberg.gu.se

HMMER 2.1 is installed at the bio-server. On-line documentation of HMMER 2.1 can be found at http://hmmer.wustl.edu/.


A new "Lab5" account is in your home directory (at lundberg).

  1. Building profile HMM's

    In this exercise, you will study a group of sequences belonging to the Rel-homology family. They all have a similar 3D structure and similar function.

    1. Go to pfam (http://www.sanger.ac.uk/Software/Pfam/)and search for "rel homology". When you found the page describing the rel homology domain, copy and save the full alignment in MSF format to your account. Note that the rel domain includes four subdomains, what are they (include in report)?

    2. Use the program hmmbuild to build an HMM from the alignment rel_homology.msf that you just copied to your home directory.
      The syntax of hmmbuild is:
      hmmbuild [-options] <hmmfile output> <alignment file>
      Use the default options.

    3. Search the Protein Databank (PDB) in /d/ncbi/pdbaa with your profile HMM, using the program hmmsearch (default options).
      The syntax of this command is:
      hmmsearch [-options] <hmmfile> <databasefile>
      It is best to redirect the outout from hmmsearch to a file by typing something like:
      bio> hmmsearch hmmfile.hmm name_of_database > outputfile

    4. Calibrate the profile HMM. This is performed in order to get more sensitive searches. The syntax for the command (hmmcalibrate) is:
      hmmcalibrate [-options] <hmmfile>
      When the model is calibrated using the default options, do a new search in the PDB ( /d/ncbi/pdbaa). Compare the results from before. Can you detect any differences?

  2. Searching a HMM-library

      A second use of HMMER is to look for known domains in a query sequence, by searching a single sequence against a library of HMMs (in contrast to the previous section, in which we searched a single HMM against a sequence database.) To do this, you need a library of profile HMMs. In this case, we will construct the database ourselves. Larger databases are availble for download, such as the PFAM-database.

      HMM databases are simply concatenated single HMM files. You can build them either by invoking the -A ``append'' option of hmmbuild, or by concatenating HMM files you've already built.

      Download (as in the previous exercise, by searching PFAM) the following PFAM entries and download the seed alignment in msf-format for all three:

      PF00041 (fn3 domain)

      PF00076 (rrm domain)

      PF00069 (pkinase domain)

      Don't be confused by the files already in your Lab5 directory, the "rrm.slx", "fn3.slx" and "pkinase.slx" files do not work (last minute discovery)!

    1. Use any of the two alternatives below:

      The first alternative is to first build three profiles with hmmbuild, just as in the previous section. Then you shold be able to fuse them into a new file, for example myhmms, with cat:
      bio> cat rrm.hmm fn3.hmm pkinase.hmm > myhmms (if you called the profiles rrm.hmm, fn3.hmm and pkinase.hmm).
      Calibrate the fused file just as before with hmmcalibrate.

      The second alternative is to use hmmbuild three times with the append option -A:
      bio> hmmbuild -A myhmms rrm.msf
      bio> hmmbuild -A myhmms fn3.msf
      bio> hmmbuild -A myhmms pkinase.msf
      Then, calibrate myhmms with hmmcalibrate.

      Note that hmmcalibrate can be run on HMM databases as well as single HMMs.

    2. Now that you have a small HMM database called myhmms, let us use it to analyze the Drosophila Sevenless sequence, 7LES_DROME (in Lab5/):
      bio> hmmpfam myhmms 7LES_DROME
      Does the sequence seem to belong to any of the protein families of the library?

  3. Compare HMMER with traditional methods

      Try to find out if the results from HMMER is better than other methods in the case of the Rel-homology proteins. As a starting point, imagine that you do not know the homologs of your protein sequence Swissprot id o96458. Download this sequence in fasta format from http://www.expasy.ch. This is actually one of the proteins in the alignment file of the rel-domain from pfam (so we do in fact know homologs to it, ie. the other proteins in the alignment).

    1. Do a ordinary BLAST (command line or on the web) with o96458 against the PDB.

      See Exercise 1 for syntax of the BLAST (blastall) program!

    2. Use the same sequence to search the PDB with psi-BLAST (command line or on the web).

      See Exercise 3 for syntax of the psi-BLAST program (i.e. blastpgp). Set the number of rounds to 6 by the option "-j 6". Psi-BLAST also uses profiles (but not HMM's) to iteratively search a database until nothing new is found (i.e. until convergence is reached).

    3. Compare the results you got from HMMER with the ones from BLAST and psi-BLAST. Can you see any differences, does any of the methods seem to be more efficient in finding Rel-homologs?

Report

Write a summary of what you have done in this exercise and the conlusions that you have made, and hand it in no later than next Friday!


Profile HMM's Summary

The basic strengths of profile HMMs are (even if you did not find any evidence what so ever...):

  1. a model can be quickly built of any multiple alignment of interest
  2. the model can be used to search a database and/or parse sequences for the presence of similar domains
  3. profile HMMs can be used to maintain alignments of huge numbers of sequences, starting from carefully constructed "seed" alignments of a representative set of sequences.

References

Profile hidden Markov models.
S.R. Eddy. Bioinformatics 14:755-763, 1998. A review of the profile HMM literature from 1996-1998.
Abstract/reprints: [Bioinformatics Online] [PostScript]. [PDF].

Multiple-alignment and -sequence searches.
S.R. Eddy. Trends Guide to Bioinformatics, pp. 15-18, 1998. A brief practical example of using multiple alignment and profile search software to detect a remote informative similarity to a "novel" sequence with no informative BLAST hits.
Preprints: [PostScript]. [PDF]. [HTML w/ additional hyperdata].

Hidden Markov Models
S.R. Eddy. Current Opinion in Structural Biology, 6:361-365, 1996. Review of the use of HMMs for sequence and structure profiles.
Abstract/reprints: [COSB Online]. [PubMed]. Preprints: [PostScript]. [PDF].

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.
R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Cambridge University Press, 1998. 350 pages.
Almost everything (and possibly more) that you ever wanted to know about hidden Markov models and other probabilistic modeling approaches in biosequence analysis.