Computer Exercise in HMM's

Practical use of HMMER software package

HMMER 2.1 is installed at the bio-server. On-line documentation of HMMER 2.1 can be found at http://hmmer.wustl.edu/.

A new "Lab5" account is in your home directory (at lundberg).

Building profile HMM's

In this exercise, you will study a group of sequences belonging to the Rel-homology family. They all have a similar 3D structure and similar function.
1. Go to pfam (http://www.sanger.ac.uk/Software/Pfam/)and search for "rel homology". When you found the page describing the rel homology domain, copy and save the full alignment in MSF format to your account. Note that the rel domain includes four subdomains, what are they (include in report)?
2. Use the program hmmbuild to build an HMM from the alignment rel_homology.msf that you just copied to your home directory.
  The syntax of hmmbuild is:
  hmmbuild [-options] <hmmfile output> <alignment file>
  Use the default options.
3. Search the Protein Databank (PDB) in /d/ncbi/pdbaa with your profile HMM, using the program hmmsearch (default options).
  The syntax of this command is:
  hmmsearch [-options] <hmmfile> <databasefile>
  It is best to redirect the outout from hmmsearch to a file by typing something like:
  bio> hmmsearch hmmfile.hmm name_of_database > outputfile
4. Calibrate the profile HMM. This is performed in order to get more sensitive searches. The syntax for the command (hmmcalibrate) is:
  hmmcalibrate [-options] <hmmfile>
  When the model is calibrated using the default options, do a new search in the PDB ( /d/ncbi/pdbaa). Compare the results from before. Can you detect any differences?
Searching a HMM-library
1. Use any of the two alternatives below:
  
  The first alternative is to first build three profiles with hmmbuild, just as in the previous section. Then you shold be able to fuse them into a new file, for example myhmms, with cat:
  bio> cat rrm.hmm fn3.hmm pkinase.hmm > myhmms (if you called the profiles rrm.hmm, fn3.hmm and pkinase.hmm).
  Calibrate the fused file just as before with hmmcalibrate.
  
  The second alternative is to use hmmbuild three times with the append option -A:
  bio> hmmbuild -A myhmms rrm.msf
  bio> hmmbuild -A myhmms fn3.msf
  bio> hmmbuild -A myhmms pkinase.msf
  Then, calibrate myhmms with hmmcalibrate.
  
  Note that hmmcalibrate can be run on HMM databases as well as single HMMs.
2. Now that you have a small HMM database called myhmms, let us use it to analyze the Drosophila Sevenless sequence, 7LES_DROME (in Lab5/):
  bio> hmmpfam myhmms 7LES_DROME
  Does the sequence seem to belong to any of the protein families of the library?
Compare HMMER with traditional methods
1. Do a ordinary BLAST (command line or on the web) with o96458 against the PDB.
  See Exercise 1 for syntax of the BLAST (blastall) program!
2. Use the same sequence to search the PDB with psi-BLAST (command line or on the web).
  See Exercise 3 for syntax of the psi-BLAST program (i.e. blastpgp). Set the number of rounds to 6 by the option "-j 6". Psi-BLAST also uses profiles (but not HMM's) to iteratively search a database until nothing new is found (i.e. until convergence is reached).
3. Compare the results you got from HMMER with the ones from BLAST and psi-BLAST. Can you see any differences, does any of the methods seem to be more efficient in finding Rel-homologs?

Report

Write a summary of what you have done in this exercise and the conlusions that you have made, and hand it in no later than next Friday!

Profile HMM's Summary

The basic strengths of profile HMMs are (even if you did not find any evidence what so ever...):

a model can be quickly built of any multiple alignment of interest
the model can be used to search a database and/or parse sequences for the presence of similar domains
profile HMMs can be used to maintain alignments of huge numbers of sequences, starting from carefully constructed "seed" alignments of a representative set of sequences.

References

Profile hidden Markov models.

S.R. Eddy. Bioinformatics 14:755-763, 1998. A review of the profile HMM literature from 1996-1998.
Abstract/reprints: [Bioinformatics Online] [PostScript]. [PDF].

Multiple-alignment and -sequence searches.

S.R. Eddy. Trends Guide to Bioinformatics, pp. 15-18, 1998. A brief practical example of using multiple alignment and profile search software to detect a remote informative similarity to a "novel" sequence with no informative BLAST hits.
Preprints: [PostScript]. [PDF]. [HTML w/ additional hyperdata].

Hidden Markov Models

S.R. Eddy. Current Opinion in Structural Biology, 6:361-365, 1996. Review of the use of HMMs for sequence and structure profiles.
Abstract/reprints: [COSB Online]. [PubMed]. Preprints: [PostScript]. [PDF].

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.

R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Cambridge University Press, 1998. 350 pages.
Almost everything (and possibly more) that you ever wanted to know about hidden Markov models and other probabilistic modeling approaches in biosequence analysis.

Computer Exercise in HMM's

Practical use of HMMER software package

Building profile HMM's

Searching a HMM-library

Compare HMMER with traditional methods

Report

Profile HMM's Summary

References