Tree families database The Sanger Institute Beijing Genomics Institute
[Home] [Search] [Browse] [TaxaView] [Download] [FAQ]


TreeFam (Tree families database) is a database of phylogenetic trees of animal genes. It aims at developing a curated resource that gives reliable information about ortholog and paralog assignments, and evolutionary history of various gene families.

TreeFam defines a gene family as a group of genes that evolved after the speciation of single-metazoan animals. It also tries to include outgroup genes like yeast (S. cerevisiae and S. pombe) and plant (A. thaliana) to reveal these distant members.

TreeFam is also an ortholog database. Unlike other pairwise alignment based ones, TreeFam infers orthologs by means of gene trees. It fits a gene tree into the universal species tree and finds historical duplications, speciations and losses events. TreeFam uses this information to evaluate tree building, guide manual curation, and infer complex ortholog and paralog relations.

The basic elements of TreeFam are gene families that can be divided into two parts: TreeFam-A and TreeFam-B families. TreeFam-B families are automatically created. They might contain errors given complex phylogenies. TreeFam-A families are manually curated from TreeFam-B ones. Family names and node names are assigned at the same time. The ultimate goal of TreeFam is to present a curated resource for all the families.


TreeFam 4.0 Released - 2007.03.07

TreeFam 4.0 is released today. In this new release, we introduce clustering based families (TF5 families) to give a more complete coverage of all annotated genes. Previously, building automated TreeFam families always started from the orginal PhIGs clusters. However, as the number of fully sequences species is growing rapidly and gene annotations become more and more accurate with years, sticking to old clusters made TreeFam miss many genes. To increase the coverage of all annotated genes, we decide to do clustering for each new release. The resultant clusters become TF5 families. Consequently, each TreeFam gene is classified in two ways: the conventional competitive method used in TreeFam-2 and the new clustering method. Searching for one gene usually leads to two results, representing the two classifying methods.

Gene sets were updated as usual. TreeFam-4 is mainly based on Ensembl v41. Four species were added in this new release. They are: Ciona savignyi, Gasterosteus aculeatus, Oryzias latipes and Aedes aegypti. Apis mellifera genes have been dropped since Ensembl did not provide the annotations any more. Gene sets of all the other species were also updated in October, 2006.

TreeSoft Project Launched - 2006.11.01

TreeSoft was registered at TreeSoft is a collection of softwares that build, display or manipulate phylogenetic trees. It is also the code base for softwares that are developed for the TreeFam (Tree Families database). At the same time, TreeSoft provides brief introductions and links to other softwares, databases or web services for phylogenetic trees. TreeSoft is an open source project hosted by The project page is at TreeSoft provides downloads and documentations for most of source codes developed for TreeFam.

HGNC Links to TreeFam - 2006.10.24

HUGO Gene Nomenclature Committee (HGNC) started to provide cross-reference links to TreeFam. These links are available in both gene pages and HOCP (HGNC Comparison of Orthology Predictions) pages. Examples are provided here and also here.

Search for External Accessions - 2006.10.22

The search page has been updated to support search of external accessions from GenBank, UniProt, PDB and even Pfam, GO and so on. The cross-reference table was imported from Ensembl. Although early version also supports this function, the new one is more flexible when Xref table become a part of TreeFam MySQL.

Link to TreeFam pages by cross-references have been updated accordingly. Now people can link to TreeFam family pages in a new way, for example:

For a complete list of dbid, please refer to this page. Usually detailed information dbid and spec should be applied whenever possible. One xref, especially an integer accession, may exist in several databases. In this case, only one result can be seen.

TreeFam 3.0 Released - 2006.06.26

It has been over half a year since the last release. Although TreeFam 3.0 looks pretty like TreeFam 2.0, we do bring a number of new features that may interest you. During this period, we stablized the automatic pipeline, which will make it possible to update TreeFam more swiftly. We also bring back the ortholog table that was missed in 2.0. In comparison to the old ortholog table of TreeFam 1.0, the new version is more complete and much more accrate by utilizing sophisticated algorithms. Other notable new features or improvements are:

Link to TreeFam Pages - 2006.06.02

Now various TreeFam pages can be accessed by providing TreeFam gene identifiers or external gene accessions that are stored by other databases such as HGNC, MGI, GenBank, etc. The following are some examples. Details are provided here.

TreeFam 2.0 Released - 2005.12.30

TreeFam 2.0 comes as a new year's present. Several essential improvements were developed in this new release: pipelines rewritten, bugs fixed, more species added, new features introduced, and web pages updated accordingly. Notable improvements are:

  • Data Sets:
  • Pipelines:
    • Competitive method. In TreeFam 2.0, one sequence is arbitrarily assigned to one family that gives the sequence the highest HMMer score. Overlapping families, which is the main problem with TreeFam 1.0, will not make troubles any more.
    • Clean tree. A clean tree was built by merging several trees together, including Phyml-AA-WAG tree, Phyml-NT-HKY, NJ-dS and NJ-dN tree. Our preliminary tests suggest this is the most accurate automatic method for building trees that we have tried.
  • Web Pages:
    • Alignment View was added to the family page. Pfam domains and splicing sites are visualized in a mapped picture.
    • Sidebar was introduced. Look-and-feel were improved.
At present, TreeFam 2.0 has not been completely finalized. As we hope users can experience the new features after they read our paper published today, we bring v2.0 out in a hurry. Sorry for the inconvenience and we will update remaining parts in the next few days. In the mean time, older release v1.x is still temporarily available at, hosted by the Insitute of Human Genetics of Aarhus University.
Last Modified Mon Mar 5 16:53:34 2007