New Databases and Tools from EMBL-EBI and NCBI

This article was featured in Library Notes #55 (August 2009).

BioCatalogue.org

EMBL-EBI and the University of Manchester have collaborated in launching a "major new e-science resource": BioCatalogue.org. This site is a curated listing of web-based services, databases and tools.  It is meant to provide a one-stop resource for researchers seeking bioinformatics resources.

Biocatalogue logo

It's still in the early stages, so BioCatalogue offers just over 1000 listings of web-based resources that are curated by "expert curators" (but it's hard to find who the curators are on the site) with the promise of many more to be added.  That the catalogue is curated suggests that, unlike many other listings of bioinformatics resources, the listings will be checked periodically for currency and functionality. 

The site looks a little like Facebook in its design and user interface, but that seems to be the point:  resources are tagged with keywords and users can submit suggestions for services and add comments.  User annotation creates a set of listings that are partly managed by the community of users of the site.  Here you can see the first result in a search for "multiple alignment" on BioCatalogue.

Biocatalogue search

It will be interesting to see how people use BioCatalogue.  For now, the search seems to work well, but the catalogue is missing some major resources:  many of the NCBI bioinformatics tools or databases do not appear in searches.  Some of the resource names look a little strange, too.  The site seems to tack on the word "Service" after a listing if the submitter doesn't remove it.  For a searchable listing of bioinformatics resources, I still currently prefer the OBRC (Online Bioinformatics Resources Collection) from the University of Pittsburgh--which is listed in BioCatalogue.

 

COBALT Multiple Alignment Tool

Following on the example above:  What tool do you use for multiple sequence alignment?  Many people will probably answer that question with the answer "Clustal", and the graphical interface for Clustal at EMBL-EBI certainly makes such work easy.  There are many other multiple alignment tools available at EMBL-EBI such as T-Coffee, MUSCLE, MAFFT and Kalign

More recently, NCBI has entered the multiple alignment arena with COBALT (Constraint-based Alignment Tool).

COBALT was first introduced in a 2007 paper by Papadopoulous and Agarwala.   It aligns sequences using progressive pairwise constraints based on information from the Conserved Domains Database (CDD), PROSITE protein motif database and sequence similarity.  According to authors, COBALT performs as well as or better than other multiple alignment algorithms.  Results are returned in a typical multiple alignment text format and a phylogenetic tree can be constructed and viewed. 

COBALT tree

The nice thing about COBALT is that it allows users to edit the sequences by hand after seeing the results and resubmit the edited sequences for re-alignment. Users may also see the phylogenetic tree in 4 different styles and the tree data is available in Nexus format for download and submission to other phylogenetic software.  Some of the limitations of COBALT are that users may only enter integers for gap costs, so numbers such as 0.5 may not be used for gap extension penalties; and the alignment data is not returned in a downloadable outfile.

Comparisons of alignments between Clustal and COBALT return generally very similar alignments, but some differences can be found, especially in the numbers and sizes of gaps inserted by the algorithm (despite using similar parameters for the alignment in each platform).  Users are always cautioned to carefully inspect their multiple alignment results and to set the algorithm parameters to achieve the alignment that best fits their data.  EBI maintains a nice tutorial on the use of Clustal in multiple sequence alignment to give you some information on how to select parameters for your alignment.

Pamela Shaw
Biosciences Librarian
E-mail me by using the"E-mail Pam" link on my liaison page.

The Biosciences & Bioinformatics Blog highlights new tools and news items of interest to the biosciences research community at Northwestern University.

Comments
Comment about the BioCatalogue article:
BioCatalogue is a catalogue of bioinformatics web services (http://en.wikipedia.org/wiki/Web_service). So the BioCatalogue lists web services bioinformaticians use to access bioinformatics database and tools. For example, Many of NCBI bioinformatics tools and database can be accessed through the NCBI eutils web services which are listed in the BioCatalogue: http://www.biocatalogue.org/search?q=eutils.
My conclusion is that, if you are looking for bio database; BioCatalogue is the wrong place but if you are looking for bio web services to access bio databases and resources then BioCatalogue is the place to start.
# Posted By Franck | 10/2/09 4:26 AM
Franck,
You are correct, and I stand corrected. At the time I first looked at BioCatalogue, I didn't realize it was intended to be a provider only of Web services. So for that reason, my searches for databases did not return what I expected.

The BioCatalogue is incredibly useful in what it is: there are very few sites that feature Web services and machine-to-machine tools that are useful for molecular biology research.

Thanks for the comment and correction. -Pamela
# Posted By Pamela Shaw | 10/2/09 11:52 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.3.006. Contact Blog Owner