Over the years, gene sequences have been obtained from genome sequencing projects, allelic variation studies, environmental sampling and a multitude of individual studies. Sequence data for many genomes remains available from a website dedicated to that genome project. Initially repositories were available for the major sequencing projects, and individual data was available from smaller projects by request. During this early period of the biological information age, it became apparent that a more centralized solution would be required.
While obtaining sequence data from genome repositories obtains the most recent gene predictions and function annotations, but is time consuming when performing comparative analysis. In order to reduce search time for researchers, a number of handy central repositories have been established, the most prominent of those being NCBI's Non-Redundant database, UniProt, PIR, and PDB. These central repositories have the advantage of allowing users to search all known sequences, but are often a version or two behind the most curated genomes, and consequently have slightly inferior gene boundary predictions and annotations. They also often suffer from redundancy.
Each of these databases can be downloaded, or accessed through a web-portal. A video example demonstrates how a sequence can be obtained from NCBI's NR, and then identify homologous structures in PDB. A separate example shows how to obtain all sequences from the human genome.
Talk about how local blast databases can be assembled.
Introduction
Hello and welcome to Pragmatic Bioinformatics; a source of practical tool review, handy code, and general tips for budding and seasoned computational biologists alike.
Saturday, February 2, 2008
Subscribe to:
Posts (Atom)
