Genomic and Genetic Databases
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.
DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.
The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations.
NCBI Reference Sequence Database. A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.
A repository for high-quality gene models produced by the manual annotation of vertebrate genomes.
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
UCSC Genome Browser contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to ENCODE data at UCSC (2003 to 2012) and to the Neanderthal project.
The Map Viewer provides a wide variety of genome mapping and sequencing data.
The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence.
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalogue of human variation and genotype data. As the project ended, the Data Coordination Centre at EMBL-EBI has received continued funding from the Wellcome Trust to maintain and expand the resource.
The Rat Genome Database (RGD) was established in 1999 and is the premier site for genetic, genomic, phenotype, and disease data generated from rat research. In addition, it provides easy access to corresponding human and mouse data for cross-species comparisons. RGD’s comprehensive data and innovative software tools make it a valuable resource for researchers worldwide.
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
ZFIN serves as the zebrafish model organism database. The long term goals for ZFIN are a) to be the community database resource for the laboratory use of zebrafish, b) to develop and support integrated zebrafish genetic, genomic and developmental information, c) to maintain the definitive reference data sets of zebrafish research information, d) to link this information extensively to corresponding data in other model organism and human databases, e) to facilitate the use of zebrafish as a model for human biology and f) to serve the needs of the research community.
A Database of Drosophila Genes & Genomes
VectorBase is an NIAID Bioinformatics Resource Center dedicated to providing data to the scientific community for Invertebrate Vectors of Human Pathogens.
WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes. Founded in 2000, the WormBase Consortium is led by Paul Sternberg of CalTech, Paul Kersey of the EBI, Matt Berriman of the Wellcome Trust Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research.
Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species.
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
The Saccharomyces Genome Database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms.
The mission of the Integrated Microbial Genomes (IMG) system is to support the annotation, analysis and distribution of microbial genome and metagenome datasets sequenced at DOE’s Joint Genome Institute (JGI).
PortEco is a next-generation resource for knowledge and data about the biology of Escherichia coli K-12 group strains (these are laboratory strains and are not pathogenic), its bacteriophages, plasmids, and mobile genetic elements. PortEco is being developed by a national consortium of both laboratory biologists and computational biologists, and is funded by a grant from the U.S. National Institutes of Health.
The Generic Model Organism Database project is a collection of open source software tools for managing, visualising, storing, and disseminating genetic and genomic data.
The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals.
This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.
ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.
The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.