RepositoriesGenome BrowsersSpecies and Taxa-specific DatabasesSubject-specific Databases

 

Repositories

NCBI (GenBank)

The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.

EMBL-EBI

EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.

DDBJ

DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.

INSDC

The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations.

RefSeq

NCBI Reference Sequence Database. A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

VEGA

A repository for high-quality gene models produced by the manual annotation of vertebrate genomes.

CCDS

The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.

 

Genome Browsers

Ensembl

The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

UCSC Genome Browser

UCSC Genome Browser contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to ENCODE data at UCSC (2003 to 2012) and to the Neanderthal project.

NCBI MapViewer

The Map Viewer provides a wide variety of genome mapping and sequencing data.

ENCODE

The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence.

1000 Genomes

The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalogue of human variation and genotype data. As the project ended, the Data Coordination Centre at EMBL-EBI has received continued funding from the Wellcome Trust to maintain and expand the resource.

 

Species and Taxa Specific Databases

Rat Genome Database

 The Rat Genome Database (RGD) was established in 1999 and is the premier site for genetic, genomic, phenotype, and disease data generated from rat research. In addition, it provides easy access to corresponding human and mouse data for cross-species comparisons. RGD’s comprehensive data and innovative software tools make it a valuable resource for researchers worldwide.

Mouse Genome Informatics

MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.

ZFIN, Zebrafish Model Organism Database

ZFIN serves as the zebrafish model organism database. The long term goals for ZFIN are a) to be the community database resource for the laboratory use of zebrafish, b) to develop and support integrated zebrafish genetic, genomic and developmental information, c) to maintain the definitive reference data sets of zebrafish research information, d) to link this information extensively to corresponding data in other model organism and human databases, e) to facilitate the use of zebrafish as a model for human biology and f) to serve the needs of the research community.

FlyBase, Drosophila and other species

A Database of Drosophila Genes & Genomes

VectorBase, invertebrate vectors of human disease

VectorBase is an NIAID Bioinformatics Resource Center dedicated to providing data to the scientific community for Invertebrate Vectors of Human Pathogens.

WormBase, C. elegans and related nematodes

WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes. Founded in 2000, the WormBase Consortium is led by Paul Sternberg of CalTech, Paul Kersey of the EBI, Matt Berriman of the Wellcome Trust Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research.

Gramene, crop grasses and other plants

Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species.

TAIR, Arabidopsis

The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.

SGD, Saccharomyces Genome Database

The Saccharomyces Genome Database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms.

IMG, Integrated Microbial Genomes

The mission of the Integrated Microbial Genomes (IMG) system is to support the annotation, analysis and distribution of microbial genome and metagenome datasets sequenced at DOE’s Joint Genome Institute (JGI).

EcoliHub, community and database for E. coli

PortEco is a next-generation resource for knowledge and data about the biology of Escherichia coli K-12 group strains (these are laboratory strains and are not pathogenic), its bacteriophages, plasmids, and mobile genetic elements. PortEco is being developed by a national consortium of both laboratory biologists and computational biologists, and is funded by a grant from the U.S. National Institutes of Health.

VBRC, Viral Bioinformatics Resource Center

GMOD

The Generic Model Organism Database project is a collection of open source software tools for managing, visualising, storing, and disseminating genetic and genomic data.

HapMap

The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals.

 

Subject-specific databases

PDB, Protein 3D structure Database

This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.

Pfam, protein families

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

GEO, Gene Expression Omnibus

GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.

ArrayExpress, repository for transcriptomics data

ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.

dbGaP

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.