Skip to Main Content
Arabidopsis Information Portal, (AIP)
"The Arabidopsis Information Portal is an open-access online community resource for Arabidopsis research. AIP enables biologists to navigate from the Arabidopsis thaliana Col-0 reference genome sequence to its associated annotation including gene structure, gene expression, protein function, and interaction networks. AIP was funded in 2013 and came on line in 2014. AIP already offers a single interface through which to access a wide range of Arabidopsis information. AIP will grow through contributions of other labs in the form of modules: data, computation, and visualization tools."
Bio-Analytic Resource - the BAR
"...Web-based tools for working with functional genomics and other data. Most are designed with the plant (mainly Arabidopsis) researcher in mind, but a couple of them can be useful to the wider research community, e.g. Mouse eFP Browser and Human eFP Browser or BlastDigester."
Bordwell pKa Table (Acidity in DMSO)
This free resource provides pKa information and references for Acidity in DMSO. If you scroll down on the left hand side there are also links to pKa lists in water, pKa in gas phase data on the NIST website, and a page on lithium reagents.
"Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov."
"Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies"
"The Ensembl genome annotation system, developed jointly by the EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000. Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy."
"Figshare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient. Figshare uses creative commons licensing to allow frictionless sharing of research data whilst allowing users to maintain their ownership."
"Figshare gives users unlimited public space and 1GB of private storage space for free."
"Figshare is based in based in London and is supported by Digital Science. Digital Science’s relationship with Figshare represents the first of its kind in the company’s history: a community- based, open science project that will retain its autonomy whilst receiving support from the
GenBank by NIH
"GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2008 Jan;36(Database issue):D25-30). There are approximately 106,533,156,756 bases in 108,431,692 sequence records in the traditional GenBank divisions and 148,165,117,763 bases in 48,443,067 sequence records in the WGS division as of August 2009."
Tip: Data submission methods and instructions are located halfway down the page.
Global Burden of Disease (GBD)
Purpose: "To provide policymakers, researchers, donors, and other decision-makers with the most timely and up-to-date picture of population health to inform critical decisions, the Global Burden of Disease (GBD) will produce annual updates to its estimates. The first update, GBD 2013, uses and expands upon the infrastructure of methodology, datasets, and tools that were presented in GBD 2010, and presents estimates of all-cause mortality, deaths by cause, years of life lost, years lived with disability, and disability-adjusted life years by country, age, and sex."
Gramene: A comparative resource for plants
"Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species. Our goal is to facilitate the study of cross-species comparisons using information generated from projects supported by public funds. Gramene currently hosts annotated whole genomes in over two dozen plant species and partial assemblies for almost a dozen wild rice species in the Ensembl browser, genetic and physical maps with genes, ESTs and QTLs locations, genetic diversity data sets, structure-function analysis of proteins, plant pathways databases (BioCyc and Plant Reactome platforms), and descriptions of phenotypic traits and mutations."
ICPSR (Inter-University Consortium for Political and Social Research)
"An international consortium of about 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community.
ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields."
Institute for Quantitative Social Science at Harvard University Dataverse
"Access the world's largest collection of social science research data here by searching across or browsing through one of the virtual data archives (called "dataverses") listed below. You may also create a dataverse of your own, backed up in perpetuity by the Henry A Murray Archive, which may easily be customized to appear as if it is on your web page, but in fact is served by the IQSS Dataverse Network."
It is important if you use this OA reference work that you look at the citation information and then go to the citation to verify the accuracy of the numbers you are planning on using, especially if safety is an issue. The creators warn against using unverified information in ways that if wrong could endanger people's health and safety. "Explore thousands of values [for physical properties of elements] with citations spanning more than 100 quantities---Use our calculators to convert values to your unit"
NCBI (National Center for Biotechnology Information)
"The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information."
NIST (National Institute of Standards and Technology) Standard Reference Data
"NIST produces the Nation’s Standard Reference Data (SRD). These data are assessed by experts and are trustworthy such that people can use the data with confidence and base significant decisions on the data. NIST provides 49 free SRD databases and 41 fee-based SRD databases."
Open Tree of Life
"Open Tree of Life aims to construct a comprehensive, dynamic and digitally-available tree of life by synthesizing published phylogenetic trees along with taxonomic data. The project is a collaborative effort between 11 PIs across 10 institutions. Funding is from NSF AVAToL #1208809.
Browse the tree and leave feedback: Click on nodes to move through the tree, and click on nodes or edges to see more information about taxonomies and phylogenies that contain / support that node. If you have feedback about the relationships that you see, use the "Add Comment" button.
Contribute data: You can contribute to the synthetic tree by uploading trees through our curation interface. These can be trees from TreeBASE or trees uploaded from your computer. There will be a delay before uploaded trees appear in the synthetic tree. Up to summer 2015, the release cycle has been months between new versions of the synthetic tree, but this should shorten in the future."
"OrgRef is an open dataset which is free for anyone to use. It extracts structured information about organizations from open resources like Wikipedia. It was created with publishers in mind, and aims to cover the most important academic and research organizations worldwide."
Salk Institute Genomic Analysis Laboratory (SIGnAL)
Plant, human and mouse related tools and data. The majority focus on Arabidopsis.
"Science.gov searches over 42 databases and over 2000 selected websites from 14 federal agencies, offering 200 million pages of authoritative U.S. government science information including research and development results"
*To use ToxBank Data you must register for free access.
"ToxBank provides a dedicated data warehouse for toxicity data management and modelling, a "gold standards" compound database and repository of selected test compounds, and a reference resource for cells, cell lines and tissues of relevance for in vitro systemic toxicity research. ToxBank supports the data management and analysis activites carried out across the Alternative Testing Strategies SEURAT-1 program."
Chemical Safety Library via the Pistoia Alliance
Chemical Safety Library
"The Pistoia Alliance Chemical Safety Library (CSL) provides unique crowd sourced data content containing hazardous reactions that can be used to alert scientists to potential dangerous experiments. This new platform supports laboratory safety by providing tools to search for submitted hazardous reaction information by CAS Registry Number, chemical name, SMILES, InChI, and more; contribute new hazardous reaction incident information to the library; and download aggregated information to integrate with internal workflows and knowledge bases."
Open Data Formats & Programs
"R is an open-source statistical environment that can be used for not only statistics, but also for data acquisition, data manipulation, modeling, among other uses...Part of the reason behind R’s explosive growth is the ease with which the ever-growing userbase can add new functionality, a fact evidenced by 3,000+ currently available R packages."
The Open Microscopy Environment
"The Open Microscopy Environment (OME) is a multi-site collaborative effort among academic laboratories and a number of commercial entities that produces open tools to support data management for biological light microscopy. Designed to interact with existing commercial software, all OME formats and software are free, and all OME source code is available under the GNU General public license or through commercial license from Glencoe Software."
Virtual Embryo Projects
EMAP (EMA Anatomy Atlas of Mouse Development and the EMAGE Gene Expression Database)
"The EMA Anatomy Atlas of Mouse Development uses embryological mouse models to provide a digital atlas of mouse development. It is based on the definitive books of mouse embryonic development by Theiler (1989) and Kaufman (1992) yet extends these studies by creating a series of three dimensional computer models of mouse embryos at successive stages of development with defined anatomical domains linked by a stage-by-stage ontology of anatomical names."
"EMAGE is a database of in situ gene expression data in the mouse embryo and an accompanying suite of tools to search and analyse the data. "
Virtual Human Embryo Project
"...The Virtual Human Embryo (VHE), a 14,250-page, illustrated atlas of human embryology, which presents all 23 Carnegie Stages of development during the 8-week embryonic period."