Skip Over Navigation Links

PubChem

PubChem logoPubChem is a freely accessible database of small organic molecules and their activities against biological assays. It was created by NIH in 2004 and maintained by the National Center for Biotechnology Information, a component of the National Library of Medicine. PubChem is a critical part of the Molecular Libraries Initiative. The database connects chemical information with biomedical research and clinical information, organizing facts in numerous databases into a unified whole.

PubChem consists of three dynamically growing primary databases:

PubChem Compound
Contains pure and characterized chemical compounds.
PubChem Substance
Contains mixtures, extracts, complexes and uncharacterized
substances.
PubChem BioAssay
Contains database results from high-throughput screening
programs with several million values.

The integration of these databases makes PubChem as a critical new tool that will speed the development of new treatments for America's most important health problems. It brings information about the biological activities of chemical substances to biomedical researchers on a broad scale. 

The NCATS NIH Chemical Genomics Center (NCGC) makes assay data available in PubChem.

PubChem Data Guideline

The quantitative high-throughput screening (qHTS) data in PubChem is preliminary, and for this reason and because of limited compound quantities, PubChem does not supply probe compounds to investigators other than those who originally submitted the assay.

The data presented in PubChem from  NCGC represent primary qHTS data. Each sample is tested as a titration series to provide a concentration-response output. Although the results accurately describe the effect of the sample on the assay end point, the "actives" are not necessarily due to the perturbation of the intended target (i.e., they may be artifactual positives). Despite this, these primary data are provided to allow analysis by cheminformatic algorithms, guide the selection of compounds for subsequent chemistry optimization, and to populate the 'chemical genomics' database of compound-activity profiles. The value of this database should increase as additional assays and compounds are added.

In interpreting and using qHTS data, investigators should remember the following:

  • The sample tested is very limited in quantity, so neither  NCGC nor Molecular Libraries Screening Centers Network (MLSCN) repository can supply screening samples upon request. Some samples are commercially available and inexpensive and can be purchased directly from vendors. Compounds about which more is known, designated as "probes" by MLSCN, will be designated as such in PubChem, and arrangements for their broader availability to investigators will be made by MLSCN.
  • The effect of the sample on the assay described in PubChem may reflect artifacts that result from the sample's physical or spectroscopic properties, such as its interference in the assay due to aggregation in aqueous buffer or absorbance of emitted fluorescence for signal detection. Flags indicating the propensity for interfering phenomenon from samples in the library will be added to the data set as it is determined.
  • Quality control information is not necessarily current. The results are determined from "samples," indicated as such, because the term "compound" implies a single chemical entity. Subsequent analysis by liquid chromatography-mass spectrometry, and verification of the activity will be performed for a subset of the actives. These data will be entered into PubChem as it is generated.
  • The IC50/EC50s (referred to by NCGC as AC50s) determined from the normalized titration-response data (n = 1) are estimates. Curve-fitting artifacts can occur due to the high throughput nature of the analysis. A flag indicating whether a curve fit has been verified will be updated over time. In addition, the primary data is available for interpretations by others.

Related Links