Depositor supplied data: PubChem Substance Database NCBI-generated records: PubChem Compound Database
Reporting errors (in substance records or compound records)
PubChem Substance Database 
repository | data fields | substance identifier (SID) | no curation | corresponding compound | substance classification (ontologies) | depositor categories
-
Repository: The PubChem Substance database is a repository of depositor supplied data. The same chemical may be represented in many PubChem substance records, and those records vary in their information content, reflecting the amount and types of information provided by their depositors. (View a list of data sources.)
-
Data fields: A separate file describes the procedure for "PubChem Substance Deposition using SD File Format" and describes the types of data that may be present in a depositor's record. The only required data field is the depositor's unique external registry ID, and the many other allowable data fields that are described reflect the wide range of information that can appear in an individual substance record. We encourage depositors to provide, at a minimum, a chemical graph and one or more names (synonyms) for the molecule.
-
Substance Identifier (SID) - An NCBI unique identifier, called the substance identifier (SID), is assigned by PubChem to each unique external registry ID provided by a PubChem data depositor. The SID is an integer that identifies the depositor's record within the PubChem Substance database, and will never be reused for another substance record.
A depositor may "revoke" (or otherwise deprecate) a PubChem SID at any time for any reason. However, the link to the "revoked" PubChem SID lives on in perpetuity. There will be a message stating the depositor deprecated the SID, but the link to the archived information will still be available. In addition, the PubChem CID's pointed to by the old version of a PubChem SID at the time it was versioned or deprecated will also be available.
Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, "2244" is a valid identifier in both the PubChem Substance and PubChem Compound database, where:
SID: 2244 is the PubChem Substance database record for cytidylate, and
CID: 2244 is the PubChem Compound database record for aspirin.
-
No curation: PubChem doesn't have curators and never changes/edits substance records. They remain as supplied by our depositors.
-
Corresponding compound:
Although PubChem does not curate submitted records, it does generate a corresponding compound record for every unique chemical structure submitted to the Substance database, using the NCBI automated data processing procedures described below, in order to provide a non-redundant view of the molecules in PubChem. The depositor-supplied substance record is then linked to the corresponding NCBI-generated compound record, and vice versa. If a depositor submits a substance that has a chemical structure already represented in an existing PubChem Compound record, a reciprocal link between the new substance record and the existing compound record is created, and the compound record is updated to reflect the newly available substance. (See illustrated example of substance records and corresponding compound record.)
-
Substance classification (ontologies): Some depositors of PubChem Substance records, such as ChEBI and KEGG, maintain ontologies of terms that describe the substances they deposit into the PubChem. Substance records from those depositors display applicable terms from their ontologies in the "Classification:Ontologies" section of the PubChem Substance summary page. (If those substance records are associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is also added to the "Classification: Ontologies" section of the corresponding PubChem Compound record. The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies.)
-
Depositor Categories:
Each lab or organization that deposits data into PubChem falls into one of the categories below. The category indicates the type of information you can expect to find for a molecule in that depositor's PubChem substance records or on the depositor's site.
Each depositor's category is displayed on the list of data sources web page. The categories are also displayed in the "Classification" section of the corresponding PubChem Compound records, under the subheader "Substance Categorization Classification." That section of a PubChem Compound record allows you to quickly find the corresponding PubChem Substance records that are likely to contain a given type of information, such as Chemical Reactions.
Depositor Category |
Meaning |
Biological Properties |
Depositor provides information about the biological properties of a substance or compound. |
Chemical Reactions |
Depositor provides information about the reactivity, synthesis, or known reactions of a substance or compound. |
Imaging Agents |
Depositor provides information about the contrast agent or imaging agent used in, for example, MRI's. |
Journal Publishers |
Depositor is a journal publisher and has articles published about a substance or compound. |
Metabolic Pathways |
Depositor provides information on the metabolic pathways involving a substance or compound. |
Molecular Libraries Screening Center Network |
Depositor is part of the NIH Molecular Libraries Screening Center Network (MLSCN). |
NIH Substance Repository |
Depositor is an NIH Molecular Libraries Small Molecule Repository servicing the MLSCN. |
Physical Properties |
Depositor provides information about the experimental physical properties of a substance or compound. |
Protein 3D Structures |
Depositor provides information about the experimental 3-D structure of a substance or compound. (Most of the molecule records that fall into this depositor category are derived from Molecular Modeling Database records, which generally contain the 3-D structures of biomolecules, such as a proteins, that may be bound to the substance or compound.) |
Substance Vendors |
Depositor is a seller of a substance or compound. |
Theoretical Properties |
Depositor provides information about the theoretical properties of a substance or compound. |
Toxicology |
Depositor provides information about the toxicological properties of a substance or compound. |
PubChem Compound Database 
validate & standardize chemical structures | identify unique chemical structures (compound identifier (CID), compound descriptors) | identify SAME compounds | gather and validate chemical name synonyms | compute 3D conformers | compute chemical and physical properties | identify SIMILAR molecules | add biomedical annotations | create links to related information
Gather and validate chemical name synonyms

The next step in PubChem data processing, after the identification of unique chemical structures and same compounds, is to gather and validate all of the synonyms that have been used for those molecules.
Various data depositors might use different terms to refer to the same chemical structure.
An individual PubChem Substance record shows only the synonyms that were provided by the depositor of that record. Different sets of synonyms might be provided by different submitters for the same molecule. The complete set of the synonyms that have been provided by all depositors of a particular chemical structure is referred to as the "unfiltered" list of synonyms. (For example, see the various synonyms provided by submitters of individual PubChem Substance records for ibuprofen (shown in the "Identification" section of each record), or view the total, unfiltered list of synonyms that depositors have used for ibuprofen.)
The corresponding PubChem Compound record shows a "filtered" list of synonyms, derived from all PubChem Substance records containing the same structure, that have been found to consistently refer to that specific chemical structure. (For example, see the filtered list of synonyms in the PubChem Compound record for ibuprofen (CID 3672), in the "Identification" section of that record.)
The "filtered" list of synonyms is created in the following way:
- All depositor-supplied synonyms for a given chemical structure are gathered from the corresponding PubChem Substance records to create the complete, "unfiltered" list of synonyms for that molecule.
- PubChem data processing identifies the subset of depositor-supplied synonyms that have been used consistently and only for the given chemical struture and its isotopes, stereoisomers, and tautomers. (Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.) The resulting subset of consistent depositor-supplied synonyms represents the "filtered" list, which appears in the PubChem Compound record for the given chemical structure.
- Each synonym is given a score that determines the order in which the synonyms are shown. The score takes into account the frequency, readability, and consistency of each synonym:
- Frequency - the number of times a synonym is provided by depositors for a particular chemical structure. Most commonly used synonym(s) show first.
- Readability - The readability score is determined by the size of the synonym, the count of non-alphabetic characters, and capitalization, etc., so easily readable synonyms (e.g., ibuprofen) are shown before chemical names (e.g., 2-[4-(2-methylpropyl)phenyl]propanoic acid).
- Consistency - The PubChem Substance records assigned to a synonym need to be consistent at any of the following levels (high to low): exact same structure, same stereo form, same connectivity, same parent structure, same parent stereo form, or same parent connectivity.
- Equation - The equation used to determine the score of a synonym (the "clean synonym weight") is:
59 * log((8- "synonym consistency level") * "synonym readability score" * "synonym frequency")
- A MeSH tree icon
appears beside any synonym that has an exact match to a term in the National Library of Medicine's Medical Subject Heading (MeSH) database.
Additional notes about synonyms:
Some PubChem Compound records do not show a list of depositor-supplied synonyms (e.g., CID 444098). This usually means there are no depositor-supplied synonyms that are consistently used only for this chemical structure. If desired, you can see the synonyms that data depositors have used for this structure, using either approach below:
- retrieve the PubChem Substance records that have the "same structure" and view the synonyms that individual depositors used for the chemical structure.
OR
- view the complete "unfiltered" list of synonyms found in all of the PubChem Substance records that have the "same structure" by inserting the CID of interest into the following URL format:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?q=nama&cid=_____&namedisopt=Unfiltered
For example:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?q=nama&cid=444098&namedisopt=Unfiltered
Note that some of the synonyms in the "unfiltered" list have also been used by various depositors for *other* chemical structures in the PubChem Substance database. That is why those synonyms are removed from the "filtered" list that appears in a PubChem Compound record.
The molecule name shown at the top of a PubChem summary page is selected from the list of synonyms in the record. The molecule name at the top of a PubChem Substance summary page is generally the synonym that was listed first by the data depositor in their submitted record. The molecule name at the top of a PubChem Compound summary page is generally the highest scoring term from the filtered list of depositor-supplied synonyms. A synonym that matches a MeSH term, however, is given priority in both cases.
-
Compute 3D conformers

PubChem generates a theoretical 3D description of each compound in the PubChem Compound database that is:
- not too large (<= 50 non-hydrogen atoms).
- not too flexible (<= 15 rotatable bonds).
- consists of only organic elements (H, C, N, O, F, P, S, Cl, Br, and I).
- has only a single covalent unit (i.e., not a salt or a mixture).
- contains only atom types recognized by the MMFF94s force field.
Details about the generation of conformers are provided in:
PubChem3D Release Notes
AND
Bolton EE, Kim S, Bryant SH. PubChem3D: Conformer generation. J Cheminform. 2011 Jan 27;3(1):4. doi:10.1186/1758-2946-3-4. [PubMed PMID: 21272340] [Free Full Text on J Cheminform]
(Part of the PubChem3D thematic series with the (BMC) Journal of Cheminformatics: http://www.jcheminf.com/series/pubchem3d)
The chemical and physical properties of the conformer are computed in a separate step.
In addition, representative conformers for all PubChem compounds are compared to each other, in order to identify molecules that have a similar 3D shape ("similar conformers"), regardless of whether their 2D chemical structures are similar. Separate sections of this document describe the method for identifying similar conformers and provide a tip on how to find compounds that are similar in 3D but not 2D structure.
Additional details about PubChem3D are available from:
Identify Similar molecules (structure clustering for compounds/substances)

After the chemical structures are standardized, and after 3D conformers are generated for each molecule, the PubChem data processing procedure identifies similar molecules in two different ways:
- chemical analogs ("similar compounds"), using the Tanimoto score calculated from the 2D structure fingerprint
- 3D similarity ("similar conformers"), based on shape/feature and pharmacophore complementariness.
You can use the "similar compounds" and "similar conformers" links on a PubChem summary page to retrieve molecules that have 2D or 3D similarity, respectively, to a compound of interest. (A separate part of this document provides a tip on how to find compounds that are similar in 3D but not 2D structure to a compound of interest.)
More details about the methods used to calculate each type of similarity are below:
Chemical analogs: "Similar Compounds"

After chemical structures are validated and standardized for the PubChem Compound database, molecules that have a similar chemical structure are identified using the method described below. The "similar compounds" link on a PubChem Compound summary page will retrieve all compounds that have a similarity score [Tanimoto] >90%. If you want to find compounds with different scores, you can visit the PubChem structure search page.
|
Similarity links are pre-computed in PubChem using a dictionary-based fingerprint at 90% using the Tanimoto score equation:
Tanimoto = AB / ( A + B - AB )
Where:
Tanimoto is the Tanimoto score, a fraction between 0 and 1.
AB is the count of bits set after bit-wise & of fingerprints A and B
A is the count of bits set in fingerprint A
B is the count of bits set in fingerprint B
Each similarity link is equivalent to a chemical structure similarity search of the PubChem Compound database yielding all chemical structures with a Tanimoto score that is 90% or above.
In addition to the Tanimoto equation above, PubChem uses a "boost" scheme that assigns a similarity score of:
104% to structures with identical stereo, isotope, and connectivity.
103% to structures with identical connectivity and either stereo or isotope.
102% to structures with identical connectivity.
101% to structures that are tautomers of the query.
The cases of "boosted" scores greater than 101% correspond to cases that originally would have had a score of 100% similarity. However, in the case where tautomers get an artificial score of 101%, their natural score could be much lower, sometimes as low as 60%, especially for small compounds where the tautomeric system is a large part of the structure.
There are 881 substructure-keys (skeys) in each fingerprint. Each bit in the fingerprint represents the presence (or absence) of a particular chemical substructure (e.g., a carboxylic acid) or a particular count of the same. These skeys are similar in nature to the well-known MDL MACCS skeys fingerprints.
|
|
3D similarity by shape/feature and pharmacophore complementariness: "Similar Conformers"

After 3D conformers are generated for the molecules in the PubChem Compound database, representative conformers are compared to each other in order to identify molecules that have a similar 3D shape ("similar conformers"), regardless of whether their 2D chemical structures are similar.
The method for identifying similar conformers is described in:
Bolton EE, Kim S, Bryant SH. PubChem3D: Similar conformers. J Cheminform. 2011 May 9;3(1):13. doi:10.1186/1758-2946-3-13. [PubMed PMID: 21554721]
[Free Full Text on J Cheminform]
and
Kim S, Bolton EE, Bryant SH. PubChem3D: Biologically relevant 3-D similarity. J Cheminform. 2011 Jul 22;3(1):26. doi:10.1186/1758-2946-3-26. [PubMed PMID: 21781288] [Free Full Text on J Cheminform]
Both articles are Part of the PubChem3D thematic series with the (BMC) Journal of Cheminformatics: http://www.jcheminf.com/series/pubchem3d.
As explained by Kim et al.:
- PubChem3D uses two 3-D similarity measures: shape-Tanimoto (ST) and color-Tanimoto (CT).
- The ST score is a measure of shape similarity.
- the CT score quantifies the similarity of 3-D orientation of functional groups used to define pharmacophores (henceforth referred to simply as "features") between conformers by checking the overlap of fictitious "color" atoms used to represent the six functional group types: hydrogen-bond donors, hydrogen-bond acceptors, cation, anion, hydrophobes, and rings.
- The PubChem "Similar Conformers" 3-D neighboring requires the STST-opt = 0.8 and CTST-opt = 0.5 for two molecules to become neighbors of each other.
The "similar conformers" link on a PubChem Compound summary page will therefore retrieve all compounds that have 3D similarity, as identified using this approach. If you want to view the similar conformers in a particular sort order (e.g., by shape then feature similarity, or by feature similarity then shape), or if you want to see only a subset of similar conformers that have specific properties, you can visit the PubChem structure search page and use the "3D Conformer" tab.
A separate section of this document provides a tip on how to find compounds that are similar in 3D but not 2D structure.
Associate biomedical annotations from various resources with a compound

Annotation sources: MeSH | Other Classification Ontologies | DailyMed | HSDB | ChemIDplus | DrugBank | Structure | BioSystems | PubChem BioAssay
A wide range of information may exist for a compound, in the literature and in external databases, beyond the information that has been provided in individual PubChem Substance records. To facilitate access to that information, the PubChem data processing procedures use the methods described below to associate chemical structures with external data sources, and to insert the information and links in the corresponding PubChem Compound record. The source of any information inserted in this way is noted in the lower right hand corner of the grey-bordered box that surrounds an information block. (Note that the PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies, including MeSH and other ontologies noted below).
Medical Subject Headings (MeSH)

The National Library of Medicine (NLM)'s Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus of medical terms that is arranged in both an alphabetic and a hierarchical structure. It is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINE®/PubMED® database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity.
Because MeSH terms provide a portal to a wealth of medical information about the compounds represented in PubChem, the following method is used to associate PubChem Compound records (or more technically, their CIDs) with corresponding MeSH terms:
- Start with the filtered list of synonyms in a PubChem Compound record:
- Synonyms that have been used consistently and only for the given chemical structure and its isotopes, stereoisomers, and tautomers are retained.
- Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.
- Identify the subset of filtered synonyms that match a MeSH term.
- Identify the synonym (and therefore MeSH term) from that subset that has been most frequently used by depositors of PubChem Substance records containing the "same structure."
Once a MeSH term is assigned to a PubChem Compound record (i.e., to a CID), the following additional information is linked to the PubChem Compound record:
- all PubMed records tagged with that MeSH term are linked to the CID and accessible from the "Literature: NLM Curated PubMed Citations" section of the PubChem Compound record.
Note: The "Literature: Depositor Provided PubMed Citations" section of the PubChem Compound record is a concatenated list of all PubMed records that have been cited by the depositors of all PubChem Substance records that contain the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry).
The "NLM Curated PubMed Citations" and "Depositor Provided PubMed Citations" might have some -- though not necessarily complete -- overlap. Each set of PubMed records might contain items that are not present in the other set.
- Some MeSH headings have pharmacological actions associated with them. For example, the MeSH heading "aspirin" is associated with the MeSH term "cyclooxygenase inhibitors." If a MeSH term has been associated with a CID, and that MeSH term has one or more pharmacological actions associated with it, all of the associated pharmacological actions are inserted into the "Pharmacology" section of the PubChem Compound record.
If a block of information on a PubChem Substance/Compound summary page was derived from MeSH, the information source (MeSH) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
Other Classification Ontologies

Some depositors of PubChem Substance records, such as ChEBI and KEGG, maintain ontologies of terms that describe the substances they deposit into the PubChem. If substance records from those depositors are associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is added to the "Classification: Ontologies" section of the PubChem Compound record. The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies.
DailyMed

The National Library of Medicine (NLM)'s DailyMed resource provides high quality information about marketed drugs, including FDA labels (package inserts).
Medication information from DailyMed is displayed in a PubChem record by identifying connections between DailyMed -> MeSH -> PubChem Compound records. Specifically:
- If the drug name in a DailyMed record has an exact match to a MeSH term, or to any of the MeSH term's synonyms, then a connection is made between the DailyMed record and the MeSH term.
If the drug name in a DailyMed record does not match any MeSH term, then each name in the drug's list of active ingredients is compared to MeSH. If an exact match is found, then a connection is made between the DailyMed record and the MeSH term.
- If that MeSH term has been annotated on any PubChem Compound records, then a connection is made between the PubChem Compound record and the DailyMed record.
- If one or more DailyMed records map to the same MeSH term, then links to all of those DailyMed records will appear in the "Use and Manufacturing: Medication Information" section of the PubChem Compound record(s) that have been annotated with that MeSH term.
If a block of information on a PubChem Substance/Compound summary page was derived from DailyMed, the information source (DailyMed) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
Hazardous Substance Data Bank (HSDB)

The National Library of Medicine (NLM)'s Hazardous Substances Data Bank (HSDB) is a comprehensive, peer-reviewed toxicology data for about 5,000 chemicals. Information from HSDB is displayed in a PubChem record if there is a match between the molecule name in the HSDB and PubChem record. The name matching is done using the following method:
- The synonyms present in an HSDB record are filtered in a similar way to the synonyms shown in a PubChem Compound record:
- Synonyms that have been used consistently and only for the given chemical structure and its isotopes, stereoisomers, and tautomers are retained.
- Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.
- If there is a match between any filtered synonyms in the PubChem Compound and HSDB record:
- A link between the records is created.
- Information from the HSDB record is then displayed in the PubChem Compound record as biological annotations.
- The annotations from HSDB can include, for example:
- Methods of Manufacturing
- Formulations/Preparations
- Therapeutic Uses
- Mechanism of Action
- Toxicity Summary
- Reactivities and Incompatibilities
- Decomposition
- Environmental Fate
- Bioconcentration
- OSHA Standards
- Threshold Limit Values
- and more...
(See the section of this document on "categories of information, as available, for a molecule" to see additional types of information imported from HSDB into PubChem.)
- If a block of information on a PubChem Substance/Compound summary page was derived from HSDB, the information source (HSDB) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
ChemIDplus

The National Library of Medicine (NLM)'s Chemical Identification Plus Database (ChemIDplus) resource is an online dictionary of chemicals, including names, synonyms, and chemical structures.
ChemIDplus is a depositor of records into the PubChem Substance database. If a substance record from ChemIDplus is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the safety and toxicology information from the ChemIDplus record is added to the PubChem Compound record.
If a block of information on a PubChem Substance/Compound summary page was derived from ChemIDplus, the information source (ChemIDplus) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
DrugBank

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information.
DrugBank is a depositor of records into the PubChem Substance database. If a substance record from DrugBank is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the interaction information from the DrugBank record is added to the "Biomolecular Interactions and Pathways" section of the PubChem Compound record.
If a block of information on a PubChem Substance/Compound summary page was derived from DrugBank, the information source (DrugBank) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
Structure

The NCBI's Structure database, also known as the Molecular Modeling Database (MMDB), is a depositor of records into the PubChem Substance database. It contains experimentally resolved 3D structures of proteins, RNA, and DNA, derived from the Protein Data Bank (PDB), with value-added features such as explicit chemical graphs, interactive views of the biomolecule's biologically active form ("biological unit"), interactions schematics that depict the contacts among the molecular components, as well as links to similar 3D structures, similar sequences, information about chemicals bound to the structures, literature, and more.
Many of the experimentally resolved 3D structures include biomolecules such as protein, DNA, or RNA bound to small molecules. In such cases, the small molecule data are extracted from the 3D structure record and deposited into the PubChem Substance database, with a link back to the original structure record from which they came.
If a substance record from the Structure database is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the "Biomolecular Interactions and Pathways" section of the PubChem Compound record will contain a thumbnail image and link to the original 3D structure record.
If a block of information on a PubChem Substance/Compound summary page was derived from the Structure database, the information source (Structure) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
BioSystems

A biosystem is a group of molecules that interact in a biological system, such as a pathway, complex, or disease. The NCBI's BioSystems database provides integrated access to biological systems and their component genes, proteins, and small molecules, as well as literature describing those biosystems and other related data throughout Entrez.
If a small molecule component of a biosystem is also present in a PubChem Substance and/or PubChem Compound record, then a link is made between the PubChem record and the biosystem.
Specifically, the BioSystems data processing procedure includes the following steps to identify associations between biosystems and PubChem records:
- BioSystem records from source databases are parsed for small molecule identification numbers, including PubChem Compound IDs (CIDs), PubChem Substance IDs (SIDs), and external registry names such as local identifiers assigned to a substance by a the source database. The types of BioSystem<->PubChem links that are made depend upon the type of identifiers that were found:
- If SIDs are present in the source record, links are established to the corresponding PubChem Substance records and to associated CIDs in PubChem Compound.
- If CIDs are present in the source record, links to the corresponding PubChem Compound records are made (however, the links are not extended to associated PubChem Substances).
- If external registry names are present, those identifiers are mapped to the corresponding SIDs and links are made to those records in PubChem Substance as well as to associated CIDs in PubChem Compound.
If a block of information on a PubChem Substance/Compound summary page was derived from the BioSystems database, the information source (BioSystems) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
PubChem BioAssay

The PubChem BioAssay database contains bioactivity screens of chemical substances described in PubChem Substance, providing information such as the following for a given chemical:
- bioactivity outcomes (active, inactive, inconclusive, unspecified)
- molecular targets (proteins and/or genes)
- bioactivity data (IC50, EC50, Potency, Ki, etc.)
PubChem BioAssay database also provides a description of each bioassay, which may include the experiment's rationale/purpose, the relationship between the assay target and a biological process or disease state, as well as assay protocols specific to that screening procedure. (As an example, see the description of bioassay AID 1575, "Summary assay for the identification of compounds that inhibit NOD1.") These descriptions are searchable directly in the BioAssay database.
The "Biological Test Results" section of a PubChem Substance/Compound summary page contains an excerpt of the bioactivity data available for the chemical, and you can follow the link for "BioActivity Summary: This Compound" to see the rest of the data in the PubChem BioAssay database itself. (As an example of what you will see when you follow that link, see the PubChem BioAssay data summary for CID 3672, Ibuprofen.)
The association between a PubChem Substance or Compound and BioAssay data are made in the following way:
Depositors of bioassays submit their data to two PubChem databases: (1) they submit biological activity test results into the PubChem BioAssay database, and (2) they submit the descriptions of the substances that were tested into the PubChem Substance database. A direct link is then made between each BioAssay record and its corresponding Substance records.
In addition, if a substance record is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the links to the BioAssay data are also added to the "Biological Test Results" section of the PubChem Compound record.
Therefore:
- A PubChem Substance record will link only to the bioassays that are associated directly with that particular substance (i.e., with that particular SID).
- A PubChem Compound record will link to all of the bioassays that tested any PubChem Substance (i.e., any SID) containing the same chemical structure (including same connectivity, isotopes and stereochemistry) as the PubChem Compound.
If a block of information on a PubChem Substance/Compound summary page was derived from the BioAssay database, the information source (BioAssay) is noted in the lower right hand corner of the grey-bordered box that surrounds it.
Note: BioAssay data -- classification of gene/protein targets -- The PubChem Substance/Compound summary page displays an overview of bioassay information available. The details about individual experiments are available on the corresponding PubChem BioAssay summary pages. Those pages also display the Gene Ontology (GO) classification of the gene/protein target(s) that were tested by the bioassay. The GO terms are associated with each gene/protein in an automated way as part of the NCBI BioSystems database data processing procedures, using the method described in the Biosystems help document. All GO terms that apply to the gene(s)/protein(s) tested by the bioassay are shown in the GO hierarchy, including: (1) biological processes, (2) cellular components, and (3) molecular functions. Clicking on any GO term in the hierarchy will retrieve all bioassays that have tested a protein(s) associated with that term. As an example, see the GO terms for the protein target that was tested by the glucocorticoid receptor (GR) redistribution assay (AID 450).
The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies, including GO.

in substance records | in compound records
If you notice an error in a PubChem record, we appreciate your feedback. The place to which you can send a report depends upon whether you are viewing a substance or compound:
Substance records: PubChem doesn't have curators and never changes/edits substance records. They remain as supplied by our depositors, just as with GenBank records. Therefore, please send error reports directly to the depositor. Contact information for each depositor is accessible from the PubChem Substance Data Source page. Once the error is corrected by the depositor, PubChem will implement it at next update.
Compound records: If you notice any errors in PubChem compound records, such as in properties or descriptors, please send a report to the NCBI help desk: info@ncbi.nlm.nih.gov. The PubChem staff will then look into the error and make a correction at next update.
|