[Go]
PubMed | Entrez | Structure | PubChem | Help
PubChem » PubChem Help » PUG Help

PubChem PUG Help


1. Introduction
2. Interacting with PUG
3. PUG Tasks
    3.1. PubChem Substance/Compound Download Tasks
    3.2. Query Tasks
        3.2.1. Chemical Structure Query Tasks
        3.2.2. PubChem BioAssay Data Query and Download
    3.3. PubChem Standardization Tasks
4. PUG, NCBI eUtils, and Entrez History
5. PUG and SOAP
6. FAQs
 

1. Introduction.

The PubChem Power User Gateway (PUG) provides access to PubChem services via a programmatic interface.  The basic design principle is straightforward.  There is a single CGI (pug.cgi, referred to hereafter as simply PUG) that is the central gateway to multiple PubChem functions.  PUG takes no URL arguments; all communication with PUG is through XML.  To perform any request, one formulates input in XML and then HTTP POST it to PUG.  The CGI interprets your incoming request, initiates the appropriate action, then returns results (also) in XML format. (This document assumes a basic familiarity with XML tags and data structures.  To learn more about XML, visit the URL: http://en.wikipedia.org/wiki/XML)

PubChem services are queued.  As such, a submitted task will (usually) complete sometime after PUG responds to the initial request.  The initial PUG response contains the request ID of your task.  This request ID must be used for further communication with PUG concerning your submitted task.  When PUG is interrogated about an outstanding request using the request ID, PUG will return either the results of your task, if completed, or the status of your task.

Each PubChem service enabled for use with PUG is documented separately.  This service by service documentation will detail the input, output, and options.  All XML used by PUG is specified in the data type definition (DTD), which may be found at:

http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd

or in the equivalent XML Schema definition at:

http://pubchem.ncbi.nlm.nih.gov/pug/pug.xsd

We strongly recommend using an XML parser/generator tool to read and write the XML data, rather than composing XML manually. 

PubChem PUG enabled services have the ability to save and open valid PUG requests designed for that service.  You can use this feature to learn how to compose valid PUG XML requests and to verify that your PUG XML request does what is intended.  Examples of such services provided in this document.

Additional documentation on PubChem and its services may be found at http://pubchem.ncbi.nlm.nih.gov and via help links throughout PubChem's web site.  If you cannnot find what you need there, further requests for information or help may be sent to the highly knowledgeable and responsive NCBI help desk at info@ncbi.nlm.nih.gov.

 

2. Interacting with PUG.

All communication to PUG is via XML sent to the CGI at the URL:

http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi

The primary data container used in all transactions is <PCT-Data>, the top-level container for any PUG input or output.  ("PCT" stands for PubChem Tools, a data specification that is shared by both PUG and internal PubChem applications.  See the introduction in this document for more information.)  This <PCT-Data> object may contain either a <PCT-InputData> or a <PCT-OutputData> object.  Users of PUG will always send <PCT-Data> containing <PCT-InputData>, and always receive <PCT-Data> containing <PCT-OutputData>.

After a new task is submitted to PUG, your request is queued, rather than executing immediately.  As such, PUG  will return an XML message containing a request ID to be used for further actions on your request.  In the example PUG XML reply below, the message says that the request was successfully submitted and that the request ID is "402936103567975582".  It will then be up to you to (periodically) poll PUG, using the request ID, until your task is complete.  When your task is completed, PUG will return the result; otherwise it will simply return a status message.  See examples in other sections of this document on how to properly poll PUG.

Example of a PUG reply to a newly submitted request:
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>402936103567975582</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

The <PCT-InputData> object is a choice between request types.  Tasks specific to various PubChem services are contained by <PCT-InputData> and are described in different sections of this document.  Primary to the use of PUG is the <PCT-InputData> input type used to perform request management, <PCT-Request>.  Request management enables you to enquire about the status of or to cancel a previous PUG request.  For example, to cancel a PUG request with request ID "402936103567975582", the PUG XML input message will look like this:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>402936103567975582</PCT-Request_reqid>
          <PCT-Request_type value="cancel"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The <PCT-OutputData> object contained in the output from PUG will always include a status message in a <PCT-Status-Message>, which consists of an enumerated status in <PCT-Status> and an optional message string.  When a new task is queued by PUG, the <PCT-OutputData> returned to you will (likely) contain a <PCT-Waiting> which contains your request ID.  If the request finishes quickly, the initially returned <PCT-OutputData> object will actually contain the appropriate result of your task specific to the requested service.  Similarly, when polling PUG using your request ID, the <PCT-OutputData> object will contain either your task result or a status message.

 

3. PUG Tasks.

PubChem services currently enabled for use by PUG include PubChem Download (http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi), PubChem Compound Structure Search (http://pubchem.ncbi.nlm.nih.gov/search/search.cgi), PubChem BioAssay Data Download (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi), and PubChem Structure Standardization (http://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi).  Each PUG service has its own expected input and provided output.  The sections below detail how to use each service with PUG.

 

3.1. PubChem  Substance/Compound Download Tasks.

This service allows you to download sets of PubChem records - substances or compounds - using PUG's <PCT-Download> sub-object.  You will need to specify which records to download, using a <PCT-QueryUids> object, the desired output format (ASN.1, XML, or SDF), and, optionally, the desired compression method (gzip or bzip2).  The options available through PUG are equivalent to those for the interactive PubChem Download service.

The <PCT-QueryUids> object enables you to specify an explicit list of record IDs, or to provide an existing Entrez history key (see eUtils documentation: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html) from either the PubChem Compound ("pccompound") or the PubChem Substance ("pcsubstance") Entrez databases.  Currently there is an upper limit of 250,000 structures per download request; if you find this limit too restrictive for your purposes, please consider using the PubChem FTP site which contains all available PubChem contents:

ftp://ftp.ncbi.nlm.nih.gov/pubchem/

When your download request is successfully completed, the returned <PCT-OutputData> object will hold a <PCT-Download-URL> containing the URL you may use to download your results.  Again, please note that the result of a download task is an URL, not the record data itself (which may be quite large).  To obtain the data requested, you must use the provided URL.

Example:   You want to download CID 1 and CID 99 – being uids 1 and 99 in the "pccompound" Entrez database – in SDF format with gzip compression.

The typical flow of information is as follows.  First, the initial input XML is sent to PUG via HTTP POST.  Note the input data container with the download request and uid and format options:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_download>
        <PCT-Download>
          <PCT-Download_uids>
            <PCT-QueryUids>
              <PCT-QueryUids_ids>
                <PCT-ID-List>
                  <PCT-ID-List_db>pccompound</PCT-ID-List_db>
                  <PCT-ID-List_uids>
                    <PCT-ID-List_uids_E>1</PCT-ID-List_uids_E>
                    <PCT-ID-List_uids_E>99</PCT-ID-List_uids_E>
                  </PCT-ID-List_uids>
                </PCT-ID-List>
              </PCT-QueryUids_ids>
            </PCT-QueryUids>
          </PCT-Download_uids>
          <PCT-Download_format value="sdf"/>
          <PCT-Download_compression value="gzip"/>
        </PCT-Download>
      </PCT-InputData_download>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

If the request is small and finishes very quickly, you may get a final URL right away (see further below). But usually PUG will respond initially with a waiting message and a request ID (<PCT-Waiting_reqid>) such as:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>402936103567975582</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

You would then parse out this request id, being "402936103567975582", in this case, and use this id to "poll" PUG on the status of the request, composing an XML message like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>402936103567975582</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

Note that here the request type "status" is used; there is also the request type "cancel" that you may use to cancel a running job.

If the request is still running, you well get back another waiting message as above, and then you would poll again after some reasonable interval. If the request is finished, you will get a final result message like:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_download-url>
          <PCT-Download-URL>
            <PCT-Download-URL_url>
	      ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/1064385222466625960.sdf.gz
	    </PCT-Download-URL_url>
          </PCT-Download-URL>
        </PCT-OutputData_output_download-url>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

You would parse out the URL from the <PCT-Download-URL_url> tag, and then use a tool of your choice to connect to that URL to retrieve the actual requested data.

 

3.2. Query Tasks.

The <PCT-Query> object may be used to perform queries against PubChem data that are not possible using Entrez.  The <PCT-Query> object consists of a series of queries and a database to query against.  One must be careful when formulating queries to select compatible database and query types, as outlined in the documentation of each query task type.  For example, you will not want to perform a chemical similarity search on a list of bioassay identifiers (AIDs), since chemical searches may only be performed on compound identifiers (CIDs).

The <PCT-Query> object can perform multiple queries in a single request.  The semantic of multiple queries in a single task is to "AND" the result between queries, which is to say that the resulting list of identifiers will satisfy all queries requested.  Sometimes it is best to perform multiple search tasks individually rather than in a single task, unless otherwise noted in the task documentation.

 

3.2.1. Chemical Structure Query Tasks.

To perform PubChem Compound structure searches using PUG, you will need to make a request using a <PCT-Query> object.  Chemical structure search tasks use the query objects <PCT-QueryCompoundCS> and <PCT-QueryCompoundEL>.  You may submit a structure search by mixing and matching more than one of these two query types in a series.  Furthermore, only the PubChem Compound ("pccompound") Entrez database may be specified in <PCT-QueryUids>, when performing chemical structure queries.

The <PCT-QueryCompoundCS> and <PCT-QueryCompoundEL> objects can encode many different types of chemical structure searches.  To help you understand how to encode a structure search, please consider using the PubChem Structure Search web site.  It has the ability to translate your structure search query into the XML necessary for use with PUG, and can be very helpful to demonstrate how to encode complex queries.  The PubChem Structure Search system is located at the URL:

http://pubchem.ncbi.nlm.nih.gov/search/

Please note that the output result of a chemical structure search is an Entrez history key (see eUtils documentation).  To obtain the list of compounds matching your query, you must use eUtils; more information on eUtils is below. There is currently a limit of two million compound identifiers returned by the structure search (through either PUG or the interactive web site).

 

Example:   You wish to perform a chemical similarity search of CID 2244 at a Tanimoto similarity value of 80% with at most 300 results returned.

The initial HTTP POST to PUG to initiate the search would contain XML like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>2244</PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_similar>
                      <PCT-CSSimilarity>
                        <PCT-CSSimilarity_threshold>80</PCT-CSSimilarity_threshold>
                      </PCT-CSSimilarity>
                    </PCT-QueryCompoundCS_type_similar>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>300</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

If the request is processed and started successfully, PUG would respond with a waiting message and request id, for example:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>271473836860076709</PCT-Waiting_reqid>
            <PCT-Waiting_message>Structure search job was submitted</PCT-Waiting_message>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

You would then use this request ID, "271473836860076709" in this case, to "poll" PUG for the status of the request:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>271473836860076709</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

If the search is still running, you would get another waiting message as above, and you would then need to poll again after a reasonable interval.  If the search task is completed, PUG would give an Entrez history key for the resulting CID (compound identifier) list:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Your
	    search has already been completed.</PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_entrez>
          <PCT-Entrez>
            <PCT-Entrez_db>pccompound</PCT-Entrez_db>
            <PCT-Entrez_query-key>1</PCT-Entrez_query-key>
            <PCT-Entrez_webenv>
	         0Hm9YDD1X4wor4nONvSWx9vkKmEqFXTiq84JO47pgxmSw_
	         cIuDBVcG46Yr@2B5C47D162BBD720_0008SID
	    </PCT-Entrez_webenv>
          </PCT-Entrez>
        </PCT-OutputData_output_entrez>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

More information on using Entrez history (and eUtils) to retrieve hit lists is below.

If for some reason your initial query cannot be properly interpreted, PUG would respond with an error message with some indication of the problem encountered:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="data-error"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Programatic
            Error:Non-decodeable query specified.
	    Input a valid SMILE/SMARTS or a CID.</PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

If you wish to cancel a queued or running request, you would send to PUG:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>271473836860076709</PCT-Request_reqid>
          <PCT-Request_type value="cancel"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

And when PUG cancels your task, you would get back:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="running"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Your search will be stopped, please wait...
	  </PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>271473836860076709</PCT-Waiting_reqid>
            <PCT-Waiting_message>Your search will be stopped, please wait...
	    </PCT-Waiting_message>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

3.2.2. PubChem BioAssay Data Query and Download.

To perform PubChem BioAssay Data Download using PUG, you will need to generate an input XML file. This XML file has information about: what to download [being data for either particular PubChem BioAssay identifiers (AIDs), particular AIDs and PubChem Compound identifiers (CIDs), or particular AIDs and PubChem Substance identifiers (SIDs)]; the download format (XML, ASN.1, or CSV); and the output dataset type (complete or concise). Eight examples demonstrating the use of the BioAssay download tool are shown below.

Example 1:     You wish to download the complete BioAssay Data Table for AIDs 523 and 820 in CSV format.  The initial HTTP POST to PUG to initiate the download would contain XML as shown below.

The bioassay data may contain multiple screening results for the same chemical structure (represented as CID) which is deposited under several substance depositions (represented as SID). The "substance view" allows one to download the individual results for each sample of the chemical structure. It is recommended to download screening data from a single assay with this option. The "compound view" allows grouping the test results based on chemical structures. This option is particularly useful for downloading screening data from multiple assays.

This example shows the data in the substance view. To show the data in the compound view, you can replace '<PCT-Assay-FocusOption_group-results-by value="substance">4</PCT-Assay-FocusOption_group-results-by>' with '<PCT-Assay-FocusOption_group-results-by value="compound">0</PCT-Assay-FocusOption_group-results-by>'.

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="csv">4</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>820</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_dataset value="complete">0</PCT-QueryAssayData_dataset>
                  <PCT-QueryAssayData_focus>
                    <PCT-Assay-FocusOption>
                      <PCT-Assay-FocusOption_group-results-by value="substance">4</PCT-Assay-FocusOption_group-results-by>
                    </PCT-Assay-FocusOption>
                  </PCT-QueryAssayData_focus>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 2: You wish to download the concise BioAssay Data Table for AIDs 523 and 820 but just for the CIDs 3243128 and 3240114 in PubChem XML format. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="assay-xml">1</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>820</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_scids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pccompound</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>3243128</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>3240114</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_scids>
                  <PCT-QueryAssayData_dataset value="concise">1</PCT-QueryAssayData_dataset>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 3: You wish to download the concise BioAssay Data Table for AIDs 523 and 820 but only for the SIDs 16952359 and 16952361 in PubChem ASN.1 format. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="assay-text-asn">2</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>820</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_scids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcsubstance</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>16952359</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>16952361</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_scids>
                  <PCT-QueryAssayData_dataset value="concise">1</PCT-QueryAssayData_dataset>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 4: You wish to download the complete BioAssay Data Table for AID 523 but only for those substances with "IC-50" value (the readout provided through TID 2) between 1 and 10 uM in CSV format. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="csv">4</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_dataset value="complete">0</PCT-QueryAssayData_dataset>
                  <PCT-QueryAssayData_readouts>
                    <PCT-Assay-Readout>
                      <PCT-Assay-Readout_aid>523</PCT-Assay-Readout_aid>
                      <PCT-Assay-Readout_query>
                        <PCT-Assay-Readout-Query>
                          <PCT-Assay-Readout-Query_fquery>
                            <PCT-Assay-Readout-Float-Query>
                              <PCT-Assay-Readout-Float-Query_tid>2</PCT-Assay-Readout-Float-Query_tid>
                              <PCT-Assay-Readout-Float-Query_lower>1</PCT-Assay-Readout-Float-Query_lower>
                              <PCT-Assay-Readout-Float-Query_upper>10</PCT-Assay-Readout-Float-Query_upper>
                            </PCT-Assay-Readout-Float-Query>
                          </PCT-Assay-Readout-Query_fquery>
                        </PCT-Assay-Readout-Query>
                      </PCT-Assay-Readout_query>
                    </PCT-Assay-Readout>
                  </PCT-QueryAssayData_readouts>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 5: You wish to determine the TID number of "IC-50" for AID 523. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_asd>
                <PCT-QueryAssayDescription>
                  <PCT-QueryAssayDescription_aid>523</PCT-QueryAssayDescription_aid>
                  <PCT-QueryAssayDescription_get-columns value="true"/>
                </PCT-QueryAssayDescription>
              </PCT-QueryType_asd>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 6: You wish to download the complete BioAssay Data Table of AID 523 with "Activity Outcome" as "Active" and "Activity Score" value between 10 and 40 in CSV format. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="csv">4</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_dataset value="complete">0</PCT-QueryAssayData_dataset>
                  <PCT-QueryAssayData_readouts>
                    <PCT-Assay-Readout>
                      <PCT-Assay-Readout_aid>523</PCT-Assay-Readout_aid>
                      <PCT-Assay-Readout_query>
                        <PCT-Assay-Readout-Query>
                          <PCT-Assay-Readout-Query_outcome>
                            <PCT-Assay-Activity-Outcome-Query value="active">2</PCT-Assay-Activity-Outcome-Query>
                          </PCT-Assay-Readout-Query_outcome>
                        </PCT-Assay-Readout-Query>
                        <PCT-Assay-Readout-Query>
                          <PCT-Assay-Readout-Query_score>
                            <PCT-Assay-Activity-Score-Query>
                              <PCT-Assay-Activity-Score-Query_lower>10</PCT-Assay-Activity-Score-Query_lower>
                              <PCT-Assay-Activity-Score-Query_upper>40</PCT-Assay-Activity-Score-Query_upper>
                            </PCT-Assay-Activity-Score-Query>
                          </PCT-Assay-Readout-Query_score>
                        </PCT-Assay-Readout-Query>
                      </PCT-Assay-Readout_query>
                    </PCT-Assay-Readout>
                  </PCT-QueryAssayData_readouts>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 7: You wish to download the complete BioAssay Data Table of AID 523 with "IC-50" value (the readout provided through TID 2) between 1 and 10 micromolar in CSV format. You want to obtain just two bioassay (TID) columns (TID 1: IC50 Qualifier and TID 2: IC50). The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_bas>
                <PCT-QueryAssayData>
                  <PCT-QueryAssayData_output value="csv">4</PCT-QueryAssayData_output>
                  <PCT-QueryAssayData_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryAssayData_aids>
                  <PCT-QueryAssayData_dataset value="complete">0</PCT-QueryAssayData_dataset>
                  <PCT-QueryAssayData_readouts>
                    <PCT-Assay-Readout>
                      <PCT-Assay-Readout_aid>523</PCT-Assay-Readout_aid>
                      <PCT-Assay-Readout_retrieve>
                        <PCT-Assay-Readout_retrieve_E>2</PCT-Assay-Readout_retrieve_E>
                        <PCT-Assay-Readout_retrieve_E>1</PCT-Assay-Readout_retrieve_E>
                      </PCT-Assay-Readout_retrieve>
                      <PCT-Assay-Readout_query>
                        <PCT-Assay-Readout-Query>
                          <PCT-Assay-Readout-Query_fquery>
                            <PCT-Assay-Readout-Float-Query>
                              <PCT-Assay-Readout-Float-Query_tid>2</PCT-Assay-Readout-Float-Query_tid>
                              <PCT-Assay-Readout-Float-Query_lower>1</PCT-Assay-Readout-Float-Query_lower>
                              <PCT-Assay-Readout-Float-Query_upper>10</PCT-Assay-Readout-Float-Query_upper>
                            </PCT-Assay-Readout-Float-Query>
                          </PCT-Assay-Readout-Query_fquery>
                        </PCT-Assay-Readout-Query>
                      </PCT-Assay-Readout_query>
                    </PCT-Assay-Readout>
                  </PCT-QueryAssayData_readouts>
                </PCT-QueryAssayData>
              </PCT-QueryType_bas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
Example 8: You wish to download the BioActivity Summary Table for CIDs 3243128 and 3240114 in AIDs 523 and 820, where the numbers of active, inactive compounds in each assay are listed. The initial HTTP POST to PUG to initiate the download would contain XML like:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_qas>
                <PCT-QueryActivitySummary>
                  <PCT-QueryActivitySummary_output value="summary-table">0</PCT-QueryActivitySummary_output>
                  <PCT-QueryActivitySummary_type value="assay-central">0</PCT-QueryActivitySummary_type>
                  <PCT-QueryActivitySummary_aids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pcassay</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>523</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>820</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryActivitySummary_aids>
                  <PCT-QueryActivitySummary_scids>
                    <PCT-QueryUids>
                      <PCT-QueryUids_ids>
                        <PCT-ID-List>
                          <PCT-ID-List_db>pccompound</PCT-ID-List_db>
                          <PCT-ID-List_uids>
                            <PCT-ID-List_uids_E>3243128</PCT-ID-List_uids_E>
                            <PCT-ID-List_uids_E>3240114</PCT-ID-List_uids_E>
                          </PCT-ID-List_uids>
                        </PCT-ID-List>
                      </PCT-QueryUids_ids>
                    </PCT-QueryUids>
                  </PCT-QueryActivitySummary_scids>
                </PCT-QueryActivitySummary>
              </PCT-QueryType_qas>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
If one of the above requests is processed and started successfully, PUG would respond with a waiting message and request id, for example:
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="running"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>336289408724569820</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>
You would then use this request ID, "336289408724569820" in this case, to "poll" PUG for the status of the request:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>336289408724569820</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
If the download is still running, you would get another waiting message as above, and you would then need to poll again after a reasonable interval. If the download task is completed, PUG would give an XML file containing assay description (Example 5), or a ftp URL (the rest examples) where the requested data is available:
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_download-url>
          <PCT-Download-URL>
            <PCT-Download-URL_url>
            ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/336289408724569820.csv.gz
            </PCT-Download-URL_url>
          </PCT-Download-URL>
        </PCT-OutputData_output_download-url>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>
If for some reason your initial query cannot be properly interpreted, PUG would respond with an error message with some indication of the problem encountered:
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="server-error"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>
         Status: server-error. Contact info@ncbi.nlm.nih.gov for assistance.
         Please include your request ID if possible.
         </PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>
If you wish to cancel a queued or running request, you would send to PUG:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>336289408724569820</PCT-Request_reqid>
          <PCT-Request_type value="cancel"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
And when PUG cancels your task, you would get back:
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="running"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Request cancelled</PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>336289408724569820</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

3.3. PubChem Standardization Tasks.

The PubChem Standardization service allows you to standardize the representation of a chemical structure using a <PCT-Standardize> sub-object.  PubChem uses a normalization procedure on all PubChem substance records to remove variation due to different representations of functional groups, tautomeric or resonance forms, etc., to create the PubChem Compound database, which contains the unique chemical structures in the PubChem Substance database.  This procedure verifies and validates that a chemical structure is reasonable (to a certain degree) through examination of the atoms and their valence and involves a valence-bond canonicalization processing for tautomer invariance.  The input to structure standardization is a chemical structure and the output is either a failure message or a chemical structure.  To use this service, you will need to specify an input structure and its format.  You also need to specify the output format you desire.  This service operates on only a single structure at a time.

Example:   You would like to standardize the representation of guanine input in SMILES format and output in SDF format.

The typical flow of information is as follows.  First, the initial input XML is sent to PUG via HTTP POST.  Note the input data container with the download request and uid and format options:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_standardize>
        <PCT-Standardize>
          <PCT-Standardize_structure>
            <PCT-Structure>
              <PCT-Structure_structure>
                <PCT-Structure_structure_string>C1=NC2=C(N1)C(=O)N=C(N2)N
		</PCT-Structure_structure_string>
              </PCT-Structure_structure>
              <PCT-Structure_format>
                <PCT-StructureFormat value="smiles"/>
              </PCT-Structure_format>
            </PCT-Structure>
          </PCT-Standardize_structure>
          <PCT-Standardize_oformat>
            <PCT-StructureFormat value="smiles"/>
          </PCT-Standardize_oformat>
        </PCT-Standardize>
      </PCT-InputData_standardize>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

If the request is small and finishes very quickly, you may get a final URL right away (see further below). But usually PUG will respond initially with a waiting message and a request ID (<PCT-Waiting_reqid>) such as:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>402936103567975582</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

You would then parse out this request id, being "402936103567975582", in this case, and use this id to "poll" PUG on the status of the request, composing an XML message like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>402936103567975582</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

Note that here the request type "status" is used; there is also the request type "cancel" that you may use to cancel a running job.

If the request is still running, you well get back another waiting message as above, and then you would poll again after some reasonable interval. If the request is finished, you will get a final result message like:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_structure>
          <PCT-Structure>
            <PCT-Structure_structure>
              <PCT-Structure_structure_string>C1=NC2=C(N1)C(=O)N=C(N2)N
              </PCT-Structure_structure_string>
            </PCT-Structure_structure>
            <PCT-Structure_format>
              <PCT-StructureFormat value="smiles"/>
            </PCT-Structure_format>
          </PCT-Structure>
        </PCT-OutputData_output_structure>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

You would parse out the output from the <PCT-Download-URL_url > tag to retrieve the standardized structure.

 

4. PUG, NCBI eUtils, and Entrez History.

NCBI's Entrez integrates the scientific literature, DNA and protein sequence databases, 3D protein structure and protein domain data, population study datasets, expression data, assemblies of complete genomes, taxonomic information, and PubChem Compound, Substance, and BioAssay databases (among others) into a tightly interlinked system.  It is a retrieval system designed for searching its linked databases.  Entrez history provides a record of the searches performed during a search session.  PubChem communicates with Entrez history through Entrez Programming Utilities (eUtils) to enhance data analysis.

NCBI's eUtils are used extensively by PubChem services.  Results from queries are often provided in the form of an Entrez history, which represents a list of database specific identifiers within the Entrez search system.  These identifiers are, for example, your PubChem CIDs (compound identifiers).  This allows you, the user, to interact with other Entrez databases and to perform hit list management tasks using eUtils, e.g., to logically combine the results of different queries using AND, OR, or NOT operations.  PubChem services typically accept an Entrez history as a means to provide a subset of identifiers as input, so that your query operates only on a subset of a PubChem database contents.  Use of Entrez history can help you avoid sending and receiving (potentially) very large lists of identifiers.  To learn more about eUtils, please visit the URL:

http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

Histories in Entrez are database specific. Each time an Entrez search is executed, the search terms, the time the search was executed, and the search results are numbered consecutively and saved automatically in Entrez history for that database. The history can be recalled at any time during a search session, but histories are lost after 8 hours of inactivity. There is also a running limit of 100 searches (across all databases) saved in any given session.

PUG is integrated with Entrez in that it may use Entrez history keys (also know as "webenv" keys) as both input and output, depending on the task.  For example, structure search via PUG may return an Entrez history, and the resulting hit list can be retrieved as a list of CIDs using Entrez's eFetch utility.  PUG can also take a history key as input, if you wanted to download the records resulting from either a prior structure search or a programmatic Entrez search via the eSearch utility.  (See the eUtils documentation for more details.)

Entrez histories are referred to programmatically by the trio of a database name, a WebEnv string, and a query key number. You can see this in the example structure search above.  The part of PUG's response that contains this information is the <PCT-Entrez> tag:

<PCT-Entrez>
  <PCT-Entrez_db>pccompound</PCT-Entrez_db>
  <PCT-Entrez_query-key>1</PCT-Entrez_query-key>
  <PCT-Entrez_webenv>
      0Hm9YDD1X4wor4nONvSWx9vkKmEqFXTiq84JO47pgxmSw_
      cIuDBVcG46Yr@2B5C47D162BBD720_0008SID
  </PCT-Entrez_webenv>
</PCT-Entrez>

This Entrez history information may be used in a variety of ways. If you want to view these hits on a regular web page, you can direct a browser to an URL as follows, which shows the results in HTML in the usual Entrez docsum format:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Select+from+History&
WebEnvRq=1&db=pccompound&query_key=1&
WebEnv=0Hm9YDD1X4wor4nONvSWx9vkKmEqFXTiq84JO47pgxmSw_ cIuDBVcG46Yr@2B5C47D162BBD720_0008SID

On the other hand, if you are writing an application and want to retrieve the hit list directly via HTTP, you can use eFetch with the same information, which can return the list in XML (with its own DTD/XSD that is not related to PUG's), for example:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
retmode=xml&rettype=uilist&
WebEnvRq=1&db=pccompound&query_key=1&
WebEnv=0Hm9YDD1X4wor4nONvSWx9vkKmEqFXTiq84JO47pgxmSw_ cIuDBVcG46Yr@2B5C47D162BBD720_0008SID

Finally, if you want to download the compounds from this search in, e.g., SDF format with gzip compression, you would send PUG a request with the <PCT-Entrez> information instead of an explicit CID list. From here, the download process would continue as in the example above.

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_download>
        <PCT-Download>
          <PCT-Download_uids>
            <PCT-QueryUids>
              <PCT-QueryUids_entrez>
                <PCT-Entrez>
                  <PCT-Entrez_db>pccompound</PCT-Entrez_db>
                  <PCT-Entrez_query-key>1</PCT-Entrez_query-key>
                  <PCT-Entrez_webenv>
                   0Hm9YDD1X4wor4nONvSWx9vkKmEqFXTiq84JO47pgxmSw_
                   cIuDBVcG46Yr@2B5C47D162BBD720_0008SID
                  </PCT-Entrez_webenv>
                </PCT-Entrez>
              </PCT-QueryUids_entrez>
            </PCT-QueryUids>
          </PCT-Download_uids>
          <PCT-Download_format value="sdf"/>
          <PCT-Download_compression value="gzip"/>
        </PCT-Download>
      </PCT-InputData_download>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

PUG and eUtils together make possible a wide variety of powerful programmatic data analysis tools for PubChem and other Entrez databases.


5. PUG and SOAP.

There is a SOAP wrapper for PUG.  Documentation and a WSDL for which can be found at:

http://pubchem.ncbi.nlm.nih.gov/pug_soap

The interface includes much of PUG's functionality, but with simplified functions that are accessible from GUI workflow applications and SOAP-aware programming languages.


6. FAQs.

Some simple workflows help to illustrate the use of PUG. All PUG messages must be sent via a HTTP POST to the URL:

http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi

Scenario 1. I would like to retrieve the SMILES (in gzip compressed format) for a list of PubChem Compound CIDs: 1, 2, and 3. Compose your PUG message:


<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_download>
        <PCT-Download>
          <PCT-Download_uids>
            <PCT-QueryUids>
              <PCT-QueryUids_ids>
                <PCT-ID-List>
                  <PCT-ID-List_db>pccompound</PCT-ID-List_db>
                  <PCT-ID-List_uids>
                    <PCT-ID-List_uids_E>1</PCT-ID-List_uids_E>
                    <PCT-ID-List_uids_E>2</PCT-ID-List_uids_E>
                    <PCT-ID-List_uids_E>3</PCT-ID-List_uids_E>
                  </PCT-ID-List_uids>
                </PCT-ID-List>
              </PCT-QueryUids_ids>
            </PCT-QueryUids>
          </PCT-Download_uids>
          <PCT-Download_format value="smiles"/>
          <PCT-Download_compression value="gzip"/>
        </PCT-Download>
      </PCT-InputData_download>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
PUG will send you back a response, for example:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>402936103567975582</PCT-Waiting_reqid>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

This response contains a request ID, being "402936103567975582". You will use this request ID to query PUG on the status of your request by composing and sending another PUG message, as such:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>402936103567975582</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

If your request is still being processed, you will receive a response message as before. When your request completes, PUG will return an XML message like:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_download-url>
          <PCT-Download-URL>
            <PCT-Download-URL_url>
    ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/656213441898678492.txt.gz
            </PCT-Download-URL_url>
          </PCT-Download-URL>
        </PCT-OutputData_output_download-url>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

The PUG response gives you an URL where you may retrieve your results:
ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/656213441898678492.txt.gz

Scenario 2a. I have a SMILES of L-tyrosine and I would like to get back an SDF file containing the PubChem Compound record(s) exactly matching this structure (with the same stereo and isotopes).

The workflow for this scenario is very similar to Scenario 1, where you . Borrowing the workflow from the first scenario, the PUG message you HTTP POST would be:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>C1=CC(=CC=C1C[C@@H](C(=O)O)N)O
                    </PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_identical>
                      <PCT-CSIdentity value="same-stereo-isotope">5</PCT-CSIdentity>
                    </PCT-QueryCompoundCS_type_identical>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>2000000</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The eventual result of the search will look something like:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Your search has completed successfully!
          </PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_entrez>
          <PCT-Entrez>
            <PCT-Entrez_db>pccompound</PCT-Entrez_db>
            <PCT-Entrez_query-key>3</PCT-Entrez_query-key>
            <PCT-Entrez_webenv>
            0hJ--NzxiSKFxJzc4SnMb5PvxBP8HKJvZ-2s-XE19WBZSHG0xIO_k_xPrU@1FBE6DE17A397ED0_0011SID
            </PCT-Entrez_webenv>
          </PCT-Entrez>
        </PCT-OutputData_output_entrez>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

When the query is complete, the result is an Entrez query key and webenv. The query key identifies the query result and the webenv provides your session identifier. You use this Entrez query key and webenv as your source of CIDs to compose another PUG message to download the gzipped compressed SDF file of the query hits:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_download>
        <PCT-Download>
          <PCT-Download_uids>
            <PCT-QueryUids>
              <PCT-QueryUids_entrez>
                <PCT-Entrez>
                  <PCT-Entrez_db>pccompound</PCT-Entrez_db>
                  <PCT-Entrez_query-key>3</PCT-Entrez_query-key>
                  <PCT-Entrez_webenv>
                  0hJ--NzxiSKFxJzc4SnMb5PvxBP8HKJvZ-2s-XE19WBZSHG0xIO_k_xPrU@1FBE6DE17A397ED0_0011SID
                  </PCT-Entrez_webenv>
                </PCT-Entrez>
              </PCT-QueryUids_entrez>
            </PCT-QueryUids>
          </PCT-Download_uids>
          <PCT-Download_format value="sdf"/>
          <PCT-Download_compression value="gzip"/>
        </PCT-Download>
      </PCT-InputData_download>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The final result, as in the first scenario, will contain an URL to the results containing CID 6057:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_download-url>
          <PCT-Download-URL>
            <PCT-Download-URL_url>
            ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/693081357064045880.sdf.gz
            </PCT-Download-URL_url>
          </PCT-Download-URL>
        </PCT-OutputData_output_download-url>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

Scenario 2b. I have a SMILES of L-tyrosine and I would like to get back an SDF file containing the PubChem Compound records containing the same isotopes (but stereo can be different).
This scenario is identical to Scenario 2a, except that the initial PUG message would look like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>
                    C1=CC(=CC=C1C[C@@H](C(=O)O)N)O</PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_identical>
                      <PCT-CSIdentity value="same-isotope">4</PCT-CSIdentity>
                    </PCT-QueryCompoundCS_type_identical>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>2000000</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The final result would contain three records, being CIDs 6057, 1153, and 71098.

Scenario 2c. I have a SMILES of L-tyrosine and I would like to get back an SDF file containing the PubChem Compound records that have a similarity of 95%.

This scenario is identical to Scenario 2a, except that the initial PUG message would look like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>
                    C1=CC(=CC=C1C[C@@H](C(=O)O)N)O</PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_similar>
                      <PCT-CSSimilarity>
                        <PCT-CSSimilarity_threshold>95</PCT-CSSimilarity_threshold>
                      </PCT-CSSimilarity>
                    </PCT-QueryCompoundCS_type_similar>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>2000000</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The final result would contain (currently) the SDF records for 191 CIDs.

Scenario 2d. How do I query using a molecular formula C2H7O and get back an SDF file containing the PubChem Compound records matching exactly?
This scenario is identical to Scenario 2a, except that the initial PUG message would look like:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>C2H7NO</PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_formula>
                      <PCT-CSMolFormula></PCT-CSMolFormula>
                    </PCT-QueryCompoundCS_type_formula>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>2000000</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The final result would contain (currently) the SDF records for 20 CIDs.

Scenario 3. How do I retrieve the SDF file of all PubChem Compound records within the mass range 100.00 to 100.01 atomic mass units? Unlike the first two scenarios, you will query Entrez (not PUG) initially to generate the list of CIDs. To perform the Entrez query, using "eSearch", use the URL:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&usehistory=y&retmax=0&term=100:100.01[exactmass] Please note that the "retmax=0" argument in the URL above prevents the actual result list of PubChem Compound CIDs from being returned. The "usehistory=y" creates an Entrez history item, required for the next step in this scenario. If a CID list is desired, simply omit both, e.g.:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&term=100:100.01[exactmass] The XML return message from eSearch, using the former URL above (rather than the latter), will look like:

<eSearchResult>
        <Count>81</Count>
        <RetMax>0</RetMax>
        <RetStart>0</RetStart>
        <QueryKey>26</QueryKey>
        <WebEnv>
 0BPLhFE_YfmLOCUMsO7FDRuhXLxgPqzfs-aB_O2nILEnCpSEb-AIRQzeQ0LTaQNNlpK8XkxiDcX71it@46C3203C79E0BE30_0000SID
        </WebEnv>
        <IdList>
        </IdList>
        <TranslationSet>
        </TranslationSet>
        <TranslationStack>
                <TermSet>
                        <Term>0000100.000000[ExactMass]</Term>
                        <Field>ExactMass</Field>
                        <Count>-1</Count>
                        <Explode>Y</Explode>
                </TermSet>
                <TermSet>
                        <Term>0000100.010000[ExactMass]</Term>
                        <Field>ExactMass</Field>
                        <Count>-1</Count>
                        <Explode>Y</Explode>
                </TermSet>
                <OP>RANGE</OP>
        </TranslationStack>
        <QueryTranslation>0000100.000000[ExactMass] : 0000100.010000[ExactMass]</QueryTranslation>
</eSearchResult>

When the query is complete, the result is an Entrez query key and webenv. The query key identifies the query result and the webenv provides your session identifier. You use this Entrez query key and webenv as your source of CIDs to compose another PUG message to download the gzipped compressed SDF file of the query hits:

<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_download>
        <PCT-Download>
          <PCT-Download_uids>
            <PCT-QueryUids>
              <PCT-QueryUids_entrez>
                <PCT-Entrez>
                  <PCT-Entrez_db>pccompound</PCT-Entrez_db>
                  <PCT-Entrez_query-key>26</PCT-Entrez_query-key>
                  <PCT-Entrez_webenv>
 0BPLhFE_YfmLOCUMsO7FDRuhXLxgPqzfs-aB_O2nILEnCpSEb-AIRQzeQ0LTaQNNlpK8XkxiDcX71it@46C3203C79E0BE30_0000SID
                  </PCT-Entrez_webenv>
                </PCT-Entrez>
              </PCT-QueryUids_entrez>
            </PCT-QueryUids>
          </PCT-Download_uids>
          <PCT-Download_format value="sdf"/>
          <PCT-Download_compression value="gzip"/>
        </PCT-Download>
      </PCT-InputData_download>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The final result, as in the first scenario, is an URL to the results:

<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_download-url>
          <PCT-Download-URL>
            <PCT-Download-URL_url>
            ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/816930703564580480.sdf.gz
            </PCT-Download-URL_url>
          </PCT-Download-URL>
        </PCT-OutputData_output_download-url>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

In this example, 81 hits are (currently) returned.

Scenario 4. How do I get back the PubMed abstracts for PubChem Compound CID 2244 (aspirin)?
Similar to Scenario 3, you will query Entrez (not PUG) to get a list of PubMed abstracts linked to a PubChem Compound. To perform the Entrez query, using "eLink", for use the URL:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pccompound&id=2244&db=pubmed

Given that a list of abstracts may be long, it is a good idea to create an Entrez history item, e.g., using the URL:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pccompound&id=2244&db=pubmed&cmd=neighbor_history

The XML return message from eLink, using the latter URL above (rather than the former), will look like:

<eLinkResult>
<LinkSet>
  	<DbFrom>pccompound</DbFrom>
  	<IdList>
  		<Id>2244</Id>
  	</IdList>
  	<LinkSetDbHistory>
  		<DbTo>pubmed</DbTo>
  		<LinkName>pccompound_pubmed</LinkName>
  		<QueryKey>5</QueryKey>
  	</LinkSetDbHistory>
  	<LinkSetDbHistory>
  		<DbTo>pubmed</DbTo>
  		<LinkName>pccompound_pubmed_mesh</LinkName>
  		<QueryKey>6</QueryKey>
  	</LinkSetDbHistory>
    <WebEnv>
0FFDpPFhSw2nzieR6fvaLMqXLnx9GKbcev9IJqI3EXp8pEzTEj38McL0EHmseTEpHXZkgacEGym2qtB@264F60B37ADDEBF0_0143SID
    </WebEnv>
  	</LinkSet>
  </eLinkResult>

In the message above, two Entrez history items were created, one corresponding to the depositor provided links (pccompound_pubmed) and those derived through linkage of MeSH ontology with depositor provided synonyms (pccompound_pubmed_mesh). To retrieve the abstracts, one may provide PubMed ids, one at a time or all at once, using "eFetch".
To retrieve a single abstract, e.g., for PubMed id 12767473, one would formulate an URL like:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12767473&retmode=xml&rettype=abstract

To retrieve abstracts for a list of PubMed ids contained in an Entrez history, e.g., for the eLink query above:
 

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&
webenv=0FFDpPFhSw2nzieR6fvaLMqXLnx9GKbcev9IJqI3EXp8pEzTEj38McL0EHmseTEpHXZkgacEGym2qtB@264F60B37ADDEBF0_0143SID
&query_key=5&retmode=xml&rettype=abstract
The output of the above URLs is omitted for brevity.


Document Version History.

V1.3.0 – 2008June13 -   Added new section on the PubChem BioAssay Query/Download service.  Minor cleanup of the previous documentation. Corrected PUG SOAP URL.

V1.2.0 - 2008May20 -   Added new section for FAQs, providing usage scenarios.  Updated the PUG SOAP documentation.

V1.1.0 – 2007Jan11 –    Added new section on the PubChem Standardization service.  Minor cleanup of and additions to the previous documentation.

V1.0.0 – 2007May10 – Initial release.



Write to Helpdesk | Disclaimer | Privacy statement | Accessibility | Data Citation Guidelines