Skip Navigation Link

Diversity Set III Information

The NCI Diversity Set III was derived from the almost 140,000 compounds available for distribution from the DTP repository. Only compounds having at least 250 mg of material available were considered. This was done to allow a large number of copies to be made and to assure adequate amounts to supply refill requests. We also wanted to insure that the computer representation of the chemical structure was reasonable. With the help of the PubChem team, we compared the connection table encoded by the old SANSS format to the connection table generated by processing the structure picture and output via a MDL molfile. Only compounds that showed identical connection tables were considered. Furthermore, we checked the molecular formula generated from the structure to the molecular formula independently entered in our database and only used compounds where the formulae matched. The more than 80,000 compounds meeting these criteria were then reduced to the final set using the programs Chem-X (Oxford Molecular Group) and Catalyst (Accelrys, Inc.). Both Chem-X and Catalyst use defined pharmacophoric centers (i.e., hydrogen bond acceptor, hydrogen bond donor, positive charge, aromatic, hydrophobic, acid, base) and defined distance intervals to create a finite set of three dimensional, 3-point pharmacophores resulting in over 1,000,000 possible pharmacophores for the Diversity Set III selection. The selection protocol considers each molecule, all its pharmacophores and each of its conformational isomers. During the generation of the diversity set, the pharmacophores for any candidate compound are compared to the set of all pharmacophores found in structures already accepted into the set. If the current structure has more than 5 new pharmacophores, it is added to the set. An additional objective with the NCI Diversity Set III was to create a diverse set of compounds that were amenable to forming structure-based hypotheses. Thus, molecules that were relatively rigid, with 5 or fewer rotatable bonds, having a tendency to be planar, 1 or less chiral centers, and pharmacologically desirable features (i.e., did not contain: obvious leaving groups, weakly bonded heteroatoms, organometallics, polycyclic aromatic hydrocarbons, etc.) were given priority in the final selection. This resulted in a set of 3046 compounds. This set was sent to the Molecular Libraries Small Molecular Repository where they were checked for purity via LC/Mass Spec. Only compounds with a purity of 90% or better by this method were accepted. This resulted in a final set of 1597 compounds.

Diversity Set Data

Structural Data

These compounds are also in the Molecular Libraries Small Molecule Repository (MLSMR) and are shipped to the screening centers that are part of the Molecular Libraries Program. In order to see data in PubChem on testing done in the Molecular Libraries, you need to search by the PubChem substance ID for the MLSMR data deposition. You can use the following file with NSC and equivalent PubChem SID. For further information or help, please contact Dan Zaharevitz.
download csv file with identifiers

Procedures for Requesting Samples

Updated 09/06/11