Diversity Set III Information
The NCI Diversity Set III was derived from the almost 140,000
compounds available for distribution from the DTP repository. Only
compounds having at least 250 mg of material available were
considered. This was done to allow a large number of copies to be
made and to assure adequate amounts to supply refill requests. We
also wanted to insure that the computer representation of the
chemical structure was reasonable. With the help of the PubChem
team, we compared the connection table encoded by the old SANSS
format to the connection table generated by processing the structure
picture and output via a MDL molfile. Only compounds that showed
identical connection tables were considered. Furthermore, we checked
the molecular formula generated from the structure to the molecular
formula independently entered in our database and only used
compounds where the formulae matched. The more than 80,000 compounds
meeting these criteria were then reduced to the final set using the
programs Chem-X (Oxford Molecular Group) and Catalyst (Accelrys,
Inc.). Both Chem-X and Catalyst use defined pharmacophoric centers
(i.e., hydrogen bond acceptor, hydrogen bond donor, positive charge,
aromatic, hydrophobic, acid, base) and defined distance intervals to
create a finite set of three dimensional, 3-point pharmacophores
resulting in over 1,000,000 possible pharmacophores for the
Diversity Set III selection. The selection protocol considers each
molecule, all its pharmacophores and each of its conformational
isomers. During the generation of the diversity set, the
pharmacophores for any candidate compound are compared to the set of
all pharmacophores found in structures already accepted into the
set. If the current structure has more than 5 new pharmacophores, it
is added to the set. An additional objective with the NCI Diversity
Set III was to create a diverse set of compounds that were amenable
to forming structure-based hypotheses. Thus, molecules that were
relatively rigid, with 5 or fewer rotatable bonds, having a tendency
to be planar, 1 or less chiral centers, and pharmacologically
desirable features (i.e., did not contain: obvious leaving groups,
weakly bonded heteroatoms, organometallics, polycyclic aromatic
hydrocarbons, etc.) were given priority in the final selection. This
resulted in a set of 3046 compounds. This set was sent to the
Molecular Libraries
Small Molecular Repository where they were checked for purity via
LC/Mass Spec. Only compounds with a purity of 90% or better by this
method were accepted. This resulted in a final set of 1597
compounds.
Diversity Set Data
Structural Data
These compounds are also in the Molecular Libraries Small Molecule
Repository (MLSMR) and are shipped to the screening centers that
are part of the
Molecular Libraries Program.
In order to see data in PubChem on testing done in the Molecular
Libraries, you need to search by the PubChem substance ID for the
MLSMR data deposition. You can use the following file with NSC and
equivalent PubChem SID. For further information or help, please
contact
Dan Zaharevitz.
download
csv file with identifiers
Procedures for Requesting Samples