An Open Letter To The Mouse Genetics
Community
From
Francis S. Collins, M.D., Ph.D.
National Human Genome Research Institute, NIH
e-mail fc23a@nih.gov
On Behalf of The Public Mouse Sequencing Project
The public mouse sequencing effort is robust and proceeding rapidly, but from
queries we have been receiving recently there seems to be considerable confusion
about the state of the effort and the location of the data. We clearly have
not been effective enough in our attempts to communicate these plans and achievements
to the wider mouse community. I would like, therefore, to let you know the current
status of our effort.
The goals for our program were designed to develop the
critical resources that were defined by the community:
-
The ultimate goals are a robust physical map and a high quality, finished
sequence of the mouse C57BL/6J strain).
- In addition, useful intermediate products will be produced as
we proceed, but in a way that will not slow the completion of
the map and sequence at all.
The Washington University, Whitehead and Sanger Centers, as well as others, are aggressively working toward these goals. Let me briefly summarize the current results and upcoming products:
- BAC Map. The BAC physical map of the mouse is in very good shape
and is nearly complete. Over 300,000 BAC clones coming approximately equally
from the RPCI23 (female) and RPCI24 (male) libraries have been fingerprinted
and assembled into fewer than 4300 contigs. The fingerprint data and an initial
assembly into 6500 contigs are available at http://www.bcgsc.bc.ca/projects/mouse_mapping/.
The more recent assembly will be posted by early August at the Wash U web
site (http://genome.wustl.edu/gsc/mouse). A "User's Guide" about how to use
these resources is currently being written and will be posted on the Wash
U site as soon as completed. In addition, the Wash U group is also attempting
to link the BAC map to both human syntenic regions and to the mouse RH map,
so that it will be possible to look up clones for particular regions.
- Initial WGS Reads. An initial set of whole genome shotgun data, comprising
over 17 million reads or about 2.5 - 3X coverage, has been generated. The
data are available from the Trace Archive at the NCBI (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi)
and the Ensembl Trace Server (http://trace.ensembl.org). The mouse reads have
been compared to the human genome and homologous reads have been laid out
along the human draft sequence. They can be viewed at (http://genome.cse.ucsc.edu
and http://www.ensembl.org).
The mouse reads are immediately useful for both human and mouse genetics.
For example, for mouse positional cloning, one can take a mouse gene in a
region, look up syntenic regions of human on the UCSC or Ensembl web sites,
and find all of the mouse reads for the conserved segments. You can also retrieve
the mate pairs from the trace archive to improve coverage. The mate pairs
are a nice distance apart for PCR recovery of the segment. Starting with a
mouse cDNA, one can search the trace repository for matches and again retrieve
the mate pairs. The hits also give a good idea of related family members in
the mouse genome. Again, use of the read pair information can let you recover
a suitable region for generating the segments needed to construct the vectors
for mouse knockouts. There are already several examples of mouse genes that
have been positionally cloned using the available public information.
- Assembly of Initial WGS Reads. The 2.5-3X coverage in WGS reads has
been subjected to an initial assembly using assembly software developed in
the public domain. This assembly is currently being assessed and will be posted
on the Whitehead web site by the end of August. Based on preliminary analysis,
the contigs are still relatively small (several kb) and not yet anchored.
But, they will be aligned against the human genome and will be useful for
experimental work as above. We will also be matching the contigs to the BAC
end sequences, and thereby expect to anchor the BAC contigs to the mouse RH
and genetic maps.
- Additional WGS Reads. Based on community feedback on the utility
of the initial WGS reads, we will expand the data set to 5-6x coverage, using
double ended plasmid reads. We expect these data to be generated by December
2001. The larger data set will be assembled to give larger contigs and much
larger scaffolds. This assembly should suffice for most positional cloning
and knockout experiments.
- BAC-based Sequencing of the Genome. We will then take a tiling path
of BAC clones across the genome (currently being developed from the BAC physical
map) and subject the entire set to standard shotgun sequencing; the results
will be combined with the WGS reads. The merged data set will provide the
ordered, anchored full shotgun coverage of the mouse sequence that will be
the substrate for the completion of the finished, high quality mouse sequence
that is the ultimate goal. We anticipate that deep shotgun coverage (leaving
relatively few gaps) will be achieved in about 2 years, with the exact time
frame depending on the precise amount of funding available. (The sequencing
capacity is in place, inasmuch as the deep shotgun phase for the human is
essentially done.) We will then turn to the finishing of the sequence and
the remaining gaps.
- Sequencing of Individual BACs. NHGRI continues to provide the opportunity
for investigators to obtain priority sequencing of individual BACs (or small
contigs) of particularly high biological interest (http://www.nih.gov/science/models/bacsequencing/).
In fact, this program is under-subscribed and has adequate capacity to accept
many additional requests.
I hope that this information is useful to you and would appreciate advice from any members of this listserve as to how to make it more widely known in the mammalian genetics community. Please feel free to distribute this summary yourselves as you see fit.
To ensure better communication and good responsiveness to the community, we will shortly establish a web site containing information and pointers for the mouse community about the available resources, which we will update regularly.
Finally, let me re-emphasize that the HGP is deeply committed
to generating these data as rapidly as possible and to serving the mouse community
as well as possible.