U.S. National Library of Medicine Logo MEDLINE®/PubMed® Baseline Repository
Skip Navigation

Requires NN-6/higher,
or FF-1.0.1/higher


   MBR Query Tool   

Specific PMID History

MBR Files

MBR Reference


We are happy to announce that the problems have been fixed that limited query results to just a PMID list. You will now be able to also create datasets up to 100,000 citations in either ASCII MEDLINE or XML MEDLINE formats. In the event that your query produces more than 100,000 PMIDs, only the first 100,000 will be used for creating the dataset. We apologize for any inconvenience this limitation caused. [ Updated May 29, 2012 ]

Please Note

The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.

To access the MeSH files under the MeSH MBR Files link, you must enter into an online Memorandum of Understanding for use of the MeSH Vocabulary data.

To access the MBR Query Tool, you must be a recognized licensee of NLM Data. NLM leases MEDLINE/PubMed to U.S. and non-U.S. individuals or organizations and to its formally recognized International MEDLARS Centers. We update the authorization information daily at 1800 Eastern Standard Time.

We monitor and provide support for this site from 0800 to 1700 (Eastern Standard Time) Monday to Friday, excluding holidays. If you experience any problems while using the MBR web site, or would like additional information about leasing MEDLINE/PubMed data, send an e-mail to NLMdatadistrib@nlm.nih.gov.

Researchers have requested the ability to have available MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The MEDLINE/PubMed Baseline Repository was set up to provide this capability. We have stored the end-of-year baseline of the MEDLINE/PubMed database for each year starting in 2002 along with a selection of the associated MeSH Vocabulary data files.

BaselineCreatedNumber of Citations
2002 Approx. November 21, 2001 11,299,108
2003 November 1-4, 2002 11,847,524
2004 November 14-18, 2003 12,421,396
2005 November 20, 2004 14,792,864
2006 November 18 & 19, 2005 15,433,668
2007 November 17 & 18, 2006 16,120,074
2008 November 16 & 17, 2007 16,880,015
2009 November 21 & 22, 2008 17,764,826
2010 November 20, 2009 18,502,916
2011 November 19, 2010 19,569,568
2012 November 18, 2011 20,494,848
2013 November 15 & 16, 2012 21,508,439
List of Available Baselines

The baselines are normally generated towards the middle of November each year and contain all completed citations in MEDLINE as of that date. The baselines represent MEDLINE after the year-end processing has been completed. This means that the records have been revised with the upcoming year's new MeSH vocabulary terms. We currently have available the 2002 - 2011 MEDLINE/PubMed Baselines. The naming of the baselines represents this year-end processing. For example, the 2002 MEDLINE/PubMed Baseline contains all completed citations from the mid-1960's until the date the baseline was created in late November 2001 with the year-end processing assigning appropriate 2002 MeSH vocabulary terms, thus it is a baseline for the 2002 year.

The baselines also contain citations that are not MEDLINE. All of the baselines we have stored (2002 on) contain "Out-of-scope" citations which were renamed to "PubMed-not-MEDLINE" starting with the 2004 MEDLINE/PubMed Baseline. The PubMed-not-MEDLINE status refers to citations that reside in PubMed from journals included in MEDLINE and have undergone quality review but are not assigned MeSH headings because the cited item is not in scope for MEDLINE either by topic or by date of publication. Citations in the Out-of-scope or PubMed-not-MEDLINE status make up a very small percentage of the total number of citations contained in the baselines (For example, 0.51% or 75,271 records in the 2005 baseline and 1.8% or 323,919 records in the 2009 baseline).

Starting with the 2005 MEDLINE/PubMed Baseline, OLDMEDLINE citations are also included in the baselines. The OLDMEDLINE citations make up approximately 11% of the total number of baseline citations. The OLDMEDLINE citations are from international biomedical journals covering the fields of medicine, preclinical sciences, and allied health sciences. The citations were originally printed in hardcopy indexes published prior to 1966. For additional information, please refer to the following URL: http://www.nlm.nih.gov/databases/databases_oldmedline.html.

In the 2005 baseline the subject indexing from the OLDMEDLINE citations were stored solely in the "Other Term" (or "OT") tagged fields and not the MeSH Terms (or MH) tagged fields. This means that searching the 2005 baseline from our MBR Query Tool via the MH field does not include any OLDMEDLINE citations. The only way to include OLDMEDLINE records in the 2005 baseline is to do a timeframe query without specifying any field specific search criteria. Beginning with the 2006 baseline, Other Terms are starting to be mapped to current MeSH Terms so that searching via the MH field may retrieve some OLDMEDLINE records, but, not necessarily the complete set of possibilities.

Starting with the 2007 MEDLINE/PubMed Baseline, on records where all the OLDMEDLINE terms are converted to MeSH Headings, the citation status changes to MEDLINE. You need to rely on the <CitationSubset> element in the XML files and the "SB" field in the MEDLINE ASCII files to determine if a citation is in the OLDMEDLINE subset. For example,

PMID: 14771459
In the XML file: <CitationSubset>OM</CitationSubset>
In the MEDLINE ASCII file: SB - OM
OLDMEDLINE Determination Example

We provide the following resources for each of the baselines for research purposes. Please note that background information on some of these resources is available from our MBR Reference Material page.

Resource Restrictions Where to Find
MBR Query Tool Database: Baseline databases 2002 forward available for searching. Includes tables with MH, SH, MH/SH combination, Chemicals, and PMID data; also can limit or filter by Date Created, Date Completed, Date Last Revised, Publication Year, and Status. License Required MBR_Query_Tool
XML Formatted Citations: XML version of baseline citations. This is the format used to export the Medline/PubMed Baseline citations. License Required MBR_Query_Tool
MEDLINE ASCII Display Formatted Citations: Each XML citation translated to MEDLINE ASCII display format used in PubMed. License Required MBR_Query_Tool
DTD Files: We save a copy of the relevant DTD (Document Type Definition) files each year for working with the Baseline XML files. No Restrictions MBR_Files
Frequency Count Files: Basic frequency counts for the entire MEDLINE/PubMed Baseline sorted into alphabetical and numerical order for the following MEDLINE fields. For all fields but the NM field, we also provide a sort and count of their occurrences as starred (Index Medicus) items.
     a. MH (MeSH Headings)
     b. SH (MeSH Subheadings)
     c. MH/SH combinations
     d. NM (Chemicals)
No Restrictions MBR_Files
Raw Data Files: Files containing the raw data similar to what was used to create our MBR Query Tool Database for this Baseline year. There is a README file describing the various files available and their layouts. No Restrictions MBR_Files
Histogram/Summary Files: File showing the number of MH terms assigned to each of the various MeSH Tree top-level and top-level + 1 categories during the latest year to see how assignment of terms vary from year to year.

File showing the number of MH terms assigned to each of the UMLS Semantic Type Groupings categories during the latest year to see how assignment of terms vary from year to year from a different perspective.
No Restrictions MBR_Files
Related MeSH Files: We save a copy of the MeSH Vocabulary data files for each year and a copy of their associated DTD (Document Type Definition) files for working with the Baseline XML files. Memorandum of Understanding required MBR_Files
UMLS Semantic Groups File: We have saved a copy of the Semantic Groups file. The Semantic Groups are a coarse-grained set of semantic type groupings designed to reduce the complexity in the UMLS Metathesaurus. The 15 semantic groups provide a partition of the UMLS Metathesaurus for 99.5% of the concepts. No Restrictions MBR_Files
Unique Words from Medline Baseline: We use a very simplified idea of a word -- we throw away anything with all numbers, throw away anything with non-ascii characters, and break at anything that is not alphanumeric. The "words" files contains single words and bigram words. The bigram words are made up of a sliding window using the last "valid" word and the current word - so you get something like "last current" where we simply added a space. We also ignore a short (313) list of stop words, so they are not included in the various lists. Each of the "words" files also contains a frequency count for each item. Also, please note that we only look at the Title and Abstract fields to generate our list of words - we have ignored the MeSH Heading fields. No Restrictions MBR_Files

Last Modified: January 29, 2013 ii-public2
     Contact Us    |   Contact Us (SemRep)    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |   USA.gov    Get Acrobat Reader button
Links to Our Sites
MetaMap Public Release
NEW: Distributable version of the actual MetaMap program.
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.
MetaMap Transfer (MMTx)
Java-Based distributable version of the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
MEDLINE Baseline Repository (MBR)
Static MEDLINE® Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Structured Abstracts (SA)
Information about NLM's research on Structured Abstracts in the MEDLINE® Baselines.
Lister Hill Center Homepage Link - Image of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Homepage Link - NLM Logo U.S. National Library of Medicine   NIH Homepage Link - NIH Logo National Institutes of Health
DHHS Homepage Link - DHHS Logo Department of Health and Human Services