MSH WSD Test Collection
|
- 2009AB UMLS
- 203 Ambiguous Words
- 37,888 Ambiguity Cases
- 37,090 MEDLINE Citations
- 2010 MEDLINE Baseline
- Requires UMLS Terminology
Services (UTS) account
|
This test collection was constructed using a method that automatically extracts instances of ambiguous terms from MEDLINE without manual curation which also uses MeSH® indexing of MEDLINE as a resource. The resulting data set contains both biomedical terms and abbreviations and is automatically created using the UMLS Metathesaurus and the manual MeSH indexing of MEDLINE.
|
Original WSD Test Collection
|
- 1999 UMLS
- 50 Ambiguous Words
- 5,000 Ambiguity Cases
- 5,000 MEDLINE Citations
- 1998 MEDLINE Baseline
- Requires UMLS Terminology
Services (UTS) account
|
This test collection was constructed using citations from the 1998 MEDLINE Baseline where the ambiguities were resolved by hand. Evaluators were asked to examine instances of an ambiguous word and determine the sense intended by selecting the Metathesaurus concept (if any) that best represents the meaning of that sense.
In June 2010, Bridget T. McInnes and Mark Stevenson developed a data set linking the WSD ambiguity choices to the 2007AB UMLS CUIs (Concept Unique Identifier). This data can be accessed via our
Collaborations Page
A small utility package called nlm2sval2, which will take the WSD Test Collection and convert it into the Senseval-2 lexical sample format was developed by Dr. Ted Pedersen and can be accessed via our
Collaborations Page
In May 2004,we created a version of this test collection using the PubMed Identifier (PMID) instead of the earlier form of Unique Identifier (MEDLINE UI).
Direct link to PMID Original WSD Test Collection
(Restricted).
|