Skip Navigation
Lister Hill Center Home  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2008Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2008-011
Naive Bayes Classifier for Extracting Bibliographic Information From Biomedical Online Articles
Kim J, Le DX, Thoma GR
Proc. of the 2008 International Conference on Data Mining. Las Vegas, Nevada, USA. July 2008;II:373-8
A Naive Bayes classifier has been developed to extract grant numbers, a key piece of bibliographic information, from online, HTML-formatted, biomedical articles for the National Library of Medicine's MEDLINE database. Grant numbers identify research support from funding organizations, and are part of the MEDLINE citations. 47,362 sentences are collected from articles cited in the MEDLINE database to train and test the classifier, and 4,721 words are identified as suitable features for classification. Experimental results are evaluated using three measures: Precision, Recall, and F-Measure, all of which exceed 98.05%.