LHNCBC: Document Abstract

Skip Navigation

Lister Hill Center Home

|

|

FAQs


	Home
	Welcome
	Organization
	Visitor Information
	Staff Directory

	Consumer Health Resources
	Image Processing
	Language & Knowledge Processing
	Medical Informatics
	Multimedia Visualization

	Published Articles
	Technical Reports
	Lectures

	Training Opportunities
	Employment Opportunities

LHNCBC: Document Abstract

Year: 2003	Download Free Adobe Acrobat Reader
LHNCBC-2003-057
Categorization of Sentence Types in Medical Abstracts
McKnight L, Srinivasan P
AMIA Annu Symp Proc. 2003 Nov;:440-444
This study evaluated th e use of machine learning techniques in the classification of sentence type. 7253 structured abstracts and 204 unstructured abstracts of Randomized Controlled Trials from MedLINE were parsed into sentences and each sentence was labeled as one of four types (Introduction, Method, Result, or Conclusion). Support Vector Machine (SVM) and Linear Classifier models were generated and evaluated on cross-validated data. Treating sentences as a simple "bag of words", the SVM model had an average ROC area of 0.92. Adding a feature of relative sentence location improved performance markedly for some models and overall increasing the average ROC to 0.95. Linear classifier performance was significantly worse than the SVM in all datasets. Using the SVM model trained on structured abstracts to predict unstructured abstracts yielded performance similar to that of models trained with unstructured abstracts in 3 of the 4 types. We conclude that classification of sentence type seems feasible within the domain of RCT's. Identification of sentence types may be helpful for providing context to end users or other text summarization techniques.
PDF

Lister Hill National Center for Biomedical Communications
U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894
National Institutes of Health, Department of Health & Human Services
Copyright, Privacy, Accessibility, Freedom of Information Act
USA.gov, Viewers & Players
Site last updated: 17 September 2012