Abstract

The classification of documents, such as research grant applications, into categories of interest is fundamental to the analysis of research portfolios. In this poster, we present a classification system consisting of an ensemble of four to eight member classifiers as well as an aggregation classifier that collects and weighs the votes of the ensemble members. We demonstrate improved performance by the ensemble in comparison to the individual ensemble members, when tested on a large corpus of 2006 Abstracts and Specific Aims sections from awarded grant applications. In particular, we demonstrate significant improvement in the operating characteristics between the ensemble and one of its members, the 2006 Research, Condition, and Disease Categorization (RCDC) category fingerprint. Similar results have been observed on more recent datasets from NCI. The potential applicability of such an ensemble as a decision-support tool is discussed in the context of a comprehensive portfolio reporting solution for NIH.

Close Window