Skip Navigation Bar

Grants and Funding: Extramural Programs (EP)

NLM Informatics Training Conference 2010

University of Colorado Anschutz Medical Campus

June 15-16, 2010

Agenda and Abstracts of Presentations and Posters


Tuesday, June 15, 2010

7:45 - 8:00 Welcome (Dr. Richard D. Krugman, Vice Chancellor for Health Affairs, Dr. Lawrence Hunter, Director - Computational Bioscience Program)
   
8:00 - 8:20 Opening Remarks (Dr. Donald A.B. Lindberg NLM Director)
   
8:20 - 8:30 Introduction of Training Directors and Trainees; Program Update (Dr. Valerie Florance)
   
8:30 - 9:45 Plenary Paper Session # 1 - Health Care Informatics, Moderator: Stephen Downs, Regenstrief, IUPUI
 
   
9:45 - 10:30 Poster Break - Day 1 Group: Posters
  Voted Best Poster, Day 1: A Molecular Dynamics Study of the APQO-CaM Interaction, Daniel Clemens
   
10:30 - 11:30 Parallel Paper Session A
   
  Session A1 - Health Care, Moderator: Amar Das, Stanford University
 
   
  Session A2 - Bioinformatics, Moderator: Pierre Baldi, University of California - Irvine
 
   
11:30 - 12:30 Executive Session of Training Directors (Session Chair: Dr. Donald A.B. Lindberg)
   
12:45 - 2:00 Plenary Paper Session # 2 Clinical/Translational, Moderator: Peter Tarczy-Hornoch, University of Washington
 
   
2:00 - 3:00 Open Mic Session
 
  • Session X1 - Health Care & Public Health Informatics, Moderator: Cindy Gadd, Vanderbilt University
  • Session X2 - Bioinformatics & Translational Informatics, Moderator: Perry Miller, Yale University
   
3:00 - 3:30 Poster Break - Day 1 Group: Posters
  Voted Best Poster, Day 1: A Molecular Dynamics Study of the APQO-CaM Interaction, Daniel Clemens
   
3:30 - 4:30 Parallel Paper Session B
   
  Session B1 - Health Care Informatics, Moderator: George Hripcsak, Columbia University
 
   
  Session B2 - Bioinformatics, Moderator: Bill Caldwell, University of Missouri
 
   

Wednesday, June 16, 2010

8:15 - 9:30 Plenary Paper Session #3 Bioinformatics, Moderator: George Phillips, University of Wisconsin
 
   
9:30 - 10:15 Poster Break - Day 2 Group Posters
  Voted Best Poster, Day 2: Co-factor of LIM Transcriptional Regulation in Mammary Gland Development, Michael L Salmans
   
10:15 - 11:30 University of Colorado, Anschutz Medical Campus Showcase
   
11:30 - 12:30 On Women's Careers in Science (de Saint Just) Women In Science handout, (Word file)
   
12:30 - 1:30 Parallel Paper Session C
   
  Network Session C1 Public Health Informatics, Moderator: Harold Lehman, Johns Hopkins University
 
   
  Network Session C2 Knowledge Tools, Moderator: Alexa McCray, Harvard University
 
   
  Network Session C3 Biological Models, Moderator: Alex Bui, UCLA
 
   
1:30 - 2:00 Poster Break - Day 2 Group Posters
  Voted Best Poster, Day 2: Co-factor of LIM Transcriptional Regulation in Mammary Gland Development, Michael L Salmans
   
2:00 - 3:15 Plenary Paper Session #4: Systems - Tools & Techniques, Moderator: Bill Hersh, OHSU
 
   
3:15 - 3:30 Closing Session and Poster Awards, Dr. Larry Hunter and Dr. Valerie Florance


PRESENTATION ABSTRACTS

Supporting Practice-Based Learning and Improvement with Population-Based Data

Authors:
Leigh A Baumgart, Ellen J Bass, and Jason A Lyman, University of Virginia, Charlottesville

Abstract:
The idea of self-assessment, often referred to as Practice-Based Learning and Improvement (PBLI) has gained increased attention since 1999 when the Accreditation Council of Graduate Medical Education (ACGME) specified PBLI as one of the core learning requirements for residency programs in the United States. Specifically, residents must demonstrate the ability to evaluate their care of patients. However, it is unclear how to best support the development of such self-assessment skills. Exploration of population- based data is one method to enable health care providers to identify deficiencies in overall practice behavior that can motivate quality improvement initiatives. At the University of Virginia we are developing a decision support tool to integrate and present population-based patient data to providers to support development of PBLI skills and drive quality improvement in health care. The Systems and Practice Analysis for Resident Competencies (SPARC) tool presents population-based reports and enables physicians to investigate their practice behaviors by looking at both non-clinical characteristics (such as race and income level), and clinical characteristics (such as those related to preventative medicine and disease management). By enabling users to separate their direct impact on clinical outcomes from non- clinical factors out of their control, we may enhance the self-assessment process.

top



Learning-Style Tailored Information Prescriptions for Emergency Medicine Patients

Authors:
Taneya Y Koonce, Nunzia B Giuse, Vanderbilt University

Abstract:
Hypertension poses a significant public health burden in the United States despite being a modifiable risk factor for cardiovascular health outcomes. High blood pressure affects 1 of every 3 adults in the U.S. and those groups at highest risk are also disproportionately represented in national emergency department (ED) utilization estimates. This imbalance, combined with the fact that many patients do not understand information provided to them in the ED setting, provides an opportunity for further study of best hypertension information dissemination practices that may promote self-management behaviors. This study sought to determine whether learning-style tailored health education materials, information prescriptions, are effective in increasing hypertension knowledge in emergency room patients. In a randomized controlled trial, hypertensive emergency medicine patients were allocated to receive either standard of care discharge instructions or discharge instructions in combination with an information prescription individualized to each patient's specific learning style preferences. Two-weeks after the visit, changes in hypertension knowledge were assessed via survey. The results of this study will contribute to establishing an infrastructure framework for developing customized information prescriptions that can be broadly adapted for use in varied care settings and with varied health conditions.

top



A Quantitative Analysis of Adverse Events Across Structured Product Labels

Authors:
Jon D Duke, Jeff Friedlin, Regenstrief Institute

Abstract:
Product labels are a primary source of drug safety information for physicians. Research suggests however that the effectiveness of labels in communicating adverse drug events (ADE's) may be diminished by issues of information overload. Despite this recognition, no study to date has explored patterns of adverse event labeling in a comprehensive and quantitative fashion. We conducted a natural language processing-based analysis of 5602 Structured Product Labels (SPL's) to assess the volume and location of adverse events. In total, we extracted 534 125 events and mapped them to 3667 terms from the Medical Dictionary for Regulatory Activities (MedDRA). The number of unique adverse events per label ranged from zero to 525, with a mean of 70 and a median of 49. More commonly prescribed drugs had higher volumes of ADE's, with an average of 107 events per label. The highest numbers of adverse events were seen in psychiatric, neurologic, and rheumatologic drugs. The most common events were nausea, vomiting, and headache, appearing on over 65% of labels. More serious but clinically rare ADE's such as angioedema and Stevens-Johnson syndrome were also prevalent, appearing on 29% and 24% of labels respectively. Section analysis revealed black box warnings in 21% of labels.

top



Local Alignment Tool for Clinical Histories (LATCH): An Efficient Search Strategy for Finding Similar Patterns of Care

Author:
Wei-Nchih Lee, Stanford University

Abstract:
Data collected through electronic health records is increasingly used for clinical research purposes. Common research tasks like identifying treatment cohorts based on similar treatment histories, assessing adherence to protocol based care, or determining clinical ‘best practices' can be difficult given complex sets of treatment choices and the longitudinal nature of patient care. To address this challenge, we have developed a temporal sequence alignment strategy, called Local Alignment Tool for Clinical Histories (LATCH). LATCH uses an ontology-based scoring schema to measure semantic similarity between two treatment histories. The algorithm relies on a user-defined threshold heuristic to reduce the search space, and has a polynomial running time. We have tested and validated LATCH by searching for patients in the Stanford HIV Database whose care is similar to guideline-recommended HIV care. Our approach can be used to cluster patients based on similar patterns of care and to predict outcomes of care for similar treatment histories.

top



Network-based Prediction of Drug-drug Interactions

Authors:
Aurel Cami, Ben Reis, Harvard University

Abstract:
Adverse events (AEs) associated with drug-drug interactions can cause significant morbidity and mortality. Several drugs have been withdrawn from the market due to interaction-related Aes. Existing methods for detecting drug interactions in the post-marketing phase are hindered by the extremely large combinatoric search space of possible (drug, drug, AE) triplets and false positives are also a major concern. We propose a novel approach to predicting adverse drug interactions based on the Exponential Random Graph (ERG) family of network models. The proposed model leverages three types of information: (i) structural biases of the network formed by the known drug-drug interactions; (ii) interaction biases associated with intrinsic drug attributes; and (iii) interaction biases associated with various drug taxonomies. We evaluate this ERG model using a cross-validation approach and find that it achieves a ROC area above 0.80. We also find that mixing of structural, attribute-based and taxonomy-based information leads to better predictive performance than the performance obtained when only structural or only attribute and taxonomy-based variables are included in the model. These results indicate that network-based approaches may be a useful tool for predicting new drug-drug interactions and point the way to new avenues of drug safety research.

top



Increasing Physician Adherence to Established Guidelines for Imaging of Low Back Pain

Authors:
Esteban F Gershanik, Louise Schneider, Ramin Khorasani, Harvard University

Abstract:
Purpose: Integration of a Clinical Decision Support Tool (CDST) into a computerized physician ordered entry (CPOE) system can increase adherence to evidence based guidelines when consequences are invoked for inappropriate orders of Lumbar-Spine Magnetic Resonance Imaging (LS-MRI) for low back pain. Methods: A primary care provider (PCP) group at an academic, tertiary center had LS-MRI CDST integrated into their CPOE with consequences based on established guidelines by the American College of Physicians and American Pain Society (ACP-APS). We compared physicians' adherence to guidelines by analyzing randomized chart reviews, LS-MRI orders, overall MRI orders, and number of low back pain visits pre- and post-intervention. Results: Randomized chart reviews showed PCP adherence rates pre- intervention 39%, post-intervention 91%. LS-MRI for low back pain per 1000 low back pain visits decreased from pre- to post-intervention by 14.2%. Lastly, LS MRI orders decreased compared to overall MRI orders. Statistical analysis showed a significant decrease in LS MRI orders with CDST integrated into CPOE adjusting for time and overall MRI orders with a p<.001. Conclusion: CDST with consequences for ordering LS-MRI for low back pain via CPOE in outpatient PCP practices improves adherence to established guidelines and decreases unnecessary, expensive imaging for low back pain.

top



Development of a Simulation Based Intervention for Veterans with Diabetes

Authors:
Bryan Gibson1,2, Michael J Lincoln1,2,3, Matthew Samore1,2, Nancy Staggers1, Charlene Weir1,2
1Salt Lake City George E. Whalen VA Medical Center, Salt Lake City UT and the Department of Veterans Affairs Chief Health Informatics Office, 2Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 3Department of Internal Medicine, University of Utah, Salt Lake City, UT

Abstract:
Self-management is a complex task for patients with Diabetes Type II. Simulation is a powerful mechanism for increasing knowledge and motivation for such tasks. This project uses simulation in two ways: 1) we are developing simulated Continuous Glucose Monitoring (CGM) curves 2) we are evaluating the impact of system-supported mental simulations on motivation to engage in physical activity.

The underlying assumption is that simulated CGM curves will convey key concepts, even in subjects with low numeracy. In particular we want subjects to understand the diurnal variation in blood glucose, the progression of the disease over time and the effects of physical activity on both acute and chronic glycemia.

To develop our initial simulated CGM curves we have used linear interpolation between the mean curves of groups of individuals with a known A1c to derive an approximate curve for an individual. We are in the process of collecting more CGM data with which to develop regression models in order to more closely approximate curves for individuals. Interface design has begun with qualitative work on understandability and usability. Future work will recruit veterans to use the system while providing their data to improve the predictive equations used in the simulation.

top



Many Low-barrier Local Motions Cooperate to Produce the Adenylate Kinase Conformational Transition

Authors:
Michael D Daily, George N Phillips, Jr, Qiang Cui, University of Wisconsin-Madison

Abstract:
Conformational transitions are functionally important in many proteins. In the enzyme adenylate kinase (AK), two small domains (LID and NMP) close over the larger CORE domain; the reverse (opening) motion limits the rate of catalytic turnover. Previous experiments and computations have revealed that functionally important dynamics are broadly distributed in AK, especially in the LID and NMP domains and near interdomain hinges. Here, using a double-well Gö simulation of E. coli AK, we elaborate on this distributed mechanism by simultaneously characterizing the contributions of rigid-body (Cartesian), backbone dihedral, and contact motions to the transition state (TS) structure and eofuncti. In Cartesian space, relative to the displacement between the open (O) and closed I conformations, LID and NMP each close approximately two-thirds toward CORE in the TS, substantially reducing rigid-body motional entropy. In backbone dihedral space, we find that as expected, backbone fluctuations are reduced in the O to C transition in parts of all three domains. Among these "quenching" residues, LID and NMP show a mix of rigidified and flexible residues in the TS, while most quenching residues in CORE are rigidified in the TS. Closed state contacts within the three domains are more probable in the TS than those at the interfaces, mostly because of a greater O-C distance difference for the interface contacts, and the CORE- LID/NMP interfaces remain sufficiently open to accommodate the ligand. From these results, we predict mutations that will perturb the opening and/or closing transition rates by changing the entropy of important dihedrals and/or the enthalpy of important contacts. Furthermore, our results support a mixed allosteric mechanism where ligand binding follows the TS, but the TS retains substantial backbone dihedral entropy. Finally, the analytical approach and the insights derived from this work may inform the rational design of allosteric properties in proteins.

top



Detection of Breakpoints of Structural Variation in 454 Sequencing Data

Authors:
Caleb F Davis, David A Wheeler, Richard A Gibbs, Ching C Lau, Baylor College of Medicine

Abstract:
Structural variants (SVs) such as deletions, duplications, inversions, and translocations, are implicated in a wide range of human disease, and are used as diagnostic and prognostic markers in lymphomas, leukemias, and sarcomas. Next generation sequencing technologies such as 454 sequencing instruments provide potentially rich sources of information in which to detect SVs, but existing informatics tools are not designed for this purpose. Therefore we built a custom set of tools in Perl and SQL to identify SVs using 454 sequence reads from the 7.5X coverage data set reported for James Watson's genome. Test regions to characterize were identified using copy number variants (CNV) from previously segmented aCGH data. Individual sequencing reads known to localize to these CNV regions were realigned to the reference genome, and those reads failing to align over their entire lengths were analyzed for patterns indicative of rearrangement. Our results show that massively parallel 454 sequencing data contains evidence of structural variation. Furthermore, our approach successfully identifies SVs, potentially without the rearrangement size limits inherent to library-based methods. We aim to generalize this method to the detection of translocation breakpoints in cancer genomes sequenced using targeted DNA capture methods followed by deep sequencing on 454 sequencing instruments.

top


Voted Best Paper, Day 1
Discovery and Validation of Novel Drug Indications Using Public Gene Expression Data


Authors:
Marina Sirota, Joel T Dudley, Mohan Shenoy, Reetesh Pai, Annie P Chiang, Alex A Morgan, Pankaj J Pasricha, Atul J Butte, Stanford University

Abstract:
Drug repositioning, the application of established drug compounds to novel therapeutic indications, offers several advantages over traditional drug development, including reduced development costs and shorter paths to approval. Recent approaches to drug repositioning employ high-throughput experimental approaches to assess a compound's potential therapeutic qualities. Here we present a systematic computational method to predict novel therapeutic indications based on comprehensive testing of molecular profiles in drug-disease pairs. We integrated gene expression measurements across 100 diseases and gene expression measurements on 164 drug compounds to predict novel drug indications. We tested our top prediction for Crohn's disease (CD) and ulcerative colitis (UC) using the rat TNBS model of inflammatory bowel disease (IBD), and successfully demonstrated the predicted efficacy of the anti-convulsant topiramate in attenuating inflammation and macroscopic damage. We also demonstrate the ability to recover many known drug and disease relationships using computationally-derived therapeutic potentials. In this work, we present the concept of computational mining of public gene expression databases as a novel approach to drug discovery. Applying this approach to new disease and drug gene expression measurements in the future has the potential of identifying novel applications in numerous diseases for many other established drugs.

top



Pharmacovigilance using Natural Language Processing, Statistics, and Information Theory

Authors:
Xiaoyan Wang, Herbert Chase, Marianthi Markatou, George Hripcsak, Carol Friedman, Columbia University

Abstract:
Objective: The objective of this work is to develop methods to detect novel adverse events (ADEs) using electronic health records (EHRs), natural language processing and association statistics, and to improve the precision of ADE detection by reducing confounding through: a) selecting information, and b) applying mutual information (MI) and data processing inequality (DPI). Design: Narrative reports were collected from the I at NYPH. MedLEE and co-occurrence statistics with adjusted volume tests were used to detect ADEs. MI and DPI was implemented to reduce confounding information. Seven drugs were selected to evaluate the system. Results: A total of 132 potential ADEs were found to be associated with the seven drugs. Qualitative evaluation using historic rollback design suggested that novel ADEs could be detected using the system. Overall recall and precision were 0.75 and 0.16 for known ADEs respectively. A further evaluation indicated the precision of detecting known ADEs can be improved to 0.31 using selective information, 0.77 using MI and DPI. Conclusion: This study provides methods for the development of active, high-throughput and prospective systems which could potentially unveil drug safety profiles. Our results demonstrate that the methods are feasible although there are a number of challenges to address.

top



High-throughput Sequencing of the Melanoma Epigenome for Biomarker Discovery

Authors:
Jill C Rubinstein, Adam Raefski, Ruth Halaban, Michael Krauthammer, Yale University

Abstract:
Epigenetic aberrations such as hyper- and hypomethylation of CpG dinucleotides have long been observed in melanoma. Microarray-based technologies have previously been used to assess the genome-wide methylation status of promoters. Recently, the application of sequencing technologies in epigenetics has allowed mapping of CpG methylation across the entire sequenced genome. Using the Solexa platform, we perform sequencing of melanoma DNA enriched for methylated regions through a methyl-binding domain pull-down assay. We also sequence total mRNA from the same cell cultures. Our analysis focuses on the identification of novel epigenetic biomarkers in melanoma, as well as the correlation of CpG methylation in promoters, gene bodies, and flanking regulatory regions to gene expression levels.

Using the Literature-Based Discovery Paradigm to Investigate Testosterone and Sleep

Authors:
Christopher M Miller, Marcelo Fiszman, Thomas C. Rindflesch, National Library of Medicine

Abstract:
Sleep disorders often show gender differences, e.g., obstructive sleep apnea occurs more commonly in men, while women are more frequently affected by insomnia. The reasons for these disparities are incompletely understood. Hormones have been suggested, although no clear mechanism has been demonstrated. We exploit the literature-based discovery (LBD) paradigm, enhanced with semantic predications, to elucidate this mechanism, concentrating on testosterone. In LBD closed discovery a connection between X and Z concepts is assumed and an intermediary concept Y is sought. SemRep was used to extract semantic triples from PubMed citations, which were then visualized as a conceptual graph using Semantic MEDLINE. PubMed query "testosterone AND sleep" returned 198 citations and 1150 semantic predications. Many of these suggested an X-Z link between testosterone and sleep. In seeking Y as a mechanism connecting the two, we concentrated on substance involvement predications (INHIBITS, STIMULATE, INTERACTS_WITH) and found many which established an inhibitory relationship between testosterone and cortisol. Based on the well-known mechanistic connection between cortisol and sleep, we propose the hypothalamic-pituitary-adrenal axis, and specifically its components, corticotrophin releasing hormone and cortisol, as a crucial mechanistic link connecting testosterone to sleep. This serves as a hypothesis to be pursued in a sleep research laboratory.

top



Boosting Power for Neuroimaging Genetics in a DTI Study of 288 Individuals

Author:
Neda Jahanshad, University of California, Los Angeles

Abstract:
Imaging genetics is a rapidly growing field, yet the effects of single-nucleotide polymorphisms (SNPs) on the white matter structure and integrity of the brain is still largely unknown. Localization of SNPs associated with individual differences in white matter integrity and connectivity using diffusion tensor imaging (DTI) may shed light on the molecular mechanisms of individual differences in cognition and risk for disease. To study genomic associations, large population studies are needed; in addition to population heterogeneity, anisotropy values from DTI may be corrupted by local partial volume effects (due to fiber mixing and crossing). Here we set out to alleviate the effect of intra-voxel heterogeneity, fiber mixing and non-homology of registered tracts to better localize genetic variants that influence fiber integrity. We aimed to improve the power to detect effects of the common SNP in the brain-derived neurotrophic factor (BDNF) gene which plays an essential role in axonal and dendritic growth, synaptic structure, neurotransmitter release, and long-term potentiation associated-learning. DTI images were acquired from 288 genotyped healthy young adult twins and their siblings. Fractional anisotropy (FA) maps for each individual were elastically registered to a common study-specific high FA template to optimally register white matter tracts. The JHU-DTI-MNI atlas was also registered to our target for accurate region of interest (ROI) analysis. Anisotropy estimates from a single-tensor model can be incorrectly reduced due to fiber mixing and partial volume effects. To adjust for this effect, a voxel-wise map of local signal-to- noise (SNR) in the FA value was obtained from a subset of the population and used to weight the FA maps. Borderline significant values of association were found with BDNF in certain ROIs including the forceps major and the left inferior fronto-occipital fasciculus (IFOF). These were re-evaluated with our weighted maps. Results indicate a higher association of BDNF with IFOF, showing a general trend for higher anisotropy for val/val carriers of the gene (67.7% of the total) versus those with a met66 variant in an allele (32.3%).

top



Modeling Advance Directive Decisions to Inform Shared Decision Making at the Point of Care

Authors:
Negin Hajizadeh, Yale University; Kristina Crothers, University of Washington; R Scott Braithwaite, New York University

Abstract:
Discussions about advance directives are often delayed in chronic obstructive lung disease (COPD). One important barrier to discussing a "Do-Not-Intubate" (DNI) directive is uncertainty about whether its harms would exceed its benefits given particular patient preferences and settings. To inform shared decision making between clinicians and their patients, we constructed a decision analytic model to ask when individualized harms of endorsing "DNI" would exceed individualized benefits. Our outcome measure was quality-adjusted life years (QALYs), the analysis time horizon was infinite, and the decision epoch was one year. Probabilities, utilities, and life expectancies were based on published estimates or expert opinion. We considered different patient preferences regarding permanent institutionalization, one of the main potential harms from not endorsing "DNI", and different complication rates.

Our model suggests that for patients with severe COPD, endorsing a DNI advance directive improved QALYs in select circumstances, most notably when patients had strong preferences against permanent institutionalization, or if outcomes, such as complication rates, were poor. We are currently evolving our model into a Markov model that will incorporate the risk of multiple events over time. Future work will translate our findings into a decision aid to be used for shared decision making about advance directives.

top



Exploring the Barriers and Facilitators to Patient Portal Use

Authors:
Shilo Anders, William Gregg, Jim Jirjis, and Matthew B Weinger, Vanderbilt University

Abstract:
Recent studies suggest that patients who are actively involved in their care have improved patient outcomes. A web-based patient portal is one way to increase patient involvement, but multiple barriers may limit widespread patient acceptance or use. Previous studies have assessed patient's intention to use such a portal, the impact of the portal on patient-physician relationships, and privacy issues. This study explores the barriers and facilitators to using a patient portal with a diverse group of portal users and non-users.

The results of focus groups and individual user interviews will help to increase our understanding of the most common barriers to portal use in different patient populations and uncover additional incentives (e.g. functionality) to attract usage. Use issues range from website- (e.g., usability and functionality) and platform-specific (e.g., web versus kiosk) issues to extrinsic environmental factors (e.g., no home computer or internet access). Preliminary results with existing portal users suggest that website-specific barriers include lack of desired functionality (especially tailored patient-specific information) as well as more mundane issues like forgetting the password. These results will form the basis of future controlled studies investigating new functionality and novel ways in which to overcome barriers to patient use.

top



Detecting Structural Variation in Natural Populations Via Paired- End Sequencing

Authors:
Julie M Cridland and Kevin R Thornton, University of California, Irvine

Abstract:
Structural variation has been recently associated with a number of complex traits; including several human diseases. However, little is currently known about the evolutionary dynamics of structural variants segregating within natural populations. New technology, such as Illumina's high-throughput, paired-end sequencing platform, makes it possible to study this variation by quickly and accurately detecting structural variants, even with relatively low coverage. This technique has several advantages over previous microarray experiments to detect variation, such as the ability to perform this type of experiment for any species of interest and the direct observation of sequence that is involved in structural variation.

We performed paired-end sequencing on multiple isofemale lines from population samples of two Drosophila species, Drosophila melanogaster and Drosophila yakuba. Many genes, such as Cyp6g1, which is known to be under positive selection, and Or22a, Gr28a and Gr28b, which are known to be polymorphic for copy number, were detected with this method. Hotspots of duplication were also identified throughout the genomes. We have also found evidence of natural selection acting on duplications and that duplications segregating within populations may be deleterious.

top



Assessing the Influence of Genomic Polymorphisms on Quantitative Proteomics

Authors:
Suzanne Fei, Phillip Wilmarth, Shannon McWeeney, Robert Hitzemann, Larry David, Oregon Health and Science University

Abstract:
Personalized medicine will someday use personal genetic information in combination with transcriptomic, proteomic, and metabolomic profiles to predict the best course of action for a given patient. Before this can occur, we need to understand how genetic differences influence these profiles, both biologically and technologically. Our group recently quantified the impact of Single Nucleotide Polymorphisms (SNPs) on hybridization-based mRNA expression techniques using two commonly-used mouse strains (Nature Methods, September 2007). 16% of the array was affected which led to a false positive rate of 22% and a false negative rate of 12%. This project performed a corresponding analysis using quantitative shotgun proteomics. Samples from both mouse strains were searched on the reference protein database and a strain-specific database that was generated using SNPs from the Sanger Mouse Genomes Project. 20% of the proteins in the Ensembl database were altered due to SNPs, but ultimately this has had little impact on the preliminary quantitative results leading to an estimated false positive rate of 1.8% and false negative rate of 0.6%. We conclude that protein-based expression techniques are much more robust to underlying genomic sequence variation than mRNA hybridization techniques.

top



A Pathway Based Correlation Method for Identifying Perturbation in Follicular Lymphoma from Microarray Gene Expression Data

Authors:
Allison N Tegge, Gerald Arthur, Lynda Bennett, Charles Caldwell, Dong Xu, Jianlin Cheng, University of Missouri, Columbia

Abstract:
Identifying the molecular cause of a disease can help lead to more accurate disease diagnosis and better treatments. Previous methods identified important pathways by selecting those enriched with differentially expressed genes; however, these methods cannot account for small changes in gene expression across the whole pathway. In order to overcome the problem, we use pair-wise pathway-level gene expression correlations between samples to identify molecular pathways that are perturbed in a disease state.

For this study, human pathway data was obtained from the KEGG database of known, annotated pathways. Pathway-specific expression profiles for each sample were created using the microarray expression values from the genes involved in that pathway. We compared the pair-wise correlation of each given gene pair between follicular lymphoma and normal tonsil cell samples for each pathway. We then ranked the pathways according to the significance of the gene expression perturbation between follicular lymphoma and normal tonsil samples, as signified by the shifts of correlation values between the two states.

The top ten pathways ranked as most perturbed include several amino acid degradation, signal transduction, and cancer related pathways, such as the p53 signaling pathway. This method has strong statistical power for ranking pathways based on gene expression perturbation and could give strong biological implications for future studies on additional diseases in order to help identify those pathways likely to be involved in the diseased phenotype.

top



Voted Best Paper, Day 2
Genome-wide Maps of Candidate Regulatory Motif Sites


Authors:
Kenny Daily, Paul Rigor, Sholeh Forouzan, Yimeng Dou, Xiaohui Xie and Pierre Baldi, University of California, Irvine

Abstract:
For any given genome, only a small fraction of the regulatory elements have been characterized, and there is great interest in applying computational techniques to systematically discover these elements. Such efforts have been hindered by the size of non-coding DNA regions and the statistical variability and complex spatial organizations of mammalian regulatory elements. A central challenge of biology is to map and understand the role of the 98% noncoding regions of the human genome. MotifMap aims to provide a comprehensive map of potential regulatory elements in genomes in an unbiased manner. Each motif found has a variety of associated scores that can aid filtering to find true, novel binding sites. One of these scores is a novel measure of conservation that takes into account not only the strength of the binding site in the species being searched, but of the strength of binding sites in related species. The methods of the MotifMap pipeline from the originally available human genome were applied to the genomes of eight other species. More species can be added in a nearly fully automated fashion, and each species is updated when novel transcription binding matrices become available through updates to existing databases or literature searches.

top



An Abductive Approach to Explaining Genome-Wide Experimental Observations

Authors:
Deborah Chasman, Brandi Gancarz, Paul Ahlquist, and Mark Craven, University of Wisconsin—Madison

Abstract:
We present a method to explain the results of a genome-wide mutant assay. While several mutants may show a significant effect, positive or negative, on the phenotype measured, we do not expect that each one is directly responsible for the change. Instead, we expect that many are upstream of a smaller set of genes or small molecules that are directly involved. Each of these interfaces with the phenotype would serve to explain a handful of significant mutants. Using first-order logic and abductive inference, we posit explanations for the experimental observations in terms of known intracellular interactions. An explanation consists of a subnetwork of genes and small molecules, and a conjecture about which entity in the subnetwork is most proximal to the phenotype. In this way, an explanation suggests a common cause for a set of significant observations. We apply our method to observations from experiments in which yeast knockout strains were assayed for the effects of host gene deletion on Brome Mosaic Virus replication (Gancarz and Ahlquist, unpublished; Kushner et al., 2003). We have been able to construct explanations that link a statistically surprising number of observations to the same interface.

top



Selective Constraint on the Organization of the Human Metabolic Network

Authors:
Corey M Hudson, Gavin C Conant, University of Missouri, Columbia

Abstract:
We seek to understand the evolution of the human metabolic network, a catalog of all enzymes in the human genome and the reactions they catalyze. We have asked whether a gene's position in this network affects the degree to which natural selection prunes variation in that gene. We thus estimated the selective constraint for (nearly) every human enzyme by analyzing nonsynonymous nucleotide changes across the orthologs of these enzymes from 8 mammal species. To improve the quality of the basic network, we pruned non-informative connections between reactions by maximizing the network's modularity. Using this optimized network, we compared the relative centrality of each reaction to each enzyme's selective constraint. There is a statistically significant negative correlation between each reaction's rate of evolution and its centrality, implying that more central reactions are less tolerant of nonsynonymous substitutions. Since metabolic genes are not completely isolated from the rest of the genome this study compares distributions of the rate of sequence evolution in metabolic genes to the genome at large. The determination of genes that are both highly conserved and central in the metabolic network offers potential for identification of plausible targets in the search for genes associated with human inherited metabolic disorders.

top



Identifying Genetic Interactions from Genome-Wide Data Using Bayesian Networks

Authors:
Xia Jiang, PhD1, M. Michael Barmada PhD2, Shyam Visweswaran, MD, PhD1
1Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
2Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA

Abstract:
Gene-gene interactions likely play an important role in susceptibility to common diseases like hypertension and Alzheimer's disease. In particular, common diseases are believed to be caused by genetic variants at multiple loci in such a way that the total effect of the multiple loci cannot be determined by analyzing the effect of each locus individually. This phenomenon is called epistasis. It is in contrast to Mendelian phenomenon in which diseases are caused by variation at a single locus.

To study the underlying genetic variants of common diseases, genome-wide association studies involving several hundreds of thousands of SNPs are being increasingly used. Typically, the data from such genome-wide studies are analyzed with single-locus methods; however, it is unlikely to learn epistasis using single-locus methods. Furthermore, efficiently analyzing epistasis using high-dimensional genome- wide data remains a crucial challenge.

We develop and evaluate a method based on Bayesian networks and the minimum description length principle for detecting epistatic relationships. We compare its power of detecting gene-gene interactions and its computational efficiency to that of a widely used combinatorial method called multifactor dimensionality reduction (MDR). We find that our method outperforms MDR. In addition, we apply the method to over 300,000 SNPs obtained from a genome-wide association study involving late onset Alzheimer's disease (LOAD). We substantiate previous results indicating that the GAB2 gene is associated with LOAD.

top



Questioning the Ubiquity of Neofunctionalization

Authors:
Todd A Gibson and Debra S Goldberg, University of Colorado, Denver

Abstract:
Neofunctionalization (the gain of new function by gene duplicates) has been proposed to be ubiquitous in several studies of protein interaction network evolution. We study three phenomena which draws into question this conclusion. First, we show the importance of analyzing duplication events which have shaped the extant empirical interactions not as isolated events, but rather in concert with concurrent and subsequent duplication events. Second, the bulk of empirical data upon which protein interaction network research is based is generated from Y2H and AP-MS assays. We note that both of these assays are biased against reporting self-interacting proteins, leaving a dearth of homomeric interactions in empirical data sets. Finally, we show that the self-interactions largely overlooked by these assays are integral to the high clustering observed in the empirical data.

In addition to showing how each of these phenomena have affected empirical networks and their evolution, we examine the impact of these phenomena on the network literature. The identification of subtle but critical methodological errors in well-respected papers is both disturbing and thought- provoking, questioning the prevalence of neofunctionalization in protein interaction networks.

top



Design and Evaluation of a Widget-based 'Web 2.0' Electronic Health Record

Authors:
Yalini Senathirajah, David Kaufman, Soumitra Sengupta, Suzanne Bakken, Columbia University

Abstract:
Objectives
Current EHRs frequently neither reflect users' domain knowledge,nor meet public health requirements for rapid configurability to meet emerging needs . Our research explores the usability, usefulness, cognitive effects, efficiency, and task-technology fit of a 'web 2.0' I interface, MedWISE, created to enhance fit between user needs, public health requirements, and EHRs. MedWISE allows clinician users to select, configure, and share information, displays and tools, via simple interfaces, without programmers.

Methods
Evaluation included focus groups, analysis of user-created materials, laboratory testing using clinical scenarios, and experimental configuration for H1N1 risk assessment.

Results
11/12 users experienced/predicted time savings and decreased errors, citing core features as promoting easier mental, workflow, data review, diagnostic and documentation processes. They thoughtfully used new functions in unforeseen ways, streamlining common processes or fitting them to personal or specialty preferences, and passed a test for determining whether users would miss clues to other diagnoses. Rapid configuration for H1N1-related tasks was successful.

Conclusion
Next-generation EHRs with this approach may have advantages: greater suitability to user needs, inclusion of multiple information sources, interoperability, agile reconfiguration for emerging situations, capture of user tacit knowledge, workflow/HCI improvements, and greater acceptance. MedWISE is also a platform for cognition and HCI research.

top



Geocoding Patient Addresses to Identify CA-MRSA Cases and Transmission

Authors:
David C Shepherd, Jeff S Wilson, Abel N Kho, Regenstrief Institute

Abstract:
Geocoding can be used to identify risk factors and patterns of infectious disease transmission. To help identify a cohort of community-associated methicillin resistant S. aureus (CA-MRSA) cases, we geocoded patient addresses from the electronic health record and compared them to publicly available institutional addresses. We successfully geocoded 79% (11,870) of addresses directly from the electronic patient record and matched them to 545 institutional addresses. There were 291 cases with only a facility name in the address field that we linked to a corresponding facility address. After manual editing and linking lone facility names to addresses, the geocoding rate improved to 84% (12,549) and matched 1,379 (9.2%) patients to institutional addresses. Patients whose addresses match to an institution (healthcare facility, prison, nursing home) are unlikely cases of CA-MRSA and were excluded from our cohort. Geocoding can help delineate between community-associated and healthcare-associate MRSA.

Further, because our data included culture dates, we observed cases where multiple patients in close geographic and temporal proximity were diagnosed with MRSA. These cases may represent direct transmission of MRSA. Geocoded patient addresses and microbiology data can be used to study transmission patterns and identify clusters of MRSA infection in the community.

top



tiLessons Learned: Meeting the Needs of Clinical Researchers in Rural Kenyatle

Authors:
Alicia F Guidry, Neil F Abernethy, Judd L Walson, University of Washington

Abstract:
While the information needs of clinical researchers in resource-limited settings resemble those of their counterparts in the United States, information systems in these settings present unique challenges. Among the most important of these are the implementation and maintenance of open-source systems and accounting for limited internet connectivity. We first review approaches for managing data in an asynchronous environment. In the specific context of clinical studies in Kenya focused on HIV/AIDS and co-infections of Malaria and Helminth, we describe one solution to unreliable internet connectivity in the exchange of clinical data using open-source Google Gears platform. The Gears module will be used to reduce the disruption of information transfer tasks by unpredictable network connections. Online/offline file synchronization will allow users in remote sites to enter data irrespective of a network connection. If no internet connection is detected when the application sends or requests data from the server, that information is stored locally on a Gears database and later uploaded when a connection is detected. Here, we review the problems encountered in developing this system and maintaining concurrent systems that informed this architecture. We believe this approach will be useful in helping others meet the challenges presented by resource-limited research settings.

top



Investigator Name Recognition from Medical Journal Articles: A Comparative Study of SVM and Structural SVM

Authors:
Xiaoli Zhang, Jie Zou, Daniel X Le, George Thoma, National Library of Medicine

Abstract:
MEDLINE®, the flagship database of the U.S. National Library of Medicine, is a critical source of information for biomedical research and clinical medicine. Automated extraction of bibliographic information from journal articles is key to the affordable creation and maintenance of this citation database. Beginning with journal issues published in 2008, personal names of those who are not entered as authors but belong to members of corporate organizations are required to be included in a new "Investigator Names" field in MEDLINE. Since the number of such names is often large, several score or more, their manual entry is time-consuming, tedious, and error-prone. The automated extraction of investigator names is a problem in Named Entity Recognition (NER), but differs from typical NER due to the absence of normal English grammar in the text containing the names and the particular requirement to identify first and last names of each investigator by MEDLINE conventions. We seek to automate this task through two machine learning approaches: Support Vector Machine (SVM) and structural SVM, both of which show good performance at the word and chunk levels. In contrast to traditional SVM, structural SVM attempts to learn a sequence by using contextual label features in addition to observational features. It out-performs SVM at the initial learning stage without using contextual observation features. However, with the addition of the contextual observation features from neighboring tokens, SVM performance improves to match or slightly exceed that of the structural SVM.

top



Leveraging Existing Knowledge to Improve Biomedical Language Understanding

Author:
Kevin M Livingston, University of Colorado, Denver

Abstract:
Large-scale language understanding of biomedical text is necessary in order to produce and extend the knowledge bases required to support next generation systems for genome-scale analysis of systems biology data. In addition to supporting the biologist, background knowledge plays an important role in language understanding. Most language understanding work is unidirectional in interacting with underlying knowledge: information is extracted from text and then stored in a knowledge base. However it is possible to leverage existing knowledge at language understanding time. Direct Memory Access Parsing (DMAP) is an approach to NLP that integrates its representations of text incrementally with existing knowledge as it reads; this is in contrast to systems that save memory integration for a final step.
This work presents an implementation of DMAP, OpenDMAP, an architecture designed to explore and evaluate the role existing knowledge can play in assisting the language understanding process. This allows experiments that have, for example, evaluated the role Gene Ontology (GO) annotations associated with specific proteins can play in recognizing and understanding "activation" events (specifically receptor and enzyme activation) in GeneRIFs. Leveraging this background knowledge, with minimal additional changes to the system, improved precision by 20%, while only reducing recall by 6%.

top



Towards Automating the Initial Screening Phase of a Systematic Review

Authors:
TanjaBekhuis1 and DinaDemner-Fushman2
1Center for Dental Informatics, School of Dental Medicine; Department of Biomedical Informatics, School of Medicine, University of Pittsburgh
2Communications Engineering Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine, NIH

Abstract:
Systematic review authors synthesize research to guide clinicians in their practice of evidence-based medicine. Teammates independently identify provisionally eligible studies by reading the same set of hundreds and sometimes thousands of citations during an initial screening phase. We investigated whether supervised machine learning methods can potentially reduce reviewer workload. We also extended earlier research by including nonrandomized studies. To build training and test sets, we used a subset of annotated citations (n=400) from a search conducted for an in-progress Cochrane systematic review (SR) of treatments for ameloblastomas, which are rare odontogenic tumors. We extracted features from titles, abstracts, and metadata (TIABS+MD), then trained, optimized, and compared several classifiers with respect to mean performance based on stratified 10-fold cross-validations. We also validated the best configurations on the held-out test set. In the training condition, the evolutionary support vector machine (EvoSVM) with an Epanechnikov or radial kernel is the best classifier: mean recall=100%; mean precision=48% and 41%, respectively. In the test condition, EvoSVM performance degrades: mean recall=77%, mean precision ranges from 26% to 37%. Because near-perfect recall is essential in this context, we conclude that supervised machine learning methods may be useful for reducing workload under certain conditions. Ongoing work addresses the limitations of the initial study by enlarging the data set; evaluating the contribution of metadata (MD) beyond titles and abstracts (TIABS); and comparing other feature-classifier combinations. Training and test sets for the follow-up study are based on a 2:1 split of the full set of annotated citations (N=1816) from the Cochrane SR; 115 citations (6%) point to provisionally eligible studies. Features will be extracted from TIABS; TIABS+MD; and UMLS® MetaThesaurus® concepts + MD. Classifiers include EvoSVM, the weightily averaged one-dependence estimator, and complement Naive Bayes. The outcomes are mean recall, precision, macro-averaged FΒ=3, and reduction in workload. This work will inform the design of future research in support of systematic reviewers.

top



Clinical and Molecular Models of Glioblastoma Multiforme Survival

Authors:
Stephen R Piccolo, Lewis J Frey, University of Utah

Abstract:
Glioblastoma multiforme (GBM) is an advanced form of brain cancer that results in a median survival of one year, a short duration relative to most other cancers. However, GBM survival times vary substantially, suggesting that individual patient and tumor characteristics influence tumor aggressiveness, responses to treatments, and ultimately survival. Predicting accurately which patients will survive relatively short could help clinicians identify patients less likely to respond to standard treatments, suggest mechanisms driving disease severity, and guide patients in making decisions at the time of diagnosis. Several clinical/demographic and molecular-level factors have been associated with length of GBM survival; however, the prognostic value of individual factors is limited, partly because multiple factors can have cumulative and interacting effects. To address this complexity, multivariate prediction methods have been proposed, yet their value for predicting GBM survival has not been explored extensively. Using data from The Cancer Genome Atlas, we evaluated the potential to construct multivariate prediction models for GBM, based on clinical/demographic data and three categories of tumor-molecular data: DNA somatic mutations, DNA methylation states, and mRNA expression levels. Using survival statistics, we observed that shorter-term survivors could be distinguished from longer-term survivors at significant levels, suggesting potential clinical relevance.

top



Linear Frequency Estimation for Biomedical Signals

Author:
Jonathan Woodbridge, University of California, Los Angeles

Abstract:
Linear frequency estimation (LFE) is a technique for data reduction of frequency-based signals. LFE converts a signal to the frequency domain by utilizing the Fourier transform and estimates both the real and imaginary parts with a series of vectors much smaller than the original signal size. The estimation is accomplished by selecting optimal points from the frequency domain and interpolating data between these points with a first order approximation. The difficulty of such a problem lies in determining which points are most significant. LFE is unique in the fact that it is generic to a wide variety of frequency-based signals such as electromyography (EMG), voice, and electrocardiography (ECG). The only requirement is that spectral coefficients are spatially correlated. Unlike other compression algorithms, LFE outputs a useful representation of the signal. Such an output allows us to perform comparisons and clustering much faster than a non-reduced signal. These properties yield applicability to data compression, indexing, search, and pattern recognition.

top



Modeling the Impact of Receptor Morphology on Neural Touch Information

Authors:
Daine R Lesniak, Gregory J Gerling, Ellen A Lumpkin, University of Virginia, Charlottesville

Abstract:
The sense of touch is vital to human health and independent living, yet our knowledge of touch is insufficient for its recreation through artificial neural feedback in upper limb prostheses. For this, the generation of the neural signals that carry touch information to the brain must be better understood. For example, it is not yet known how differing configurations of the slowly adapting type I (SAI) mechanoreceptor's branching end organ impacts its signaling of edge and curvature information, nor is it understood how interactions within this branching structure give rise to the distinctive SAI response. To address these gaps, a SAI model is developed with reconfigurable receptor morphology. The model combines finite element analysis of skin, fitted functions of receptor transduction, and leaky-integrate-and- fire models of neural dynamics. Model responses are compared to experimentally recorded SAI responses, and the effects of receptor morphology and internal interactions are examined. In addition to facilitating artificial neural feedback, this work may impact the design of sensors and human-machine interfaces in medical environments such as surgical robots and training simulators.

top



Adaptive Information Retrieval: Automated Harvesting of Salient Articles

Authors:
T Elizabeth Workman, John F Hurdle, University of Utah

Abstract:
Semantic MEDLINE, a natural language processing application which processes bibliographic data, is implemented in four separate phases: information retrieval, semantic predication generation by SemRep, summarization of SemRep output, and visualization of its findings. The critical step of summarization is dependent on manually coded schemas which dictate specific arguments to be gleaned from SemRep output. Manual schema coding is expensive, time-consuming, and requires an expert. Our research shows that if a user expresses a concept of interest, and a point of view (e.g. treatment), statistical techniques can dynamically predict the salient arguments in SemRep output, thus eliminating the need for multiple, manually coded schemas. To identify salient SemRep data, we developed a statistical technique to analyze semantic predications in the curation concept of bladder cancer from the point of view of genetic etiology. This technique outperformed a manual schema in harvesting salient predications. Implementation of this statistical- approach in data summarization could provide adaptive information retrieval, freeing users to express their precise information needs.

top



Probabilistic Multiattribute Anonymization for Biomedical Databases

Author:
J Mark Ettinger, University of Texas Health Science Center at Houston

Abstract:
Large biomedical databases potentially contain vast amounts of undiscovered knowledge. Sharing data among a wide array of researchers is vital to unlocking these new scientific discoveries. However it is also important to protect the privacy of those whose medical records are stored in these databases. This may be accomplished by anonymizing the data sufficiently to prevent reidentification of individuals through the use of identifying characteristics contained in the database.

We present a Bayesian approach to anonymizing databases. We discuss maximum entropy methods for deriving a prior distribution, define the formal, probabilistic inference for assessing privacy, and introduce an anonymization scheme analogous to substitution ciphers in cryptography. Throughout we emphasize the viewpoint of probabilistic databases as a unifying theme in anonymization technologies.

top



Themes and Trends in IT and Informatics from 60 Cancer Centers

Author:
Paul A Fearn, University of Washington

Abstract:
The NIH is investing considerable funding Clinical Translational Science Awards (CTSAs) and NCI- designated cancer centers. Informatics and IT efforts at cancer centers are often expensive and significantly impact organizations, yet we know little about the context, efforts and innovations from other centers. From 2008 to 2009, the author visited 61 cancer centers around the US and met with 394 people to explore issues and trends in informatics and IT.

The data collected from these meetings was organized using a qualitative analysis tool, and a number of distinct patterns are evident. This paper will review findings regarding electronic medical record and clinical data repository implementations, selection and implementation of clinical research systems, struggles with biospecimen information management, curation and technical support of research databases, the persistence of Access and Excel databases, the strong rise of caTissue, REDCap and i2b2, patterns in database and web development platforms, caBIG issues, and social/organizational issues.

There are outstanding examples of informatics skills, experience, methods and tools across CTSA and cancer center sites. To make the best use of resources for informatics and IT, centers should look systematically for opportunities to collaborate, cross-pollinate ideas, adopt solutions from other groups, and align their efforts with broader trends.

top



The Impact of Organizational and IT Factors on Clinical Decision Support Adoption

Authors:
Nareesa Mohammed-Rajput1, Elizabeth Yano2, Bradley Doebbeling1
1Richard Roudebush VA Medical Center, Indianapolis, Indiana
2VA Greater Los Angeles Healthcare System, Los Angeles, California

Abstract:
Our objective was to determine the environmental and organizational factors associated with clinical decision support (CDS) implementation and adoption, and to determine what clinical practice and IT infrastructure and strategy factors are associated with CDS development, implementation and adoption.

A survey of VA Chiefs of Staff and Ambulatory Care Directors was performed in 2007 and 2008 (˜85% participation). Four major variables will be analyzed - extent of CDS implementation, CDS development methods used, CDS use to promote clinical practice guidelines (CPGs), and CPRS template use to promote CPG adherence.

To model the extent of CDS implementation, we will use linear regression. To model both the CDS and CPRS templates used to promote CPGs, we will use ordinal logistic regression. To model the type of CDS development methods used, we will use generalized logistic regression. Once the multiple predictor model is fit, statistical assumptions underlying the model will be validated.

The rationale for this study is to inform understanding of effective approaches and metrics for future implementation of CDS tools. This strategy will help identify and remove barriers to CDS implementation and therefore maximize the effectiveness of the CDS.

top



Health Information Technology Systems Profoundly Impact Users: A Case Study in a Dental School

Authors:
Heather K Hill, Denice C L Stewart, Joan S Ash, Oregon Health & Science University

Abstract:
The purpose of was study is to increase our understanding of the impact of Health Information Technology Systems (HITS) on dental school users when the systems are integrated into chairside patient care. We used qualitative research methods, including interviews, focus groups and observations, to capture the experiences of HITS users at a single institution. Users included administrators, clinical faculty, predoctoral students, support staff and residents. The data were analyzed using a grounded theory approach and nine themes emerged: 1) HITS benefits were disproportionate among users, 2) Communicating about the HITS was challenging, 3) Users experienced a range of strong emotions, 4) The instructor persona diminished, 5) There were shifts in the school's power structure, 6) Allocation of end-users' time shifted, 7) The training and support needs of end-users were significant, 8) Perceived lack of HITS usability made documentation cumbersome for clinicians, and 9) Clinicians' workflow was disrupted. HITS integration into patient care impacts the work of all system users, especially end-users. The themes highlight areas of potential concern for implementers and users in integrating a HITS into patient care.

top





top



DAY 1 POSTER ABSTRACTS

Chronic Illness Management Models and HIT Reform:A Systematic Review of the Literature

Authors:

Rhonda Renee Archie, Suzanne Austin Boren, University of Missouri, Columbia

Abstract:
The objective of this paper was to provide an overview of available evidence and evaluate the chronic disease management (CDM) model, chronic care model (CC) and patient centered primary care collaborative (PCPCC) model principles and its various components as they relate to health information technology (HIT). The authors identified reports of the chronic illness disease management process through systematic electronic database searches. The eligibility criteria were 1) chronic illness intervention model, and 2) HIT usage. Among the eligible articles, both the CDM and CC models showed potential benefits after implementation. Outcomes related to HIT usage were identified. The American Recovery and Reinvestment Act funding has the potential to fund evidence-based clinical guidelines and the development of community-based prevention and wellness programs that address chronic disease rates. The initial evidence suggests a PCPCC model through physician led interdisciplinary team care can play a significant role in the future.

top


Reconstruction of Signaling Transduction Networks from Mass Cytometery Data

Authors:

Robert V Bruggner, Michael Linderman, Karen Sachs, Nikesh Kotecha , Garry Nolan, Stanford University

Abstract:
Aberrant intracellular signaling plays a key role in numerous lethal diseases related to cellular malfunctions (e.g. cancer). Accordingly, an understanding of cell signaling cascades provides crucial insight into disease mechanism and can play a critical role in patient diagnosis and treatment. To facilitate high-throughput analysis of signaling components, instrumentation technologies such as flow cytometry have emerged that enable hight throughput, simultaneous measurement of intra and extra-cellular molecules of a single cell. Prior work in Bayesian network inference demonstrates the ability to automatically reconstruct signaling cascades from flow cytometry data. We expand on this work and present here a hardware-accelerated Markov Chain Monte Carlo (MCMC) implementation to learn signaling cascades from single cell data. Additionally, we present the results of utilizing this pipeline to reconstruct T-Cell receptor signaling cascades in Jurkat cells and discuss implementation issues such as discretization of cytometry data, order and graph sampling methods, and execution time in CPU and GPU based architectures. As the number of simultaneous, single-cell measurements continues to increase, automated approaches such as the one described here will play a crucial role in describing aberrant signaling and provide key insights into disease mechanism and potential disease causing populations.

top


A Successful Model for Organizational Engagement in Bridging to CPOE

Authors:

Authors: Kevin C Chang1, Sheraz F Noormohammad2, Alan D Snell3, JM McCoy1
1Indiana University School of Medicine and Regenstrief Institute, Inc., 2Zanett Inc., and 3St. Vincent Health, Indianapolis, IN

Abstract:
The American Recovery and Reinvestment Act of 2009 included a section of stimulus funds devoted to promoting the use of Electronic Health Records (EHRs), under the HITECH Act. Though "meaningful use" criteria continue to be refined, it is known that one requirement will include the capability for EHR systems to utilize Computerized Physician Order Entry (CPOE). While studies have looked at variations to approaching CPOE implementation, the majority of institutions in the United States do not have CPOE functionality. A potential reason for this is lack of organizational engagement and allocation of appropriate resources. St. Vincent Health, a large multi-hospital health care institution in Indiana, and a ministry of its national parent organization Ascension Health, has successfully designed and executed a unique model of organizational engagement allowing them to bridge their hospitals to CPOE effectively and efficiently. This paper will describe the efforts by St. Vincent Health in reaching these goals including the creation of standardized order sets as an intermediary to CPOE. This timely demonstration may have potential usefulness to organizations undergoing a similar process in designing their own model of organizational engagement and trying to meet "meaningful use" criteria by the proposed deadline of 2011.

top


Voted Best Poster, Day 1
A Molecular Dynamics Study of the AQP0-CaM Interaction

Authors:

Daniel M Clemens, J Alfredo Freites, Katalin Kalman, Doug J Tobias, and James E Hall, University of California, Irvine

Abstract:
Aquaporin 0 (AQP0, previously MIP) is a water transport channel comprising ~60% of protein in lens cell membranes. The water permeability of AQP0 is regulated by intracellular calcium levels through calmodulin (CaM) binding. Altered sensitivity to calcium levels in response to phosphorylation of the C- terminal α-helix has also been observed. The purpose of this study is to elucidate the molecular mechanisms by which AQP0 is regulated by CaM. Specifically, we use atomistic molecular dynamic simulations to explore how various states of phosphorylation influence the interaction between the C- terminal domains of AQP0 in the native (tetrameric, membrane embedded) form with CaM . Our simulations show that CaM can interact with the C-terminus of AQP0 stably in more than one conformation, indicating that there may be various conformational states of the AQP0-CaM complex resulting from different physiological conditions. Furthermore, our simulations of the AQP0-CaM complex confirm previous hypotheses that AQP0 is regulated by calcium through the occlusion of the transmembrane pores of the AQP0 tetramer by CaM. Additionally, our simulations give insight to the molecular mechanisms behind the altered calcium sensitivity of AQP0 in various states of C-terminal helix phosphorylation.

top


Interdisciplinary Taxonomy and Nursing Documentation for Patient-Centered Handoff

Authors:

Sarah Collins, David Vawdrey, Department of Biomedical Informatics, Columbia University

Abstract:
Aim: To identify the overlap between nursing and physician handoff processes and to derive knowledge from electronic nursing documentation to improve handoff communication and predict inpatient complications.

Methods: 1) Content analysis of published nurses' and physicians' handoff taxonomies; 2) Analysis of nursing documentation for 213 cardiac arrest patients from 2008-2009 entered 0-48 hours prior to cardiac arrest.

Results: Four nursing and 10 physician taxonomies were analyzed. Twenty-six interdisciplinary, 16 nursing-specific, and 12 physician-specific taxonomy elements were identified. Preliminary analysis of electronic nursing documentation indicated that 63.6% of patient's structured assessment flowsheets were contextualized by optional narrative ‘comment' fields entered by nurses. 58.5% of comments discussed a related intervention (e.g., "Levophed increased to 13 when MAP [blood pressure] decreased after starting CVVH [dialysis]"), and 14.0% indicated that the nurse notified a physician. Narrative notes frequently discussed abnormal findings.

Conclusion: An interdisciplinary handoff taxonomy and narrative documentation that contextualizes structured EHR data may be useful to standardize and highlight data to support patient-centered handoff. Further analysis of the patterns and content of nursing documentation, including natural language processing, may be helpful for predicting patient complications.

top


Effects of an Integrated Cognitive Support Tool on Provider Performance

Authors:

James L Hellewell, Charlene R Weir, Jonathan R Nebeker, Salt Lake City George E. Whalen VA Medical Center, Salt Lake City UT, and University of Utah Departments of Biomedical Informatics and Internal Medicine, Salt Lake City UT

Abstract:
BACKGROUND: Many studies have shown benefits of clinical decision support systems (CDSS). However, some CDSS have failed to improve various outcomes of interest. The factors that make one CDSS better than another are not well understood. Some researchers have recommended more emphasis be placed on understanding underlying cognitive principles, such as situation models and mental representations of the goals of care.

We propose to test the effect of displaying clinical information organized around goals of care, one function of a new interface design called the Integrated Medication Manager (IMM). Unlike a traditional interface, IMM displays goal interactions graphically as relationships between problems, medications and measured observations. We expect our intervention to decrease the cognitive effort required to review clinical information at the point of care. Accordingly, we anticipate its benefits will be most apparent under conditions of high cognitive load (e.g. in the presence of interruptions).

METHODS: We will use a 2 between (+/- goal display) by 2 within (+/- interruptions) randomized design. A convenience sample of 24 primary care providers will be recruited to participate in this study. Each provider's performance (i.e. accuracy and recall) will be evaluated using 4 pre-designed clinical vignettes.

top


Identifying Adverse Drug Events Using Multi-Relational Subgroup Discovery

Authors:

Larry A Hendrix, David Page, University of Wisconsin-Madison

Abstract:
The U.S. Food and Drug Administration (FDA) has taken the lead in developing and implementing safety standards for pharmaceutical drugs. Product approval is granted when the FDA determines in the premarket phase that a product's benefits outweigh the risks associated with its use by a target population. Although the preapproval process is rigorous and involves randomized, controlled clinical trials, previously unanticipated adverse drug events (ADEs) are often observed when the drug is released on the market to a larger, more diverse population. Therefore there is a need for continued post- marketing surveillance of drugs to identify previously unanticipated ADEs. We cast this task as a machine learning problem where our goal is to predict and/or detect ADEs using patient electronic medical records (EMRs). We provide an initial evaluation of this approach based on experiments with synthetic data provided by the Observational Medical Outcomes Partnership (OMOP).

top


The Crimson Asthma Project

Authors:

Blanca E Himes, Ross Lazarus, Lynn Bry, Isaac S Kohane, Scott T Weiss, Harvard University

Abstract:
Electronic medical records have become a rich and effective source to identify large numbers of subjects for research studies, including genomic studies of complex diseases. The Crimson Asthma Project (CAP) consists of over 10,000 Partners Healthcare patients who were selected for genomics studies of asthma and related phenotypes. Specifically, we identified patients based on extracted de-identified electronic medical record data using resources from the National Center for Biomedical Computing entitled i2b2 project, and we have obtained DNA for these subjects via the Crimson Project, which collects discarded clinical samples from routine healthcare visits. In a pilot study, we have genotyped 220 asthma cases, selected on the basis of ICD-9 codes and medication history, and 853 controls at variants in 39 genes that have been found to be associated with asthma in at least two previous independent populations. Several such associations replicated in CAP at a marginally significant level (i.e. p<0.05), including the association of variants near ORMDL3 on chromosome 17q21, the region most consistently associated with asthma. In addition to providing further evidence of association of some genes to asthma, we show that the CAP is a cost-effective and efficient way to gather a large population for asthma genomics studies.

top


Methods and Modular Architecture for Automated Hypoglycemia Prevention

Authors:

Colleen S Hughes, Stephen D Patek, Marc Breton, Boris Kovatchev, University of Virginia, Charlottesville

Abstract:
The most promising contemporary treatment of diabetes is closed-loop control - a class of emerging "artificial pancreas" systems that are use computerized algorithms to control blood glucose (BG) levels in diabetes. Continuous Glucose Monitors (CGM) serve as sources of real-time information for both automated closed-loop control and advice to patients allowing them to avoid hypo- and hyperglycemic events. Insulin is delivered by programmable insulin pumps. In our research, we have specified and are evaluating a modular architecture for the implementation of automated devices for the closed-loop control of type 1 diabetes. The modular architecture features separate interacting components responsible for continuous safety supervision (e.g. prevention of hypoglycemia), real-time control of insulin delivery, and tailoring of the control to the metabolic and behavioral aspects of individuals. In the process of this research, we are examining the role that CGM can play in algorithms that automatically attenuate insulin pump basal rates, using a quantitative assessment of the risk of hypoglycemia. We have shown that with the addition of insulin pump data and the use of "rescue" carbohydrates when hypoglycemia is otherwise imminent, we can significantly reduce its incidence. We are currently evaluating the efficacy of our methods via extensive computer simulation and in human clinical trials with a hardware/software platform that complies with the modular architecture above. Our results suggest that new methods for hypoglycemia prevention will prove useful in conventional pump therapy, as well as in future open- and closed-loop control systems.

top


Using Sequence Algorithms to Investigate the Performance of Annotation Matching Services

Authors:

Kevin McDade, Uma Chandran, Alex Lisovich, Madhavi Ganapathiraju, Roger Day, University of Pittsburgh

Abstract:
Investigators who wish to integrate gene expression microarray and MS/MS proteomics data rely on annotation retrieval systems to find cognate annotations that represent each mRNA and its translated protein. Integromics analysis, which depends on proper annotation matching, can be a powerful tool in cancer biomarker studies and systems biology applications. A number of integrated experiments have reported discordance between mRNA and protein levels of what should be the same gene product. There are many documented biological and non-biological reasons for this discordance. A ‘back to the basics' approach of using the sequence data as the gold standard may be warranted to investigate the contribution of sequence mismatch to the discordance observed in integrative studies. Using a merged data query from three annotation services, the relationship of the probe set target sequences of the Affymetrix U133 2.0 plus array and each cognate protein is evaluated using sequence analysis algorithms. The comparison metrics are obtained through the usage of the BLASTX, TBLASTN, and n- gram algorithms. This study evaluates the agreement and disagreement between the annotation service mapping output and three mapping outputs generated from the sequence algorithm methods.

top


Exploration of a SOAP model for Emergency Department Reports

Authors:

Danielle Mowery, Jan Wiebe, Pam Jordan, Wendy Chapman, University of Pittsburgh

Abstract:
For many applications in biomedicine, we rely on information extraction to retrieve data from free-text that is relevant for a particular end goal. Understanding whether a condition is mentioned as a subjective (S) patient symptom, an objective (O) test result, a possible diagnostic assessment (A) or hypothetical event in a plan (P) is important for tasks that require highly accurate data retrieval. We are currently investigating whether the SOAP model traditionally used to structure progress notes can be applied to emergency department reports.

In a pilot annotation study of 10 reports (734 sentences), we found the SOAP model can be annotated with high agreement (kappa 81.9%) and most sentences can be classified using the SOAP framework (95%). We developed SOAP classifiers using supervised (support vector machines or SVM) and unsupervised (bootstrapping using conditional probability) methods trained on n-grams. SVM shows respectable f-measure for more prevalent classes (S: 95.6%; O: 94.5%) and clear places for improvement in less prevalent classes (A: 67.7%; P: 78.0%). The unsupervised classifier's precision ranged from excellent (S: 95.2%; O: 95.7%) to good (A: 65.1%; P: 69.7%). Currently, we are developing a larger training and test set and are incorporating syntactic and semantic features to improve performance.

top


Optimization of MRI Processing Pipelines in Neuroscience

Authors:

Nolan Nichols, Jim Brinkley, University of Washington

Abstract:
The application of computable anatomical knowledge resources to arbitrarily organize annotated information in neuroinformatics databases has the potential to rapidly accelerate scientific discovery. Similar to "in-silico" experiments using bioinformatics databases in systems biology, the annotated information produced by image processing pipelines in neuroimaging research provides additional perspective when reorganized and reanalyzed. The need to accelerate the analyses from image- processing pipeline results will require neuroinformatics databases to organize data in a manner amenable to rapid analysis similar to the bioinformatics databases of systems biology. Brain image- processing pipelines automatically annotate values derived from segmented brain regions and fiber bundles/tracts using terms that are being actively incorporated into the FMA. However, web accessible, fully annotated brain-imaging databases are not widely available like those of the genetic and proteomic databases.

We propose a semantic web application to generate views into annotated multi-modal brain-imaging databases. Previous work demonstrated how views were used to integrate fMRI activation data with anatomical knowledge to for specific queries. We will extend this work to allow saving multimodal brain imaging data views, which can then be queried as an information source. Creating views that render brain-imaging datasets more manageable and ready for automated meta-analysis will further accelerate the rate at which scientific discoveries are made.

top


Predicting 30-day Hospital Readmission for Heart Failure Inpatients

Authors:

Julia O'Rourke, Kamal Jethwani, Aurel Cami, Alice Watson, Adrian Zai, Harvard University

Abstract:
Heart Failure (HF) affects an estimated 4.9 million Americans and accounts for approximately one million hospitalizations with an estimated cost to healthcare of $34.8 billion in 2008. Up to half of all patients with heart failure are readmitted to the hospital within 3 to 6 months of discharge. Identifying patients at high risk of readmission is critical for optimizing the distribution of limited resources. We identified a number of behavioral characteristics potentially associated with readmission and extracted them from the note field in the EMR. These variables included: 1) lives alone; 2) non- compliance; 3) refusal; 4) declined care; 5) anxiety; 6) depression and 7) confusion. We then developed a Logistic Regression model to predict 30-day hospital readmission risk using data from 744 patients with a HF admission at Massachusetts General Hospital between 10/1/2007 and 9/30/2008. Starting with 30 potential variables our best predictive model (c-statistics of 0.75) included the following: 1) payor type; 2) whether the word "refuse" appeared in the notes; 3) number of previous admissions with heart failure; and 4) total number of Hematocrit tests performed in the last year. This model highlights the value of certain behavioral predictors and demonstrated greater predictive ability than previously published models.

top


Development of a Hyper-graph Methodology for Analyzing Transcription Factor Networks

Authors:

Sudhir Perincheri, Emad Ramadan, Lyndsay Harris, David Tuck, Yale University

Abstract:
Gene expression profiling has proved useful in delineating molecular subtypes of breast cancer and in the development of various prognostic signatures. We are developing an analytical pipeline that uses hypergraph networks to characterize the transcriptional regulators of these subtypes and signatures. This method denotes the transcription factors in the network as a set of vertices and their gene targets as hyperedges of a hypergraph. A hyperedge differs from an edge of a graph in that it can be a subset of two or more vertices. A network of genes and the candidate transcription factors regulating them is built using this method. Our model then uses a vertex cover problem approach to predict the most important transcription factors regulating the network. We can also use this approach to find transcription factor combinations regulating different functional subsets of genes in the network.

We are optimizing our algorithms using gene expression datasets generated by experiments investigating estrogen receptor mediated transcriptional events. We intend to use this approach on breast cancer data sets and in ongoing clinical trials to generate hypotheses about transcription factors regulating the molecular subtypes of breast cancer and the determinants of therapeutic response to drugs used in our clinical trials.

top


Maximum Entropy Machine Learning for Characterizing the Content of the Literature on Transcription Factors

Authors:

Terry H Shen, Michael F Ochs, Harold P Lehmann, Johns Hopkins University

Abstract:
Transcription factors (TFs) play an important role in both gene expression and regulation through its binding to cis-acting elements in the promoter. There are currently many databases that include manually curated information on transcription factors, but provision of such information is time-consuming and not up to date. Furthermore, current transcription factor databases do not capture information on the context and effects of TFs. Our proposed system, TFExtract, uses a text mining method for the identification of TFs and their contextual information that current manual approaches lack. For this study, we propose to use the maximum entropy machine learning approach in order to automatically extract sets of transcription factor relationships inherent in the data. Information retrieval is first conducted in order to identify a representative corpus of literature related to TFs. The system will use a 100-article training set. Training will be done by eliciting four experts in the area of transcription factors to manually annotate the training set. The system will be tested upon a 100-article testing set. We hypothesize that the TFExtract system will yield a relatively high F-score as compared to comparable systems.

top


Prediction of Transcription Factor Binding Sites Using Both Sequence and Expression Information from Multiple Species

Authors:

Elizabeth Siewert and Katerina Kechris, University of Colorado, Denver

Abstract:
Genome-wide detection of transcription factor binding sites (TFBSs) is difficult because they are short, degenerate sequences buried in unknown locations within regulatory regions of a gene. Earlier prediction methods incorporated genome-wide expression data and promoter sequences into a linear-model framework, regressing measures of expression onto counts of putative TFBSs in promoters for a single species. More recently, it has been shown that including genomic sequence data from multiple species improves the predictive ability of this regression model. We describe two extensions of this single- species, linear regression method using a multivariate modeling framework. The resulting algorithms extend the search space to both sequence and expression information from all available genes across multiple species. We also constrain our multivariate models to account for the phylogenetic relationships among the multiple species. We show that the multiple-species method results in an improvement in the prediction of TFBS over the single species method using several evaluation criteria. In summary, utilizing both sequence and expression data across species appears to be beneficial for genome-wide prediction of TFBSs.

top


Impact of Electronic Prescribing Intervention on Generic Drug Prescribing

Authors:

Shane P Stenner, Qingxia Chen, Kevin B Johnson, Vanderbilt University

Abstract:
Generic prescribing rates remain low in the U.S., despite decades of policy work and recommendations from national organizations. We sought to evaluate the effectiveness of a human-computer interface intervention to increase electronic (e-) prescribing of generic medications.

We reviewed retrospective e-prescribing data from >2,000 prescribers (attending physicians, house staff, and advanced practice nurses) using prescribing logs from July 1, 2005 through September 30, 2008. We applied an interrupted time-series design to assess the rate of generic prescribing before and after adding decision support to suggest generic alternatives. To account for unmeasured covariates, we compared generic prescribing rates using e-prescribing with a random sample of hand-generated prescriptions completed before, immediately post-intervention, and in the final month of the study.

Proportion of generic medication prescriptions increased from 32.0% to 54.1% following intervention (22.1% increase, 95% CI: 21.8%-22.3%), with no diminution in magnitude of improvement post- intervention. There was a larger change in generic prescribing rates among authorized prescribers (24.3%) than nurses (18.4%; adjusted ORs=1.36, 95% CI [1.33-1.40]). Two years post-intervention, proportion of generic prescribing remained significantly higher for e-prescriptions (58.5%; 95% CI [58.1%, 58.9%]) than for all other prescriptions (37.4%; 95% CI [34.9%, 39.9%]); p-value < 0.0001. Changes to an e-prescribing user interface were associated with dramatic and sustained improvements in the rate of outpatient generic prescribing across all specialties.

top


Reverse Non-Formulary Inter Facility Consults as Quality Improvement Project

Authors:

Yuriy Y Vinokur MD, W Paul Nichol MD, Kenric Hammond MD, Veterans Affairs Puget Sound Health Care System, Seattle, WA

Abstract:
Smaller VA hospitals depend on Inter-facility Consults (IFC) to refer patients to tertiary care facilities for services not available at smaller sites. Frequently the referral process requires advance completion of studies plus additional follow-up care, neither of which may be available at the referring site. EMR capabilities permit innovative solutions, but effective implementation requires considering external constraints, such as workflow and business rules. Fourteen providers from 4 referral sites in VISN 20 completed a 60-question email survey of practice patterns and issues involved in the IFC process. We evaluated major obstacles to satisfactory consult completion.

Primary care providers had difficulty completing nephrology and neurology consultation pre-requisites because recommended lab tests were not available routinely. Primary care providers also reported difficulty obtaining local approval of remotely recommended non-formulary medication, causing delays in care and provider discomfort in adjusting doses of unfamiliar specialist-recommended medications.

Historically, smaller VA facilities have used IFC to obtain specialized services available at tertiary care facilities within their VA service network. To resolve identified problems, we suggest creating a "reverse non-formulary medication consult" enabling specialists to request authorization of restricted drugs at the remote site. We plan a follow-up survey of providers to prioritize requisite tests and medications.

top


Evolutionary Importance Continuity Improves Protein Functional Site Discovery

Authors:

Angela D Wilkins, Rhonald Lua, Serkan Erdin, R Matthew Ward, and Olivier Lichtarge, Baylor College of Medicine

Abstract:
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the Evolutionary Trace (ET) ranks the relative importance of residues according to their evolutionary variations. Top-ranked residues cluster spatially into evolutionary hotspots that predict functional sites in structures. Various functions that measure the physical continuity of ET ranks among neighboring residues in the structure or sequence, i.e. clustering, are shown to inform sequence selection and to improve functional site resolution. First, in 110 proteins, the overlap between top-ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure-function motifs and to enzyme function prediction by the Evolutionary Trace Annotation method with better sensitivity of (40 - 53%) and positive predictive value (93 - 94%), suggesting that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.

top


A Usability Evaluation of Cerner First Net at the University of Utah Hospital

Authors:

Neelam Zafar, Bruce Bray, John Hurdle, Nancy Staggers, University of Utah

Abstract:
The American Recovery and Reinvestment Act (ARRA) of 2009 has made available billions of dollars to adopt and "meaningfully use" certified electronic health records (EHRs). The usability of EHR systems is recognized as critical for successful adoption and meaningful use but little systematic evidence has been gathered on the usability of EHRs in practice. The University of Utah hospital implemented a new Emergency Department (ED) Computerized physician order entry (CPOE) module; its usability had not yet been formally evaluated. The goal of this study was to perform an expert usability evaluation on the ED CPOE module at the University Hospital. Four usability experts formally trained in usability evaluation conducted the evaluation. The preliminary findings of this study reveal major violations of the heuristics titled: Match with the Real World, Consistency, Minimalist (extraneous information is distraction) and Document (Help & Documentation). We conclude from the study that Heuristic Evaluation methodologies can be cost effective and an efficient approach to discover usability problems. In this study we not only discovered the flaws in the design of ED CPOE user interface but also verified the important role that an interface design can play in end user adoption of critical systems.

top


DAY 2 POSTER ABSTRACTS

Linguistic Summarization of Human Activity

Author:

Derek Anderson, University of Missouri, Columbia

Abstract:
Linguistic summarization is central to the reliable succinct modeling and inference of human concepts regarding activity. It is also asserted that the inherent and unavoidable uncertainty in this domain is both linguistic and fuzzy. Advantages of the proposed work include the generation of human interpretable confidences, improved rejection of unknown activity, information reduction, complexity management, and the detection of adverse events. Specifically, a vision-based hierarchical soft-computing linguistic summarization framework is proposed. First, images are summarized through the identification of the human and a three-dimensional object called voxel person is built. Next, approximate reasoning is used to linguistically summarize the state of the human at each moment, i.e. image, using features extracted from voxel person. Subsequently, temporal linguistic summarizations are produced from the state membership time series. State summaries are used to infer activity, which are also linguistically summarized and can be subsequently used in a hierarchical similar fashion to recognize additional more specific types of higher-level activity. A demonstration is provided for the goal of elderly activity recognition. The system parameters are designed under the supervision of nurses. The results are compared to probabilistic graphical models for three data sets consisting of student and nurse trained and supervised stunt actor activities.

top


Effects of Search Result Translation on User Performance

Authors:

Steven Bedrick, Bill Hersh, Oregon Health & Science University

Abstract:
The overwhelming majority of today's medical and scientific literature is published in the English language. This represents a significant challenge to clinicians and researchers in parts of the world where English is not a native language. If these individuals wish to make use of the latest medical information, they must do so while reading in a foreign language: a cognitively difficult task, at best. Machine translation (MT) and cross-language information retrieval (CLIR) have the potential to assist non-native- English-speakers to access English-language medical information. For example, a literature search system could potentially automatically translate its results into one or more languages, or provide its users with other forms of linguistic support. This poster describes a user study conducted with Latin American physicians. The aim of the study was to investigate whether automatic result translation affects user performance at completing a document-selection task. In addition to examining the effects of automatic result translation, our study explores the influences of other user factors (such as English proficiency) on performance.

top


Real Time Surveillance of Influenza/Pneumonia Using Grid Computing and Death Records

Authors:

Kailah Davis1, Catherine Staes1, Ronald Price1, Carol Friedman2, Julio C Facelli1
1University of Utah
2Columbia University

Abstract:
Surveillance of deaths due to influenza and pneumonia using death records has the potential to be a relatively inexpensive and real-time approach to tracking and detecting influenza and respiratory illness outbreaks; however, presently such a system does not exist because of the time delays in processing death records: coded national vital statistics data is not available until after 2-3 years. There is an ongoing system for tracking influenza and pneumonia deaths in the 122 cities in the US, but the system is partially manual and not flexible. This poster presents the rationale for designing a real time surveillance system, based on mortality data, using grid and natural language processing tools that will address the current barrier that coded death certificate data is not available for several months. To develop a public health tool that delivers a timely surveillance system for influenza and pneumonia, using death certificates from Utah Department of Health, we integrated grid data services, a grid version of the natural language processing tool, Medlee, and an analytical tool to monitor levels of mortality. This example demonstrates how local, state, and national authorities can automate their Influenza and Pneumonia Surveillance System and expand the number of reporting cities.

top


Privacy Preserving Medical Record Integration with Fault Tolerance

Authors:

Elizabeth Durham, Vanderbilt University, Yuan Xue, Vanderbilt University, Murat Kantarcioglu, University of Texas at Dallas, Brad Malin, Vanderbilt University

Abstract:
Medical data sharing is important for both patient care and research. Repositories, such as health information exchanges and the database of Genotypes and Phenotypes, pool information from multiple, disparate institutions. This sharing of patient data must uphold patient privacy. In fact, deidentifying patient data before sharing for research is a federal requirement.

When integrating data, identifying records that refer to the same individual is essential. This prevents both underpowered associations due to failure to integrate disparate pieces an individual's information, and false associations based on duplicate records in the data. Existing privacy-preserving record linkage techniques are based upon encrypting and comparing patient information. However, this approach is not fault tolerant in that it takes away the ability to determine similarity amongst features.

This work evaluates a privacy preserving, fault tolerant record linkage method, based on encryption and bloom filters, in the medical domain and compares this approach to two existing methods. 1,000 records containing demographic information of patients at Vanderbilt are linked to corrupted versions of these records. The privacy preserving, fault tolerant method achieves higher recall and precision than previous methods and demonstrates that highly accurate privacy preserving integration of medical records is possible and computationally feasible.

top


Prospective Large-scale Evaluation of Rapid Mortality Monitoring in Mali

Author:

Olga H Joos, Johns Hopkins Bloomberg School of Public Health

Abstract:
A key source of public health informatics is vital registration systems. Unfortunately, in many low and middle-income countries vital registration systems have incomplete information input and poor management, resulting in incorrect population statistics. This impacts program planning at the national and international level.

The Institute of International Programs at the Johns Hopkins Bloomberg School of Health is researching the viability of developing a system to capture mortality more efficiently and effectively. Although the current focus of the rapid mortality monitoring project focuses on capturing mortality of children under five by increasing the scope of work of community health workers, it could potentially be applied to capture all deaths as a parallel vital registration system. Considering the challenges currently faced by low and middle-income countries in strengthening public health information systems, this pilot project could potentially function as an effective system until large-scale reform can be accomplished.

The poster will present current vital registration limitations in Mali and the proposed formative research to be implemented in the summer of 2010. It will highlight the objectives, strategies, and tools of the formative research chosen to best develop an effective rapid mortality monitoring system.

top


Designing an Interdisciplinary Enterprise Patient Problem Repository Service

Authors:

Richard Loomis, Howard Goldberg, Blackford Middleton, Harvard University

Abstract:
The benefits of maintaining an up-to-date patient problem list are widely recognized, though the method of incorporating this functionality into the electronic medical record is inconsistent. In addition to facilitating clinical documentation and decision making, a structured problem list offers additional benefits including electronic clinical decision support, alerts and reminders, and billing automation. We are building an patient problem repository service that will ultimately be utilized by numerous clinical applications throughout the Partners Healthcare Enterprise. A multidisciplinary team of clinicians, analysts, software developers, and informaticists determined the workflow and functional requirements. Design sessions utilized several modeling techniques, including flow-diagrams, screen mockups, and interactive prototypes.

Over the course of several months, the team arrived at several common themes. First, there was agreement among clinicians that certain problems are relevant to specific providers, necessitating the ability to filter problems. Second, in order to facilitate problem list maintenance, problem documentation should be seamlessly integrated with other routine clinical documentation. Third, the importance of a robust underlying problem terminology was highlighted, minimizing unstructured, free-text problems in the repository. By assembling a multidisciplinary team to collaborate on the problem repository design, we were able to gather functional requirements for a wide range of clinical settings.

top


The Role of Syntrophic Bacteria in Methanogenic Metabolism in the Human Gut

Authors:

Catherine Lozupone, University of Colorado, Elizabeth Hansen, Washington University, Jeff Gordon, Washington University, and Rob Knight, University of Colorado

Abstract:
The human gut hosts a complex community of microorganisms that can harvest nutrients/energy from otherwise undigestible components of our diets (e.g. complex plant polysaccharides). The efficiency of microbial metabolism is influenced by syntrophy, where one microbe produces compounds that the other requires for growth or removes compounds that inhibit the progress of metabolic reactions. Here I explore the extent to which co-occurrence analysis can identify important microbial interactions in the human gut, using the methanogenic archaeon Methanobrevibacter smithii as a model.

Methanogenic archaea drive bacterial fermentation by preventing the accumulation of metabolic products such as hydrogen and closely associate with specific syntrophic bacteria in other anoxic environments. A preliminary analysis using quantitative PCR for M. smithii and > 1 million bacterial 16S rRNA sequences from the stool of 183 individuals, has identified 24 bacterial phylotypes that are positively associated with M. smithii, but whether this co-occurrence was driven by syntrophy or by similar environmental preferences is hard to predict. To illustrate whether the underlying cause of co-occurrence patterns can be determined using genome sequences, I have developed metabolic reconstruction techniques that can identify known examples of syntrophy and metabolic niche convergence.

top


Measuring Crosstalk Ratios in Biological Pathways

Authors:

Mary F McGuire1, M. Sriram Iyengar1, David W Mercer2
1University of Texas Health Science Center, Houston
2University of Nebraska Medical Center, Omaha

Abstract:
Crosstalk relates to how biological pathways determine functional specificity, how ubiquitous messengers transmit specific information, and how redundant messages crosslink within the system while undesired signals are minimized. Objective, quantitative measures of crosstalk could yield useful insights into signaling mechanisms. Here we propose such a metric applied to disease progression in trauma/critical care.

Method
Biological pathways can be represented as directed graphs that can be made computable when converted to matrices. In matrix terms, the minimum number of molecular interactions (edges) required for a molecular function is the same as the rank, the number of linearly independent rows in the incidence matrix constructed from the edges. A metric of crosstalk is then defined as XTALK = 1- (rank/number of edge rows).

Application
XTALK was calculated and compared across time, outcome of multiple organ failure or not, and interaction type in biological pathways evoked by cytokine signaling in trauma. Preliminary results with temporal differences have shown that the XTALK measure provides quantitative crosstalk ratios that have implications for the timing of therapeutic interventions that target specific molecular interaction types found in disease progression

top


Conditional Distributions in Multivariate Single-cell Data Describe Cell Decision

Authors:

Rachel Melamed, Dana Pe'er, Columbia University

Abstract:
Mammalian immune cells integrate environmental stimuli with intercellular communications (cytokines, receptors) and intracellular protein concentrations (signaling proteins, transcription factors) to make complex decisions about their fates. Using eight-parameter flow cytometry snapshots of naive T cells undergoing stimulation to differentiate, we analyze a high dimensional set of distributions representing protein dependencies, cell populations, and stimulus response. We find the dependencies in the distributions of protein levels that control this decision making. Rather than taking the Bayesian network reconstruction approach to understanding this causality, we attempt a nonparametric approach to finding proteins that can predict noise levels of other proteins, or dependencies in other pairs of proteins. We use the level of each putative causal protein to break up the whole distribution of data into slices, and quantify aspects of the distributions of all other proteins within each slice. If a protein is causal, its levels will change the distributions in these slices. This approach will not only help us identify protein dependencies that regulate the cell state, it will also help us represent this causality in terms of the conditional probabilistic distributions that best represent this type of data. We will use what we learn to improve our Bayesian network learning.

top


Assessing the Quality of Residents' Night Float Sign-out

Authors:

Sharon Meth, Ellen Bass, Thomas Perez, Margaret Plews-Ogan, University of Virginia, Charlottesville

Abstract:
This study helps to characterize the utility of verbal sign-out in order to inform resident training and the development of electronic medical record tools to support sign-out. Observation of night float sign-outs and surveys were used to assess sign-out quality. Observations showed that when the resident providing sign-out to the night float resident did not care for the patient during the day (which occurred because of a sequential sign-out structure) there was significantly less information, different information, and incorrect information contained in the sign-out. Survey results showed that while 96% of night float residents rated their understanding of the patients' acuity highly (4 or 5), only 81% rated that the verbal sign-out was of high quality and only 68% rated the quality of the written sign-out as high. Night float residents reported that 41% of unanticipated events should have been anticipated at sign-out. The majority of unanticipated overnight events were either cardiac or pulmonary in nature. Other significant unanticipated events included transfers to the intensive care unit, cardiopulmonary arrests, and two deaths. Regardless of whether the event should have been anticipated, only 65% of the time did residents rate that the sign-out was helpful with clinical decision making.

top


Predicting the Impact of Clinical and Analytical Validity on Genetic Test Results

Authors:

Casey L Overby, Peter Tarczy-Hornoch, University of Washington

Abstract:
Several genetic tests exploit associations between SNPs and disease. As next generation sequencing technology comes down the pipeline, we will need to evaluate the validity of testing for thousands of genes simultaneously. Our aim is to better understand the proportion of these tests that lead to incorrect diagnosis and screening results. In our research, we consider the 25-mutation panel test used to detect SNPs associated with Cystic Fibrosis (CF). Our approach takes both the clinical and analytical validity of the test into consideration to predict the proportion of a population that: will have missed mutations leading to an incorrect disease classification; and will be incorrectly diagnosed as having the disease. Our predictions are based on clinical validity data reported in GeneReviews, an online publication of expert- authored disease reviews. We also use data from the American College of Medical Genetics and College of American Pathologists external proficiency testing, currently the best available data on analytical validity. We hope to apply this approach as a generalized scheme for predicting the proportion of test results that lead to incorrect disease classification. Subsequently, this information may be useful for evaluating and determining the appropriate use of a particular genetic test in clinical practice.

top


Extracting Structural Connectivity Networks in Diffusion Tensor MRI

Authors:

Gary D Pack, Andrew L Alexander, Vikas Singh, Charles R Dyer, University of Wisconsin-Madison

Abstract:
One of the fundamental questions of neuroscience is how functionally related but structurally separated regions of a living brain interact. Specifically how are the processing centers of brain, groups of neuronal cell bodies called gray matter, connected together by bundles of myelinated axons referred to as white matter. Our focus is on the elucidating the organization and structure of these white matter bundles using information derived from Diffusion MRI. Diffusion MRI allows the in vivo characterization of the local organization of white matter microstructure throughout the volume of the brain. We construct Semiparametric Support Vector models that extract white matter volumetric pathways containing the bundles of neurons that make up the connections between designated gray matter regions. Current techniques primarily rely on strictly local information to propagate streamlines (tracts), along 1D trajectories through a diffusion tensor field making them highly susceptible to noise. The Semiparametric Support Vector model is generated using information from the entire 3D volume of the pathway being extracted. The result is increased robustness and reproducibility over the current state of the art.

top


Development of a Computer-Based Cue System for Motivational Interviewing Counseling

Authors:

Lisa M Quintiliani, Amy Rubin, Jessica A Whiteley, Robert H Friedman, Boston University, and University of Massachusetts Boston

Abstract:
Motivational interviewing [MI] is an effective counseling method for changing modifiable risk behaviors (e.g., smoking) by understanding individuals' specific motivations. However, individually tailored telephone-delivered counseling by humans makes MI susceptible to inconsistent delivery across counselors and clients, which can ultimately reduce study internal and external validity. To improve consistency of MI delivery, this project's purpose is to develop and test a Web-based computer system [CUES] to guide non-professional peer counselors in their MI counseling. We will use task analysis to construct CUES's six functions: (1) schedule calls, (2) assess and respond to diet and physical activity behaviors, (3) collect social-contextual information, (4) assess and respond to theoretical constructs (e.g., confidence), (5) set goals, and (6) generate ‘take-home' messages. For well-defined functions (i.e., 1,2,4), we will construct programmatic representations of dicyclic graph workflows to guide counselors; for functions without defined logic paths (i.e., 3,5,6), we will construct a heuristic-driven contextually-sensitive computer assistant to reinforce main points. In addition to guiding counseling, CUES will facilitate training counselors and periodic quality assurance. CUES will be tested using qualitative and quantitative methods. If CUES is acceptable and promotes adherence to MI principles, it would have widespread applicability for face-to-face counseling delivered by clinicians.

top


A Role for Information Science in Improving Patient Outreach Systems

Authors:

Zeshan A Rajput1,2, Lucky Gunasekara3, Christopher M Murphy4, Joseph Koech 4, and Daniel Ochieng4
1Indiana University, Indianapolis
2Regenstrief Institute, Indianapolis IN
3Stanford University, Palo Alto CA
4Moi University, Eldoret Kenya

Abstract:
Delivering health care is a challenge in resource constrained environments, in part due to increasing patient burdens and poor telecommunication infrastructure. AMPATH is the largest healthcare delivery system for patients with HIV in Africa, and its patient outreach department collects patient demographic information and contacts patients who miss a scheduled visit. In 50% of cases, this requires travel to the patient's home. The department cannot accommodate the growing patient population. We describe a study in progress to identify potential systems to increase efficiency and scalability.

We conducted semi-structured and unstructured interviews to document the current work flow. We derived common themes and highlighted specific information needs. Next we will hold a focus group on these themes and discuss possible solutions to improve patient outreach. To date we have identified two possible solutions:

1) Develop a messaging system to remind patients of upcoming appointments and reduce financial and manpower costs related to visiting patient homes, and 2) Develop an outreach module in the medical record system to facilitate data entry, utilize existing patient data, and automate report generation. By developing solutions in freely available, open source methods we hope to benefit others facing similar constraints.

top


Automated Extraction for Determining Profession of Health Record Documentation

Authors:

Ruth Reeves, Fern FitzHenry, Michael E Matheny, Theodore Speroff, Steven H Brown, Department of Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN, Vanderbilt University Medical Center, Nashville, TN

Abstract:
One of the key issues in healthcare information management is a reliable way to assess the type of information and the informational value of asserted statements. One wants to know "Who said it?" Clinicians, healthcare quality managers, billing coders, and researchers attend routinely to author's profession when searching, interpreting, and abstracting information from clinical record documentation. This crucial datum does not occur in a standardized metadata form in Electronic Health Records. From Veterans Affairs electronic health records, we developed automated algorithms to map from document data to one of a set of 15 profession types in use at the VA (Provider Taxonomy of the National Uniform Claim Committee). We combined three data elements of EHRs: free text, highly variant electronic signature lines, LOINC-developed National Standard document titles, and locally developed document titles to achieve this mapping. The sensitivity and specificity of algorithm performance on 250 training documents was 0.98 and 0.99, respectively compared to human review. Our study demonstrates the feasibility of determining author profession from the combination of document titles and electronic signature lines. Future research includes refining the algorithms to detect specialty and level of training information and to apply them to a larger data set for validation.

top


NUDGES: Using a Choice Architecture to Analyze Clinical Decision-Making

Authors:

Joshua E Richardson, Joan S Ash, Oregon Health & Science University

Abstract:
Behavioral economists question the "rational actor theory" which assumes humans linearly collect information, with that information weigh actions and their benefits, decide, and then act. Behavioral economists instead point out that human decisions use alternate cognitive "types" to make decisions: Type 1 supports rapid, automatic decision-making; Type 2 supports slow, deliberative decision-making. (1) To promote integrated Type 1 and Type 2 decision-making practices, Professors Richard Thaler and Cass Sunstein published a six-facet framework using the acronym, NUDGES: N: iNcentives; U: Understand mappings, D: Defaults, G: Give feedback, E: Expect errors, and S: Structure complex choices. (2) The facets comprise a framework termed, "choice architecture." We applied the choice architecture framework to reframe our analysis of physicians' decision-making and goal attainment. Analyzing clinical decision support as a choice architecture provides a fresh approach to understanding and designing clinical tools meant to improve clinical practice.

A researcher collected data from interviews and observations of community-based physicians that use clinical decision support in their everyday practices. A snowball method was used to recruit different specialties: family practitioners, internists, pediatricians, and OB/GYNs. Physicians were members from three community health care organizations across areas of Oregon; each organization supplied a different electronic health record platform.

References:
1. Slovic P, Finucane ML, Peters E, MacGregor DG. Rational actors or rational fools: implications of the affect heuristic for behavioral economics [Internet]. In: Behavioral Economics and Neoclassical Economics: Continuity or Discontinuity. Great Barrington, MA: 2002. Available from: http://linkinghub.elsevier.com/retrieve/pii/S1053535702001749

2. Thaler RH, Sunstein CR. Nudge: Improving decisions about health, wealth, and happiness. Yale University Press Newhaven and London; 2008.

top


Caenorhabditis elegans Pseudogene Identification

Authors:

Rebecca Robilotto1, Paul Harrison2, Mark Gerstein1
1Yale University
2McGill University

Abstract:
Pseudogenes are copies of genes that no longer produce a functioning protein. They can be classified as either duplicated or processed. Duplicated pseudogenes arise from the duplication of a functional gene that acquires mutations leading to a disablement. A processed pseudogene occurs when an mRNA is reverse-transcribed and inserted randomly in the chromosomal DNA. Pseudogenes can be detected by looking for disruptions in the coding sequence, which can be caused by frameshifts or nonsense mutations. For an accurate annotation of the model organism Caenorhabditis elegans, three different pseudogene identification methods ranging from manual annotation to an automated pipeline were used to create a consensus list of pseudogenes. These pseudogenes can be used to further investigate mutation rates and can lead to questions about the evolution of genes. They can be compared with transcription factor binding sites, along with ncRNA to determine whether they may have adopted a regulatory role in the genome.

top


Voted Best Poster, Day 2
Co-factor of LIM Transcriptional Regulation in Mammary Gland Development

Authors:

Michael L Salmans, Padhraic Smyth, Bogi Andersen, University of California, Irvine

Abstract:
Co-factor of LIM (Clim) proteins recruit LIM domain transcription factors to mediate their interaction with DNA targets. Expression of the Clim2 isoform in the mammary gland is essential as evidenced by transgenic mice expressing a dominant-negative Clim molecule (DN-Clim) under the epithelial-specific control of the K14 promoter. Mammary glands in DN-Clim mice exhibit decreased branching morphogenesis and terminal end bud (TEB) size, delayed ductal elongation, and depletion of the stem cell population. Preliminary gene expression profiling of whole mammary glands from eight week old wild type and DN-Clim mice suggests Clims regulate pathways required for stem cell maintenance, TEB formation, and branching morphogenesis. To further investigate the transcriptional role of Clims in the mouse mammary gland we have isolated TEB and ductal tissue by laser capture microscopy for a time course gene expression profiling of several key stages during pubertal development. Computational analyses will reveal the gene networks that operate during normal mammary gland development throughout puberty and those specifically under transcriptional regulation by Clims. Chromosome immunoprecipitation followed by DNA sequence analysis (ChIP-Seq) will reveal the sequence motifs associated with Clim2 binding and provide insights into the various transcription factors under control of Clim2.

top


The WANDA B System: A Wireless Health Platform

Author:

Diane Suh, University of California, Los Angeles

Abstract:
Heart failure is a leading cause of death in the United States, with around 5 million Americans currently suffering from congestive heart failure. The WANDA B wireless health technology leverages sensor technology and wireless communication to monitor heart failure patient activity and to provide tailored guidance. Patients who have cardiovascular system disorders can measure their weight, blood pressure, activity levels, and other vital signs in a real-time automated fashion. The system was developed in conjunction with the UCLA Nursing School and the UCLA Wireless Health Institute (WHI) for use on actual patients. WANDA B is presently being used with real patients in a clinical trial; initial results of its usage and performance are presented here.

top


Quantitative comparison of flow cytometry samples using Earth Mover's Distance

Authors:

Noah Zimmerman, Leonore Herzenberg, Wayne Moore, Guenther Walther, Stanford University

Abstract:
Increasing interest in high-throughput flow cytometry has spawned a demand for new methods of analyzing data from high-dimensional multi-sample experiments. A host of automated algorithms have been proposed in the literature for automatically gating flow cytometry data, one of the biggest bottlenecks in data analysis. For a typical hi-dimensional flow assay, automated algorithms may identify hundreds of unique cell populations in a single sample. In order to make meaningful comparisons of the resulting cell populations across many samples, we must determine a correspondence of populations between samples. We introduce the application of the Earth Mover's Distance (EMD), a metric for measuring the difference between two multivariate distributions to (i) quantitatively compare flow cytometry samples and (ii) determine population correspondence between two samples.

top