Journal: Journal of biomedical informatics


Risk sharing arrangements between hospitals and payers together with penalties imposed by the Centers for Medicare and Medicaid (CMS) are driving an interest in decreasing early readmissions. There are a number of published risk models predicting 30 day readmissions for particular patient populations, however they often exhibit poor predictive performance and would be unsuitable for use in a clinical setting. In this work we describe and compare several predictive models, some of which have never been applied to this task and which outperform the regression methods that are typically applied in the healthcare literature. In addition, we apply methods from deep learning to the five conditions CMS is using to penalize hospitals, and offer a simple framework for determining which conditions are most cost effective to target.

Concepts: Scientific method, Regression analysis, Medicare, Health insurance, Hospital, Prediction, Centers for Medicare and Medicaid Services, Medicaid


Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.

Concepts: Pharmacology, Medicine, Performance, Adverse drug reaction, Pharmacy, Pharmacovigilance, American Depositary Receipt, Medical informatics


The Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) has been designated as the recommended clinical reference terminology for use in clinical information systems around the world and is reported to be used in over 50 countries. However, there are still few implementation details. This study examined the implementation of SNOMED CT in terms of design, use and maintenance issues involved in 13 healthcare organisations across eight countries through a series of interviews with 14 individuals. While a great deal of effort has been spent on developing and refining SNOMED CT, there is still much work ahead to bring SNOMED CT into routine clinical use.

Concepts: Health care, Medicine, World, Design, Systematized Nomenclature of Medicine, SNOMED CT, Clinical Data Interchange Standards Consortium


Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over hundred years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research.

Concepts: Bioinformatics, Cancer, Oncology, Data mining, Malignancy, Research and development, Natural language processing, Text mining


Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.

Concepts: Scientific method, Natural language processing, Named entity recognition, Message Understanding Conference, Temporal expressions


When medical data have been successfully recorded or exchanged between systems there appear a need to present the data consistently to ensure that it is clearly understood and interpreted. A standard based user interface can provide interoperability on the visual level.

Concepts: Computing terminology


Patient classification systems (PCSs) are commonly used in nursing units to assess how many nursing care hours are needed to care for patients. These systems then provide staffing and nurse-patient assignment recommendations for a given patient census based on these acuity scores. Our hypothesis is that such systems do not accurately capture workload and we conduct an experiment to test this hypothesis. Specifically, we conducted a survey study to capture nurses' perception of workload in an inpatient unit. 45 nurses from an oncology and surgery unit completed the survey and rated the impact of patient acuity indicators on their perceived workload using a six-point Likert scale. From these ratings we can calculate a workload score for an individual nurse given a set of patient acuity indicators. The approach offers optimization models (prescriptive analytics), which use patient acuity indicators from a commercial PCS as well as a survey-based nurse workload score. The models assigns patients to nurses by distributing acuity scores from the PCS and survey-based perceived workload in a balanced way. Numerical results suggest that the proposed nurse-patient assignment models achieve a balanced assignment and lower overall survey-based perceived workload compared to the assignment based solely on acuity scores from the PCS. This results in an improvement of perceived workload that is upwards of five percent.

Concepts: Patient, Psychometrics, Nursing, Nurse, Likert scale, Nurses, Florence Nightingale, Conducting


Software tools play a critical role in the development and maintenance of biomedical ontologies. One important task that is difficult without software tools is ontology quality assurance. In previous work, we have introduced different kinds of abstraction networks to provide a theoretical foundation for ontology quality assurance tools. Abstraction networks summarize the structure and content of ontologies. One kind of abstraction network that we have used repeatedly to support ontology quality assurance is the partial-area taxonomy. It summarizes structurally and semantically similar concepts within an ontology. However, the use of partial-area taxonomies was ad hoc and not generalizable. In this paper, we describe the Ontology Abstraction Framework (OAF), a unified framework and software system for deriving, visualizing, and exploring partial-area taxonomy abstraction networks. The OAF includes support for various ontology representations (e.g., OWL and SNOMED CT’s relational format). A Protégé plugin for deriving “live partial-area taxonomies” is demonstrated.

Concepts: Taxonomy, Quality assurance, Ontology, Concept, Category of being, Software architecture, WordNet, Folksonomy


Though the genetic etiology of autism is complex, our understanding can be improved by identifying genes and gene-gene interactions that contribute to the development of specific autism subtypes. Identifying such gene groupings will allow individuals to be diagnosed and treated according to their precise characteristics. To this end, we developed a method to associate gene combinations with groups with shared autism traits, targeting genetic elements that distinguish patient populations with opposing phenotypes. Our computational method prioritizes genetic variants for genome-wide association, then utilizes Frequent Pattern Mining to highlight potential interactions between variants. We introduce a novel genotype assessment metric, the Unique Inherited Combination support, which accounts for inheritance patterns observed in the nuclear family while estimating the impact of genetic variation on phenotype manifestation at the individual level. High-contrast variant combinations are tested for significant subgroup associations. We apply this method by contrasting autism subgroups defined by severe or mild manifestations of a phenotype. Significant associations connected 286 genes to the subgroups, including 193 novel autism candidates. 71 pairs of genes have joint associations with subgroups, presenting opportunities to investigate interacting functions. This study analyzed 12 autism subgroups, but our informatics method can explore other meaningful divisions of autism patients, and can further be applied to reveal precise genetic associations within other phenotypically heterogeneous disorders, such as Alzheimer’s disease.

Concepts: DNA, Gene, Genetics, Genotype, Evolution, Biology, Phenotype, Wilhelm Johannsen


Time motion studies were first described in the early 20(th) century in industrial engineering, referring to a quantitative data collection method where an external observer captured detailed data on the duration and movements required to accomplish a specific task, coupled with an analysis focused on improving efficiency. Since then, they have been broadly adopted by biomedical researchers and have become a focus of attention due to the current interest in clinical workflow related factors. However, attempts to aggregate results from these studies have been difficult, resulting from a significant variability in the implementation and reporting of methods. While efforts have been made to standardize the reporting of such data and findings, a lack of common understanding on what “time motion studies” are remains, which not only hinders reviews, but could also partially explain the methodological variability in the domain literature (duration of the observations, number of tasks, multitasking, training rigor and reliability assessments) caused by an attempt to cluster dissimilar sub-techniques. A crucial milestone towards the standardization and validation of time motion studies corresponds to a common understanding, accompanied by a proper recognition of the distinct techniques it encompasses. Towards this goal, we conducted a review of the literature aiming at identifying what is being referred to as “time motion studies”. We provide a detailed description of the distinct methods used in articles referenced or classified as “time motion studies”, and conclude that currently it is used not only to define the original technique, but also to describe a broad spectrum of studies whose only common factor is the capture and/or analysis of the duration of one or more events. To maintain alignment with the existing broad scope of the term, we propose a disambiguation approach by preserving the expanded conception, while recommending the use of a specific qualifier “continuous observation time motion studies” to refer to variations of the original method (the use of an external observer recording data continuously). In addition, we present a more granular naming for sub-techniques within continuous observation time motion studies, expecting to reduce the methodological variability within each sub-technique and facilitate future results aggregation.

Concepts: Time, Scientific method, Data collection, Methodology, Quantitative research, Standardization, Continuity, Reference