Concept: Data analysis
BACKGROUND: Severe eczema in young children is associated with an increased risk of developing asthma and rhino-conjunctivitis. In the general population, however, most cases of eczema are mild to moderate. In an unselected cohort, we studied the risk of current asthma and the co-existence of allergy-related diseases at 6 years of age among children with and without eczema at 2 years of age. METHODS: Questionnaires assessing various environmental exposures and health variables were administered at 2 years of age. An identical health questionnaire was completed at 6 years of age. The clinical investigation of a random subsample ascertained eczema diagnoses, and missing data were handled by multiple imputation analyses. RESULTS: The estimate for the association between eczema at 2 years and current asthma at 6 years was OR=1.80 (95 % CI 1.10-2.96). Four of ten children with eczema at 6 years had the onset of eczema after the age of 2 years, but the co-existence of different allergy-related diseases at 6 years was higher among those with the onset of eczema before 2 years of age. CONCLUSIONS: Although most cases of eczema in the general population were mild to moderate, early eczema was associated with an increased risk of developing childhood asthma. These findings support the hypothesis of an atopic march in the general population.Trial registrationThe Prevention of Allergy among Children in Trondheim study has been identified as ISRCTN28090297 in the international Current Controlled Trials database.
There is controversy on the proposed benefits of publishing mortality rates for individual surgeons. In some procedures, analysis at the level of an individual surgeon may lack statistical power. The aim was to determine the likelihood that variation in surgeon performance will be detected using published outcome data.
BACKGROUND: Experimental datasets are becoming larger and increasingly complex, spanning different data domains, thereby expanding the requirements for respective tool support for their analysis. Networks provide a basis for the integration, analysis and visualization of multi-omics experimental datasets. RESULTS: Here we present VANTED (version 2), a framework for systems biology applications, which comprises a comprehensive set of seven main tasks. These range from network reconstruction, data visualization, integration of various data types, network simulation to data exploration combined with a manifold support of systems biology standards for visualization and data exchange. The offered set of functionalities is instantiated by combining several tasks in order to enable users to view and explore a comprehensive dataset from different perspectives. We describe the system as well as an exemplary workflow. CONCLUSIONS: VANTED is a stand-alone framework which supports scientists during the data analysis and interpretation phase. It is available as a Java open source tool from http://www.vanted.org.
BACKGROUND: Treatment burden can be defined as the self-care practices that patients with chronic illness must perform to respond to the requirements of their healthcare providers, as well as the impact that these practices have on patient functioning and well being. Increasing levels of treatment burden may lead to suboptimal adherence and negative outcomes. Systematic review of the qualitative literature is a useful method for exploring the patient experience of care, in this case the experience of treatment burden. There is no consensus on methods for qualitative systematic review. This paper describes the methodology used for qualitative systematic reviews of the treatment burdens identified in three different common chronic conditions, using stroke as our exemplar. METHODS: Qualitative studies in peer reviewed journals seeking to understand the patient experience of stroke management were sought. Limitations of English language and year of publication 2000 onwards were set. An exhaustive search strategy was employed, consisting of a scoping search, database searches (Scopus, CINAHL, Embase, Medline & PsycINFO) and reference, footnote and citation searching. Papers were screened, data extracted, quality appraised and analysed by two individuals, with a third party for disagreements. Data analysis was carried out using a coding framework underpinned by Normalization Process Theory (NPT). RESULTS: A total of 4364 papers were identified, 54 were included in the review. Of these, 51 (94%) were retrieved from our database search. Methodological issues included: creating an appropriate search strategy; investigating a topic not previously conceptualised; sorting through irrelevant data within papers; the quality appraisal of qualitative research; and the use of NPT as a novel method of data analysis, shown to be a useful method for the purposes of this review. CONCLUSION: The creation of our search strategy may be of particular interest to other researchers carrying out synthesis of qualitative studies. Importantly, the successful use of NPT to inform a coding frame for data analysis involving qualitative data that describes processes relating to self management highlights the potential of a new method for analyses of qualitative data within systematic reviews.
Fusarium head blight is one of the most important and most common diseases of winter wheat. In order to better understanding this disease and to assess the correlations between different factors, 30 cultivars of this cereal were evaluated in a two-year period. Fusarium head blight resistance was evaluated and the concentration of trichothecene mycotoxins was analysed. Grain samples originated from plants inoculated with Fusarium culmorum and naturally infected with Fusarium species. The genetic distance between the tested cultivars was determined and data were analysed using multivariate data analysis methods. Genetic dissimilarity of wheat cultivars ranged between 0.06 and 0.78. They were grouped into three distinct groups after cluster analysis of genetic distance. Wheat cultivars differed in resistance to spike and kernel infection and in resistance to spread of Fusarium within a spike (type II). Only B trichothecenes (deoxynivalenol, 3-acetyldeoxynivalenol and nivalenol) produced by F. culmorum in grain samples from inoculated plots were present. In control samples trichothecenes of groups A (H-2 toxin, T-2 toxin, T-2 tetraol, T-2 triol, scirpentriol, diacetoxyscirpenol) and B were detected. On the basis of Fusarium head blight assessment and analysis of trichothecene concentration in the grain relationships between morphological characters, Fusarium head blight resistance and mycotoxins in grain of wheat cultivars were examined. The results were used to create of matrices of distance between cultivars - for trichothecene concentration in inoculated and naturally infected grain as well as for FHB resistance Correlations between genetic distance versus resistance/mycotoxin profiles were calculated using the Mantel test. A highly significant correlation between genetic distance and mycotoxin distance was found for the samples inoculated with Fusarium culmorum. Significant but weak relationships were found between genetic distance matrix and FHB resistance or trichothecene concentration in naturally infected grain matrices.
Hierarchical classification (HC) stratifies and classifies data from broad classes into more specific classes. Unlike commonly used data classification strategies, this enables the probabilistic prediction of unknown classes at different levels, minimizing the burden of incomplete databases. Despite these advantages, its translational application in biomedical sciences has been limited. We describe and demonstrate the implementation of a HC approach for “omics-driven” classification of 15 bacterial species at various taxonomic levels achieving 90-100% accuracy, and 9 cancer types into morphological types and 35 subtypes with 99% and 76% accuracy, respectively. Unknown bacterial species were probabilistically assigned with 100% accuracy to their respective genus or family using mass spectra (n = 284). Cancer types were predicted by mRNA data (n = 1960) for most subtypes with 95-100% accuracy. This has high relevance in clinical practice where complete datasets are difficult to compile with the continuous evolution of diseases and emergence of new strains, yet prediction of unknown classes, such as bacterial species, at upper hierarchy levels may be sufficient to initiate antimicrobial therapy. The algorithms presented here can be directly translated into clinical-use with any quantitative data, and have broad application potential, from unlabeled sample identification, to hierarchical feature selection, and discovery of new taxonomic variants.
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 3 years ago
Many PhD programs incorporate boot camps and summer bridge programs to accelerate the development of doctoral students' research skills and acculturation into their respective disciplines. These brief, high-intensity experiences span no more than several weeks and are typically designed to expose graduate students to data analysis techniques, to develop scientific writing skills, and to better embed incoming students into the scholarly community. However, there is no previous study that directly measures the outcomes of PhD students who participate in such programs and compares them to the outcomes of students who did not participate. Likewise, no previous study has used a longitudinal design to assess these outcomes over time. Here we show that participation in such programs is not associated with detectable benefits related to skill development, socialization into the academic community, or scholarly productivity for students in our sample. Analyzing data from 294 PhD students in the life sciences from 53 US institutions, we found no statistically significant differences in outcomes between participants and nonparticipants across 115 variables. These results stand in contrast to prior studies presenting boot camps as effective interventions based on participant satisfaction and perceived value. Many universities and government agencies (e.g., National Institutes of Health and National Science Foundation) invest substantial resources in boot camp and summer bridge activities in the hopes of better supporting scientific workforce development. Our findings do not reveal any measurable benefits to students, indicating that an allocation of limited resources to alternative strategies with stronger empirical foundations warrants consideration.
Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is “saturation.” Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept “information power” to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power depends on (a) the aim of the study, (b) sample specificity, © use of established theory, (d) quality of dialogue, and (e) analysis strategy. We present a model where these elements of information and their relevant dimensions are related to information power. Application of this model in the planning and during data collection of a qualitative study is discussed.
Cluster analysis is aimed at classifying elements into categories on the basis of their similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern recognition. We propose an approach based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. This idea forms the basis of a clustering procedure in which the number of clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and clusters are recognized regardless of their shape and of the dimensionality of the space in which they are embedded. We demonstrate the power of the algorithm on several test cases.
Human neuroscience research faces several challenges with regards to reproducibility. While scientists are generally aware that data sharing is important, it is not always clear how to share data in a manner that allows other labs to understand and reproduce published findings. Here we report a new open source tool, AFQ-Browser, that builds an interactive website as a companion to a diffusion MRI study. Because AFQ-Browser is portable-it runs in any web-browser-it can facilitate transparency and data sharing. Moreover, by leveraging new web-visualization technologies to create linked views between different dimensions of the dataset (anatomy, diffusion metrics, subject metadata), AFQ-Browser facilitates exploratory data analysis, fueling new discoveries based on previously published datasets. In an era where Big Data is playing an increasingly prominent role in scientific discovery, so will browser-based tools for exploring high-dimensional datasets, communicating scientific discoveries, aggregating data across labs, and publishing data alongside manuscripts.