SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Curve fitting

28

Electrocardiogram (ECG) based biometric matching suffers from high misclassification error with lower sampling frequency data. This situation may lead to an unreliable and vulnerable identity authentication process in high security applications. In this paper, quality enhancement techniques for ECG data with low sampling frequency has been proposed for person identification based on piecewise cubic Hermite interpolation (PCHIP) and piecewise cubic spline interpolation (SPLINE). A total of 70 ECG recordings from 4 different public ECG databases with 2 different sampling frequencies were applied for development and performance comparison purposes. An analytical method was used for feature extraction. The ECG recordings were segmented into two parts: the enrolment and recognition datasets. Three biometric matching methods, namely, Cross Correlation (CC), Percent Root-Mean-Square Deviation (PRD) and Wavelet Distance Measurement (WDM) were used for performance evaluation before and after applying interpolation techniques. Results of the experiments suggest that biometric matching with interpolated ECG data on average achieved higher matching percentage value of up to 4% for CC, 3% for PRD and 94% for WDM. These results are compared with the existing method when using ECG recordings with lower sampling frequency. Moreover, increasing the sample size from 56 to 70 subjects improves the results of the experiment by 4% for CC, 14.6% for PRD and 0.3% for WDM. Furthermore, higher classification accuracy of up to 99.1% for PCHIP and 99.2% for SPLINE with interpolated ECG data as compared of up to 97.2% without interpolation ECG data verifies the study claim that applying interpolation techniques enhances the quality of the ECG data.

Concepts: Aliasing, Curve fitting, Spline interpolation, Interpolation, Polynomial interpolation, Spline, Cubic Hermite spline, Hermite interpolation

10

We analyze the evolution of the risk of cycling in Seville before and after the implementation of a network of segregated cycle tracks in the city. Specifically, we study the evolution of the risk for cyclists of being involved in a collision with a motor vehicle, using data reported by the traffic police along the period 2000-2013, i.e. seven years before and after the network was built. A sudden drop of such risk was observed after the implementation of the network of bikeways. We study, through a multilinear regression analysis, the evolution of the risk by means of explanatory variables representing changes in the built environment, specifically the length of the bikeways and a stepwise jump variable taking the values 0/1 before/after the network was implemented. We found that this last variable has a high value as explanatory variable, even higher than the length of the network, thus suggesting that networking the bikeways has a substantial effect on cycling safety by itself and beyond the mere increase in the length of the bikeways. We also analyze safety in numbers through a non-linear regression analysis. Our results fully agree qualitatively and quantitatively with the results previously reported by Jacobsen (2003), thus providing an independent confirmation of Jacobsen’s results. Finally, the mutual causal relationships between the increase in safety, the increase in the number of cyclists and the presence of the network of bikeways are discussed.

Concepts: Scientific method, Regression analysis, Linear regression, The Network, Computer network, Stepwise regression, Automobile, Curve fitting

5

The use of the least squares method to calculate the best-fitting line through a two-dimensional scatter plot typically requires the user to assume that one of the variables depends on the other. However, in many cases the relationship between the two variables is more complex, and it is not valid to say that one variable is independent and the other is dependent. When analysing such data researchers should consider plotting the three regression lines that can be calculated for any two-dimensional scatter plot.

Concepts: Regression analysis, Vector space, Least squares, Ordinary least squares, Linear least squares, Carl Friedrich Gauss, Curve fitting, Scatter plot

4

Appropriate large-scale citizen-science data present important new opportunities for biodiversity modelling, due in part to the wide spatial coverage of information. Recently proposed occupancy modelling approaches naturally incorporate random effects in order to account for annual variation in the composition of sites surveyed. In turn this leads to Bayesian analysis and model fitting, which are typically extremely time consuming. Motivated by presence-only records of occurrence from the UK Butterflies for the New Millennium data base, we present an alternative approach, in which site variation is described in a standard way through logistic regression on relevant environmental covariates. This allows efficient occupancy model-fitting using classical inference, which is easily achieved using standard computers. This is especially important when models need to be fitted each year, typically for many different species, as with British butterflies for example. Using both real and simulated data we demonstrate that the two approaches, with and without random effects, can result in similar conclusions regarding trends. There are many advantages to classical model-fitting, including the ability to compare a range of alternative models, identify appropriate covariates and assess model fit, using standard tools of maximum likelihood. In addition, modelling in terms of covariates provides opportunities for understanding the ecological processes that are in operation. We show that there is even greater potential; the classical approach allows us to construct regional indices simply, which indicate how changes in occupancy typically vary over a species' range. In addition we are also able to construct dynamic occupancy maps, which provide a novel, modern tool for examining temporal changes in species distribution. These new developments may be applied to a wide range of taxa, and are valuable at a time of climate change. They also have the potential to motivate citizen scientists.

Concepts: Time, Regression analysis, Ecology, Maximum likelihood, Model, Bayesian inference, Likelihood function, Curve fitting

2

The six-minute walk test (6MWT) is commonly used to quantify exercise capacity in patients with several cardio-pulmonary diseases. Oxygen uptake ([Formula: see text]O2) kinetics during 6MWT typically follow 3 distinct phases (rest, exercise, recovery) that can be modeled by nonlinear regression. Simultaneous modeling of multiple kinetics requires nonlinear mixed models methodology. To the best of our knowledge, no such curve-fitting approach has been used to analyze multiple [Formula: see text]O2 kinetics in both research and clinical practice so far.

Concepts: Regression analysis, Medicine, Clinical trial, Pneumonia, Chronic obstructive pulmonary disease, Model, Greatest hits, Curve fitting

2

BACKGROUND: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). RESULTS: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. CONCLUSIONS: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.

Concepts: Regression analysis, Linear regression, Biology, Correlation and dependence, Real number, Matrices, Curve fitting, Interpolation

1

The need for assay characterization is ubiquitous in quantitative mass spectrometry-based proteomics. Among many assay characteristics, the limit of blank (LOB) and limit of detection (LOD) are two particularly useful figures of merit. LOB and LOD are determined by repeatedly quantifying the observed intensities of peptides in samples with known peptide concentrations, and deriving an intensity versus concentration response curve. Most commonly, a weighted linear or logistic curve is fit to the intensity-concentration response, and LOB and LOD are estimated from the fit. Here we argue that these methods inaccurately characterize assays where observed intensities level off at low concentrations, which is a common situation in multiplexed systems. This manuscript illustrates the deficiencies of these methods, and proposes an alternative approach based on non-linear regression that overcomes these inaccuracies. We evaluated the performance of the proposed method using computer simulations, and using eleven experimental datasets acquired in Data-Independent Acquisition (DIA), Parallel Reaction Monitoring (PRM), and Selected Reaction Monitoring (SRM) mode. When the intensity levels off at low concentrations, the non-linear model changes the estimates of LOB/LOD upwards, in some datasets by 20-40%. In absence of a low concentration intensity leveling off, the estimates of LOB/LOD obtained with non-linear statistical modeling were identical to those of weighted linear regression. We implemented the non-linear regression approach in the open-source R-based software MSstats, and advocate its general use for characterization of mass spectrometry-based assays.

Concepts: Regression analysis, Linear regression, Statistics, Concentration, Curve fitting, Nonlinear system, Nonlinear regression

1

Some debates exist regarding the association of coffee consumption with hypertension risk. We performed a meta-analysis including dose-response analysis aimed to derive a more quantitatively precise estimation of this association. PubMed and Embase were searched for cohort studies published up to 18 July 2017. Fixed-effects generalized least-squares regression models were used to assess the quantitative association between coffee consumption and hypertension risk across studies. Restricted cubic spline was used to model the dose-response association. We identified eight articles (10 studies) investigating the risk of hypertension with the level of coffee consumption, including 243,869 individuals and 58,094 incident cases of hypertension. We found no evidence of a nonlinear dose-response association of coffee consumption and hypertension (P nonlinearity = 0.243). The risk of hypertension was reduced by 2% (relative risk (RR) = 0.98, 95% confidence interval (CI) 0.98-0.99) with each one cup/day increment of coffee consumption. With the linear cubic spline model, the RRs of hypertension risk were 0.97 (95% CI 0.95-0.99), 0.95 (95% CI 0.91-0.99), 0.92 (95% CI 0.87-0.98), and 0.90 (95% CI 0.83-0.97) for 2, 4, 6, and 8 cups/day, respectively, compared with individuals with no coffee intakes. This meta-analysis provides quantitative evidence that consumption of coffee was inversely associated with the risk of hypertension in a dose-response manner.

Concepts: Regression analysis, Cohort study, Research methods, Epidemiology, Systematic review, Relative risk, Evaluation methods, Curve fitting

1

Statistical models for assessing risk of type 2 diabetes are usually additive with linear terms that use non-nationally representative data. The objective of this study was to use nationally representative data on diabetes risk factors and spline regression models to determine the ability of models with nonlinear and interaction terms to assess the risk of type 2 diabetes.

Concepts: Regression analysis, Statistics, Diabetes mellitus type 2, Curve fitting, Interpolation, Spline

1

The new diagnostic threshold of hemoglobin A(1c) was made based on evidence from cross-sectional studies, and no longitudinal study supports its validity. To examine whether hemoglobin A(1c) of 6.5% or higher defines a threshold for elevated risk of incident retinopathy, we analyzed longitudinal data of 19,897 Japanese adults who underwent a health checkup in 2006 and were followed-up 3 years later. We used logistic regression models and restricted cubic spline models to examine the relationship between baseline hemoglobin A(1c) levels and the prevalence and the 3-year incidence of retinopathy. The restricted cubic spline model indicated a possible threshold for the risk of incident retinopathy at hemoglobin A(1c) levels of 6.0-7.0%. Logistic regression analysis found that individuals with hemoglobin A(1c) levels of 6.5-6.9% were at significantly higher risk of developing retinopathy at 3 years compared with those with hemoglobin A(1c) levels of 5.0-5.4% (adjusted odds ratio, 2.35 [95% CI 1.08-5.11]). Those with hemoglobin A(1c) levels between 5.5 and 6.4% exhibited no evidence of elevated risks. We did not observe a threshold in the analysis of prevalent retinopathy. Our longitudinal results support the validity of the new hemoglobin A(1c) threshold of 6.5% or higher for diagnosing diabetes.

Concepts: Regression analysis, Logistic regression, Longitudinal study, Epidemiology, Medical statistics, Cross-sectional study, Curve fitting, Interpolation