Concept: Familywise error rate
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 4 years ago
The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging.
Growing interest in personalised medicine and targeted therapies is leading to an increase in the importance of subgroup analyses. If it is planned to view treatment comparisons in both a predefined subgroup and the full population as co-primary analyses, it is important that the statistical analysis controls the familywise type I error rate. Spiessens and Debois (Cont. Clin. Trials, 2010, 31, 647-656) recently proposed an approach specific for this setting, which incorporates an assumption about the correlation based on the known sizes of the different groups, and showed that this is more powerful than generic multiple comparisons procedures such as the Bonferroni correction. If recruitment is slow relative to the length of time taken to observe the outcome, it may be efficient to conduct an interim analysis. In this paper, we propose a new method for an adaptive clinical trial with co-primary analyses in a predefined subgroup and the full population based on the conditional error function principle. The methodology is generic in that we assume test statistics can be taken to be normally distributed rather than making any specific distributional assumptions about individual patient data. In a simulation study, we demonstrate that the new method is more powerful than previously suggested analysis strategies. Furthermore, we show how the method can be extended to situations when the selection is not based on the final but on an early outcome. We use a case study in a targeted therapy in oncology to illustrate the use of the proposed methodology with non-normal outcomes. Copyright © 2012 John Wiley & Sons, Ltd.
In comparing multiple treatments, 2 error rates that have been studied extensively are the familywise and false discovery rates. Different methods are used to control each of these rates. Yet, it is rare to find studies that compare the same methods on both of these rates, and also on the per-family error rate, the expected number of false rejections. Although the per-family error rate and the familywise error rate are similar in most applications when the latter is controlled at a conventional low level (e.g., .05), the 2 measures can diverge considerably with methods that control the false discovery rate at that same level. Furthermore, we shall consider both rejections of true hypotheses (Type I errors) and rejections of false hypotheses where the observed outcomes are in the incorrect direction (Type III errors). We point out that power estimates based on the number of correct rejections do not consider the pattern of those rejections, which is important in interpreting the total outcome. The present study introduces measures of interpretability based on the pattern of separation of treatments into nonoverlapping sets and compares methods on these measures. In general, range-based (configural) methods are more likely to obtain interpretable patterns based on treatment separation than individual p-value-based measures. Recommendations for practice based on these results are given in the article. Although the article is complex, these recommendations can be understood without the necessity for detailed perusal of the supporting material. (PsycINFO Database Record © 2013 APA, all rights reserved).
RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of differential expression analysis with RNA-seq data. One challenge is that there are no closed-form formulae to calculate power for the popularly applied tests for differential expression analysis. In addition, false discovery rate (FDR), instead of family-wise type I error rate, is controlled for the multiple testing error in RNA-seq data analysis. So far, there are very few proposals on sample size calculation for RNA-seq experiments.
With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on pre-screening to identify the level of resolution of distinct hypotheses. We show how FDR controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the NFBC66 cohort study.
Many psychologists do not realize that exploratory use of the popular multiway analysis of variance harbors a multiple-comparison problem. In the case of two factors, three separate null hypotheses are subject to test (i.e., two main effects and one interaction). Consequently, the probability of at least one Type I error (if all null hypotheses are true) is 14 % rather than 5 %, if the three tests are independent. We explain the multiple-comparison problem and demonstrate that researchers almost never correct for it. To mitigate the problem, we describe four remedies: the omnibus F test, control of the familywise error rate, control of the false discovery rate, and preregistration of the hypotheses.
The multi-arm multi-stage (MAMS) design described by Royston et al. [Stat Med. 2003;22(14):2239-56 and Trials. 2011;12:81] can accelerate treatment evaluation by comparing multiple treatments with a control in a single trial and stopping recruitment to arms not showing sufficient promise during the course of the study. To increase efficiency further, interim assessments can be based on an intermediate outcome (I) that is observed earlier than the definitive outcome (D) of the study. Two measures of type I error rate are often of interest in a MAMS trial. Pairwise type I error rate (PWER) is the probability of recommending an ineffective treatment at the end of the study regardless of other experimental arms in the trial. Familywise type I error rate (FWER) is the probability of recommending at least one ineffective treatment and is often of greater interest in a study with more than one experimental arm.
The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as loss-of-function and missense variants, are more likely than others to affect protein function and are therefore more likely to be causative. Using data from whole-genome sequencing of 2,636 Icelanders and the association results for 96 quantitative and 123 binary phenotypes, we estimated the enrichment of association signals by sequence annotation. We propose a weighted Bonferroni adjustment that controls for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have estimated in Iceland to derive significance thresholds for other populations with different numbers and combinations of sequence variants.
Analysis of functional magnetic resonance imaging (fMRI) data typically involves over one hundred thousand independent statistical tests; therefore, it is necessary to correct for multiple comparisons to control familywise error. Eklund, Nichols, and Knutsson (2016, Proceedings of the National Academy of Sciences of the United States of America, 113, 7900-7905) used resting-state fMRI data to evaluate commonly employed methods to correct for multiple comparison and reported unacceptable rates of familywise error. Eklund et al.’s analysis was based on the assumption that resting-state fMRI data reflects null data; however, their “null data” actually reflected default network activity that inflated familywise error. As such, Eklund et al.’s results provide no basis to question the validity of the tens of thousands of published fMRI studies that have corrected for multiple comparisons or the commonly employed methods to correct for multiple comparisons.
The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS.