Concept: Computer science
Women comprise a minority of the Science, Technology, Engineering, Mathematics, and Medicine (STEMM) workforce. Quantifying the gender gap may identify fields that will not reach parity without intervention, reveal underappreciated biases, and inform benchmarks for gender balance among conference speakers, editors, and hiring committees. Using the PubMed and arXiv databases, we estimated the gender of 36 million authors from >100 countries publishing in >6000 journals, covering most STEMM disciplines over the last 15 years, and made a web app allowing easy access to the data (https://lukeholman.github.io/genderGap/). Despite recent progress, the gender gap appears likely to persist for generations, particularly in surgery, computer science, physics, and maths. The gap is especially large in authorship positions associated with seniority, and prestigious journals have fewer women authors. Additionally, we estimate that men are invited by journals to submit papers at approximately double the rate of women. Wealthy countries, notably Japan, Germany, and Switzerland, had fewer women authors than poorer ones. We conclude that the STEMM gender gap will not close without further reforms in education, mentoring, and academic publishing.
Recently, we proposed that Brainets, i.e. networks formed by multiple animal brains, cooperating and exchanging information in real time through direct brain-to-brain interfaces, could provide the core of a new type of computing device: an organic computer. Here, we describe the first experimental demonstration of such a Brainet, built by interconnecting four adult rat brains. Brainets worked by concurrently recording the extracellular electrical activity generated by populations of cortical neurons distributed across multiple rats chronically implanted with multi-electrode arrays. Cortical neuronal activity was recorded and analyzed in real time, and then delivered to the somatosensory cortices of other animals that participated in the Brainet using intracortical microstimulation (ICMS). Using this approach, different Brainet architectures solved a number of useful computational problems, such as discrete classification, image processing, storage and retrieval of tactile information, and even weather forecasting. Brainets consistently performed at the same or higher levels than single rats in these tasks. Based on these findings, we propose that Brainets could be used to investigate animal social behaviors as well as a test bed for exploring the properties and potential applications of organic computers.
We describe a set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software.
Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don’t know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.
Driven by advances in materials and computer science, researchers are attempting to design systems where the computer and material are one and the same entity. Using theoretical and computational modeling, we design a hybrid material system that can autonomously transduce chemical, mechanical, and electrical energy to perform a computational task in a self-organized manner, without the need for external electrical power sources. Each unit in this system integrates a self-oscillating gel, which undergoes the Belousov-Zhabotinsky (BZ) reaction, with an overlaying piezoelectric (PZ) cantilever. The chemomechanical oscillations of the BZ gels deflect the PZ layer, which consequently generates a voltage across the material. When these BZ-PZ units are connected in series by electrical wires, the oscillations of these units become synchronized across the network, where the mode of synchronization depends on the polarity of the PZ. We show that the network of coupled, synchronizing BZ-PZ oscillators can perform pattern recognition. The “stored” patterns are set of polarities of the individual BZ-PZ units, and the “input” patterns are coded through the initial phase of the oscillations imposed on these units. The results of the modeling show that the input pattern closest to the stored pattern exhibits the fastest convergence time to stable synchronization behavior. In this way, networks of coupled BZ-PZ oscillators achieve pattern recognition. Further, we show that the convergence time to stable synchronization provides a robust measure of the degree of match between the input and stored patterns. Through these studies, we establish experimentally realizable design rules for creating “materials that compute.”
In the 1940s, the first generation of modern computers used vacuum tube oscillators as their principle components, however, with the development of the transistor, such oscillator based computers quickly became obsolete. As the demand for faster and lower power computers continues, transistors are themselves approaching their theoretical limit and emerging technologies must eventually supersede them. With the development of optical oscillators and Josephson junction technology, we are again presented with the possibility of using oscillators as the basic components of computers, and it is possible that the next generation of computers will be composed almost entirely of oscillatory devices. Here, we demonstrate how coupled threshold oscillators may be used to perform binary logic in a manner entirely consistent with modern computer architectures. We describe a variety of computational circuitry and demonstrate working oscillator models of both computation and memory.
BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a standalone application.
Background Genome-wide association studies have become very popular in identifyinggenetic contributions to phenotypes. Millions of SNPs are being tested fortheir association with diseases and traits using linear or logistic regression models.This conceptually simple strategy encounters the following computational issues: a largenumber of tests and very large genotype files (many Gigabytes) which cannot bedirectly loaded into the software memory. One of the solutions applied on agrand scale is cluster computing involving large-scale resources.We show how to speed up the computations using matrix operations in pure R code.Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes.Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them intocomputer memory becomes an important issue. However, much improvement can bemade if the data is structured beforehand in a way allowing for easy access to blocks ofSNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression.We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.Conclusions We provide very fast algorithms for GWAS written in pure R code. We also showhow to rearrange SNP data for fast access.
Spike pattern classification is a key topic in machine learning, computational neuroscience, and electronic device design. Here, we offer a new supervised learning rule based on Support Vector Machines (SVM) to determine the synaptic weights of a leaky integrate-and-fire (LIF) neuron model for spike pattern classification. We compare classification performance between this algorithm and other methods sharing the same conceptual framework. We consider the effect of postsynaptic potential (PSP) kernel dynamics on patterns separability, and we propose an extension of the method to decrease computational load. The algorithm performs well in generalization tasks. We show that the peak value of spike patterns separability depends on a relation between PSP dynamics and spike pattern duration, and we propose a particular kernel that is well-suited for fast computations and electronic implementations.