Discover the most talked about and latest scientific content & concepts.

Concept: Programming language


Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.

Concepts: Programming language, Cognition, Reason, Mathematics, Translation, Root, Word, Language


The process of documentation in electronic health records (EHRs) is known to be time consuming, inefficient, and cumbersome. The use of dictation coupled with manual transcription has become an increasingly common practice. In recent years, natural language processing (NLP)-enabled data capture has become a viable alternative for data entry. It enables the clinician to maintain control of the process and potentially reduce the documentation burden. The question remains how this NLP-enabled workflow will impact EHR usability and whether it can meet the structured data and other EHR requirements while enhancing the user’s experience.

Concepts: Knowledge, C, Output, Programming language, Natural language, Linguistics, Electronic health record


Crowdsourcing linguistic phenomena with smartphone applications is relatively new. In linguistics, apps have predominantly been developed to create pronunciation dictionaries, to train acoustic models, and to archive endangered languages. This paper presents the first account of how apps can be used to collect data suitable for documenting language change: we created an app, Dialäkt Äpp (DÄ), which predicts users' dialects. For 16 linguistic variables, users select a dialectal variant from a drop-down menu. DÄ then geographically locates the user’s dialect by suggesting a list of communes where dialect variants most similar to their choices are used. Underlying this prediction are 16 maps from the historical Linguistic Atlas of German-speaking Switzerland, which documents the linguistic situation around 1950. Where users disagree with the prediction, they can indicate what they consider to be their dialect’s location. With this information, the 16 variables can be assessed for language change. Thanks to the playfulness of its functionality, DÄ has reached many users; our linguistic analyses are based on data from nearly 60,000 speakers. Results reveal a relative stability for phonetic variables, while lexical and morphological variables seem more prone to change. Crowdsourcing large amounts of dialect data with smartphone apps has the potential to complement existing data collection techniques and to provide evidence that traditional methods cannot, with normal resources, hope to gather. Nonetheless, it is important to emphasize a range of methodological caveats, including sparse knowledge of users' linguistic backgrounds (users only indicate age, sex) and users' self-declaration of their dialect. These are discussed and evaluated in detail here. Findings remain intriguing nevertheless: as a means of quality control, we report that traditional dialectological methods have revealed trends similar to those found by the app. This underlines the validity of the crowdsourcing method. We are presently extending DÄ architecture to other languages.

Concepts: Programming language, Semiotics, English language, German language, Historical linguistics, Dialect, Linguistics, Language


SUMMARY: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast, customisable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search, and a library of “widgets” performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages. AVAILABILITY: Freely available from under the LGPL license. CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Concepts: Data, Model organism, Type system, Bioinformatics, Programming language, Biological data, Statistics, Data management


Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at

Concepts: Computer software, Programmer, Free software, Programming language, Java, Latex, Source code, Computer program


BACKGROUND: A molecule editor, i.e. a program facilitating graphical input and interactive editing of molecules, is an indispensable part of every cheminformatics or molecular processing system. Today, when a web browser has become the universal scientific user interface, a tool to edit molecules directly within the web browser is essential. One of the most popular tools for molecular structure input on the web is the JME applet. Since its release nearly 15 years ago, however the web environment has changed and Java applets are facing increasing implementation hurdles due to their maintenance and support requirements, as well as security issues. This prompted us to update the JME editor and port it to a modern Internet programming language - JavaScript. SUMMARY: The actual molecule editing Java code of the JME editor was translated into JavaScript with help of the Google Web Toolkit compiler and a custom library that emulates a subset of the GUI features of the Java runtime environment. In this process, the editor was enhanced by additional functionalities including a substituent menu, copy/paste, drag and drop and undo/redo capabilities and an integrated help. In addition to desktop computers, the editor supports molecule editing on touch devices, including iPhone, iPad and Android phones and tablets. In analogy to JME the new editor is named JSME. This new molecule editor is compact, easy to use and easy to incorporate into web pages. CONCLUSIONS: A free molecule editor written in JavaScript was developed and is released under the terms of permissive BSD license. The editor is compatible with JME, has practically the same user interface as well as the web application programming interface. The JSME editor is available for download from the project web page

Concepts: HTML, Web server, Programming language, Web page, Google, World Wide Web, Web browser, Java


BACKGROUND: Although programming in a type-safe and referentiallytransparent style offers several advantages over working withmutable data structures and side effects, this style of programminghas not seen much use in chemistry-related software. Since functionalprogramming languages were designed with referential transparency in mind,these languages offer a lot of support when writing immutable data structuresand side-effects free code. We therefore started implementingour own toolkit based on the above programming paradigms in a modern,versatile programming language. RESULTS: We present our initial results with functionalprogramming in chemistry by first describing an immutable data structurefor molecular graphs together with a couple of simplealgorithms to calculate basic molecular propertiesbefore writing a complete SMILES parser in accordance with theOpenSMILES specification. Along the way we show how to dealwith input validation, error handling, bulk operations, and parallelizationin a purely functional way. At the end we also analyze and improve our algorithmsand data structures in terms of performance and compare itto existing toolkits both object-oriented and purely functional.All code was written inScala, a modern multi-paradigm programming language with a strongsupport for functional programming and a highly sophisticated type system. CONCLUSIONS: We have successfully made the first importantsteps towards a purely functional chemistry toolkit. The data structuresand algorithms presented in this article perform well while at the sametime they can be safely used in parallelized applications, such as computeraided drug design experiments, withoutfurther adjustments. This stands in contrast to existing object-orientedtoolkits where thread safety of data structures and algorithms isa deliberate design decision that can be hard to implement.Finally, the level of type-safety achieved by \emph{Scala}highly increased the reliability of our codeas well as the productivity of the programmers involved in this project.

Concepts: Haskell, C Sharp, Referential transparency, Type system, Purely functional, Programming paradigm, Functional programming, Programming language


The concept of reachable workspace is closely tied to upper limb joint range of motion and functional capability. Currently, no practical and cost-effective methods are available in clinical and research settings to provide arm-function evaluation using an individual’s three-dimensional (3D) reachable workspace. A method to intuitively display and effectively analyze reachable workspace would not only complement traditional upper limb functional assessments, but also provide an innovative approach to quantify and monitor upper limb function.

Concepts: Programming language, Limbs, Lambda calculus, Limb, Upper limb


We present a web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST Server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA whilst minimising client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor (VEP) tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability: The Ensembl REST API can be accessed at and source code is freely available under an Apache 2.0 license from

Concepts: Compiler, C, Language, Programmer, Source code, Java, Computer program, Programming language


Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone.

Concepts: Physician, Latin, Medical terms, Genetic algorithm, Language, Programming language, Time, Linguistics