Concept: Graph theory
Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions.
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
Weight-based stigma compromises the social networks of overweight children. To date, research on the position of overweight children in their peer network has focused only on friendship relations, and not on negative relationship dimensions. This study examined how overweight was associated with relations of friendship and dislike (antipathies) in the peer group. Exponential random graph models (ERGM) were used to examine friendship and antipathy relations among overweight children and their classmates, using a sub-sample from the TRacking Adolescents' Individual Lives Survey (N = 504, M age 11.4). Findings showed that overweight children were less likely to receive friendship nominations, and were more likely to receive dislike nominations. Overweight children were also more likely than their non-overweight peers to nominate classmates that they disliked. Together, the results indicate that positive and negative peer relations are impacted by children’s weight status, and are relevant to addressing the social marginalization of overweight children.
Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of approaches have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes) from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
We introduce a new method for detecting communities of arbitrary size in an undirected weighted network. Our approach is based on tracing the path of closest-friendship between nodes in the network using the recently proposed Generalized Erds Numbers. This method does not require the choice of any arbitrary parameters or null models, and does not suffer from a system-size resolution limit. Our closest-friend community detection is able to accurately reconstruct the true network structure for a large number of real world and artificial benchmarks, and can be adapted to study the multi-level structure of hierarchical communities as well. We also use the closeness between nodes to develop a degree of robustness for each node, which can assess how robustly that node is assigned to its community. To test the efficacy of these methods, we deploy them on a variety of well known benchmarks, a hierarchal structured artificial benchmark with a known community and robustness structure, as well as real-world networks of coauthorships between the faculty at a major university and the network of citations of articles published in Physical Review. In all cases, microcommunities, hierarchy of the communities, and variable node robustness are all observed, providing insights into the structure of the network.
Network motifs are small connected sub-graphs that have recently gathered much attention to discover structural behaviors of large and complex networks. Finding motifs with any size is one of the most important problems in complex and large networks. It needs fast and reliable algorithms and tools for achieving this purpose. CytoKavosh is one of the best choices for finding motifs with any given size in any complex network. It relies on a fast algorithm, Kavosh, which makes it faster than other existing tools. Kavosh algorithm applies some well known algorithmic features and includes tricky aspects, which make it an efficient algorithm in this field. CytoKavosh is a Cytoscape plug-in which supports us in finding motifs of given size in a network that is formerly loaded into the Cytoscape work-space (directed or undirected). High performance of CytoKavosh is achieved by dynamically linking highly optimized functions of Kavosh’s C++ to the Cytoscape Java program, which makes this plug-in suitable for analyzing large biological networks. Some significant attributes of CytoKavosh is efficiency in time usage and memory and having no limitation related to the implementation in motif size. CytoKavosh is implemented in a visual environment Cytoscape that is convenient for the users to interact and create visual options to analyze the structural behavior of a network. This plug-in can work on any given network and is very simple to use and generates graphical results of discovered motifs with any required details. There is no specific Cytoscape plug-in, specific for finding the network motifs, based on original concept. So, we have introduced for the first time, CytoKavosh as the first plug-in, and we hope that this plug-in can be improved to cover other options to make it the best motif-analyzing tool.
The maximum clique enumeration (MCE) problem asks that we identify all maximum cliques in a finite, simple graph. MCE is closely related to two other well-known and widely-studied problems: the maximum clique optimization problem, which asks us to determine the size of a largest clique, and the maximal clique enumeration problem, which asks that we compile a listing of all maximal cliques. Naturally, these three problems are NP-hard, given that they subsume the classic version of the NP-complete clique decision problem. MCE can be solved in principle with standard enumeration methods due to Bron, Kerbosch, Kose and others. Unfortunately, these techniques are ill-suited to graphs encountered in our applications. We must solve MCE on instances deeply seeded in data mining and computational biology, where high-throughput data capture often creates graphs of extreme size and density. MCE can also be solved in principle using more modern algorithms based in part on vertex cover and the theory of fixed-parameter tractability (FPT). While FPT is an improvement, these algorithms too can fail to scale sufficiently well as the sizes and densities of our datasets grow.
[Background] Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. [Results] To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. [Conclusion] The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
In this paper, we study the problem of constructing perfect phylogenies for three-state characters. Our work builds on two recent results. The first result states that for three-state characters, the local condition of examining all subsets of three characters is sufficient to determine the global property of admitting a perfect phylogeny. The second result applies tools from minimal triangulation theory to the partition intersection graph to determine if a perfect phylogeny exists. Despite the wealth of combinatorial tools and algorithms stemming from the chordal graph and minimal triangulation literature, it is unclear how to use such approaches to efficiently construct a perfect phylogeny for three-state characters when the data admits one. We utilize structural properties of both the partition intersection graph and the original data in order to achieve a competitive time bound.
A number of centrality measures are available to determine the relative importance of a node in a complex network, and betweenness is prominent among them. However, the existing centrality measures are not adequate in network percolation scenarios (such as during infection transmission in a social network of individuals, spreading of computer viruses on computer networks, or transmission of disease over a network of towns) because they do not account for the changing percolation states of individual nodes. We propose a new measure, percolation centrality, that quantifies relative impact of nodes based on their topological connectivity, as well as their percolation states. The measure can be extended to include random walk based definitions, and its computational complexity is shown to be of the same order as that of betweenness centrality. We demonstrate the usage of percolation centrality by applying it to a canonical network as well as simulated and real world scale-free and random networks.