Communities are fundamental entities for the characterization of the structure of real networks. The standard approach to the identification of communities in networks is based on the optimization of a quality function known as modularity. Although modularity has been at the center of an intense research activity and many methods for its maximization have been proposed, not much is yet known about the necessary conditions that communities need to satisfy in order to be detectable with modularity maximization methods. Here, we develop a simple theory to establish these conditions, and we successfully apply it to various classes of network models. Our main result is that heterogeneity in the degree distribution helps modularity to correctly recover the community structure of a network and that, in the realistic case of scale-free networks with degree exponent ?<2.5, modularity is always able to detect the presence of communities.
We analyze the citation distributions of all papers published in Physical Review journals between 1985 and 2009. The average number of citations received by papers published in a given year and in a given field is computed. Large variations are found, showing that it is not fair to compare citation numbers across fields and years. However, when a rescaling procedure by the average is used, it is possible to compare impartially articles across years and fields. We make the rescaling factors available for use by the readers. We also show that rescaling citation numbers by the number of publication authors has strong effects and should therefore be taken into account when assessing the bibliometric performance of researchers.
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
We considered all matches played by professional tennis players between 1968 and 2010, and, on the basis of this data set, constructed a directed and weighted network of contacts. The resulting graph showed complex features, typical of many real networked systems studied in literature. We developed a diffusion algorithm and applied it to the tennis contact network in order to rank professional players. Jimmy Connors was identified as the best player in the history of tennis according to our ranking procedure. We performed a complete analysis by determining the best players on specific playing surfaces as well as the best ones in each of the years covered by the data set. The results of our technique were compared to those of two other well established methods. In general, we observed that our ranking method performed better: it had a higher predictive power and did not require the arbitrary introduction of external criteria for the correct assessment of the quality of players. The present work provides novel evidence of the utility of tools and methods of network theory in real applications.
Many systems in nature, society, and technology can be described as networks, where the vertices are the systems elements, and edges between vertices indicate the interactions between the corresponding elements. Edges may be weighted if the interaction strength is measurable. However, the full network information is often redundant because tools and techniques from network analysis do not work or become very inefficient if the network is too dense, and some weights may just reflect measurement errors and need to be be discarded. Moreover, since weight distributions in many complex weighted networks are broad, most of the weight is concentrated among a small fraction of all edges. It is then crucial to properly detect relevant edges. Simple thresholding would leave only the largest weights, disrupting the multiscale structure of the system, which is at the basis of the structure of complex networks and ought to be kept. In this paper we propose a weight-filtering technique based on a global null model [Global Statistical Significance (GloSS) filter], keeping both the weight distribution and the full topological structure of the network. The method correctly quantifies the statistical significance of weights assigned independently to the edges from a given distribution. Applications to real networks reveal that the GloSS filter is indeed able to identify relevant connections between vertices.
Communities are clusters of nodes with a higher than average density of internal connections. Their detection is of great relevance to better understand the structure and hierarchies present in a network. Modularity has become a standard tool in the area of community detection, providing at the same time a way to evaluate partitions and, by maximizing it, a method to find communities. In this work, we study the modularity from a combinatorial point of view. Our analysis (as the modularity definition) relies on the use of the configurational model, a technique that given a graph produces a series of randomized copies keeping the degree sequence invariant. We develop an approach that enumerates the null model partitions and can be used to calculate the probability distribution function of the modularity. Our theory allows for a deep inquiry of several interesting features characterizing modularity such as its resolution limit and the statistics of the partitions that maximize it. Additionally, the study of the probability of extremes of the modularity in the random graph partitions opens the way for a definition of the statistical significance of network partitions.
Percolation is one of the most studied processes in statistical physics. A recent paper by Achlioptas [Science 323, 1453 (2009)] showed that the percolation transition, which is usually continuous, becomes discontinuous ("explosive") if links are added to the system according to special cooperative rules (Achlioptas processes). In this paper, we present a detailed numerical analysis of Achlioptas processes with product rule on various systems, including lattices, random networks á la Erdös-Rényi, and scale-free networks. In all cases, we recover the explosive transition by Achlioptas However, the explosive percolation transition is kind of hybrid as, despite the discontinuity of the order parameter at the threshold, one observes traces of analytical behavior such as power-law distributions of cluster sizes. In particular, for scale-free networks with degree exponent lambda<3 , all relevant percolation variables display power-law scaling, just as in continuous second-order phase transitions.
Nodes in real-world networks are usually organized in local modules. These groups, called communities, are intuitively defined as subgraphs with a larger density of internal connections than of external links. In this work, we define a measure aimed at quantifying the statistical significance of single communities. Extreme and order statistics are used to predict the statistics associated with individual clusters in random graphs. These distributions allows us to define one community significance as the probability that a generic clustering algorithm finds such a group in a random graph. The method is successfully applied in the case of real-world networks for the evaluation of the significance of their communities.
Recently, the abundance of digital data is enabling the implementation of graph-based ranking algorithms that provide system level analysis for ranking publications and authors. Here, we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network, we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of physics. A website where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.org.
We study scale-free networks constructed via a cooperative Achlioptas growth process. Links between nodes are introduced in order to produce a scale-free graph with given exponent lambda for the degree distribution, but the choice of each new link depends on the mass of the clusters that this link will merge. Networks constructed via this biased procedure show a percolation transition which strongly differs from the one observed in standard percolation, where links are introduced just randomly. The different growth process leads to a phase transition with a nonvanishing percolation threshold already for lambda>lambda(c) approximately 2.2. More interestingly, the transition is continuous when lambda3. This may have important consequences for both the structure of networks and for the dynamics of processes taking place on them.
The recent information technology revolution has enabled the analysis and processing of large-scale data sets describing human activities. The main source of data is represented by the web, where humans generally use to spend a relevant part of their day. Here, we study three large data sets containing the information about web activities of humans in different contexts. We study in details interevent and waiting-time statistics. In both cases, the number of subsequent operations which differs by tau units of time decays powerlike as tau increases. We use nonparametric statistical tests in order to estimate the significance level of reliability of global distributions to describe activity patterns of single users. Global interevent time probability distributions are not representative for the behavior of single users: the shape of single users interevent distributions is strongly influenced by the total number of operations performed by the users and distributions of the total number of operations performed by users are heterogeneous. A universal behavior can be anyway found by suppressing the intrinsic dependence of the global probability distribution on the activity of the users. This suppression can be performed by simply dividing the interevent times with their average values. Differently, waiting-time probability distributions seem to be independent of the activity of users and global probability distributions are able to significantly represent the replying activity patterns of single users.
Relative indicators are commonly used to remove biases due to different citation practices in various scientific fields. Here we extend our recent investigation on the viability of the use of relative indicators for comparing article impact in different disciplines. We consider citation distributions for papers published in 14 of the 172 disciplines categorized by the Journal Citation Reports. The distribution of the number of citations received by publications in a certain discipline divided by the average number for the discipline is a universal function. Based on it, we compute the relative number of citations needed to be among the q percent most-cited publications in a discipline. The effect of finite samples is also discussed. The average number of citations is shown to be strongly correlated with the impact factor, but fluctuations are quite large. A similar universal distribution is found (with exceptions) when citation distributions restricted to papers published in a single journal are considered.
Complex networks have acquired a great popularity in recent years, since the graph representation of many natural, social, and technological systems is often very helpful to characterize and model their phenomenology. Additionally, the mathematical tools of statistical physics have proven to be particularly suitable for studying and understanding complex networks. Nevertheless, an important obstacle to this theoretical approach is still represented by the difficulties to draw parallelisms between network science and more traditional aspects of statistical physics. In this paper, we explore the relation between complex networks and a well known topic of statistical physics: renormalization. A general method to analyze renormalization flows of complex networks is introduced. The method can be applied to study any suitable renormalization transformation. Finite-size scaling can be performed on computer-generated networks in order to classify them in universality classes. We also present applications of the method on real networks.
Many studies demonstrate that there is still a significant gender bias, especially at higher career levels, in many areas including science, technology, engineering, and mathematics (STEM). We investigated field-dependent, gender-specific effects of the selective pressures individuals experience as they pursue a career in academia within seven STEM disciplines. We built a unique database that comprises 437,787 publications authored by 4,292 faculty members at top United States research universities. Our analyses reveal that gender differences in publication rate and impact are discipline-specific. Our results also support two hypotheses. First, the widely-reported lower publication rates of female faculty are correlated with the amount of research resources typically needed in the discipline considered, and thus may be explained by the lower level of institutional support historically received by females. Second, in disciplines where pursuing an academic position incurs greater career risk, female faculty tend to have a greater fraction of higher impact publications than males. Our findings have significant, field-specific, policy implications for achieving diversity at the faculty level within the STEM disciplines.
Comments are special types of publications whose aim is to correct or criticize previously published papers. For this reason, comments are believed to make commented papers less worthy or trusty to the eyes of the scientific community, and thus predestined to have low scientific impact. Here, we show that such belief is not supported by empirical evidence. We consider thirteen major publication outlets in science, and perform systematic comparisons between the citations accumulated by commented and non commented articles. We find that (i) commented papers are, on average, much more cited than non commented papers, and (ii) commented papers are more likely to be among the most cited papers of a journal. Since comments are published soon after criticized papers, comments should be viewed as early indicators of the future impact of criticized papers.
Recent analysis of empirical data [Radicchi, Baronchelli, and Amaral, PloS ONE 7, e029910 (2012)] showed that humans adopt Lévy-flight strategies when exploring the bid space in online auctions. A game theoretical model proved that the observed Lévy exponents are nearly optimal, being close to the exponent value that guarantees the maximal economical return to players. Here, we rationalize these findings by adopting an evolutionary perspective. We show that a simple evolutionary process is able to account for the empirical measurements with the only assumption that the reproductive fitness of the players is proportional to their search ability. Contrary to previous modeling, our approach describes the emergence of the observed exponent without resorting to any strong assumptions on the initial searching strategies. Our results generalize earlier research, and open novel questions in cognitive, behavioral, and evolutionary sciences.
Inspired by the Games held in ancient Greece, modern Olympics represent the worlds largest pageant of athletic skill and competitive spirit. Performances of athletes at the Olympic Games mirror, since 1896, human potentialities in sports, and thus provide an optimal source of information for studying the evolution of sport achievements and predicting the limits that athletes can reach. Unfortunately, the models introduced so far for the description of athlete performances at the Olympics are either sophisticated or unrealistic, and more importantly, do not provide a unified theory for sport performances. Here, we address this issue by showing that relative performance improvements of medal winners at the Olympics are normally distributed, implying that the evolution of performance values can be described in good approximation as an exponential approach to an a priori unknown limiting performance value. This law holds for all specialties in athletics-including running, jumping, and throwing-and swimming. We present a self-consistent method, based on normality hypothesis testing, able to predict limiting performance values in all specialties. We further quantify the most likely years in which athletes will breach challenging performance walls in running, jumping, throwing, and swimming events, as well as the probability that new world records will be established at the next edition of the Olympic Games.
The large amount of information contained in bibliographic databases has recently boosted the use of citations, and other indicators based on citation numbers, as tools for the quantitative assessment of scientific research. Citations counts are often interpreted as proxies for the scientific influence of papers, journals, scholars, and institutions. However, a rigorous and scientifically grounded methodology for a correct use of citation counts is still missing. In particular, cross-disciplinary comparisons in terms of raw citation counts systematically favors scientific disciplines with higher citation and publication rates. Here we perform an exhaustive study of the citation patterns of millions of papers, and derive a simple transformation of citation counts able to suppress the disproportionate citation counts among scientific domains. We find that the transformation is well described by a power-law function, and that the parameter values of the transformation are typical features of each scientific discipline. Universal properties of citation patterns descend therefore from the fact that citation distributions for papers in a specific field are all part of the same family of univariate distributions.
Information technology has revolutionized the traditional structure of markets. The removal of geographical and time constraints has fostered the growth of online auction markets, which now include millions of economic agents worldwide and annual transaction volumes in the billions of dollars. Here, we analyze bid histories of a little studied type of online auctions--lowest unique bid auctions. Similarly to what has been reported for foraging animals searching for scarce food, we find that agents adopt Lévy flight search strategies in their exploration of "bid space". The Lévy regime, which is characterized by a power-law decaying probability distribution of step lengths, holds over nearly three orders of magnitude. We develop a quantitative model for lowest unique bid online auctions that reveals that agents use nearly optimal bidding strategies. However, agents participating in these auctions do not optimize their financial gain. Indeed, as long as there are many auction participants, a rational profit optimizing agent would choose not to participate in these auction markets.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.