Most of the complex social, technological, and biological networks have a significant community structure. Therefore the community structure of complex networks has to be considered as a universal property, together with the much explored small-world and scale-free properties of these networks. Despite the large interest in characterizing the community structures of real networks, not enough attention has been devoted to the detection of universal mechanisms able to spontaneously generate networks with communities. Triadic closure is a natural mechanism to make new connections, especially in social networks. Here we show that models of network growth based on simple triadic closure naturally lead to the emergence of community structure, together with fat-tailed distributions of node degree and high clustering coefficients. Communities emerge from the initial stochastic heterogeneity in the concentration of links, followed by a cycle of growth and fragmentation. Communities are the more pronounced, the sparser the graph, and disappear for high values of link density and randomness in the attachment procedure. By introducing a fitness-based link attractivity for the nodes, we find a phase transition where communities disappear for high heterogeneity of the fitness distribution, but a different mesoscopic organization of the nodes emerges, with groups of nodes being shared between just a few superhubs, which attract most of the links of the system.
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here, we develop an original framework for measuring how a publication's citation rate ?c depends on the reputation of its central author i, in addition to its net citation count c. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly cited scientists, using the total citations Ci of each scientist as his/her reputation measure. We find a citation crossover c×, which distinguishes the strength of the reputation effect. For publications with c < c×, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in Ci. However, the reputation effect becomes negligible for highly cited publications meaning that, for c ??c×, the citation rate measures scientific impact more transparently. In addition, we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.
Most algorithms to detect communities in networks typically work without any information on the cluster structure to be found, as one has no a priori knowledge of it, in general. Not surprisingly, knowing some features of the unknown partition could help its identification, yielding an improvement of the performance of the method. Here we show that, if the number of clusters was known beforehand, standard methods, like modularity optimization, would considerably gain in accuracy, mitigating the severe resolution bias that undermines the reliability of the results of the original unconstrained version. The number of clusters can be inferred from the spectra of the recently introduced nonbacktracking and flow matrices, even in benchmark graphs with realistic community structure. The limit of such a two-step procedure is the overhead of the computation of the spectra.
The impact factor (IF) of scientific journals has acquired a major role in the evaluations of the output of scholars, departments and whole institutions. Typically papers appearing in journals with large values of the IF receive a high weight in such evaluations. However, at the end of the day one is interested in assessing the impact of individuals, rather than papers. Here we introduce Author Impact Factor (AIF), which is the extension of the IF to authors. The AIF of an author A in year t is the average number of citations given by papers published in year t to papers published by A in a period of ?t years before year t. Due to its intrinsic dynamic character, AIF is capable to capture trends and variations of the impact of the scientific output of scholars in time, unlike the h-index, which is a growing measure taking into account the whole career path.
We propose an extension to the map of Bentley et al. by incorporating an aspect of underlying network structure that is likely relevant for many modes of collective behavior. This dimension, which captures a feature of network community structure, is known both from theory and from experiments to be relevant for decision-making processes.
Correctly assessing a scientists past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidates future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientists future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.
Election data represent a precious source of information to study human behavior at a large scale. In proportional elections with open lists, the number of votes received by a candidate, rescaled by the average performance of all competitors in the same party list, has the same distribution regardless of the country and the year of the election. Here we provide the first thorough assessment of this claim. We analyzed election datasets of 15 countries with proportional systems. We confirm that a class of nations with similar election rules fulfill the universality claim. Discrepancies from this trend in other countries with open-lists elections are always associated with peculiar differences in the election rules, which matter more than differences between countries and historical periods. Our analysis shows that the role of parties in the electoral performance of candidates is crucial: alternative scalings not taking into account party affiliations lead to poor results.
Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates when the resolution is low; the tendency to split large subgraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
Many systems in nature, society, and technology can be described as networks, where the vertices are the systems elements, and edges between vertices indicate the interactions between the corresponding elements. Edges may be weighted if the interaction strength is measurable. However, the full network information is often redundant because tools and techniques from network analysis do not work or become very inefficient if the network is too dense, and some weights may just reflect measurement errors and need to be be discarded. Moreover, since weight distributions in many complex weighted networks are broad, most of the weight is concentrated among a small fraction of all edges. It is then crucial to properly detect relevant edges. Simple thresholding would leave only the largest weights, disrupting the multiscale structure of the system, which is at the basis of the structure of complex networks and ought to be kept. In this paper we propose a weight-filtering technique based on a global null model [Global Statistical Significance (GloSS) filter], keeping both the weight distribution and the full topological structure of the network. The method correctly quantifies the statistical significance of weights assigned independently to the edges from a given distribution. Applications to real networks reveal that the GloSS filter is indeed able to identify relevant connections between vertices.
Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boost the citation rates of their previous publications. Given that innovations must outcompete the rich-gets-richer effect for scientific citations, it turns out that they can make their way only through citation cascades. A quantitative analysis reveals how and why they happen. Science appears to behave like a self-organized critical system, in which citation cascades of all sizes occur, from continuous scientific progress all the way up to scientific revolutions, which change the way we see our world. Measuring the "boosting effect" of landmark papers, our analysis reveals how new ideas and new players can make their way and finally triumph in a world dominated by established paradigms. The underlying "boost factor" is also useful to discover scientific breakthroughs and talents much earlier than through classical citation analysis, which by now has become a widespread method to measure scientific excellence, influencing scientific careers and the distribution of research funds. Our findings reveal patterns of collective social behavior, which are also interesting from an attention economics perspective. Understanding the origin of scientific authority may therefore ultimately help to explain how social influence comes about and why the value of goods depends so strongly on the attention they attract.
Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks.
Online popularity has an enormous impact on opinions, culture, policy, and profits. We provide a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two massive model systems: the Wikipedia and an entire countrys Web space. We find that the dynamics of popularity are characterized by bursts, displaying characteristic features of critical systems such as fat-tailed distributions of magnitude and interevent time. We propose a minimal model combining the classic preferential popularity increase mechanism with the occurrence of random popularity shifts due to exogenous factors. The model recovers the critical features observed in the empirical analysis of the systems analyzed here, highlighting the key factors needed in the description of popularity dynamics.
Percolation is one of the most studied processes in statistical physics. A recent paper by Achlioptas [Science 323, 1453 (2009)] showed that the percolation transition, which is usually continuous, becomes discontinuous ("explosive") if links are added to the system according to special cooperative rules (Achlioptas processes). In this paper, we present a detailed numerical analysis of Achlioptas processes with product rule on various systems, including lattices, random networks á la Erdös-Rényi, and scale-free networks. In all cases, we recover the explosive transition by Achlioptas However, the explosive percolation transition is kind of hybrid as, despite the discontinuity of the order parameter at the threshold, one observes traces of analytical behavior such as power-law distributions of cluster sizes. In particular, for scale-free networks with degree exponent lambda<3 , all relevant percolation variables display power-law scaling, just as in continuous second-order phase transitions.
Uncovering the community structure exhibited by real networks is a crucial step toward an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman [Proc. Natl. Acad. Sci. U.S.A. 99, 7821 (2002)] and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom [Proc. Natl. Acad. Sci. U.S.A. 104, 7327 (2007); Proc. Natl. Acad. Sci. U.S.A. 105, 1118 (2008)], Blondel [J. Stat. Mech.: Theory Exp. (2008), P10008], and Ronhovde and Nussinov [Phys. Rev. E 80, 016109 (2009)] have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.
Recently, the abundance of digital data is enabling the implementation of graph-based ranking algorithms that provide system level analysis for ranking publications and authors. Here, we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network, we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of physics. A website where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.org.
We study scale-free networks constructed via a cooperative Achlioptas growth process. Links between nodes are introduced in order to produce a scale-free graph with given exponent lambda for the degree distribution, but the choice of each new link depends on the mass of the clusters that this link will merge. Networks constructed via this biased procedure show a percolation transition which strongly differs from the one observed in standard percolation, where links are introduced just randomly. The different growth process leads to a phase transition with a nonvanishing percolation threshold already for lambda>lambda(c) approximately 2.2. More interestingly, the transition is continuous when lambda3. This may have important consequences for both the structure of networks and for the dynamics of processes taking place on them.
We study the coevolution of a generalized Glauber dynamics for Ising spins with tunable threshold and of the graph topology where the dynamics takes place. This simple coevolution dynamics generates a rich phase diagram in the space of the two parameters of the model, the threshold and the rewiring probability. The diagram displays phase transitions of different types: spin ordering, percolation, and connectedness. At variance with traditional coevolution models, in which all spins of each connected component of the graph have equal value in the stationary state, we find that, for suitable choices of the parameters, the system may converge to a state in which spins of opposite sign coexist in the same component organized in compact clusters of like-signed spins. Mean field calculations enable one to estimate some features of the phase diagram.
Many complex networks display a mesoscopic structure with groups of nodes sharing many links with the other nodes in their group and comparatively few with nodes of different groups. This feature is known as community structure and encodes precious information about the organization and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they should be tested. Recently we have proposed a general class of undirected and unweighted benchmark graphs, with heterogeneous distributions of node degree and community size. An increasing attention has been recently devoted to develop algorithms able to consider the direction and the weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the basic ideas behind our previous benchmark to generate directed and weighted networks with built-in community structure. We also consider the possibility that nodes belong to more communities, a feature occurring in real systems, such as social networks. As a practical application, we show how modularity optimization performs on our benchmark.
Complex networks have acquired a great popularity in recent years, since the graph representation of many natural, social, and technological systems is often very helpful to characterize and model their phenomenology. Additionally, the mathematical tools of statistical physics have proven to be particularly suitable for studying and understanding complex networks. Nevertheless, an important obstacle to this theoretical approach is still represented by the difficulties to draw parallelisms between network science and more traditional aspects of statistical physics. In this paper, we explore the relation between complex networks and a well known topic of statistical physics: renormalization. A general method to analyze renormalization flows of complex networks is introduced. The method can be applied to study any suitable renormalization transformation. Finite-size scaling can be performed on computer-generated networks in order to classify them in universality classes. We also present applications of the method on real networks.
Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually.
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.