Bacterial genomes and large-scale computer software projects both consist of a large number of components (genes or software packages) connected via a network of mutual dependencies. Components can be easily added or removed from individual systems, and their use frequencies vary over many orders of magnitude. We study this frequency distribution in genomes of ?500 bacterial species and in over 2 million Linux computers and find that in both cases it is described by the same scale-free power-law distribution with an additional peak near the tail of the distribution corresponding to nearly universal components. We argue that the existence of a power law distribution of frequencies of components is a general property of any modular system with a multilayered dependency network. We demonstrate that the frequency of a component is positively correlated with its dependency degree given by the total number of upstream components whose operation directly or indirectly depends on the selected component. The observed frequency/dependency degree distributions are reproduced in a simple mathematically tractable model introduced and analyzed in this study.
In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. A toolbox model of evolution was recently proposed to explain this empirical scaling for metabolic enzymes and their regulators. According to its rules, the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger "universal" network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes in the toolbox model. Here we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We prove that critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for quadratic scaling. We further generalize the rules of the model to incorporate reactions with multiple substrates/products as well as branched and cyclic metabolic pathways. To achieve its metabolic tasks, the new model employs evolutionary optimized pathways with minimal number of reactions. Numerical simulations of this realistic model on the universal network of all reactions in the KEGG database produced approximately quadratic scaling between the number of regulated pathways and the size of the metabolic network. To quantify the geometrical structure of individual pathways, we investigated the relationship between their number of reactions, byproducts, intermediate, and feedback metabolites. Our results validate and explain the ubiquitous appearance of the quadratic scaling for a broad spectrum of topologies of underlying universal metabolic networks. They also demonstrate why, in spite of "small-world" topology, real-life metabolic networks are characterized by a broad distribution of pathway lengths and sizes of metabolic regulons in regulatory networks.
It has been reported that the number of transcription factors encoded in prokaryotic genomes scales approximately quadratically with their total number of genes. We propose a conceptual explanation of this finding and illustrate it using a simple model in which metabolic and regulatory networks of prokaryotes are shaped by horizontal gene transfer of coregulated metabolic pathways. Adapting to a new environmental condition monitored by a new transcription factor (e.g., learning to use another nutrient) involves both acquiring new enzymes and reusing some of the enzymes already encoded in the genome. As the repertoire of enzymes of an organism (its toolbox) grows larger, it can reuse its enzyme tools more often and thus needs to get fewer new ones to master each new task. From this observation, it logically follows that the number of functional tasks and their regulators increases faster than linearly with the total number of genes encoding enzymes. Genomes can also shrink, e.g., because of a loss of a nutrient from the environment, followed by deletion of its regulator and all enzymes that become redundant. We propose several simple models of network evolution elaborating on this toolbox argument and reproducing the empirically observed quadratic scaling. The distribution of lengths of pathway branches in our model agrees with that of the real-life metabolic network of Escherichia coli. Thus, our model provides a qualitative explanation for broad distributions of regulon sizes in prokaryotes.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.