Self-Organizing Maps (SOMs) are readily-available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically and bijectively transform aligned nucleotide sequences into numeric vectors, dealing with both nucleotide ambiguity and gaps derived from sequence alignment.
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information-theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Human Immunodeficiency Virus type 1 (HIV-1) because of high mutation rates, large population sizes, and rapid replication, exhibits complex evolutionary strategies. For the analysis of evolutionary processes, the graphical representation of fitness landscapes provides a significant advantage. The experimental determination of viral fitness remains, in general, difficult and consequently most published fitness landscapes have been artificial, theoretical or estimated. Self-Organizing Maps (SOM) are a class of Artificial Neural Network (ANN) for the generation of topological ordered maps. Here, three-dimensional (3D) data driven fitness landscapes, derived from a collection of sequences from HIV-1 viruses after "in vitro" passages and labelled with the corresponding experimental fitness values, were created by SOM. These maps were used for the visualization and study of the evolutionary process of HIV-1 "in vitro" fitness recovery, by directly relating fitness values with viral sequences. In addition to the representation of the sequence space search carried out by the viruses, these landscapes could also be applied for the analysis of related variants like members of viral quasiespecies. SOM maps permit the visualization of the complex evolutionary pathways in HIV-1 fitness recovery. SOM fitness landscapes have an enormous potential for the study of evolution in related viruses of "in vitro" works or from "in vivo" clinical studies with human, animal or plant viral infections.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available.
A new approach for parameter estimation in chemical kinetics has been recently proposed (Ross et al. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 12777). It makes use of an optimization criterion based on a Generalized Fisher Equation (GFE). Its utility has been demonstrated with two reaction mechanisms, the chlorite-iodide and Oregonator, which are computationally stiff systems. In this Article, the performance of the GFE-based algorithm is compared to that obtained from minimization of the squared distances between the observed and predicted concentrations obtained by solving the corresponding initial value problem (we call this latter approach "traditional" for simplicity). Comparison of the proposed GFE-based optimization method with the "traditional" one has revealed their differences in performance. This difference can be seen as a trade-off between speed (which favors GFE) and accuracy (which favors the traditional method). The chlorite-iodide and Oregonator systems are again chosen as case studies. An identifiability analysis is performed for both of them, followed by an optimal experimental design based on the Fisher Information Matrix (FIM). This allows to identify and overcome most of the previously encountered identifiability issues, improving the estimation accuracy. With the new data, obtained from optimally designed experiments, it is now possible to estimate effectively more parameters than with the previous data. This result, which holds for both GFE-based and traditional methods, stresses the importance of an appropriate experimental design. Finally, a new hybrid method that combines advantages from the GFE and traditional approaches is presented.
Tools-4-Metatool (T4M) is a suite of web-tools, implemented in PERL, which analyses, parses, and manipulates files related to Metatool. Its main goal is to assist the work with Metatool. T4M has two major sets of tools: Analysis and Compare. Analysis visualizes the results of Metatool (convex basis, elementary flux modes, and enzyme subsets) and facilitates the study of metabolic networks. It is composed of five tools: MDigraph, MetaMatrix, CBGraph, EMGraph, and SortEM. Compare was developed to compare different Metatool results from different networks. This set consists of: Compara and ComparaSub which compare network subsets providing outputs in different formats and ComparaEM that seeks for identical elementary modes in two metabolic networks. The suite T4M also includes one script that generates Metatool input: CBasis2Metatool, based on a Metatool output file that is filtered by a list of convex basis metabolites. Finally, the utility CheckMIn checks the consistency of the Metatool input file. T4M is available at http://solea.quim.ucm.es/t4m.
A generalized Fisher equation (GFE) relates the time derivative of the average of the intrinsic rate of growth to its variance. The GFE is an exact mathematical result that has been widely used in population dynamics and genetics, where it originated. Here we demonstrate that the GFE can also be useful in other fields, specifically in chemistry, with models of two chemical reaction systems for which the mechanisms and rate coefficients correspond reasonably well to experiments. A bad fit of the GFE can be a sign of high levels of measurement noise; for low or moderate levels of noise, fulfillment of the GFE is not degraded. Hence, the GFE presents a noise threshold that may be used to test the validity of experimental measurements without requiring any additional information. In a different approach information about the system (model) is included in the calculations. In that case, the discrepancy with the GFE can be used as an optimization criterion for the determination of rate coefficients in a given reaction mechanism.
High spatial resolution images have been increasingly used for urban land use/cover classification, but the high spectral variation within the same land cover, the spectral confusion among different land covers, and the shadow problem often lead to poor classification performance based on the traditional per-pixel spectral-based classification methods. This paper explores approaches to improve urban land cover classification with Quickbird imagery. Traditional per-pixel spectral-based supervised classification, incorporation of textural images and multispectral images, spectral-spatial classifier, and segmentation-based classification are examined in a relatively new developing urban landscape, Lucas do Rio Verde in Mato Grosso State, Brazil. This research shows that use of spatial information during the image classification procedure, either through the integrated use of textural and spectral images or through the use of segmentation-based classification method, can significantly improve land cover classification performance.
A living organism must not only organize itself from within; it must also maintain its organization in the face of changes in its environment and degradation of its components. We show here that a simple (M,R)-system consisting of three interlocking catalytic cycles, with every catalyst produced by the system itself, can both establish a non-trivial steady state and maintain this despite continuous loss of the catalysts by irreversible degradation. As long as at least one catalyst is present at a sufficient concentration in the initial state, the others can be produced and maintained. The system shows bistability, because if the amount of catalyst in the initial state is insufficient to reach the non-trivial steady state the system collapses to a trivial steady state in which all fluxes are zero. It is also robust, because if one catalyst is catastrophically lost when the system is in steady state it can recreate the same state. There are three elementary flux modes, but none of them is an enzyme-maintaining mode, the entire network being necessary to maintain the two catalysts.
We introduce systematic approaches to chemical kinetics based on the use of phase-phase (log-log) representations of the rate equations. For slow processes, we obtain a corrected form of the mass-action law, where the concentrations are replaced by kinetic activities. For fast reactions, delay expressions are derived. The phase-phase expansion is, in general, applicable to kinetic and transport processes. A mechanism is introduced for the occurrence of a generalized mass-action law as a result of self-similar recycling. We show that our self-similar recycling model applied to prothrombin assays reproduces the empirical equations for the International Normalized Ratio calibration (INR), as well as the Watala, Golanski, and Kardas relation (WGK) for the dependence of the INR on the concentrations of coagulation factors. Conversely, the experimental calibration equation for the INR, combined with the experimental WGK relation, without the use of theoretical models, leads to a generalized mass-action type kinetic law.
Determining the regulation of metabolic networks at genome scale is a hard task. It has been hypothesized that biochemical pathways and metabolic networks might have undergone an evolutionary process of optimization with respect to several criteria over time. In this contribution, a multi-criteria approach has been used to optimize parameters for the allosteric regulation of enzymes in a model of a metabolic substrate-cycle. This has been carried out by calculating the Pareto set of optimal solutions according to two objectives: the proper direction of flux in a metabolic cycle and the energetic cost of applying the set of parameters. Different Pareto fronts have been calculated for eight different "environments" (specific time courses of end product concentrations). For each resulting front the so-called knee point is identified, which can be considered a preferred trade-off solution. Interestingly, the optimal control parameters corresponding to each of these points also lead to optimal behaviour in all the other environments. By calculating the average of the different parameter sets for the knee solutions more frequently found, a final and optimal consensus set of parameters can be obtained, which is an indication on the existence of a universal regulation mechanism for this system.The implications from such a universal regulatory switch are discussed in the framework of large metabolic networks.
A fundamental landmark in the emergence and maintenance of the first proto-biological systems must have been the formation of a "closed" metabolic organization, and this paper describes a stochastic analysis of a simple model of a system that is closed to efficient causation. Although it shows an absorbing barrier corresponding to the trivial solution that implies collapse and extinction, for certain values of the kinetic parameters it can also show a "coexistence state" in which there are non-null populations of its intermediates, which corresponds approximately to a non-trivial deterministic stable steady state. Depending on the initial conditions, fluctuations can drive the system either to the self-maintaining regime or to extinction, with different probabilities. Different lines of equal probability have been obtained and compared with the deterministic results, and the average time for reaching these states (characteristic time) has been estimated. The system shows strong dependence on volume size, and there is a critical volume below which it collapses very rapidly. The characteristic time is also affected by the volume, with faster responses for lower system volumes. All these results are discussed in the context of the origin of living organization.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.