In ultra-high dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a data set with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × (p (2) + 3p)/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction selection consistency is hard to achieve in high dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p ? n. Theoretically, we prove that they possess sure screening property for ultra-high dimensional settings. Numerical examples are used to demonstrate their finite sample performance.
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
The support vector machine (SVM) is a popular learning method for binary classification. Standard SVMs treat all the data points equally, but in some practical problems it is more natural to assign different weights to observations from different classes. This leads to a broader class of learning, the so-called weighted SVMs (WSVMs), and one of their important applications is to estimate class probabilities besides learning the classification boundary. There are two parameters associated with the WSVM optimization problem: one is the regularization parameter and the other is the weight parameter. In this paper we first establish that the WSVM solutions are jointly piecewise-linear with respect to both the regularization and weight parameter. We then develop a state-of-the-art algorithm that can compute the entire trajectory of the WSVM solutions for every pair of the regularization parameter and the weight parameter, at a feasible computational cost. The derived two-dimensional solution surface provides theoretical insight on the behavior of the WSVM solutions. Numerically, the algorithm can greatly facilitate the implementation of the WSVM and automate the selection process of the optimal regularization parameter. We illustrate the new algorithm on various examples.
Mantle cell lymphoma (MCL) demonstrates cytologic features that overlap with those of other types of B-cell non-Hodgkin lymphomas (B-cell NHLs) containing small to medium-sized cells. The accurate diagnosis of MCL is important because MCL has relatively more aggressive biologic behavior and thus requires specific treatment regimens. Fine-needle aspiration (FNA) is used for diagnosing or staging lymphoma, often with the help of immunophenotyping by flow cytometry. However, the cellularity of an FNA sample may not be high enough for flow cytometry, leading to diagnostic difficulty. SOX11 immunostaining is helpful in the diagnosis of MCL in histologic sections. However, to the authors' knowledge, its diagnostic value for FNA samples has not been studied to date.
Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences.
Diacetyl (DA), a component of artificial butter flavoring, has been linked to the development of bronchiolitis obliterans (BO), a disease of airway epithelial injury and airway fibrosis. The epidermal growth factor receptor ligand, amphiregulin (AREG), has been implicated in other types of epithelial injury and lung fibrosis. We investigated the effects of DA directly on the pulmonary epithelium, and we hypothesized that DA exposure would result in epithelial cell shedding of AREG. Consistent with this hypothesis, we demonstrate that DA increases AREG by the pulmonary epithelial cell line NCI-H292 and by multiple independent primary human airway epithelial donors grown under physiologically relevant conditions at the air-liquid interface. Furthermore, we demonstrate that AREG shedding occurs through a TNF-?-converting enzyme (TACE)-dependent mechanism via inhibition of TACE activity in epithelial cells using the small molecule inhibitor, TNF-? protease inhibitor-1, as well as TACE-specific small inhibitor RNA. Finally, we demonstrate supportive in vivo results showing increased AREG transcript and protein levels in the lungs of rodents with DA-induced BO. In summary, our novel in vitro and in vivo observations suggest that further study of AREG is warranted in the pathogenesis of DA-induced BO.
Model selection and estimation are crucial parts of econometrics. This paper introduces a new technique that can simultaneously estimate and select the model in generalized method of moments (GMM) context. The GMM is particularly powerful for analyzing complex data sets such as longitudinal and panel data, and it has wide applications in econometrics. This paper extends the least squares based adaptive elastic net estimator of Zou and Zhang (2009) to nonlinear equation systems with endogenous variables. The extension is not trivial and involves a new proof technique due to estimators lack of closed form solutions. Compared to Bridge-GMM of Caner (2009), we allow for the number of parameters to diverge to infinity as well as collinearity among a large number of variables, also the redundant parameters set to zero via a data dependent technique. This method has the oracle property, meaning that we can estimate nonzero parameters with their standard limit and the redundant parameters are dropped from the equations simultaneously. Numerical examples are used to illustrate the performance of the new method.
Baseline human papillomavirus (HPV) prevalence and type distribution were evaluated in young Chinese women enrolled in a clinical trial of an HPV vaccine (ClinicalTrials.gov registration NCT00779766). Cervical specimens and blood samples were collected at baseline from women aged 18-25 years (n?=?6,051) from four sites across Jiangsu province. Cervical specimens were tested for HPV DNA by SPF10 PCR-DEIA-LiPA25 version 1, and HPV-16/18 type-specific polymerase chain reaction. Anti-HPV-16 and anti-HPV-18 antibody titres were quantified by enzyme-linked immunosorbent assay. At baseline, 15.3% of women were DNA positive for any of 14 HPV high-risk (hr) types (HPV-16/18/31/33/35/39/45/51/52/56/58/59/66/68). The most commonly detected hrHPV types in cervical specimens were HPV-52 (4.0%) and HPV-16 (3.7%). High-risk HPV DNA-positivity increased with severity of cytological abnormalities: 39.3% in atypical squamous cells of undetermined significance, 85.0% in low-grade squamous intraepithelial lesions and 97.8% in high-grade squamous intraepithelial lesions (HSIL). The hrHPV types most frequently detected in HSIL were HPV-16 (63.0%), HPV-18 (17.4%), HPV-52 (17.4%), HPV-58 (15.2%) and HPV-33 (15.2%). The hrHPV types most frequently detected in cervical intraepithelial neoplasia 2+ were HPV-16 (66.1%), HPV-33 (16.1%), HPV-52 (16.1%), HPV-58 (14.5%) and HPV-51 (11.3%). Multiple hrHPV infections were reported for 24.4% of hrHPV DNA positive women. Regardless of baseline HPV DNA status, 30.5% and 16.0% of subjects were initially seropositive for anti-HPV-16 and anti-HPV-18, respectively. In conclusion, the high baseline seropositivity rate and intermediate prevalence of cervical hrHPV types in Chinese women aged 18-25 years underlines the importance of early HPV vaccination in this population.
This phase II/III, double-blind, randomized trial assessed the efficacy, immunogenicity and safety of the human papillomavirus (HPV)-16/18 AS04-adjuvanted vaccine in young Chinese women (ClinicalTrials.gov registration NCT00779766). Women aged 18-25 years from Jiangsu province were randomized (1:1) to receive HPV vaccine (n?=?3,026) or Al(OH)3 control (n?=?3,025) at months 0, 1 and 6. The primary objective was vaccine efficacy (VE) against HPV-16/18 associated 6-month persistent infection (PI) and/or cervical intraepithelial neoplasia (CIN) 1+. Secondary objectives were VE against virological and clinical endpoints associated with HPV-16/18 and with high-risk HPV types, immunogenicity and safety. Mean follow-up for the according-to-protocol cohort for efficacy (ATP-E) was ?15 months after the third dose. In the ATP-E (vaccine?=?2,889; control?=?2,894), for initially HPV DNA negative and seronegative subjects, HPV-16/18 related VE (95% CI) was 94.2% (62.7, 99.9) against 6-month PI and/or CIN1+ and 93.8% (60.2, 99.9) against cytological abnormalities. VE against HPV-16/18 associated CIN1+ and CIN2+ was 100% (-50.4, 100) and 100% (-140.2, 100), respectively (no cases in the vaccine group and 4 CIN1+ and 3 CIN2+ cases in the control group). At Month 7, at least 99.7% of initially seronegative vaccine recipients had seroconverted for HPV-16/18; geometric mean antibody titres (95% CI) were 6,996 (6,212 to 7,880) EU/mL for anti-HPV-16 and 3,309 (2,942 to 3,723) EU/mL for anti-HPV-18. Safety outcomes between groups were generally similar. The HPV-16/18 AS04-adjuvanted vaccine is effective, immunogenic and has a clinically acceptable safety profile in young Chinese women. Prophylactic HPV vaccination has the potential to substantially reduce the burden of cervical cancer in China.
In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.
Although innate immunity is increasingly recognized to contribute to lung allograft rejection, the significance of endogenous innate ligands, such as hyaluronan (HA) fragments, in clinical or experimental lung transplantation is uncertain.
Inherited mtDNA diseases transmit maternally and cause severe phenotypes. Currently, there is no effective therapy or genetic screens for these diseases; however, nuclear genome transfer between patients' and healthy eggs to replace mutant mtDNAs holds promises. Considering that a polar body contains few mitochondria and shares the same genomic material as an oocyte, we perform polar body transfer to prevent the transmission of mtDNA variants. We compare the effects of different types of germline genome transfer, including spindle-chromosome transfer, pronuclear transfer, and first and second polar body transfer, in mice. Reconstructed embryos support normal fertilization and produce live offspring. Importantly, genetic analysis confirms that the F1 generation from polar body transfer possesses minimal donor mtDNA carryover compared to the F1 generation from other procedures. Moreover, the mtDNA genotype remains stable in F2 progeny after polar body transfer. Our preclinical model demonstrates polar body transfer has great potential to prevent inherited mtDNA diseases.
5'-deoxy-5'-methylthioadenosine (MTA) is an endogenous compound produced through the metabolism of polyamines. The therapeutic potential of MTA has been assayed mainly in liver diseases and, more recently, in animal models of multiple sclerosis. The aim of this study was to determine the neuroprotective effect of this molecule in vitro and to assess whether MTA can cross the blood brain barrier (BBB) in order to also analyze its potential neuroprotective efficacy in vivo.
Direct reprogramming of human fibroblasts into functional neurons in vitro by defined factors provides an invaluable resource for regenerative medicine. However, clinical applications must consider the risk of immune rejection, thus patient-specific induced neuronal cells (iNCs) may serve as an ideal source for autologous cell replacement. In this study, we report a robust process for functional neuronal cells from the patients scalp by lentiviral gene delivery of Ascl1, Myt1l, and Sox2. These three-factor iNCs are similar to human neuronal cells in morphology, surface antigens, gene expression, and electrophysiological characteristics. Our findings might provide a source of patient-specific functional neurons for cell therapy.
Despite of the immense breakthroughs of induced pluripotent stem cells (iPSCs), clinical application of iPSCs and their derivates remains hampered by a lack of definitive in vivo studies. Here, we attempted to track iPSCs-derived neural stem cells (NSCs) in the rodent and primate central nervous system (CNS) and explore their therapeutic viability for stem cell replacement in traumatic brain injury (TBI) rats and monkeys with spinal cord injury (SCI). Superparamagnetic iron oxide (SPIO) particles were used to label iPSCs-derived NSCs in vitro. Labeled NSCs were implanted into TBI rats and SCI monkeys 1 week after injury, and then imaged using gradient reflection echo (GRE) sequence by 3.0T magnetic resonance imaging (MRI) scanner. MRI analysis was performed at 1, 7, 14, 21, and 30 days, respectively, following cell transplantation. Pronounced hypointense signals were initially detected at the cell injection sites in rats and monkeys and were later found to extend progressively to the lesion regions, demonstrating that iPSCs-derived NSCs could migrate to the lesion area from the primary sites. The therapeutic efficacy of iPSCs-derived NSCs was examined concomitantly through functional recovery tests of the animals. In this study, we tracked iPSCs-derived NSCs migration in the CNS of TBI rats and SCI monkeys in vivo for the first time. Functional recovery tests showed obvious motor function improvement in transplanted animals. These data provide the necessary foundation for future clinical application of iPSCs for CNS injury.
Statistical procedures for variable selection have become integral elements in any analysis. Successful procedures are characterized by high predictive accuracy, yielding interpretable models while retaining computational efficiency. Penalized methods that perform coefficient shrinkage have been shown to be successful in many cases. Models with correlated predictors are particularly challenging to tackle. We propose a penalization procedure that performs variable selection while clustering groups of predictors automatically. The oracle properties of this procedure including consistency in group identification are also studied. The proposed method compares favorably with existing selection approaches in both prediction accuracy and model discovery, while retaining its computational efficiency. Supplemental material are available online.
We report the findings of three randomized, double-blind, placebo-controlled Phase I studies undertaken to support licensure of the liquid formulation of the human G1P rotavirus (RV) vaccine (RIX4414; GlaxoSmithKline Biologicals SA) in China. Healthy adults aged 18-45 y (n=48) and children aged 2-6 y (n=50) received a single dose of the human RV vaccine or placebo. Healthy infants (n=50) aged 6-16 weeks at the time of first vaccination received two oral doses of the human RV vaccine or placebo according to a 0, 1 mo schedule. In infants, blood samples were collected prior to vaccination and one month post-dose 2 to assess anti-RV IgA antibody concentrations using ELISA. Stool samples were collected from all infants on the day of each vaccination, at 7 and 15 d after each vaccination and one month post-dose 2. Stool samples were analyzed by ELISA for detection of RV antigen to assess RV antigen excretion. The reactogenicity profile of the human RV vaccine was found to be comparable to that of placebo in all age groups studied. The anti-RV IgA antibody seroconversion rate in infants after two vaccine doses was 86.7% (95% CI: 59.5-98.3). Vaccine take in infants who received the liquid human RV vaccine was 86.7% (95% CI: 59.5-98.3). A Phase III efficacy study of the human RV vaccine in the infant population in China has now been completed (ROTA-075/NCT01171963).
Glycogen, the largest cytosolic macromolecule, is soluble because of intricate construction generating perfect hydrophilic-surfaced spheres. Little is known about neuronal glycogen function and metabolism, though progress is accruing through the neurodegenerative epilepsy Lafora disease (LD) proteins laforin and malin. Neurons in LD exhibit Lafora bodies (LBs), large accumulations of malconstructed insoluble glycogen (polyglucosans). We demonstrated that the laforin-malin complex reduces LBs and protects neuronal cells against endoplasmic reticulum stress-induced apoptosis. We now show that stress induces polyglucosan formation in normal neurons in culture and in the brain. This is mediated by increased glucose-6-phosphate allosterically hyperactivating muscle glycogen synthase (GS1) and is followed by activation of the glycogen digesting enzyme glycogen phosphorylase. In the absence of laforin, stress-induced polyglucosans are undigested and accumulate into massive LBs, and in laforin-deficient mice, stress drastically accelerates LB accumulation and LD. The mechanism through which laforin-malin mediates polyglucosan degradation remains unclear but involves GS1 dephosphorylation by laforin. Our work uncovers the presence of rapid polyglucosan metabolism as part of the normal physiology of neuroprotection. We propose that deficiency in the degradative phase of this metabolism, leading to LB accumulation and resultant seizure predisposition and neurodegeneration, underlies LD.
Neuronal channelopathies cause brain disorders, including epilepsy, migraine, and ataxia. Despite the development of mouse models, pathophysiological mechanisms for these disorders remain uncertain. One particularly devastating channelopathy is Dravet syndrome (DS), a severe childhood epilepsy typically caused by de novo dominant mutations in the SCN1A gene encoding the voltage-gated sodium channel Na(v) 1.1. Heterologous expression of mutant channels suggests loss of function, raising the quandary of how loss of sodium channels underlying action potentials produces hyperexcitability. Mouse model studies suggest that decreased Na(v) 1.1 function in interneurons causes disinhibition. We aim to determine how mutant SCN1A affects human neurons using the induced pluripotent stem cell (iPSC) method to generate patient-specific neurons.
Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity.
Margin-based classifiers have been popular in both machine learning and statistics for classification problems. Among numerous classifiers, some are hard classifiers while some are soft ones. Soft classifiers explicitly estimate the class conditional probabilities and then perform classification based on estimated probabilities. In contrast, hard classifiers directly target on the classification decision boundary without producing the probability estimation. These two types of classifiers are based on different philosophies and each has its own merits. In this paper, we propose a novel family of large-margin classifiers, namely large-margin unified machines (LUMs), which covers a broad range of margin-based classifiers including both hard and soft ones. By offering a natural bridge from soft to hard classification, the LUM provides a unified algorithm to fit various classifiers and hence a convenient platform to compare hard and soft classification. Both theoretical consistency and numerical performance of LUMs are explored. Our numerical study sheds some light on the choice between hard and soft classifiers in various classification problems.
Being able to effectively explore the visual world is of fundamental importance, and it has been suggested that the straight-ahead gaze position within the egocentric reference frame ("primary position") might play a special role in this context. In the present study we employed human electroencephalography (EEG) to examine neural activity related to the spatial guidance of saccadic eye movements. Moreover, we sought to investigate whether such activity would be modulated by the spatial relation of saccade direction to the primary gaze position (recentering saccades). Participants executed endogenously cued saccades between five equidistant locations along the horizontal meridian. This design allowed for the comparison of isoamplitude saccades from the same starting position that were oriented either toward the primary position (centripetal) or further away from it (centrifugal). By back-averaging time-locked to the saccade onset on each trial, we identified a parietally distributed, negative-polarity EEG deflection contralateral to the direction of the upcoming saccade. Importantly, this contralateral presaccadic negativity, which appeared to reflect the location-specific attentional guidance of the eye movement, was attenuated for recentering saccades relative to isoamplitude centrifugal saccades. This differential electrophysiological signature was paralleled by faster saccadic reaction times and was substantially more apparent when time-locking the data to the onset of the saccade rather than to the onset of the cue, suggesting a tight temporal association with saccade initiation. The diminished level of this presaccadic component for recentering saccades may reflect the preferential coding of the straight-ahead gaze position, in which both the eye-centered and head-centered reference frames are perfectly aligned and from which the visual world can be effectively explored.
Partially linear models provide a useful class of tools for modeling complex data by naturally incorporating a combination of linear and nonlinear effects within one framework. One key question in partially linear models is the choice of model structure, that is, how to decide which covariates are linear and which are nonlinear. This is a fundamental, yet largely unsolved problem for partially linear models. In practice, one often assumes that the model structure is given or known and then makes estimation and inference based on that structure. Alternatively, there are two methods in common use for tackling the problem: hypotheses testing and visual screening based on the marginal fits. Both methods are quite useful in practice but have their drawbacks. First, it is difficult to construct a powerful procedure for testing multiple hypotheses of linear against nonlinear fits. Second, the screening procedure based on the scatterplots of individual covariate fits may provide an educated guess on the regression function form, but the procedure is ad hoc and lacks theoretical justifications. In this article, we propose a new approach to structure selection for partially linear models, called the LAND (Linear And Nonlinear Discoverer). The procedure is developed in an elegant mathematical framework and possesses desired theoretical and computational properties. Under certain regularity conditions, we show that the LAND estimator is able to identify the underlying true model structure correctly and at the same time estimate the multivariate regression function consistently. The convergence rate of the new estimator is established as well. We further propose an iterative algorithm to implement the procedure and illustrate its performance by simulated and real examples. Supplementary materials for this article are available online.
In decision-making on optimal treatment strategies, it is of great importance to identify variables that are involved in the decision rule, i.e. those interacting with the treatment. Effective variable selection helps to improve the prediction accuracy and enhance the interpretability of the decision rule. We propose a new penalized regression framework which can simultaneously estimate the optimal treatment strategy and identify important variables. The advantages of the new approach include: (i) it does not require the estimation of the baseline mean function of the response, which greatly improves the robustness of the estimator; (ii) the convenient loss-based framework makes it easier to adopt shrinkage methods for variable selection, which greatly facilitates implementation and statistical inferences for the estimator. The new procedure can be easily implemented by existing state-of-art software packages like LARS. Theoretical properties of the new estimator are studied. Its empirical performance is evaluated using simulation studies and further illustrated with an application to an AIDS clinical trial.
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Efficient memory formation relies on the establishment of functional hippocampal circuits. It has been proposed that synaptic connections are refined by neural activity to form functional brain circuitry. However, it is not known whether and how hippocampal connections are refined by neural activity in vivo. Using a mouse genetic system in which restricted populations of neurons in the hippocampal circuit are inactivated, we show that inactive axons are eliminated after they develop through a competition with active axons. Remarkably, in the dentate gyrus, which undergoes neurogenesis throughout life, axon refinement is achieved by a competition between mature and young neurons. These results demonstrate that activity-dependent competition plays multiple roles in the establishment of functional memory circuits in vivo.
Bronchiolitis obliterans (BO) is a fibrotic lung disease that occurs in a variety of clinical settings, including toxin exposures, autoimmunity and lung or bone marrow transplant. Despite its increasing clinical importance, little is known regarding the underlying disease mechanisms due to a lack of adequate small animal BO models. Recent epidemiological studies have implicated exposure to diacetyl (DA), a volatile component of artificial butter flavoring, as a cause of BO in otherwise healthy factory workers. Our overall hypothesis is that DA induces severe epithelial injury and aberrant repair that leads to the development of BO. Therefore, the objectives of this study were 1) to determine if DA, delivered by intratracheal instillation (ITI), would lead to the development of BO in rats and 2) to characterize epithelial regeneration and matrix repair after ITI of DA.
While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the development of optimal weighting schemes for various nonstandard classification problems. In addition, we discuss several alternative criteria and propose an adaptive weighting scheme (awDWD) and demonstrate its advantages over nonadaptive weighting schemes under some situations. The second major contribution is a theoretical study of weighted DWD. Both high-dimensional low sample-size asymptotics and Fisher consistency of DWD are studied. The performance of weighted DWD is evaluated using simulated examples and two real data examples. The theoretical results are also confirmed by simulations.
Classical statistical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). These methods often make certain assumptions on the form of probability functions or on the underlying distributions of subclasses. In this article, we develop a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. In particular, the new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate competitive performance of the new approach and compare it with several other existing methods.
We study a general class of partially linear transformation models, which extend linear transformation models by incorporating nonlinear covariate effects in survival data analysis. A new martingale-based estimating equation approach, consisting of both global and kernel-weighted local estimation equations, is developed for estimating the parametric and nonparametric covariate effects in a unified manner. We show that with a proper choice of the kernel bandwidth parameter, one can obtain the consistent and asymptotically normal parameter estimates for the linear effects. Asymptotic properties of the estimated nonlinear effects are established as well. We further suggest a simple resampling method to estimate the asymptotic variance of the linear estimates and show its effectiveness. To facilitate the implementation of the new procedure, an iterative algorithm is developed. Numerical examples are given to illustrate the finite-sample performance of the procedure.
Censored median regression has proved useful for analyzing survival data in complicated situations, say, when the variance is heteroscedastic or the data contain outliers. In this paper, we study the sparse estimation for censored median regression models, which is an important problem for high dimensional survival data analysis. In particular, a new procedure is proposed to minimize an inverse-censoring-probability weighted least absolute deviation loss subject to the adaptive LASSO penalty and result in a sparse and robust median estimator. We show that, with a proper choice of the tuning parameter, the procedure can identify the underlying sparse model consistently and has desired large-sample properties including root-n consistency and the asymptotic normality. The procedure also enjoys great advantages in computation, since its entire solution path can be obtained efficiently. Furthermore, we propose a resampling method to estimate the variance of the estimator. The performance of the procedure is illustrated by extensive simulations and two real data applications including one microarray gene expression survival data.
Semiparametric linear transformation models have received much attention due to its high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models is less studied, partially because a convenient loss function is not readily available under this context. In this paper, we propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. (2002), construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. Under regularity conditions, we have shown that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve higher efficiency than that yielded from the estimation equations. For computation, we suggest a one-step approximation algorithm which can take advantage of the LARS and build the entire solution path efficiently. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data.
Dentate granule cell (DGC) neurogenesis persists throughout life in the hippocampal dentate gyrus. In rodent temporal lobe epilepsy models, status epilepticus (SE) stimulates neurogenesis, but many newborn DGCs integrate aberrantly and are hyperexcitable, whereas others may integrate normally and restore inhibition. The overall influence of altered neurogenesis on epileptogenesis is therefore unclear. To better understand the role DGC neurogenesis plays in seizure-induced plasticity, we injected retroviral (RV) reporters to label dividing DGC progenitors at specific times before or after SE, or used x-irradiation to suppress neurogenesis. RV injections 7 weeks before SE to mark DGCs that had matured by the time of SE labeled cells with normal placement and morphology 4 weeks after SE. RV injections 2 or 4 weeks before seizure induction to label cells still developing during SE revealed normally located DGCs exhibiting hilar basal dendrites and mossy fiber sprouting (MFS) when observed 4 weeks after SE. Cells labeled by injecting RV after SE displayed hilar basal dendrites and ectopic migration, but not sprouting, at 28 d after SE; when examined 10 weeks after SE, however, these cells showed robust MFS. Eliminating cohorts of newborn DGCs by focal brain irradiation at specific times before or after SE decreased MFS or hilar ectopic DGCs, supporting the RV labeling results. These findings indicate that developing DGCs exhibit maturation-dependent vulnerability to SE, indicating that abnormal DGC plasticity derives exclusively from aberrantly developing DGCs. Treatments that restore normal DGC development after epileptogenic insults may therefore ameliorate epileptogenic network dysfunction and associated morbidities.
Forebrain neurogenesis persists throughout life in the rodent subventricular zone (SVZ) and hippocampal dentate gyrus (DG). Several strategies have been employed to eliminate adult neurogenesis and thereby determine whether depleting adult-born neurons disrupts specific brain functions, but some approaches do not specifically target neural progenitors. We have developed a transgenic mouse line to reversibly ablate adult neural stem cells and suppress neurogenesis. The nestin-tk mouse expresses herpes simplex virus thymidine kinase (tk) under the control of the nestin 2nd intronic enhancer, which drives expression in neural progenitors. Administration of ganciclovir (GCV) kills actively dividing cells expressing this transgene. We found that peripheral GCV administration suppressed SVZ-olfactory bulb and DG neurogenesis within 2 weeks but caused systemic toxicity. Intracerebroventricular GCV infusion for 28 days nearly completely depleted proliferating cells and immature neurons in both the SVZ and DG without systemic toxicity. Reversibility of the effects after prolonged GCV infusion was slow and partial. Neurogenesis did not recover 2 weeks after cessation of GCV administration, but showed limited recovery 6 weeks after GCV that differed between the SVZ and DG. Suppression of neurogenesis did not inhibit antidepressant responsiveness of mice in the tail suspension test. These findings indicate that SVZ and DG neural stem cells differ in their capacity for repopulation, and that adult-born neurons are not required for antidepressant responses in a common behavioral test of antidepressant efficacy. The nestin-tk mouse should be useful for studying how reversible depletion of adult neurogenesis influences neurophysiology, other behaviors, and neural progenitor dynamics.
We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.
We propose and study a unified procedure for variable selection in partially linear models. A new type of double-penalized least squares is formulated, using the smoothing spline to estimate the nonparametric part and applying a shrinkage penalty on parametric components to achieve model parsimony. Theoretically we show that, with proper choices of the smoothing and regularization parameters, the proposed procedure can be as efficient as the oracle estimator (Fan and Li, 2001). We also study the asymptotic properties of the estimator when the number of parametric effects diverges with the sample size. Frequentist and Bayesian estimates of the covariance and confidence intervals are derived for the estimators. One great advantage of this procedure is its linear mixed model (LMM) representation, which greatly facilitates its implementation by using standard statistical software. Furthermore, the LMM framework enables one to treat the smoothing parameter as a variance component and hence conveniently estimate it together with other regression coefficients. Extensive numerical studies are conducted to demonstrate the effective performance of the proposed procedure.
We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property (Fan and Li, 2001; Fan and Peng, 2004) which ensures the optimal large sample performance. Furthermore, the high-dimensionality often induces the collinearity problem which should be properly handled by the ideal method. Many existing variable selection methods fail to achieve both goals simultaneously. In this paper, we propose the adaptive Elastic-Net that combines the strengths of the quadratic regularization and the adaptively weighted lasso shrinkage. Under weak regularity conditions, we establish the oracle property of the adaptive Elastic-Net. We show by simulations that the adaptive Elastic-Net deals with the collinearity problem better than the other oracle-like methods, thus enjoying much improved finite sample performance.
Traumatic optic nerve injury and glaucoma are among the leading causes of incurable vision loss across the world. What is worse, neither pharmacological nor surgical interventions are significantly effective in reversing or halting the progression of vision loss. Advances in cell biology offer some hope for the victims of optic nerve damage and subsequent partial or complete visual loss. Retinal ganglion cells (RGCs) travel through the optic nerve and carry all visual signals to the brain. After injury, RGC axons usually fail to regrow and die, leading to irreversible loss of vision. Various kinds of cells and factors possess the ability to support the process of axon regeneration for RGCs. This article summarizes the latest advances in RGC regeneration.
The selection of random effects in linear mixed models is an important yet challenging problem in practice. We propose a robust and unified framework for automatically selecting random effects and estimating covariance components in linear mixed models. A moment-based loss function is first constructed for estimating the covariance matrix of random effects. Two types of shrinkage penalties, a hard thresholding operator and a new sandwich-type soft-thresholding penalty, are then imposed for sparse estimation and random effects selection. Compared with existing approaches, the new procedure does not require any distributional assumption on the random effects and error terms. We establish the asymptotic properties of the resulting estimator in terms of its consistency in both random effects selection and variance component estimation. Optimization strategies are suggested to tackle the computational challenges involved in estimating the sparse variance-covariance matrix. Furthermore, we extend the procedure to incorporate the selection of fixed effects as well. Numerical results show promising performance of the new approach in selecting both random and fixed effects and, consequently, improving the efficiency of estimating model parameters. Finally, we apply the approach to a data set from the Amsterdam Growth and Health study.
In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject-specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random-effect model and assumes a time-varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. One additional advantage of the procedure is that it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time-varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.
Adult hippocampal neurogenesis is thought to be essential for learning and memory, and has been implicated in the pathogenesis of several disorders. Although recent studies have identified key factors regulating neuroprogenitor proliferation in the adult hippocampus, the mechanisms that control the migration and integration of adult-born neurons into circuits are largely unknown. Reelin is an extracellular matrix protein that is vital for neuronal development. Activation of the Reelin cascade leads to phosphorylation of Disabled-1, an adaptor protein required for Reelin signaling. Here we used transgenic mouse and retroviral reporters along with Reelin signaling gain-of-function and loss-of-function studies to show that the Reelin pathway regulates migration and dendritic development of adult-generated hippocampal neurons. Whereas overexpression of Reelin accelerated dendritic maturation, inactivation of the Reelin signaling pathway specifically in adult neuroprogenitor cells resulted in aberrant migration, decreased dendrite development, formation of ectopic dendrites in the hilus, and the establishment of aberrant circuits. Our findings support a cell-autonomous and critical role for the Reelin pathway in regulating dendritic development and the integration of adult-generated granule cells and point to this pathway as a key regulator of adult neurogenesis. Moreover, our data reveal a novel role of the Reelin cascade in adult brain function with potential implications for the pathogenesis of several neurological and psychiatric disorders.
Extensive baseline covariate information is routinely collected on participants in randomized clinical trials, and it is well recognized that a proper covariate-adjusted analysis can improve the efficiency of inference on the treatment effect. However, such covariate adjustment has engendered considerable controversy, as post hoc selection of covariates may involve subjectivity and may lead to biased inference, whereas prior specification of the adjustment may exclude important variables from consideration. Accordingly, how to select covariates objectively to gain maximal efficiency is of broad interest. We propose and study the use of modern variable selection methods for this purpose in the context of a semiparametric framework, under which variable selection in modeling the relationship between outcome and covariates is separated from estimation of the treatment effect, circumventing the potential for selection bias associated with standard analysis of covariance methods. We demonstrate that such objective variable selection techniques combined with this framework can identify key variables and lead to unbiased and efficient inference on the treatment effect. A critical issue in finite samples is validity of estimators of uncertainty, such as standard errors and confidence intervals for the treatment effect. We propose an approach to estimation of sampling variation of estimated treatment effect and show its superior performance relative to that of existing methods.
ATBF1 is a large nuclear protein that contains multiple zinc-finger motifs and four homeodomains. In mammals, ATBF1 regulates differentiation, and its mutation and/or downregulation is involved in tumorigenesis in several organs. To gain more insight into the physiological functions of ATBF1, we generated and validated a conditional allele of mouse Atbf1 in which exons 7 and 8 were flanked by loxP sites (Atbf1(flox) ). Germline deletion of a single Atbf1 allele was achieved by breeding to EIIa-cre transgenic mice, and Atbf1 heterozygous mice displayed reduced body weight, preweaning mortality, increased cell proliferation, and attenuated cytokeratin 18 expression, indicating haploinsufficiency of Atbf1. Floxed Atbf1 mice will help us understand such biological processes as neuronal differentiation and tumorigenesis.
Syringocystadenocarcinoma papilliferum (SCACP) is an exceedingly rare cutaneous adnexal neoplasm, which is typically located in the head and neck, and perianal area. Very few cases have been reported in the literature. Here, we report a case of SCACP with evident transition to squamous differentiation. A 75-year-old white woman presented with 1-year history of a solitary tender nodule in the left upper arm. Physical examination revealed a single, 1.5 × 1.1-cm, erythematous ulcerated nodule within a background of red patch. Biopsy showed an adnexal carcinoma connected to the epidermis and composed of cystic papillary projections admixed with solid basaloid areas with marked cytologic atypia. The basaloid tumor cells appeared to blend with the squamous component that demonstrated ductal formation, which was highlighted by carcinoembryonic antigen. Tumor cells were reactive for both cytokeratins 5/6 and 7. This case represents SCACP arising from syringocystadenoma papilliferum in the upper arm, with distinct transition to areas of squamous differentiation.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.