Mixed-effects models are flexible and useful tools for analyzing data with a hierarchical stochastic structure in forestry and could also be used to significantly improve the performance of forest growth models. Here, a protocol is presented that synthesizes information relating to linear mixed-effects models.
Here, we developed an individual-tree model of 5-year basal area increments based on a dataset including 21898 Picea asperata trees from 779 sample plots located in Xinjiang Province, northwest China. To prevent high correlations among observations from the same sampling unit, we developed the model using a linear mixed-effects approach with random plot effect to account for stochastic variability. Various tree- and stand-level variables, such as indices for tree size, competition, and site condition, were included as fixed effects to explain the residual variability. In addition, heteroscedasticity and autocorrelation were described by introducing variance functions and autocorrelation structures. The optimal linear mixed-effects model was determined by several fit statistics: Akaike’s information criterion, Bayesian information criterion, logarithm likelihood, and a likelihood ratio test. The results indicated that significant variables of individual-tree basal area increment were the inverse transformation of diameter at breast height, the basal area of trees larger than the subject tree, the number of trees per hectare, and elevation. Furthermore, errors in variance structure were most successfully modeled by the exponential function, and the autocorrelation was significantly corrected by first-order autoregressive structure (AR(1)). The performance of the linear mixed-effects model was significantly improved relative to the model using ordinary least squares regression.
Compared with even-aged monoculture, uneven-aged mixed-species forest management with multiple objectives has received increased attention recently1,2,3. Prediction of different management alternatives is necessary for formulating robust forest management strategies, especially for complex uneven-aged mixed-species forest4. Forest growth and yield models have been used extensively to forecast tree or stand development and harvest under various management schemes5,6,7. Forest growth and yield models are classified into individual-tree models, size-class models, and whole-stand growth models6,7,8. Unfortunately, size-class models and whole-stand models are not appropriate for uneven-aged mixed-species forests, which require a more detailed description to support the forest management decision-making process. For this reason, individual-tree growth and yield models have received increased attention throughout the last few decades because of their ability to make predictions for forest stands with a variety of species compositions, structures, and management strategies9,10,11.
Ordinary least squares (OLS) regression is the most commonly used method for the development of individual-tree growth models12,13,14,15. The datasets for individual-tree growth models collected repeatedly over a fixed length of time on the same sampling unit (i.e., sample plot or tree) have a hierarchical stochastic structure, with a lack of independence and high spatial and temporal correlation among observations10,16. The hierarchical stochastic structure violates the fundamental assumptions of OLS regression: namely independent residuals and normally distributed data with equal variances. Therefore, the use of OLS regression inevitably produces biased estimates of the standard error of parameter estimates for these data13,14.
Mixed-effects models provide a powerful tool for analyzing data with complex structures, such as repeated measures data, longitudinal data, and multi-level data. Mixed-effects models consist of both fixed components, common to the complete population, and random components, which is specific to each sampling level. In addition, mixed-effects models take into account heteroscedasticity and autocorrelation in space and time by defining non-diagonal variance-covariance structure matrices17,18,19. For this reason, mixed-effects models have been extensively used in forestry, such as in diameter-height models20,21, crown models22,23, self-thinning models24,25, and growth models26,27.
Here, the main objective was to develop an individual-tree basal area increment model using a linear mixed-effects approach. We hope that the mixed-effects approach could be broadly applied.
1. Data preparation
Variables | Fitting data | Validation data | |||||||
Min | Max | Mean | S.D. | Min | Max | Mean | S.D. | ||
DBH1 (cm) | 5 | 124.8 | 19.9 | 13.2 | 5 | 101.5 | 19.5 | 13.4 | |
QMD (cm) | 6.7 | 82.3 | 22.5 | 8.5 | 9.2 | 73.3 | 21.8 | 9.2 | |
ID (cm) | 0.1 | 14.4 | 1.1 | 1 | 0.1 | 16.9 | 1 | 1.1 | |
BAL (m3) | 0 | 5.2 | 1.7 | 0.9 | 0 | 5.4 | 1.7 | 1 | |
NT (trees/ha) | 14.9 | 3642 | 1072 | 673.7 | 14.9 | 3418 | 1205 | 829.3 | |
BA (m2/ha) | 0.1 | 77.5 | 34.2 | 13.9 | 0.1 | 80.6 | 34.5 | 15.3 | |
EL (m) | 2 | 3302 | 2189 | 340.3 | 1441 | 3380 | 2256 | 308.3 |
Table 1. Summary statistics for fitting and validation data. DBH1: initial diameter at breast height at 1.3 m (DBH), DBH2: DBH measured after 5 years of growth, QMD: quadratic mean diameter, ID: diameter increment for 5 years (DBH2 – DBH1), BAL: the basal area of trees larger than the subject tree (the subject tree: the tree which was calculated the competition indices), NT: the number of trees per hectare, BA: basal area per hectare, EL: elevation, S.D.: standard deviation.
2. Basic model development
3. Linear mixed-effects model development with the package “nlme” in R software
4. Bias correction
5. Model prediction and evaluation
The basic basal area increment model for P. asperata was expressed as Equation (7). The parameter estimates, their corresponding standard errors, and the lack-of-fit statistics are shown in Table 2. The residual plot is shown in Figure 1. Pronounced heteroscedasticity of the residuals was observed.
(7)
Estimate | Standard error | t-test | P-value | VIF | |
int | 2.41 | 2.26E-02 | 106.78 | <2e-16 | – |
1/DBH1 | -5.84 | 7.57E-02 | -77.19 | <2e-16 | 1.12 |
BAL | -0.0954 | 3.34E-03 | -28.54 | <2e-16 | 1.08 |
NT | -0.000158 | 4.74E-06 | -33.31 | <2e-16 | 1.12 |
EL | -0.00011 | 9.07E-06 | -12.13 | <2e-16 | 1.05 |
AIC = 16789 | |||||
BIC = 16836 | |||||
Loglik = -8389 |
Table 2. Basic model results. The estimated parameters, their corresponding standard errors, and the lack-of-fit statistics derived from Equation (7). VIF: variance inflation factor, AIC: Akaike’s information criterion, BIC: Bayesian information criterion, and Loglik: logarithm likelihood.
Figure 1. Residual plot derived from Equation (7). The residuals have a clear trend, i.e., pronounced heteroscedasticity of the residuals was observed. Please click here to view a larger version of this figure.
There were 31 possible combinations of random-effects parameters for Equation (7). After fitting, 30 combinations reached convergence (Table 3). Among these 30 combinations, Model 30 of Equation (8) was selected since it yielded the lowest AIC (9083), the lowest BIC (9207), the largest LogLik (-4525), and the LRT was significantly different when compared with the other models.
(8)
where β1 – β5 are the fixed effects parameters and b1 – b4 are the random-effects parameters.
Model | Random parameters | AIC | BIC | LogLik | LRT | P-value | ||||
int | 1/DBH1 | BAL | NT | EL | ||||||
1 | ▲ | 10175 | 10230 | -5081 | ||||||
2 | ▲ | 11630 | 11684 | -5808 | ||||||
3 | ▲ | 11772 | 11826 | -5879 | ||||||
4 | ▲ | 10556 | 10611 | -5271 | ||||||
5 | ▲ | 10259 | 10313 | -5123 | ||||||
6 | ▲ | ▲ | 9268 | 9338 | -4625 | 911.1 | <.0001 | |||
(1 vs 6) | ||||||||||
7 | ▲ | ▲ | 9411 | 9481 | -4697 | |||||
8 | ▲ | ▲ | 10179 | 10249 | -5081 | |||||
9 | ▲ | ▲ | 10179 | 10249 | -5080 | |||||
10 | ▲ | ▲ | 10829 | 10899 | -5406 | |||||
11 | ▲ | ▲ | 9532 | 9601 | -4757 | |||||
12 | ▲ | ▲ | 9335 | 9405 | -4659 | |||||
13 | ▲ | ▲ | 9803 | 9873 | -4892 | |||||
14 | ▲ | ▲ | 9465 | 9535 | -4723 | |||||
15 | ▲ | ▲ | 10200 | 10270 | -5091 | |||||
16 | ▲ | ▲ | ▲ | Nonconvergence | ||||||
17 | ▲ | ▲ | ▲ | 9271 | 9364 | -4624 | ||||
18 | ▲ | ▲ | ▲ | 9274 | 9367 | -4625 | ||||
19 | ▲ | ▲ | ▲ | 9417 | 9510 | -4696 | ||||
20 | ▲ | ▲ | ▲ | 9417 | 9510 | -4697 | ||||
21 | ▲ | ▲ | ▲ | 10184 | 10277 | -5080 | ||||
22 | ▲ | ▲ | ▲ | 9332 | 9425 | -4654 | ||||
23 | ▲ | ▲ | ▲ | 9132 | 9225 | -4554 | 142.7 | <.0001 | ||
(23 vs 6) | ||||||||||
24 | ▲ | ▲ | ▲ | 9293 | 9386 | -4634 | ||||
25 | ▲ | ▲ | ▲ | 9443 | 9536 | -4709 | ||||
26 | ▲ | ▲ | ▲ | ▲ | 9083 | 9207 | -4525 | |||
27 | ▲ | ▲ | ▲ | ▲ | 9086 | 9210 | -4527 | |||
28 | ▲ | ▲ | ▲ | ▲ | 9280 | 9404 | -4624 | |||
29 | ▲ | ▲ | ▲ | ▲ | 9425 | 9549 | -4696 | |||
30 | ▲ | ▲ | ▲ | ▲ | 9083 | 9207 | -4525 | 56.8 | <.0001 | |
(30 vs 23) | ||||||||||
31 | ▲ | ▲ | ▲ | ▲ | ▲ | 9091 | 9254 | -4525 |
Table 3. Evaluation indices of each linear mixed-effects model. ▲: random-effects parameter was selected for fitting; LRT: likelihood ratio test.
The linear mixed-effects models with variance functions and correlation structures are shown in Table 4. According to the AIC, BIC, Loglik, and LRT, the exponential function and AR(1) were selected as the best variance function and autocorrelation structure, respectively.
Model | Variance function | Correlation structure | AIC | BIC | LogLik | LRT | P-value |
30 | No | Independent | 9083 | 9207 | -4525 | ||
30.1 | ConstPower | Independent | 9075 | 9215 | -4520 | 11.8a | 0.0028 |
30.2 | Power | Independent | 9073 | 9205 | -4520 | 11.7a | 6.00E-04 |
30.3 | Exponent | Independent | 9073 | 9204 | -4519 | 12.3a | 5.00E-04 |
30.3.1 | Exponent | CS | Nonconvergence | ||||
30.3.2 | Exponent | AR(1) | 9050 | 9189 | -4507 | 24.9b | <.0001 |
30.3.3 | Exponent | ARMA(1,1) | Nonconvergence |
Table 4. Comparisons of the linear mixed-effects individual-tree basal area increment models performance with different variance functions and different correlation structures. CS: compound symmetry structure, AR(1): a first-order autoregressive structure, ARMA(1,1): a combination of first-order autoregressive and moving average structures; a Likelihood ratio was calculated for Model 30; b Likelihood ratio was calculated for Model 30.3.
The final linear mixed-effects individual-tree basal area increment model was proposed using the REML method [Equation (9)]. The estimated fixed parameters, their corresponding standard errors, and the lack-of-fit statistics are shown in Table 5. The residual plot of the final model is shown in Figure 2. A significant improvement was observed in the residuals.
(9)
where
(10)
Estimate | Standard error | t-Test | P-value | |
int | 2.8086 | 7.99E-02 | 35.14 | <0.01 |
1/DBH1 | -6.2402 | 1.56E-01 | -40.01 | <0.01 |
BAL | -0.1324 | 8.07E-03 | -16.41 | <0.01 |
NT | -0.0001 | 2.26E-05 | -4.921 | <0.01 |
EL | -0.0003 | 3.32E-05 | -7.86 | <0.01 |
AIC = 9105 | ||||
BIC = 9244 | ||||
Loglik = -4535 |
Table 5. Mixed-effects model results. The estimated fixed parameters, their corresponding standard errors, and the lack-of-fit statistics derived from Equation (9).
Figure 2. Residual plot derived from Equation (9). Compared with Figure 1, a significant improvement was observed in the residuals. Please click here to view a larger version of this figure.
Table 6 listed the three prediction statistics of Equation (7) and Equation (9). Compared with the basic model, the performance of the linear mixed-effects model was significantly improved.
Model | Bias | RMSE | R2 |
Basic model | 0.297 | 0.377 | 0.479 |
Mixed-effects model | 0.221 | 0.286 | 0.699 |
Table 6. Evaluation indices of the basic model and the linear mixed-effects model. A significant improvement was observed from the three prediction statistics.
A crucial issue for the development of mixed-effects models is to determine which parameters can be treated as random effects and which should be considered fixed effects34,35. Two methods have been proposed. The most common approach is to treat all parameters as random effects and then have the best model selected by AIC, BIC, Loglik, and LRT. This was the method employed by our study35. An alternative is to fit basal area increment models for every sample plot with OLS regression. Parameters that have high variability and less overlap in confidence intervals across the sample plots among these models can be regarded as random ones17.
To account for heteroscedasticity and autocorrelation, three variance functions and three autocorrelation structures were introduced. Consistent with the results of Calama and Montero17 and Uzoh and Oliver27, the exponential function and AR(1) were determined to be the optimal variance function and autocorrelation structure, respectively.
There are two most commonly used methods in statistical software programs to estimate mixed-effects models: ML and REML17. ML is more flexible because models that differ in either their fixed effects or their random effects can be directly compared. However, the estimator for the variance obtained by ML is biased because ML does not account for the fact that the intercept and slope are estimated as well (as opposed to being known for certain). REML can provide superior ML estimates. Therefore, when the model comparisons were completed, the REML method was used for final model fitting17,18,36.
In this study, we found that the individual-tree basal area increment model for P. asperata using a linear mixed-effects approach represented a significant improvement over the basic model using OLS regression. Mixed-effects models provide an efficient tool for modeling data with hierarchical stochastic structure, making it widely applicable in fields such as agriculture, biology, economics, manufacturing, and geophysics.
The authors have nothing to disclose.
This research was funded by the Fundamental Research Funds for the Central Universities, grant number 2019GJZL04. We thank Professor Weisheng Zeng at the Academy of Forest Inventory and Planning, National Forestry and Grassland Administration, China for providing access to data.