A Reproducible Survey-Scoring And C5.0 Decision-Tree Workflow For Classifying Self-Reported Higher-Order Thinking In Generative AI-Supported Learning

Xueyan Zhao; Pu Song; Mengmeng Zhong; Shiya Zhu

doi:10.3791/71447

Method Article

A Reproducible Survey-Scoring And C5.0 Decision-Tree Workflow For Classifying Self-Reported Higher-Order Thinking In Generative AI-Supported Learning

DOI:

10.3791/71447

⸱

June 5th, 2026

Xueyan Zhao¹ , Pu Song¹ , Mengmeng Zhong² , Shiya Zhu³

¹School of Educational Science, Yili Normal University, ²Vocational Education Research Institute, Liuzhou Polytechnic University, ³School of Education, Liupanshui Normal University

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol presents a reproducible workflow for collecting self-report survey data, scoring learner-related variables, dichotomizing higher-order thinking scores, and applying a C5.0 decision tree to demonstrate interpretable learner classification in generative artificial intelligence-supported academic learning.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Generative artificial intelligence is increasingly used in higher education, but researchers still need transparent procedures for collecting, scoring, and organizing learner-level survey data in this rapidly changing context. This article presents a reproducible survey-scoring and C5.0 decision-tree workflow for classifying self-reported higher-order thinking among college students who have used generative artificial intelligence tools for academic learning. The protocol covers participant recruitment, questionnaire administration, response-quality screening, composite-score calculation, binary coding, training-testing partitioning, C5.0 tree construction, pruning, model-output export, and interpretation of node-based classification paths. The workflow is demonstrated using a single-university survey dataset of 776 undergraduate students collected in China from March 7 to March 15, 2023. Higher-order thinking is operationalized as a self-reported questionnaire score rather than as directly observed cognitive performance. In the demonstration dataset, the pruned tree retained eight learner-related variables: generative artificial intelligence anxiety, trust in generative artificial intelligence, problematic smartphone use, academic procrastination, academic performance, parental upbringing, negative emotions, and attitudes toward generative artificial intelligence. The retained model achieved 89.52% accuracy in the training subset and 86.21% accuracy in the testing subset. However, class-specific evaluation showed uneven performance, with substantially weaker recall for the Low-HOT class than for the High-HOT class. Therefore, the tree should be interpreted as an auxiliary, interpretable workflow demonstration rather than as a validated screening tool. This protocol may support researchers who require a documented and repeatable procedure for survey-based learner profiling in generative artificial intelligence-supported learning environments.

Introduction

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Higher-order thinking is a central concern in higher education because university learning is expected to move beyond the recall of factual knowledge toward analysis, evaluation, reflection, problem solving, and knowledge creation¹. Later revisions of Bloom’s taxonomy further clarified that advanced learning involves not only remembering and understanding, but also applying, analyzing, evaluating, and creating knowledge in increasingly complex contexts². This distinction is important for the present protocol because higher-order thinking is not treated as a simple achievement score, but as a learner-level construct that requires careful operationalization before it can be analyzed. In educational research, higher-order thinking has also been linked to critical reasoning, metacognitive monitoring, and the transfer of learning across problem contexts³. For technology-supported learning, this issue becomes especially important because students often need to connect concepts, evaluate generated information, and make decisions during problem-solving activities rather than simply receive instructional content⁴.

The rapid spread of generative artificial intelligence has made the measurement of higher-order thinking more complicated. Generative artificial intelligence tools can produce text, code, images, explanations, summaries, and other forms of academic support from patterns learned from large-scale datasets⁵. Large language models, including ChatGPT, became especially visible in higher education during the early period of public adoption in late 2022 and early 2023⁶. In academic settings, students may use these tools for information search, concept explanation, outline development, idea generation, assignment revision, translation support, and other course-related activities. These uses may reduce routine workload, but they may also change how students allocate attention, judge information quality, and regulate their own cognitive effort. For this reason, research on generative artificial intelligence-supported learning requires procedures that clearly document how learner-level variables are collected, scored, coded, and interpreted.

Previous work on digital and blended learning has shown that technology can support higher-order thinking when students remain actively engaged with learning tasks rather than using the environment passively⁷. Mobile and technology-enhanced learning studies have similarly suggested that peer interaction, learning perception, and active engagement are relevant to students’ higher-order thinking tendencies⁸. Data mining approaches have also been used to explore how online learning behaviors are associated with higher-order thinking skills, showing the value of organizing educational data into interpretable learner patterns⁹. A substantial part of technology use research has been informed by the Technology Acceptance Model, which explains use behavior through perceived usefulness and perceived ease of use¹⁰. However, generative artificial intelligence-supported learning differs from earlier digital learning contexts because students interact with systems that generate open-ended responses, shape confidence, and may either stimulate or reduce critical engagement. Therefore, a protocol for this topic should not only record whether students accept the technology, but also document how emotional, behavioral, academic, and contextual indicators are prepared for analysis.

Several learner-related variables are especially relevant in this setting. Trust in artificial intelligence may influence whether students rely on generated output, question it, or integrate it into academic work with caution¹¹. Anxiety toward technology or artificial intelligence may also matter, although it should not be interpreted in a single direction without empirical caution¹². Problematic smartphone use is another relevant behavioral indicator because fragmented attention and device-related distraction may interfere with sustained cognitive engagement during study¹³. More general learning research further suggests that self-regulated learning strategies are associated with academic achievement in online and technology-supported environments¹⁴. Inquiry-based and cooperative learning research also indicates that meaningful learning depends on how students participate in learning tasks, organize evidence, and construct explanations¹⁵. These considerations support the use of a multi-variable survey workflow rather than a narrow measure of generative artificial intelligence exposure alone.

Cognitive Load Theory provides an additional rationale for documenting emotional and behavioral indicators in the same workflow. Because working memory is limited, excessive or poorly organized information can interfere with learning and problem solving¹⁶. In generative artificial intelligence-supported learning, this issue becomes visible when students encounter abundant generated output, inconsistent explanations, irrelevant suggestions, or competing sources of information. Emotional activity has also been shown to relate to cognitive load during multimedia learning, suggesting that affective states may shape how learners process information in technology-rich environments¹⁷. At the same time, current discussions of artificial general intelligence and future education have emphasized that artificial intelligence may reshape learning practices faster than traditional instructional systems can adapt¹⁸. These conditions make it necessary to use a method that records not only technology attitudes, but also emotional condition, self-regulation, academic background, and contextual variables. In the present workflow, these variables are treated as self-report survey indicators, not as directly observed behavioral traces or causal mechanisms.

A methodological difficulty is that learner profiles are rarely formed by one variable alone. Educational survey data often contain layered combinations of emotional, behavioral, academic, contextual, and technology-related indicators. Conventional regression models are useful for estimating net associations, but they are less intuitive when the goal is to display conditional branching, subgroup differentiation, and rule-based classification. Decision tree analysis provides a complementary approach because it recursively partitions cases into interpretable branches and makes the classification sequence visible through node structures¹⁹. Compared with more complex machine learning models, a single decision tree is easier to inspect, explain, and export as part of a reproducible workflow. Compared with standard linear modeling, it can show how combinations of learner indicators form practical classification paths in educational data²⁰. However, a single tree is also sensitive to class imbalance, partitioning decisions, pruning settings, and sample characteristics. For that reason, the decision tree in this protocol is used as an auxiliary interpretable classification step, not as a universal prediction engine or a causal model²¹.

This article presents a reproducible survey-scoring and C5.0 decision-tree workflow for classifying self-reported higher-order thinking in generative artificial intelligence-supported academic learning. The workflow is demonstrated with a single university survey dataset of 776 undergraduate students collected in China from March 7 to March 15, 2023. The outcome is a self-reported higher-order thinking questionnaire score that is dichotomized for auxiliary classification; it is not a direct measure of demonstrated cognitive performance. The generative artificial intelligence exposure in this demonstration refers to students'self-reported use of generative artificial intelligence tools for course-related academic activities during the preceding four weeks, including information search, concept explanation, idea generation, outline drafting, and assignment revision. Tool use was not verified through platform logs, prompt histories, or real-time activity traces.

The purpose of the protocol is not to claim that the retained tree generalizes to all institutions, cohorts, or current generative artificial intelligence environments. This caution is necessary because the demonstration data were collected at one university during a short early 2023 window, when student exposure to generative artificial intelligence was still developing quickly. Instead, the contribution of the article lies in making the full procedure repeatable: defining the survey context, administering the questionnaire, screening responses, scoring constructs, documenting reliability, applying a fixed coding rule, partitioning the dataset, training and pruning a C5.0 tree, exporting model outputs, and interpreting node-based classification paths with attention to class-specific performance. This method-focused workflow may help educational researchers, classroom-oriented investigators, and instructional analysts reproduce or adapt a transparent approach to survey-based learner profiling in generative artificial intelligence-supported learning.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study involving human participants was reviewed and approved by the Ethics Committee of the School of Educational Science, Yili Normal University (Reference No. LZEDU2023003). The study was conducted from March 7 to March 15, 2023. The committee determined that the study qualified for ethical exemption because it involved only anonymous, voluntary questionnaire procedures without any physical, psychological, or social harm to participants. All participants provided electronic informed consent before completing the questionnaire.

Before analysis, replace respondent records with non-reversible case numbers and remove all direct identifiers. Do not retain names, student identification numbers, telephone numbers, account names, class identifiers, Internet Protocol addresses, device identifiers, or raw platform account markers in the analytic file. Use duplicate control information only for response screening, then remove it before locking the analytic dataset.

1. Establish the survey setting and sampling procedure

Define the study setting and eligibility criteria
1. Conduct the survey at Liuzhou Polytechnic University, Guangxi Zhuang Autonomous Region, China, from March 7 to March 15, 2023.
2. Define the target population as currently enrolled undergraduate students at the participating university.
3. Include students who report using at least one generative artificial intelligence tool for course-related academic learning during the preceding 4 weeks.
4. Define eligible academic use as information search, concept explanation, idea generation, outline drafting, assignment revision, translation support, code-related assistance, or other course-related tasks.
5. Exclude students who report only entertainment use, casual chatting, non-study experimentation, or other non-course-related use.
6. Exclude respondents who are not current students, do not report academic use of generative artificial intelligence, decline electronic informed consent, or submit invalid questionnaires according to the screening rules below.
7. Record generative artificial intelligence exposure through self-report items covering tool category or tool name, approximate use frequency during the preceding 4 weeks, and academic tasks supported by the tool. Do not treat these exposure data as verified platform use records.
Construct the sampling frame and recruit participants
1. Obtain the institutional undergraduate sampling frame from the participating university. The frame covers more than 15,000 students across 46 majors.
2. Apply stratified random sampling by major category and year level to draw the invitation list, so that students from different disciplinary areas and study stages are represented in the invited sample.
3. Apply stratified random sampling by major category and year level to draw the invitation list.
4. Invite 800 selected students using a standardized recruitment notice.
5. Distribute the online questionnaire platform questionnaire quick response code only to invited students during non-instructional periods, such as class breaks or student-advising sessions.
6. Record the full recruitment and screening flow. In this workflow, 800 students were invited and 792 questionnaires were returned. Sixteen questionnaires were excluded: 4 for ineligible generative artificial intelligence use, 3 for duplicate submission, 5 for completion time below the minimum threshold, and 4 for invalid straight-line response patterns. The final analytic sample contained 776 valid questionnaires.

2. Configure questionnaire administration and data privacy procedures

Configure the online questionnaire
1. Build and distribute the questionnaire using online questionnaire platform, also known as Questionnaire Star.
2. Enable quick response code distribution, mandatory scale-item responses, submission time export, duplicate control checking, and spreadsheet export.
3. Lock the respondent-facing questionnaire before collecting the first valid response. Keep item wording, item order, response anchors, instructions, scale labels, consent wording, and access route unchanged throughout the survey period.
4. Set all scale items as required items. Do not use post hoc imputation for scale responses.
5. Enable one-submission control through the least identifiable platform option available. Use account-level, device-level, or platform-generated markers only for duplicate screening.
Standardize completion conditions
1. Ask respondents to complete the questionnaire individually, in one sitting, and without peer discussion.
2. Instruct respondents to answer according to their actual academic use of generative artificial intelligence during the preceding 4 weeks.
3. Ask respondents to complete the questionnaire in a quiet setting and avoid device switching, extended pauses, or repeated access after submission.
4. Set the expected completion time window at 530 min. Use completion time only as a response quality indicator.

3. Screen responses and lock the analytic dataset

Export the raw-response file
1. Export the raw questionnaire file from online questionnaire platform after the survey window closes.
2. Preserve the raw export before exclusion, scoring, coding, or model construction.
3. Assign non-identifiable case numbers to all returned records before screening. In this workflow, assign case numbers to the 792 returned questionnaires.
Apply response quality exclusion rules
1. Exclude records without electronic informed consent.
2. Exclude records that do not meet current student status or academic eligibility.
3. Exclude records with missing non-demographic scale responses after export. In this workflow, mandatory response settings prevented missing scale responses.
4. Use an average completion-time threshold of 2 s per closed-ended item to identify records that are unlikely to reflect item-level reading and response processing. Exclude records below this threshold.
5. Inspect records completed in less than 5 min or more than 30 min as potential low-quality or interrupted responses. Retain records outside this time window only when duplicate checking and response-pattern inspection do not indicate invalid completion.
6. Use straight-line responding as an additional response-quality indicator. Exclude records in which the same response option is selected for at least 90% of Likert-scale items, as this pattern suggests limited engagement with item content.
7. Exclude duplicate submissions based on platform duplicate markers and retain the earliest complete valid record.
8. Preserve the respondent-facing questionnaire as Supplementary File 1.
9. Preserve the screening log as Supplementary File 2. The screening log should report invited students, returned questionnaires, excluded questionnaires, exclusion reasons, and final retained cases.
Create the locked analytic dataset
1. Retain the 776 valid questionnaires after screening. Lock the respondent-facing questionnaire and the analytic dataset before scoring to ensure that item wording, eligibility criteria, exclusion decisions, and variable definitions are not changed after data collection.
2. Remove direct identifiers, raw timestamps, account-level markers, device-level traces, and duplicate control markers before analysis. Use this de-identification step to protect participant anonymity and prevent the analytic dataset from being linked back to individual students.
3. Save the de-identified 776 case dataset as the locked analytic dataset.
4. Use the locked analytic dataset as the only source for scoring, coding, and classification.

4. Measure learner-related variables

Organize the questionnaire
1. Divide the questionnaire into four sections: informed consent, demographic and academic background, generative artificial intelligence exposure, and learner-related scales.
2. Record gender, age, major, year level, and self-reported academic performance.
3. Record generative artificial intelligence exposure by tool category, approximate use frequency, and supported academic task.
4. Measure self-reported higher-order thinking, generative artificial intelligence anxiety, trust in generative artificial intelligence, negative emotions, attitudes toward generative artificial intelligence, problematic smartphone use, academic procrastination, and parental upbringing.
5. Treat all scale-based variables as self-report survey indicators, not direct behavioral observations or directly observed cognitive performance.
Define retained measures
1. Measure self-reported higher-order thinking with a retained 23-item instrument covering problem solving, critical thinking, teamwork, communication, and innovation²².
2. Measure generative artificial intelligence anxiety with an 11-item adapted technology anxiety scale²³.
3. Measure trust in generative artificial intelligence with a 12-item adapted trust in automation scale²⁴.
4. Measure negative emotions with an adapted Depression Anxiety Stress Scales21-item set used as a general negative emotion indicator²⁵. Do not interpret this score as a clinical diagnosis.
5. Measure attitudes toward generative artificial intelligence with an 18-item adapted technology-attitude scale²⁶.
6. Measure problematic smartphone use with the Smartphone Addiction Scale Short Version adapted for the student survey context²⁷.
7. Measure academic procrastination with a 16-item academic procrastination scale²⁸.
8. Measure academic performance with a 5-point self-report item: 1 = far below average, 2 = below average, 3 = average, 4 = above average, and 5 = excellent.
9. Measure parental upbringing with a 5-point self-report contextual item: 1 = very unsupportive, 2 = unsupportive, 3 = neutral or mixed, 4 = supportive, and 5 = highly supportive.
Specify response anchors and adaptation procedures
1. Use a 1–5 Likert-type response format for retained scale variables.
2. Use the general anchors: 1 = strongly disagree, 2 = disagree, 3 = neutral or uncertain, 4 = agree, and 5 = strongly agree.
3. Administer the questionnaire in Chinese to ensure that respondents complete all scale items in the instructional and language context used at the participating university.
4. Adapt source items through forward translation, expert review, wording adjustment for generative artificial intelligence-supported learning, and pilot readability checking to preserve the intended construct meaning while improving contextual clarity for Chinese undergraduate respondents.
5. Use expert review to check whether the adapted items remain consistent with the original constructs, response anchors, scoring direction, and study context. Revise only wording that may cause ambiguity, context mismatch, or misunderstanding in the generative artificial intelligence-supported learning setting.
6. Include item wording, response anchors, item order, scoring direction, and final variable labels in Supplementary File 1.

5. Score measures and verify measurement quality

Prepare the scored working file
1. Open the locked analytic dataset in IBM SPSS Statistics 26.0.
2. Preserve one respondent per row and one item or variable per column.
3. Check missing values, response ranges, item labels, and variable names.
4. Save a screened item-level dataset before calculating composite scores.
Compute composite scores
1. Calculate one respondent-level mean composite score for each multiitem construct.
2. Reverse-score only items identified as reverse-keyed in the scoring key before composite score calculation.
3. Do not reverse score the DASS21-derived negative emotion items in this workflow because all retained items are scored in the same direction, with higher values indicating stronger negative emotion experience.
4. Do not compute composite scores from partially transformed scales.
5. Save a continuous score working file after composite score calculation.
Verify internal measurement quality
1. Calculate Cronbach’s alpha for each retained multi-item scale in IBM SPSS Statistics 26.0. Use Cronbach’s alpha as an internal-consistency index that indicates whether items within the same scale show sufficient coherence for composite-score calculation.
2. Inspect corrected item-total correlations and alpha-if-item-deleted values for each retained scale. Use corrected item-total correlations to check whether each item is sufficiently aligned with the total score of its own scale after excluding that item from the total calculation.
3. Use alpha-if-item-deleted values to identify items that may reduce internal consistency if retained. Do not remove items automatically on statistical grounds alone; retain or remove an item only after checking its construct meaning, scoring direction, and consistency with the adapted questionnaire.
4. Conduct Kaiser-Meyer-Olkin and Bartlett’s test checks for the adapted scale set.
5. Report key Cronbach’s alpha values in the main text.
6. Provide the complete scoring and measurement record as Supplementary Table 1. Include item count, response range, response anchors, scoring direction, reverse-keyed items, composite-score rule, Cronbach’s alpha, corrected item-total correlation checking result, alpha-if-item-deleted checking result, and final variable label.

6. Convert continuous scores into auxiliary classification inputs

Apply binary coding
1. Use the continuous score working file as the source file.
2. Recode self-reported higher-order thinking as the binary target variable.
3. Code scores ≤3 as 0 and scores >3 as 1.
4. Apply the same threshold to all retained predictors after composite score calculation.
5. Use the threshold of 3 as the midpoint of the 1–5 response scale to create a transparent and reproducible coding rule.
6. Perform dichotomization only to create binary inputs for the C5.0 decision-tree demonstration.
7. Check the class counts after dichotomization.
8. Preserve the continuous score working file.
Create the coded analysis file
1. Generate a new binary-coded analysis file.
2. Recode generative artificial intelligence anxiety, trust in generative artificial intelligence, problematic smartphone use, academic procrastination, academic performance, parental upbringing, negative emotions, and attitudes toward generative artificial intelligence using the same threshold.
3. Verify class counts for each binary variable.
4. Report the coding rule and class counts in Table 1.
5. Preserve the binary-coded analysis file.
6. Provide the coded analysis file structure as Supplementary File 3.

7. Build the auxiliary C5.0 decision tree classifier

Configure the modeling environment
1. Open IBM SPSS Modeler 18.4 on a Windows 10 64bit operating system.
2. Import the binary-coded analysis file.
3. Assign binary self-reported higher-order thinking as the target field.
4. Assign all other retained binary variables as input fields.
5. Verify field names, value labels, and target/input roles.
6. Record the software name, version, operating system, C5.0 implementation, project file name, and analysis date in Supplementary Table 2.
Partition the dataset
1. Partition the 776 cases into a 70:30 training-testing split.
2. Use the fixed random seed 20230307.
3. Apply stratified partitioning by the binary higher-order thinking target.
4. Assign 544 cases to the training subset and 232 cases to the testing subset.
5. Verify low-HOT and high-HOT counts in the full dataset, training subset, and testing subset.
6. Preserve the partition record in Supplementary Table 2.
Train and prune the C5.0 tree
1. Train one C5.0 classifier on the training subset.
2. Use gain ratio as the split selection rule²⁹.
3. Define entropy as:
  Entropy(D) = −Σp_ilog₂(p_i)
  where p_i is the proportion of cases in class i within dataset D.
4. Define gain ratio as:
  Gain ratio (D, C) = [Entropy(D) − Entropy(D|C)]/SplitInfo (D, C)
  where C is the candidate splitting variable.
5. Enable pruning to reduce overly specific branches that may fit training-subset noise.
6. Use the following C5.0 settings: pruning = enabled; boosting = disabled; number of trials = 1; misclassification costs = not applied; class weights = not applied; minimum records per child branch = 2; global pruning = enabled; subtree raising = enabled; confidence factor = 0.25; stopping rules = IBM SPSS Modeler C5.0 default settings unless otherwise stated.
7. Do not apply synthetic oversampling, class weighting, or cost-sensitive learning in the primary workflow.
8. Export the initial tree and the retained pruned tree.
9. Save the final node structure, terminal node summaries, variable importance output, classification rules, and model settings.
Document class imbalance
1. Calculate the low-HOT and high-HOT distribution before model training.
2. Record the no-information rate as the majority class proportion in the full dataset and testing subset.
3. Interpret the tree as an auxiliary interpretable classification output rather than as an optimized screening model.
4. For future screening-oriented applications, consider class weights, cost-sensitive learning, resampling, repeated train-test partitions, k-fold cross-validation, or threshold adjustment.

8. Evaluate and preserve the classification output

Export performance outputs
1. Apply the retained pruned tree to the training subset.
2. Export the training confusion matrix, accuracy, class-specific recall, class-specific precision, and terminal node output.
3. Apply the same tree to the testing subset.
4. Export the testing confusion matrix, accuracy, class-specific recall, class-specific precision, and terminal node output.
5. Calculate recall as TP/(TP + FN).
6. Calculate precision as TP/(TP + FP).
7. Calculate overall accuracy as correctly classified cases divided by total cases.
8. Calculate the no-information rate and compare testing accuracy with the majority class baseline.
9. Report class-specific metrics together with overall accuracy.
Export interpretability outputs
1. Export the variable-importance table with numeric values.
2. Export the final tree diagram with node sample size, predicted class, class distribution, and class probability.
3. If the full tree diagram is visually crowded, provide a simplified main figure and place the complete node table in Supplementary File 4.
4. Export terminal-node summaries including rule path, node sample size, predicted class, low-HOT count, high-HOT count, and classification confidence.
5. Describe one or two representative classification pathways in the Representative Results.
Preserve the reproducibility package
1. Preserve the raw-response file, de-identification record, screening log, screened item-level dataset, continuous score working file, binary-coded analysis file, sampling flow table, scoring key, reliability table, C5.0 project file, initial tree file, pruned tree file, variable importance output, terminal node table, confusion matrices, and exported model summaries.
2. Provide Supplementary Table 1 as the scoring and measurement record.
3. Provide Supplementary Table 2 as the model setting and output record.
4. Provide Supplementary File 1 as the respondent-facing questionnaire.
5. Provide Supplementary File 2 as the sample screening log.
6. Provide Supplementary File 3 as the coded analysis file structure.
7. Provide Supplementary File 4 as the terminal-node and variable-importance output.
8. Preserve the workflow path as: raw questionnaire export > de-identification and screening log > screened item-level dataset > continuous score working file > binary coded analysis file > training/testing partition > initial C5.0 tree > pruned C5.0 tree > exported classification outputs > supplementary reproducibility package.

Results

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Descriptive statistics of retained variables

Table 2 reports the descriptive statistics of the retained self-report survey variables in the continuous-score dataset before binary recoding. The mean self-reported higher-order thinking score was 3.71 (SD = 0.68), which was above the fixed midpoint threshold of 3 on the 1–5 response scale. Trust in generative artificial intelligence also exceeded this threshold, with a mean of 3.54 (SD = 0.60), while academic performance was close to the threshold, with a mean of 3.07 (SD = 1.12). The remaining variables showed mean scores of 3.48 for generative artificial intelligence anxiety, 2.72 for problematic smartphone use, 2.59 for academic procrastination, 2.99 for parental upbringing, 2.12 for negative emotions, and 3.26 for attitudes toward generative artificial intelligence, with standard deviations of 0.75, 0.74, 0.64, 0.43, 0.75, and 0.57, respectively.

These descriptive statistics indicate that the retained variables did not have identical distributions before binary recoding. Self-reported higher-order thinking, trust in generative artificial intelligence, generative artificial intelligence anxiety, academic performance, and attitudes toward generative artificial intelligence were at or above the midpoint threshold, whereas problematic smartphone use, academic procrastination, parental upbringing, and negative emotions were below or close to the threshold. This distributional pattern is important because the same midpoint coding rule was later applied to all retained variables.

Binary coding and class distribution

Table 1 summarizes the binary coding rule and class counts used in the auxiliary classification workflow. Scores less than or equal to 3 were coded as 0, and scores greater than 3 were coded as 1. After this coding rule was applied to the self-reported higher-order thinking score, 118 students were classified as Low-HOT and 658 students were classified as High-HOT. Thus, the full dataset was imbalanced, with Low-HOT representing 15.21% of cases and High-HOT representing 84.79% of cases.

This imbalance provides important context for interpreting the decision tree output. Because the majority class accounted for 84.79% of the full dataset, overall accuracy alone could overstate practical classification performance. The retained tree was therefore evaluated with class-specific recall and precision in addition to overall accuracy.

Representative decision tree output

In the auxiliary C5.0 classification step, eight respondent-level variables were retained in the pruned tree used to classify binary self-reported higher-order thinking. In descending order of model-reported importance, these variables were generative artificial intelligence anxiety, trust in generative artificial intelligence, problematic smartphone use, academic procrastination, academic performance, parental upbringing, negative emotions, and attitudes toward generative artificial intelligence. Figure 1 presents the retained pruned tree generated from the training subset.

The tree should be interpreted as an auxiliary organization of learner profile patterns rather than as a causal model. The position of a variable in the tree indicates its role in the retained classification structure under the specified coding rule, partitioning procedure, and C5.0 settings. It does not show that the variable directly causes higher or lower higher-order thinking.

To improve interpretability, the complete terminal-node output is provided in Supplementary File 4. This file reports the rule path, node sample size, predicted class, Low-HOT count, High-HOT count, and classification confidence for each terminal node. The node-level output is especially important for interpreting branches that may appear counter-intuitive in the simplified tree diagram.

Classification performance of the retained model

Table 3 presents the confusion matrices for the training and testing subsets. In the training subset (n = 544), the retained model correctly classified 37 Low-HOT cases and 450 High-HOT cases. Forty-eight Low-HOT cases were classified as High-HOT, and 9 High-HOT cases were classified as Low-HOT. In the testing subset (n = 232), the model correctly classified 10 Low-HOT cases and 190 High-HOT cases. Twenty-three Low-HOT cases were classified as High-HOT, and nine High-HOT cases were classified as Low-HOT.

As shown in Table 4, the training accuracy was 89.52% (487/544), and the testing accuracy was 86.21% (200/232). However, the no-information rate in the testing subset was 85.78% because 199 of the 232 testing cases belonged to the High-HOT class. The testing accuracy therefore only modestly exceeded the majority class baseline.

Table 5 reports the corrected class-specific recall and precision values in the testing subset. Low-HOT recall was 30.30% (10/33), and Low-HOT precision was 52.63% (10/19). High-HOT recall was 95.48% (190/199), and High-HOT precision was 89.20% (190/213). This pattern shows that the retained classifier recovered the High-HOT class much more effectively than the Low-HOT class. The weak Low-HOT recall means that most students in the Low-HOT class were not identified by the retained tree. For this reason, the model should not be interpreted as a validated screening tool for detecting students with lower self-reported higher-order thinking. Its main value in this article is to demonstrate a reproducible and interpretable survey-based classification workflow.

Decision tree diagram on AI anxiety and academic outcomes; analysis of various influencing factors.
Figure 1: Auxiliary C5.0 decision tree classification of self-reported higher-order thinking. Retained pruned C5.0 decision tree generated from the training subset to classify binary self-reported higher-order thinking. The model used generative artificial intelligence anxiety, trust in generative artificial intelligence, problematic smartphone use, academic procrastination, academic performance, parental upbringing, negative emotions, and attitudes toward generative artificial intelligence as input variables. The tree was trained in IBM SPSS Modeler 18.4 using the fixed 70:30 stratified split. Node-level class distribution and prediction information are shown in the figure or provided in Supplementary File 4. Please click here to view a larger version of this figure.

Variable	Coding	Count	Percentage
Self-reported higher-order thinking	0 = Low-HOT	118	15.21%
Self-reported higher-order thinking	1 = High-HOT	658	84.79%
Generative artificial intelligence anxiety	0 = Low	252	32.47%
Generative artificial intelligence anxiety	1 = High	524	67.53%
Trust in generative artificial intelligence	0 = Low	217	27.96%
Trust in generative artificial intelligence	1 = High	559	72.04%
Problematic smartphone use	0 = Low	591	76.16%
Problematic smartphone use	1 = High	185	23.84%
Academic procrastination	0 = Low	629	81.06%
Academic procrastination	1 = High	147	18.94%
Academic performance	0 = Low	478	61.60%
Academic performance	1 = High	298	38.40%
Parental upbringing	0 = Low	483	62.24%
Parental upbringing	1 = High	293	37.76%
Negative emotions	0 = Low	694	89.43%
Negative emotions	1 = High	82	10.57%
Attitudes toward generative artificial intelligence	0 = Low	341	43.94%
Attitudes toward generative artificial intelligence	1 = High	435	56.06%

Table 1: Binary coding rule and class distribution. Binary coding scheme and class counts for the target and input variables used in the auxiliary classification workflow. Scores ≤3 on the 1–5 response scale were coded as 0, and scores >3 were coded as 1. For the target variable, 0 indicates Low-HOT and 1 indicates High-HOT. For input variables, 0 indicates a low or neutral-or-lower level, and 1 indicates a high or agreement-level response.

Variable	Response range	Mean	SD	Binary coding threshold
Self-reported higher-order thinking	1–5	3.71	0.68	3
Generative artificial intelligence anxiety	1–5	3.48	0.75	3
Trust in generative artificial intelligence	1–5	3.54	0.6	3
Problematic smartphone use	1–5	2.72	0.74	3
Academic procrastination	1–5	2.59	0.64	3
Academic performance	1–5	3.07	1.12	3
Parental upbringing	1–5	2.99	0.43	3
Negative emotions	1–5	2.12	0.75	3
Attitudes toward generative artificial intelligence	1–5	3.26	0.57	3

Table 2: Descriptive statistics before binary recoding. Descriptive statistics of the retained self-report survey variables in the continuous score dataset. Means, standard deviations, response ranges, and the fixed midpoint threshold are reported before binary recoding. The response range column indicates that retained variables were analyzed on a 1–5 scale before application of the midpoint coding threshold.

Dataset	Actual class	Predicted Low-HOT	Predicted High-HOT	Total actual cases
Training subset	Low-HOT	37	48	85
Training subset	High-HOT	9	450	459
Testing subset	Low-HOT	10	23	33
Testing subset	High-HOT	9	190	199

Table 3: Confusion matrices for the retained C5.0 classifier. Confusion matrices for the retained pruned C5.0 tree in the training and testing subsets. The table compares actual and predicted class membership for Low-HOT and High-HOT cases. The training subset contained 544 cases, and the testing subset contained 232 cases.

Dataset	Result / metric	Count	Percentage
Training subset	Correct classifications	487	89.52%
Training subset	Incorrect classifications	57	10.48%
Training subset	Total cases	544	100.00%
Training subset	No-information rate	459 / 544	84.38%
Testing subset	Correct classifications	200	86.21%
Testing subset	Incorrect classifications	32	13.79%
Testing subset	Total cases	232	100.00%
Testing subset	No-information rate	199 / 232	85.78%

Table 4: Overall accuracy and majority class baseline. Overall accuracy of the retained C5.0 tree in the training and testing subsets. Correct classifications, incorrect classifications, total cases, and no-information rate are reported to contextualize accuracy under class imbalance. The no-information rate is the majority-class proportion in each subset. It is reported because the target variable was imbalanced, with High-HOT representing the majority class.

Class	True positives	False negatives	False positives	Recall	Precision
Low-HOT	10	23	9	30.30%	52.63%
High-HOT	190	9	23	95.48%	89.20%

Table 5: Corrected class-specific recall and precision in the testing subset. Corrected recall and precision values for Low-HOT and High-HOT cases in the testing subset. Recall was calculated as TP/(TP + FN), and precision was calculated as TP/(TP + FP). The corrected values show that the retained classifier recovered the High-HOT class much more effectively than the Low-HOT class in the testing subset.

Supplementary Table 1: Scoring and measurement record. Scoring and measurement record for the retained self-report survey variables. The table includes item count, response range, response anchors, scoring direction, reverse-keyed items, composite-score rule, Cronbach’s alpha, item-total checking result, and final variable label. Composite scores were calculated at the respondent level before binary recoding. The DASS-21-derived negative-emotion items were not reverse-scored in this workflow because all retained items were scored in the same direction, with higher values indicating stronger negative-emotion experience. Cronbach’s alpha was reported for retained multi-item scales only.Please click here to download this file.

Supplementary Table 2: C5.0 model settings and exported outputs. Model setting and output record for the auxiliary C5.0 decision-tree workflow. The table includes the target variable, input variables, coding threshold, split rule, random seed, software version, pruning settings, class-weighting setting, misclassification-cost setting, and exported model outputs. All settings were fixed before model evaluation and preserved unchanged across the auxiliary classification workflow.Please click here to download this file.

Supplementary File 1: Respondent-facing questionnaire. Full questionnaire used for survey administration, including consent text, background items, generative artificial intelligence exposure items, scale item wording, response anchors, item order, scoring direction, and final variable labels.Please click here to download this file.

Supplementary File 2: Sample-screening log. Screening log documenting the transition from invited students to the final analytic sample, including returned questionnaires, exclusion categories, and retained valid cases.Please click here to download this file.

Supplementary File 3: Binary-coded analysis-file structure. Coding structure for the binary analysis file used in C5.0 classification. The file documents source variables, binary variable labels, coding rules, value labels, class counts, and class percentages.Please click here to download this file.

Supplementary File 4: Terminal-node and variable-importance output. Terminal node and variable importance output from the retained pruned C5.0 tree, including rule paths, node sample sizes, predicted classes, class distributions, class probabilities, and model-reported importance values.Please click here to download this file.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol presents a reproducible survey-scoring and C5.0 decision-tree workflow for organizing self-report learner data in generative artificial intelligence-supported academic learning. Its main contribution is methodological rather than predictive. The workflow specifies how to define the survey context, screen responses, score multi-item constructs, apply a fixed midpoint coding rule, partition the dataset, train and prune a C5.0 tree, export node-level outputs, and evaluate class-specific performance.C5.0 was selected because it produces an interpretable rule-based tree, supports gain-ratio-based splitting and pruning, and allows researchers to inspect how combinations of learner-related variables lead to terminal classification paths. Compared with less transparent classification approaches, this structure fits the purpose of a methods article that emphasizes reproducibility, auditability, and learner-profile interpretation rather than maximum predictive optimization²⁹. This distinction is important in generative artificial intelligence education research, where student learning is shaped not only by tool access but also by cognitive engagement, technology-related perception, and academic self-regulation. The protocol therefore preserves the full path from questionnaire administration to model output interpretation, rather than treating the final tree as the sole research product.

Several steps are critical for successful implementation. The first is the definition of generative artificial intelligence exposure before questionnaire distribution. In this protocol, exposure refers to self-reported course-related use during the preceding 4 weeks, including information search, concept explanation, idea generation, outline drafting, assignment revision, translation support, and code-related assistance. Because students may use artificial intelligence tools for very different learning tasks, recording tool category, use frequency, and supported academic activity are necessary before interpreting learner-profile outputs. These exposure data remain self-reported and should not be treated as verified platform logs.

A second critical step is the construction and preservation of the scoring record. The workflow uses multiple adapted self-report measures, including self-reported higher-order thinking, generative artificial intelligence anxiety, trust in generative artificial intelligence, negative emotions, attitudes toward generative artificial intelligence, problematic smartphone use, academic procrastination, academic performance, and parental upbringing. The adapted scales must be administered with stable item wording, response anchors, scoring direction, and variable labels. This is especially important because higher-order thinking in this protocol is a questionnaire-based score, not a directly observed measure of cognitive performance. Prior work on technology-supported higher-order thinking shows that learner-related constructs can be useful only when their measurement basis is clearly specified³⁰.

The third critical step is the binary coding rule. This workflow applies a fixed threshold of 3 on the 1–5 response scale, coding scores ≤3 as 0 and scores >3 as 1. The rule is transparent and easy to reproduce, but it compresses continuous variation into two categories and may amplify imbalance when most respondents cluster above or below the midpoint. In this dataset, the higher-order thinking target was imbalanced: 15.21% of cases were classified as Low-HOT and 84.79% as High-HOT. Because educational data mining workflows often use classification to make learner patterns visible³¹, class distribution and baseline performance must be reported alongside accuracy.

The retained C5.0 tree produced training accuracy of 89.52% and testing accuracy of 86.21%. However, the testing no-information rate was 85.78%, because the High-HOT class dominated the testing subset. The accuracy therefore only modestly exceeded the majority class baseline. More importantly, the testing subset showed a strong asymmetry in class-specific performance: Low-HOT recall was 30.30%, whereas High-HOT recall was 95.48%. This means that the retained tree recovered the High-HOT class effectively but missed most Low-HOT cases. This uneven class-specific performance also reinforces the need to interpret learner-related variables cautiously, because higher-order thinking in technology-enhanced learning environments may be shaped by multiple learner factors rather than by a single classification branch³². In line with broader cautions about applying machine learning models in education³³, the retained tree should be interpreted as an auxiliary classification output, not as a validated screening tool for identifying students with lower self-reported higher-order thinking.

The variable importance results also require cautious interpretation. Generative artificial intelligence anxiety was ranked as the most important variable in the retained tree, followed by trust in generative artificial intelligence. This does not mean that anxiety causes higher-order thinking. A decision tree identifies splitting variables that improve classification under a specific coding rule, data partition, and pruning setting. In this context, generative artificial intelligence anxiety may reflect concern about correctness, dependence, authorship, or acceptable use, but it may also function simply as a discriminator separating different respondent profiles. Since artificial intelligence anxiety and acceptance can coexist in technology use contexts³⁴, the distinction between facilitating concern and debilitating anxiety should be presented as a possible interpretation rather than a confirmed mechanism.

The same caution applies to problematic smartphone use, academic procrastination, negative emotions, academic performance, and parental upbringing. These variables were retained by the tree, but their positions in the classification structure do not establish causal pathways. Problematic smartphone use and procrastination may indicate fragmented attention and weaker self-regulation³⁵, both of which are relevant to complex academic work. Negative emotions may signal affective strain during learning, but the retained score is a general self-report indicator rather than a clinical measure³⁶. Academic performance and parental upbringing are more distal background variables. Their retention may reflect broader academic adjustment or self-regulatory context, but they should not be overinterpreted as technology-specific predictors. Cognitive Load Theory helps explain why attention, emotion, and task regulation may matter in generative artificial intelligence-supported learning³⁷, since poorly managed information and distraction can interfere with meaningful processing.

Several troubleshooting points should be considered when applying this workflow. If the Low-HOT class is small, researchers should report the no-information rate and class-specific recall and precision before making claims about model usefulness. If the tree repeatedly predicts the majority class, alternative model settings such as class weights, cost-sensitive learning, resampling, repeated train-test partitions, or k-fold cross-validation should be considered. If a variable appears multiple times in a tree path, researchers should confirm whether the model used binary or continuous inputs and verify that coding was performed only after composite score calculation. Because a single decision tree can be unstable under resampling³⁸, especially when the target class is imbalanced, simplified figures should be supplemented with complete terminal node tables when node sizes or classification confidence are uneven.

This protocol has several limitations. First, the demonstration data were collected from one university in China between March 7 and March 15, 2023. Student familiarity with generative artificial intelligence tools has likely changed since that early adoption period, especially as tool availability, institutional policies, and classroom practices have developed. The single-institution setting also limits cross-cultural and institutional generalizability. Students’ use of generative artificial intelligence, attitudes toward technology, academic self-regulation, and interpretation of survey items may differ across countries, institutional types, disciplinary cultures, language contexts, and local artificial intelligence policies. For this reason, the retained learner-profile pattern should not be assumed to represent other universities, education systems, or cultural settings without local validation and, ideally, multisite replication. The workflow remains repeatable, but the learner-profile pattern from this dataset should not be assumed to represent current or multi-institutional student populations. Second, the study used a cross-sectional design, so the retained tree reflects classification patterns at one time point rather than changes over time. Third, all focal measures were self-reported and may be affected by recall bias, response style, and social desirability. Future applications could strengthen validity by combining survey data with prompt histories, learning management system traces, writing revision records, or task-based performance measures when ethical approval and privacy safeguards allow such linkage³⁹.

Fourth, the workflow uses dichotomization for transparency and reproducibility, but this choice reduces information from the original continuous scores. Alternative implementations may compare midpoint coding with median splits, tertile coding, ordinal models, or continuous-input tree settings. Fifth, the primary C5.0 workflow did not apply oversampling, class weighting, or misclassification costs. This decision preserved the original class distribution for demonstration, but it also contributed to weak Low-HOT sensitivity. Researchers who intend to use a similar workflow for intervention targeting should treat imbalance-aware modeling as a required extension rather than an optional addon. Sixth, adapted instruments require local validation beyond Cronbach’s alpha⁴⁰. Item total checks, response range verification, translation documentation, and supplementary scoring records are needed to support reproducibility in a new sample.

Despite these limitations, the workflow has practical applications. It can be used as a classroom-oriented research procedure for documenting learner profiles in generative artificial intelligence-supported learning, provided that users avoid treating the retained tree as a universal predictive model. It can also support multisite studies by giving researchers a fixed structure for questionnaire administration, scoring, coding, partitioning, and model output reporting. In intervention studies, the same workflow could be used before and after instruction to examine whether learner profile distributions change, although causal claims would require an appropriate longitudinal or experimental design. As educational research increasingly combines survey, trace, and model-based evidence in artificial intelligence-supported learning environments⁴¹, the value of this protocol lies in making each analytic decision auditable, repeatable, and open to revision.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This research was supported by the 2024 Autonomous Region Educational Reform Project "Research on the Sharing Mechanism of High-Quality Ideological and Political Education Resources in Curricula Between Eastern and Western Universities" (Project No.: XJGXJGZH2024011).

Materials

List of materials used in this article
Name	Company	Catalog Number	Comments
IBM SPSS Modeler	IBM Corp.	Version 18.4	Used for 70:30 stratified data partitioning, C5.0 decision-tree construction, pruning, classification evaluation, and tree-output export.
IBM SPSS Statistics	IBM Corp.	Version 26.0	Used for data checking, composite-score calculation, reliability analysis, frequency checking, and binary recoding.
Internet-connected respondent device	Respondent-owned device	Smartphone, tablet, or laptop with browser access	Used by respondents to complete the online questionnaire. No study-specific device was provided.
Microsoft Excel	Microsoft Corp.	Microsoft 365 or Excel 2019 or later	Used for screening-log review, coding verification, table formatting, and exported-output checking.
QR-code distribution function	Wenjuanxing / Questionnaire Star	Built-in questionnaire-link QR-code function	Used to provide standardized questionnaire access to invited students during the survey window.
Quiet survey completion setting	Participating university	Non-instructional classroom break or student-advising setting	Used to reduce response-context variation during one-sitting questionnaire completion.
Secure data-storage directory	Authors / participating institution	Access-controlled institutional storage	Used to preserve raw export, screened dataset, scored dataset, coded dataset, model files, logs, and exported outputs with version control.
Wenjuanxing / Questionnaire Star online survey platform	Changsha Ranxing Information Technology Co., Ltd.	Web-based platform; access through official Wenjuanxing website	Used for questionnaire construction, QR-code distribution, mandatory-response settings, duplicate-control checking, timestamp export, and spreadsheet export.
Windows operating system	Microsoft Corp.	Windows 10, 64-bit	Operating environment for IBM SPSS Statistics 26.0 and IBM SPSS Modeler 18.4.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

A Reproducible Survey-Scoring And C5.0 Decision-Tree Workflow For Classifying Self-Reported Higher-Order Thinking In Generative AI-Supported Learning

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

Reprints and Permissions

Tags

Related Articles