RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This study introduces an AI-based restaurant catering system that allows for contactless communication, customized meal suggestions, and satisfaction prediction. By utilizing NLP with LDA, Conv-RNN, and Conv-LSTM, it surpasses rule-based techniques with more accuracy, precision, recall, and reduced mistake rates, demonstrating AI's revolutionary potential in the food service industry.
The food industry has undergone a significant transformation in recent decades due to globalization, technological advancements, and evolving customer expectations. Artificial Intelligence (AI) and the Internet of Things (IoT) are now playing a critical role in enhancing food production, marketing, and service delivery. This study proposes an AI-driven intelligent system to improve restaurant catering services through contactless service using Natural Language Processing(NLP) and Linear Discriminant Analysis(LDA), personalized food recommendations through a Convolutional Recurrent Neural Network(Conv-RNN) model, and customer satisfaction prediction using an optimized Convolutional Long Short Term Memory(Conv-LSTM) model. Real-world experiments demonstrate that the proposed system outperforms traditional rule-based methods, achieving 91.5% accuracy, 91% precision, 91.1% recall, and an F1 score of 89.7% with Word2Vec-LDA; 98.5% accuracy with a loss of 0.02 in the Conv-RNN model; and an RMSE of 0.1011 with an R2 of 0.9812 in the Conv-LSTM system. These results highlight the transformative potential of AI in automating and enhancing customer service in the restaurant industry.
Adoption of AI has been a crucial part of digital technology growth for the last decade. It has given several industries, including the hospitality sector, both possibilities and challenges since its start1, and numerous AI-powered inventions have been developed that have the potential to improve people's quality of life and thereby enhance the economy. In the very competitive restaurant industry, maintaining top-notch food and customer service is essential to success. As technology advances and dining experiences shift, AI is becoming a game-changing tool to increase operational effectiveness and customer satisfaction. AI-powered monitoring systems are transforming restaurant operations2 to better manage their kitchens, keep an eye on food quality, and deliver top-notch customer service. Through the use of advanced algorithms and real-time data analytics, these technologies streamline operations and guarantee consistency, safety, and excellence in every aspect of the dining experience. It is now possible for restaurants to achieve a higher degree of precision for normal operating procedures2.
Overall financial success, adaptability to changing circumstances, and the ability to expand and change its offers to meet customer needs and expectations are all factors in the tourist and hospitality industry, and these factors frequently determine whether a business survives3. Therefore, the tourism and hospitality sector is using advanced technologies like AI and robotics (AIR) to enhance client service and experience. These technological advancements are being used as intelligent tools for customer care in order to improve client experience4. Furthermore, corporate performance may be enhanced by the rapid advancement of AI in hospitality management. An example of a data-intensive industry that collects vast amounts of data in various formats is the hotel sector.
Inefficiencies in operational management, growing consumer expectations for individualized services, growing labor shortages, and the requirement for precise demand forecasts are some of the ongoing issues facing the hospitality industry. Conventional approaches frequently fail to adequately handle these problems, which raises operating expenses and results in uneven service quality. By automating tedious processes, facilitating data-driven decision-making, boosting demand prediction, optimizing pricing and inventory management, and enhancing consumer personalization, artificial intelligence (AI) provides solutions. AI is becoming more and more positioned as a game-changing instrument for enhancing the hospitality sector's operational effectiveness and guest experience by filling in these gaps.
By restoring operational effectiveness and recognizing client events, AI is transforming the history of customer care in the restaurant industry2. AI skills like machine learning and extrapolative analytics are crucial for streamlining procedures like demand creation and inventory management. Better service consistency and lower operating costs are the results of these advancements5,6. Additionally, by evaluating consumer data to offer tailored menu suggestions and elevations, AI facilitates personalized interactions and promotes customer loyalty and happiness7. Intelligent systems play a key role in the travel and tourism sector, increasing competition by concentrating on sophistication8. Hotels are experimenting with cutting-edge technology, including digital strategies9 and robots enhanced with AI and the Internet of Things (IoT). Robots powered by AI and related technologies are becoming more and morecommon10. Through digital interfaces, voice assistants (VAs), which are artificial intelligence (AI) devices activated by voice commands, provide a degree of human-like intelligence11. VAs have difficulties despite their advancements, such as low user awareness, annoyance, and occasional opposition from both hotel employees and visitors12.
Delivery and takeout operations are growing in the restaurant industry as society dynamics change and more Americans choose convenient and time-saving options13. Nearly 60% of consumers in the US place an order for delivery or takeout at least once a week, and 78% of consumers actively use online food ordering platforms14. This trend is further reinforced by the data that the online food delivery market reached a confounding $220 billion by the end of 2023, accounting for 40% of restaurant sales, and that the market will reach $365 billion by the end of 203015. It is anticipated to increase to over USD 534.60 billion by 2028. With a predicted compound annual growth rate of approximately 9.5% through 2034, the global market was valued at approximately USD 288.87 billion in 2024 and is anticipated to reach USD 316.31 billion by the end of 202515. Furthermore, the number of direct online orders has increased by an astounding 54% between 2019 and 2021, indicating that customers are increasingly choosing digital ordering over traditional dine-in options. The growing significance of text mining and data-driven decision-making in service management, illuminating new management domains and important topics such as market intelligence and social media analysis16. This research provides a compelling argument for restaurants to use chatbots to develop operational effectiveness and customer engagement. Furthermore, the research by Proenca and Soukiazis17 highlights the possibility of data-driven approaches in marketing and customer relationship management initiatives, offering insightful information on how to use chatbots to maximize customer interactions and spur company expansion.
In order to increase efficiency and choose individualized, customer-focused service, the restaurant industry is implementing Robotics, Artificial Intelligence, and Service Automation (RAISA)18. The use of RAISA in restaurants has a number of implications that should be assessed, just like any other technology change19. By lowering errors and improving overall service quality, RAISA can provide dependable, standardized service20. Additionally, it can provide clients with a distinctive interactive experience. According to Kreishan18, service robots are now more sophisticated, independent, and adaptable. But they can also have an impact on the general ambiance of a restaurant as well as the social interactions between patrons, employees, and robots. It is critical to evaluate the aspects impacting technology acceptability because managers, employees, and consumers have varying perceptions of these elements, and their readiness to interact with RAISA is changing21.
Omni models' significant linguistic capabilities have been highlighted by their improved sentiment classification accuracy in hotel review analysis, which reached over 67% compared to 60.6% for BERT22. At the same time, recommender systems are progressively using Large Language Models (LLMs) to improve customization by managing profiles and analyzing user reviews. While hybrid systems integrate LLM outputs with graph neural network embeddings to improve recommendation accuracy in sparse data circumstances, the PURE framework uses LLMs to dynamically update user profiles based on reviews23. Meanwhile, in session-based situations, transformer-style recommendation architectures, such as Transformers4Rec, are beating conventional RNN models24.
There are still a number of unexplored or underexplored facets of intelligent catering, despite the quick developments in AI, automation, and smart technology. These challenges include managing energy efficiency and food waste through AI-driven systems, scaling personalization for a variety of customer preferences, integrating AI with traditional kitchen workflows, protecting customer privacy and security in customer profiling, and implementing intelligent catering solutions for small and medium-sized restaurants with limited funding. For intelligent catering solutions to be widely and sustainably adopted, these gaps must be filled. This creates an opportunity for more study and invention. Potential research holes in intelligent catering systems include the following:
Although AI systems are capable of making meal recommendations based on user choices, they frequently struggle to completely comprehend the subtleties of dietary requirements and personal preferences25. The majority of solutions are still somewhat basic and prone to mistakes like predicting mismatches; however, some intelligent catering systems employ AI to manage inventory or estimate demand. A lot of intelligent catering systems only have one way to interface, such as touch screens or voice recognition26. Many systems are still limited in their ability to fully comprehend or forecast individual preferences, even if AI agents can provide a certain level of customization based on consumer history or preferences. Basic inputs like demographic data or previous orders are frequently used by AI bots.
Recent advancements in machine learning have increasingly focused on modeling human behavior and contextual data across diverse domains. In their evaluation of data sources and methodologies for urban building occupancy profiles, Nejadshamsi et al.27 emphasized the value of heterogeneous data in capturing dynamic behavioral patterns. Building on this, Nejadshamsi et al.28 showed how well deep learning predicts spatial-temporal flows of human activity by putting forth a geographic-semantic context-aware commuting flow prediction model utilizing graph neural networks. Similarly, Nejadshamsi et al.29 highlighted the importance of contextual cues in improving predictive performance by creating a transportation-informed framework for urban-scale occupancy and energy estimation.
Improving customer experience and increasing operational efficiency have become crucial success elements in the quickly changing restaurant industry. Many restaurants still struggle to deliver smooth, individualized, and effective services despite the increasing use of technology because they lack integrated intelligent systems. Key issues, including enhancing contactless service, providing individualized meal recommendations, and precisely forecasting client happiness in real time, are frequently overlooked by current systems25. By creating an intelligent catering system that uses cutting-edge technologies to enhance restaurant operations, this project aims to close this gap. Three essential elements are integrated into the system:
Contactless Service using Latent Dirichlet Allocation (LDA) and Natural Language Processing (NLP): By leveraging LDA and NLP, effective contactless customer interactions are made possible, cutting down on waiting times and human error.
Conv-RNN-based Recommendation Systems for Food Suggestions: This enhances menu satisfaction by using a Convolutional Recurrent Neural Network (Conv-RNN) to produce dynamic, tailored food recommendations based on consumer preferences.
Predicting Customer Satisfaction with an Optimized Conv-LSTM Model: Restaurants can make data-driven improvements by using an improved Convolutional Long Short-Term Memory (Conv-LSTM) model to forecast customer happiness based on real-time data and feedback. In addition to offering contactless engagement and customized eating experiences in a more scalable and effective way, the suggested intelligent system seeks to improve customer satisfaction, streamline operations, and improve service delivery.
We contrast this work with a number of previous methods in order to place it within the present context of AI-driven food recommender systems. For instance, a transformer-based sequential conversational recommendation framework that uses self-attention processes to capture discussion dynamics was developed by Zou et al.30. In order to facilitate dialogue and visual-based recommendation tasks, Gambetti and Han31 developed AiGen-FoodReview, a multimodal dataset that consists of matched restaurant review texts and photos. Previously, MenuAI was created by Ju et al.32 and uses transformer models to make menu item recommendations straight from textual menu graphics. Interpretability and computational efficiency are occasionally compromised by these approaches, despite their excellent skills in processing context or multimodal information. Our integrated NLP-LDA + Conv-RNN + Conv-LSTM system, on the other hand, strikes a balance between explainability, lightweight deployment, and high prediction accuracy, which makes it particularly appropriate for catering situations with limited resources.
Examining how AI-powered technology may improve hospitality standards and expedite restaurant operations through Intelligent Catering Services (ICS) is the main objective of this study. Latent Dirichlet Allocation (LDA) and NLP are combined in the suggested method to efficiently handle client inquiries contactlessly. A Conv-RNN is used to produce meal recommendations in order to provide individualized experiences, and an optimized Conv-LSTM model is used to forecast consumer satisfaction levels. By integrating these elements, the created ICS shows practical use in raising overall customer satisfaction, guaranteeing quality, and increasing service efficiency. Using performance measures on food recommendation accuracy and satisfaction prediction, experimental results verify the model's efficacy.
The proposed intelligent catering system provides a contactless service for providing food suggestions and customer satisfaction predictions. A contactless intelligent catering system is mostly focused on automation and convenience, enabling customers to interact with the system in a touchless manner while performing specific tasks like food suggestion and satisfaction prediction. Whereas an AI agent goes beyond automation by using data-driven insights to make more accurate, adaptive, and personalized decisions, enhancing customer experience and improving operational efficiency across a wider range of functions. The goal of this study is to develop an AI-driven Intelligent Catering System (ICS) that integrates NLP-LDA for contactless interactions, Conv-RNN for personalized food recommendations, and Conv-LSTM for predicting customer satisfaction. This system is useful for restaurants because it enhances operational efficiency, reduces costs, delivers consistent service, and improves customer engagement through personalization and real-time feedback.
This study was conducted in accordance with the guidelines of the Research Ethics Committee of The National University of Malaysia (UKM) and approved under approval number UKM FST/2025-AI/023. Written informed consent was obtained from all participants prior to the collection of chatbot queries. All data were anonymized to ensure participant confidentiality and privacy
Study overview
The overview of the proposed intelligent catering system assisted with AI technologies is shown in Figure 1. As illustrated, the customer input is preprocessed with the NLP techniques such as word embeddings, lemmatization, and tokenization to extract the tags. Then, the ML model called LDA has been applied to modelling customer tags to provide contactless service to them. The food suggestion is carried out using a Conv-RNN model. Based on the flow sequence recorded from the previous customer's choices, the food is suggested to the customer intelligently. Finally, the customer satisfaction level is predicted by using an optimized Conv-LSTM model for further improvement in the services of the restaurant. The performance of the proposed AI models is evaluated under the various evaluation metrics.

Figure 1: Proposed system model for intelligent catering services (ICS). The architecture integrates user interaction, data preprocessing, intent detection, food recommendation, and feedback mechanisms. Please click here to view a larger version of this figure.
Dataset used
To create the intelligent catering system, we gathered 283 requests from a nearby restaurant using a chatbot. These questions, which included a broad variety of client inquiries from menu details to operating hours, were manually divided into 15 different intent groups. This guarantees thorough coverage of all possible user interactions with the system. Salutations, goodbyes, appreciation, catering, hours, setting, contact information, questions about payments, today's menu, delivery alternatives, menu questions, ordering processes, special deals, bookings, beverage options, and accessibility for outdoor sitting are just a few of the specific aspects of customer inquiries that the intent classes were made to record. Table 1 shows the frequency of questions in each of the purpose groups and the classification according to the queries' thematic substance. For example, the category Contact Information had the highest queries, suggesting that patrons are very interested in finding out how to get in touch with the eatery. On the other hand, the Seating and Beverage categories got the fewest inquiries, which may indicate that there is less consumer interest in these subjects or that there is already more clarification on them.
Contactless service using NLP with LDA
The user input that starts the framework's activity is initially preprocessed to standardize the text and eliminate noise. Tokenization, stop word elimination, and lemmatization are examples of this preparation, which gets the data ready for additional analysis. Following preliminary processing, user inquiries are converted into numerical representations. Semantic linkages and contextual relevance are among the linguistic features that are captured by these representations. Several machine learning classifiers were used to train these vector representations of user queries for the classification of 16 pre-defined classes (intents/tags). The model fetches the predetermined related response and returns it to the user after accurately predicting the user's query intents.
Preprocessing
Our analysis's dependability is greatly enhanced by the preprocessed queries, which verify that the incoming data is formatted consistently and relevant for the classification procedure that follows. The intent patterns (e.g., greetings such as hi, hello, hey) were collected and preprocessed by converting text to lowercase, removing stop words, and applying tokenization using the NLTK toolkit33. Table 2 shows the preprocessing steps followed by this study.
LDA-based tag modeling
The system classifies the tags from the user using Support Vector Regression (SVR) so that the user's greetings are recognized by the system. In order to improve the system capacity to anticipate user intent, we thoroughly examined both conventional and cutting-edge text processing methods in addition to Machine Learning (ML) and Deep Learning (DL) models. Building a highly accurate algorithm that could comprehend a broad range of user questions was our aim. This study employed the fundamental strategies of Bag of Words (BoW) and TF-IDF because of their ease of use and potency in emphasizing word frequency and the importance of words in the text. Glove and Word2Vec's capacity to produce word embeddings according to word usage together allowed them to deliver knowledge of word meanings and context22.
Once the data is prepared, the intents queries are classified using the latent Dirichlet allocation (LDA) method. The goal is to use text mining with the LDA method to analyze the connections among terms and identify patterns in their structures34,35. Unsupervised and probabilistic in nature, LDA makes the assumption that every document in a corpus is composed of a predetermined number of manually defined themes. Every document in LDA has equal weight and has a bag of words. Each document's words are presumed to be unordered. A topic is also described using a probability mass function of words. Every document uses a probability mass function to choose themes. For intent classification, Latent Dirichlet Allocation (LDA) was applied using the GensimPython library34.
In LDA, the intents are viewed as the distribution over the latent that is denoted by the LDA distribution air called
. A pattern is selected based on the intent distribution, which is denoted as θ(multinomial) which defines the given intent, I's probability belongs to the given class C. A Dirichlet distribution is related to β which encodes the pattern into a Bag of words. Given the α and β, it is defined as the multivariate distribution with N words related to M patterns with z intents. The group of N terms is denoted as W, which is given as36:
(1)
By integrating over θ, the summation is declared as Z, and the product is taken of the probabilities of the marginal of the individual intents, and the entire intent probability is computed as,
(2)
The patterns, including the intent variations for greetings, are hi, hello, Hey, Good morning/noon/eve, and hola. While receiving these patterns, the system finds the user intent as greetings and responds to them consequently with the defined phrase, such as What can I help you? or How can I help you? If the query is not recognized by the system with the predefined 15 classes, then the system provides restaurant information and suggests that the user communicate with customer service. The outcome of the LDA-based tag modeling of user queries related to greetings is shown in Figure 2. Similarly, the responses are carried out for all 15 classes related to the user intents (Queries). The model LDA has the following parameters. Number of topics (k) is declared as 40, Alpha is fixed as 0.05, Beta value is 0.04, and the number of iterations is declared as 100.
Food suggestion using Bi-NN for ICS
In the Intelligent Catering System (ICS), food items are systematically categorized to support accurate recommendations. Each menu entry may represent either a single item (e.g., burger, snack, drink) or a combination of items such as a meal set (e.g., fried chicken with a cold drink). These items are grouped into six main categories: chicken, burger, snack, drink, suit, and tiffin. For clarity, suit refers to packaged meal sets, while tiffin represents traditional multi-item meals. Each food item is defined by three key features: its price, its category, and a content vector describing its composition. This structured categorization enables the recommendation model to analyze customer purchase histories at both the item and category levels, improving the system's ability to suggest relevant meals and combos that align with user preferences. All the foods in the ICS are a set denoted as,
where N denotes the total number of food items in the ICS. For every food product
in F,
consists of details of the features of food, and it is denoted as,
(3)
where,
denotes the prices of food,
denotes the category of the food items where
, C denotes the categories.
denotes the number of menus where
where M denotes the menus
denotes the content vector of food, where
where
is the element of the content vector and represents chicken, burger, snack, drink, suit, and Tiffin, respectively.
The user data consists of details about the user to denote the user's features that can be obtained from the app or the usage flow. For every user
, the features are denoted as,
(4)
where
, denotes the details about the user, such as age and gender.
denotes the click event of the user, which is a variable-length vector where
and t declares the time of the recent click event,
is the positive number and
denotes the entire flow sequence that denotes the entire purchase carried out by the user.
Denotes the list of food purchased by the customer, where
, k denotes the total amount of purchased food items by the user.
With the use of F and U, which denote food items and user data, respectively, the recommendation system is framed in ICS. The objective of this system is to improve the accuracy to enhance the user purchase intention. The problem of ICS is formulated as,
(5)
Equation (5) denotes the objective of the problem to find improved accuracy results on ICS problem by reducing the loss function which is denoted in Equation (6).
(6)
where,
is the predicted results and
is the actual result. The model performs better when the loss function's value is smaller. The loss function is employed to determine the discrepancy between the model's predicted value and the true value Y. In this,
(7)
where,
(8)
(9)
In this, r denotes the training data in X, and s is the purchased food item number of one training data. The developed ICS employed cross-entropy as a loss function, which is described in the following section.
Conv-RNN-based food recommendation
By analyzing the product's attributes, user ratings, and user profiles, a Conv-RNN-based recommendation system provides users with recommendations based on their interests. Figure 3 shows the proposed CRNN design. The Conv-RNN models frequently consider or automatically add specific information about the user's temporal context while making suggestions. However, how well a recommender system comprehends and utilizes the context provided by the suggestion requests often determines how effective it is. Conv-RNN calculates prediction ratings based on the dynamic features and attributes of the item and the user's current time context to provide suitable recommendations for a specific user. It is inevitable that people who are going through similar things at the same time will have similar preferences. The effectiveness of a CNN-based time-aware system for recommendations depends on its capacity to find users who are most comparable to the intended receiver and share the same temporal context. Thus, CNN records the temporal context, which is the time-sensitive information about the user's activity. The CNN's input layer was then fed the user attributes, item characteristics, and time information to rebuild the original matrix. A method for calculating the final output is given once the convolution layer has been used to extract features from the matrix.
From the convolution layers, the food click events are extracted using Eqn (10)
(10)
where O is the output size, X is the input data size, F is the convolutional kernel size, a is to fill the input data, and S is more than 1 and S is the kernel stride. The neural network can model more complex models than it could if it were restricted to simulating computations between neighboring layers of the network, which it does by using uniformity but only linear operation, because the activation function within the stimulation layer is used to perform nonlinear operations. In a neural network, layer-to-layer communication is strictly sequential. In Conv-RNN, activation functions were the most prevalent. The conventional Tanh, sigmoid, and other types of activation functions lack a gradient and have small, practical interval ranges. When a resource-efficient nonlinear operation is also used, the rectified linear unit (ReLU) functions become the primary instrument for overcoming these two issues.
For computational efficiency, the pooling layer down-samples and sparsely processes feature data. The maximum and average pooling methods are two well-known examples of pooling algorithms; MaxPooling provides better feature selection results. MaxPooling selected the following features:
(11)
The Conv-RNN then employs the fully connected layer using two dense approaches for retraining the Conv-RNN tail with less feature information loss. Recurrent layers are commonly used in neural networks for the analysis of sequential data. Because of connections that allow them to maintain an internal memory of previous inputs, recurrent layers handle each input separately, unlike traditional feedforward layers. This makes recurrent layers particularly well-suited for tasks involving sequences, time-series data, or any kind of information where the order of inputs matters. The fundamental unit of a recurrent layer is the recurrent neuron, often known as an RNN cell. As RNN cells handle inputs one at a time, they maintain an internal state that contains data from previous inputs. Its internal condition is altered at each time step, which influences the processing of subsequent inputs. The output layer shows the user the results after using the SoftMax classifier. Before using the fields, whose characteristics are categorized as the index of the embedded matrix, they must first be converted to integers.
The categorical cross-entropy loss function, which quantifies the difference between the true class labels y and the projected probability distribution y', was used to train the Conv-RNN recommendation model. Stochastic gradient descent (SGD) with adaptive moment estimation (Adam) was used to improve the model parameters θ (weights and biases). The parameters were modified as follows at each training iteration t:
(12)
where, η is the learning rate,
is the gradient of the loss function with respect to θ.
The following tactics were used to avoid overfitting. During training, dropout regularization (ρ=0.3) is used on fully linked layers to randomly deactivate neurons. When no improvement was seen for 15 consecutive epochs, training was stopped early based on validation loss.
Five-fold split cross-validation to confirm generalization performance
Using a grid search on the validation set, hyperparameter tuning was carried out. Included in the search space were the Number of filters for the convolution layer, 64, the kernel size, 3 x 3, the Number of Recurrent layers as 100, the Batch size, dropout rate 0.2, and learning rate 0.002. The used optimizer is ADAM. Finally, the output layer returns the results of recommended foods as a one-hot encoding vector, where suggested foods are denoted by one, and other outputs will be denoted as 0.
Customer satisfaction prediction using Conv-LSTM
This study utilized the Convolution Long Short-Term Memory (Conv-LSTM) to forecast customer satisfaction once they have finished their catering. The structure of Conv-LSTM is shown in Figure 4. CNN's architecture includes input neurons, a series of convolutional layers, pooling, completely connected layers, and normalization layers37. The convolution layer's nerve cells are connected to the layer above it through a narrow region, while the activation neurons of the fully linked layers are fully related to the layers below them. Conv-LSTM inputs explicitly define the tensor shapes and temporal granularity. Each input sequence is structured per customer order session (time step = per order), where the purchase list is encoded as a multi-hot vector and the associated satisfaction level is represented as a numerical score. The resulting tensor has the shape (batch size × sequence length × feature dimension).
The forward and backward reverse transmission of a function in CNN generally separates factors into different groups based on their input. Numerous CNN designs have emerged as a result of recent research advancements. As shown in Equation (15), three weights, iw, rw, and b, denote input weight, recurrent weight, and bias, respectively, that have been employed in each LSTM block.
(13)
The following is a declaration of the cell state at time step t:
(14)
where the Hadamard product is denoted by . The code for the hidden state Ht of t is,
(15)
The hyperparameters and their values of Conv-LSTM are declared as follows: The number of filters for the convolution layer is 64, the kernel size is 3 x 3, the LSTM units are 100 with a dropout rate of 0.2, the batch size is 64, the learning rate is 0.002, and the number of time steps is 50 with the Adam optimizer.
UML diagram and Pseudo Code for Customer Interaction
The dynamic flow of interactions in the suggested system is depicted in the UML sequence diagram (Figure 5). The NLP–LDA module processes the user's request (such as a meal order or query) for topic modeling and intent extraction. Following processing, the recommendation engine (Conv-RNN) receives the request and produces a customized recommendation. Lastly, the user receives a real-time response from the system. This sequence guarantees transparency in the conversion of user input into intelligent service outputs and emphasizes the modular interplay of components.
The Conv-RNN recommendation algorithm has been given pseudocode to improve reproducibility. It provides an overview of the sequential computational logic, which includes preprocessing the user request, using convolutional and recurrent layers for sequence modeling and feature extraction, regularization, and a softmax output layer to generate a suggestion. This pseudocode offers a clear implementation-level view of the model workflow, which enhances mathematical formulations.
Pseudo Code: Conv-RNN Recommendation Algorithm
Input: User request U, historical interaction sequence H
Output: Recommended food item R
This study thoroughly tested and validated several models to guarantee the authenticity and dependability of the developed ICS. The most efficient setup for ICS was determined by performing a comparative study of several word embedding and classifier combinations. Each experiment was conducted 10x and the results were presented as average values with standard errors enclosed in parentheses. This method brought attention to the model's unpredictability and consistency in performance. The standard deviation is a crucial factor to consider when evaluating a model; greater values may suggest that the performance of models varies greatly across the datasets or scenarios, casting doubt on the model's generalizability and dependability in real-world applications.
Evaluation metrics
In this study, the AI models are experimented with in terms of three evaluation criteria.
Criteria 1: Performance of the contactless service based on user queries is evaluated with accuracy, F1score, precision, and Recall.
Criteria 2: Performance of the food suggestion system is evaluated based on Shopping hit accuracy, precision, cross entropy, F1 score, and Recall.
Criteria 3: Performance of customer satisfaction prediction is experimented based on Mean absolute error (MAE), Root mean square error (RMSE), and R2. Table 3 lists the mathematical expressions of these evaluation metrics.
Criteria 1: Performance analysis of contactless query services using NLP and LDA
Table 4 illustrates the experimental analysis of various word embeddings with the classification model. Word2Vec with LDA performs better with improved accuracy, precision, recall, and F1 score. The second-highest performance is secured with the TFIDF model. Figure 6 illustrates the interaction between the customer and the ICS using the developed models. When the user typed HI the system responded How can I help you? and when the user inquires with the intent Order then the system responds with the link to place the order. Similarly, when the user asks the intent with the 15 classes, the ICS responds with the corresponding reply. If the user asks for an intent that is not in the class, the ICS responds to assist with the restaurant contact number. Comparatively, the developed model, word2vec with LDA, obtained an improved accuracy of 91.5%, precision of 91%, recall of 91.1%, and F1 score of 89.7%, respectively.
Criteria 2: Performance of the food suggestion system using Conv-RNN
Figure 7 illustrates the shopping hit accuracy. As the L* becomes larger during the training process, a higher accuracy is obtained. The shopping hit accuracy of the developed model is larger than that of the rule-based system, and the epochs reached 1500. As the developed model uses NLP for processing user click events for suggestions, it should have been trained for some time. The Conv-RNN secured a more accurate prediction of 98.5% on recommendations.
The category cross-entropy loss is the sum of the cross-entropy losses for each category. The larger the categorical cross-entropy loss, the more probable it is that the training outcome will match the real-world data. As training time increases, the category loss of cross-entropy reduces, as shown in Figure 8. The Conv-rNN-based system has a lower loss of 0.02 than the rules-based approach, with the loss varying in the range 0.8 to 1.5, when the epoch surpasses 1500. Because the proposed Conv-RNN-based approach may generate a more accurate forecast (or suggestion), it can reduce categorical cross-entropy loss.
Figure 9 demonstrates that precision increases with the length of the training procedure. When the epoch exceeds 1500, the Conv-RNN-based system's precision secured 0.94, which surpasses the rule-based approaches with a precision value of 0.82. The suggested Conv-RNN-based approach can get greater precision since it can generate predictions (or recommendations) with greater accuracy. Figure 10 illustrates that the recall increases with the length of the training procedure. When the epoch exceeds 1500, the Conv-RNN-based scheme's recall value is 0.92, which surpasses the rule-based approaches with a recall of 0.81. The suggested Conv-RNN-based strategy can achieve higher recall because it can produce a forecast (or recommendation) that is more accurate.
Criteria 3: Performance of customer satisfaction prediction using Conv-LSTM
The performance of the suggested model is compared with the conventional systems such as K-means, SVR24, MLP-ANN, and Decision Trees. The results are stated in Table 5. Comparatively, the proposed model predicts customer satisfaction as ratings from 2 to 5 accurately with reduced error measures. The next method that secured a reduced error is SVR. The customer satisfaction rating of predicted and actual for R2 is shown in Figure 11. It has been noted that the obtained R2 estimation of the deviation between actual and predicted output is less with improved prediction of customer satisfaction.
Ablation Study
By altering the sequence length of input interactions, we carried out ablation research to assess the Conv-LSTM satisfaction prediction module's resilience. Because there was less contextual information available, shorter input sequences (5-10 timesteps) produced greater error rates (RMSE = 0.1562 and 0.1247), as Table 5 illustrates. With longer sequences, the model's performance continuously increased, achieving its peak accuracy at 20 timesteps (RMSE = 0.1011, R2 = 0.9812). Performance stayed consistent after this, with very slight variations in RMSE and R2 for 25-30 timesteps. These findings validate the Conv-LSTM model's dependability in noisy and dynamic real-world scenarios by demonstrating that it is not only accurate but also resistant to changes in input length.
Statistical significance test
As shown in Table 6, both the Conv-RNN and Conv-LSTM models significantly outperform the fine-tuned BERT baseline across all metrics (paired t-test, p < 0.05). These results highlight the advantage of integrating sequence modeling with explainable components in hospitality-specific applications, beyond generic transformer-based architectures.
To evaluate reliability under real-world feedback conditions, we conducted additional tests by simulating imbalanced class distributions in Table 7. Even under mild (70:30) and moderate (80:20) imbalance, the Conv-LSTM model maintained strong performance, with only marginal decreases in macro-F1 and balanced accuracy compared to the balanced case. These results indicate that the model is robust to class distribution shifts commonly observed in catering feedback, further validating its applicability in practice.
Data availability
Because of participant confidentiality agreements, the dataset created and examined in this work is proprietary and cannot be made publicly available. However, data may be requested from the corresponding author, Kaihong Feng (norbertfeng199371044047p147151@asu.edu.pl) upon reasonable request. To promote transparency and reproducibility, an anonymized representative subset and related documentation of the preprocessing and labeling protocol are offered as supplemental material. An anonymized subset of customer queries with corresponding intent labels, recommended items, and anticipated satisfaction levels is included in Supplementary File 1 to demonstrate the dataset structure.
Documentation of the text preprocessing procedures, tag extraction procedure and labeling guidelines employed in this work is included in Supplementary File 2. Metrics for Evaluation is provided in Supplementary File 3 with detailed explanations of the performance metrics used to assess the suggested models, including accuracy, precision, recall, F1-score, and RMSE.

Figure 2: LDA performance on user query with Greetings intent. Topic modeling output highlight's key semantic themes used for automatic response generation. Please click here to view a larger version of this figure.

Figure 3: Conv-RNN-based food recommendation for ICS. The recommendation combines convolutional layers for feature extraction and RNN units for sequential modeling of user preferences. Please click here to view a larger version of this figure.

Figure 4: Conv-LSTM model architecture. Captures spatiotemporal dependencies in user behavior and food ordering patterns for personalized recommendations. Please click here to view a larger version of this figure.

Figure 5: UML sequence diagram of user-AI interactions in the intelligent catering system. UML sequence diagram showing the flow of user-AI interactions, starting from User to NLP-LDA Module to Recommendation Engine to System Response. Please click here to view a larger version of this figure.

Figure 6: ICS contactless service flow. The figure illustrates the workflow of an automated customer interaction using QR-based menus and digital ordering. Please click here to view a larger version of this figure.

Figure 7: Impact of shopping hit accuracy on ICS performance. The figure analyzes the system's ability to recommend relevant items across various accuracy thresholds. Please click here to view a larger version of this figure.

Figure 8: Cross-entropy variation in the developed ICS model. Training and validation losses are tracked to evaluate convergence and classification performance. Please click here to view a larger version of this figure.

Figure 9: Precision evaluation of the ICS model. The model measures the relevance of predicted food items to user preferences. Please click here to view a larger version of this figure.

Figure 10: Recall performance of the ICS model. The model evaluates the system's ability to capture all relevant food items in recommendations. Please click here to view a larger version of this figure.

Figure 11: R2 performance for customer satisfaction prediction. The prediction shows the predictive accuracy of the ICS model using the Conv-LSTM architecture for satisfaction scores. Please click here to view a larger version of this figure.

Figure 12: Computation time comparison. Benchmarks execution time of the proposed ICS model versus baseline systems. Please click here to view a larger version of this figure.
| S.No | Queries (Classes) | Count |
| 1 | Contact | 50 |
| 2 | Location | 40 |
| 3 | Order | 22 |
| 4 | Hours | 21 |
| 5 | Today’s Menu | 20 |
| 6 | Thanks | 20 |
| 7 | Payments | 20 |
| 8 | Menu | 20 |
| 9 | Delivery option | 19 |
| 10 | Catering | 19 |
| 11 | Greeting | 16 |
| 12 | Reservations | 7 |
| 13 | Offers | 5 |
| 14 | Seating | 2 |
| 15 | Beverages | 2 |
Table 1: User query frequencies under different intent categories. The frequencies were collected from a local restaurant's chatbot interface and used for intent classification.
| Methods | Description |
| Cleaning | To reduce data noise, remove extraneous characters called punctuation marks (.,!,?), symbols (#, $, %, &), as well as other non-alphanumeric characters. |
| Tokenization | It is the process of dividing the processed text into discrete units (tokens) so that analysis is simpler |
| Stop word removal | It is the process of removing common words like "a," "an," and "the" that don't really help with intent recognition so that the queries may concentrate on more important terms |
| Lemmatization | It is the process of taking words down to their most basic form in order to combine several word variations into a single word, which makes the dataset simpler and improves the method’s ability to understand the queries |
Table 2: Query preprocessing steps for word embedding generation. The table describes tokenization, stop word removal, and embedding techniques used for model input.
| Metrics | Mathematical Expressions |
| Accuracy | ![]() |
| Precision | ![]() |
| Recall | ![]() |
| F1Score | ![]() |
| Shopping Hit Accuracy | ![]() |
| Cross-Entropy | ![]() |
| MAE | ![]() |
| RMSE | ![]() |
| R2 | ![]() |
Table 3: Mathematical definitions of evaluation metrics. The table includes cross-entropy, precision, recall, and R2 used for assessing system performance. Here TP is the true positive, TN is true negative, FP is false positive, FN is false negative, X_i is the input of training data, Y_j is the desired output, A_j is the jth output of neural network, [actual]_i is the real output, prediction denotes the predicted output, (actual) ̅_i denotes mean value of observed data and (prediction) ̅_i is the mean value of predicted output.
| Models | Accuracy | Precision | Recall | F1socre |
| BoW with LDA | 0.812 | 0.832 | 0.811 | 0.805 |
| Glove with LDA | 0.798 | 0.774 | 0.806 | 0.773 |
| TFIDF with LDA | 832 | 0.85 | 0.832 | 0.82 |
| Word2Vec with LDA | 0.915 | 0.91 | 0.911 | 0.897 |
Table 4: LDA-based contactless service evaluation. The evaluation presents the accuracy and response relevance achieved using topic modeling for intent classification.
| Sequence Length (timesteps) | R² |
| 5 | 0.9478 |
| 10 | 0.9635 |
| 15 | 0.9761 |
| 20 | 0.9812 |
| 25 | 0.9799 |
| 30 | 0.9785 |
Table 5: Ablation Study of Conv-LSTM Satisfaction Prediction under Varying Sequence Lengths. The study compares model performance in terms of mean squared error, R2, and other metrics across test scenarios.
| Model | Accuracy | Precision | Recall | F1-Score | Notes |
| Rule-Based Baseline | 78.30% | 76.90% | 77.50% | 77.20% | Traditional approach |
| BERT (Fine-Tuned, AAAI 2025) | 94.20% | 93.80% | 94.10% | 93.90% | Transformer baseline |
| Conv-RNN (Proposed) | 98.50% | 98.20% | 98.30% | 98.30% | Outperforms BERT (p< 0.05) |
| Conv-LSTM (Proposed) | 97.90% | 97.60% | 97.80% | 97.70% | Outperforms BERT (p< 0.05) |
Table 6: Comparison of Proposed Models with Fine-Tuned BERT (AAAI 2025) for Hospitality Service Interactions. Statistical significance testing (paired t-tests at a 95% confidence level) was used to verify the observed improvements.
| Data Distribution | RMSE | R2 | Macro-F1 | Balanced Accuracy |
| Balanced (Original) | 0.1011 | 0.9812 | 0.9784 | 0.981 |
| Mild Imbalance (70:30) | 0.1126 | 0.9725 | 0.9631 | 0.9658 |
| Moderate Imbalance (80:20) | 0.1249 | 0.9613 | 0.9452 | 0.9517 |
| Severe Imbalance (90:10) | 0.1378 | 0.9486 | 0.9215 | 0.9342 |
Table 7: Conv-LSTM Performance under Balanced versus Imbalanced Catering Feedback Data. Data simulated imbalanced class distributions. This shows how the Conv-LSTM model's performance changes when catering feedback data is evenly balanced across classes versus when it reflects imbalanced, real-world class distributions.
Supplementary File 1: An anonymized representative subset of customer queries paired with their corresponding content class labels. Please click here to view a download this file.
Supplementary File 2: This subset illustrates the preprocessing and intent-labeling protocol applied in the study while preserving participant confidentiality. Please click here to view a download this file.
Supplementary File 3: Detailed explanations of the performance metrics used to assess the suggested models, including accuracy, precision, recall, F1-score, and RMSE. Please click here to view a download this file.
The overall performance of the suggested ICS model using AI technologies is compared with the k-means with SVR24, quick service restaurant with LSTM (QSR-LSTM)25, and NLP-ANN38. Comparatively, the proposed model secured a reduced computation time compared to the considered approaches, as shown in Figure 12. As the number of iterations increases, the computation time for all the models increases gradually. The suggested intelligent catering system using AI has secured improved performance with reduced computation time and error. Therefore, the developed model is efficient and effective for providing intelligent catering services to restaurants.
Critical steps
Although the Protocol section describes the general concept, the success and reproducibility of the suggested system are determined by a few crucial steps:
Accurate Preprocessing: Before using Word2Vec and LDA, the text data must be thoroughly cleaned and tokenized; any mistakes made at this point will lower the accuracy of intent categorization.
Balanced Training Data: To prevent biased predictions, customer contact data sets should reflect a range of food preferences and satisfaction levels.
Hyperparameter Sensitivity: Inaccurate values severely impair performance for the Conv-RNN and Conv-LSTM models, which necessitate careful adjustment of the learning rate, dropout ratio, and hidden layers.
Module Interoperability: To work with the recommendation and satisfaction prediction modules, NLP outputs (from LDA) need to be consistently organized.
Robust Validation: To avoid overfitting and guarantee consistent performance across real-world datasets, cross-validation and early stopping are crucial.
By managing these phases, the system can consistently achieve high accuracy, scalability, and repeatability in a variety of restaurant service contexts.
Modification and troubleshooting
Several adjustments and troubleshooting techniques can be used when typical problems occur in order to guarantee the suggested AI-driven restaurant service system's repeatability and adaptability:
Imbalance and Data Quality: Problem: Model accuracy may be lowered by noisy or uneven customer contact data. Solution: Use preprocessing techniques including data augmentation (e.g., paraphrasing customer requests), text normalization, and outlier reduction. Minority classes can be balanced with the aid of oversampling or synthetic data synthesis (SMOTE).
Topic Coherence Issues in the NLP-LDA Module: Problem: LDA occasionally generates subjects that are irrelevant or unintelligible. Solution: Use domain-specific stop-word lists, tweak hyperparameters (α, β), and change the number of subjects. Coherence is enhanced by Word2Vec embeddings trained on corpora unique to restaurants.
Conv-RNN and Conv-LSTM Model Overfitting: Problem: Models may perform well during training but poorly during generalization. Solution: Implement early halting, weight regularization, and dropout layers. Robustness is further enhanced by cross-validation and training data expansion.
Module Integration: Problem: Workflow disruption may result from a misalignment between NLP outputs (LDA topics) and recommendation/prediction inputs. The solution is to standardize output formats (structured arrays and JSON) and validate intermediate results before supplying them to models further down the line.
Latency of the System in Real-Time Deployment: Problem: Response times may be slowed down by an increased computing burden. Solution: Implement lightweight inference engines, cache frequently requested results, or compress the model (e.g., by pruning or quantizing).
In summary, the created ICS provides a model for the next developments in food service automation and demonstrates how AI has the ability to revolutionize customer service in the restaurant sector. However, ICS's capacity to adjust and satisfy the rising needs for engaging and individualized consumer experiences will depend heavily on the ongoing development of AI, especially in generative models. The dynamic character of AI research and its tangible ramifications are highlighted by this conversation between technological advancement and industry application, which also points to a future direction for both academic study and industry practice.
In terms of offering clever catering services, the suggested ICS can be used as a remedy and a driving force in the restaurant sector. The total service experience can be significantly improved by ICS adoption in the restaurant business, which benefits both patrons and eateries. Restaurants may save money and maximize their resources by automating customer service, which is especially advantageous for small enterprises with constrained workforce levels. Automating repetitive operations, cutting down on waiting times, and delivering accurate and timely data improve customer experience. By enabling smooth interaction with current restaurant systems, ICS's natural language processing capabilities promote effective user query comprehension. The proposed food recommendation system using Conv-RNN is used to solve the quick service restaurants' issues and provide food suggestions to the customer based on historical click and food order events. Finally, customer satisfaction is predicted using Conv-LSTM in the ratings of 1 to 5. The developed ICS using AI models' performance is experimented under various evaluation metrics, and the results show the model's efficiency with improved accuracy and reduced error. Word2vec with LDA secured the accuracy of 91.5%, precision of 91%, recall of 91.1% and f1 score of 89.7%. The proposed Conv-RNN-based food suggestion model secured an accuracy of 98.5% with a reduced loss value of 0.02. The suggested Conv-LSTM-based satisfaction prediction system secured the reduced RMSE of 0.1011 and improved R2 value of 0.9812. Our study demonstrates how ICS may revolutionize service delivery, opening the door to higher customer satisfaction and more logical efficiency in the restaurant sector.
Limitations
This study has some limitations. The results of the study are based on particular datasets and experimental setups, which might not be typical of other restaurant settings. In real-world applications, variables like cultural variances, variations in client behavior, and diverse cuisines may affect how effective the system is. It takes a lot of processing power to implement AI-based models, particularly deep learning methods like Conv-RNN and Conv-LSTM. It could be difficult for small or medium-sized eateries with tight finances to implement these technologies. Therefore, optimization approaches will be applied in the future to fine-tune the DL model parameters. Notwithstanding these limitations, the framework works effectively for real-world applications, including automated customer care in quick-service eateries, customized menu suggestions in large-scale catering, and satisfaction tracking in smart eating settings.
While our AI-driven framework shows transformative potential for catering services, it is critical to consider ethical implications-particularly around data privacy, transparency, and stakeholder well-being. To align with established best practices, the system design now references IEEE standards such as IEEE 7002 (data privacy by design) and IEEE 7001 (transparency in AI systems) to ensure that user data is handled with accountability. We also incorporated a systematic value-sensitive design approach guided by IEEE 7000 and considered human well-being metrics following IEEE 7010. These extensions demonstrate our commitment to deploying AI in a responsible and trust-enhanced manner39. To further confirm scalability and resilience, future research could broaden the datasets to include a variety of cuisines and service methods. The versatility and accessibility of ICS with clouds and edge are added to the suggested model in the future to enable the development of more applications on the ICS platform. In the future, the real-world practical implications of this model will be experimented with to provide more realistic support to ICS industries.
The authors have no conflicts of interest.
The authors gratefully acknowledge the research support provided by the Faculty of Information Science and Technology, The National University of Malaysia. This work was made possible through the university's internal research funding and academic support infrastructure. The authors also extend their appreciation to colleagues and technical staff for their valuable input during the system design and modeling phase.
| Programming Language | Python (used for model development, NLP, and deep learning) | https://www.python.org/ | Python 3.8+ |
| Database | MySQL or SQLite (for storing user interaction logs) | https://www.mysql.com/; https://www.sqlite.org/ | MySQL 8.0 or SQLite3 |
| Dataset | User queries collected from local restaurant ordering chatbot | Manually annotated | |
| Deep Learning Framework | TensorFlow / Keras | https://www.tensorflow.org/; Keras 2.11 → https://keras.io/ | TensorFlow 2.11 or Keras 2.11 |
| Development Environment | Jupyter Notebook / Google Colab | https://jupyter.org/; https://colab.research.google.com/ | JupyterLab 3+ / Colab (free) |
| Evaluation Metrics | scikit-learn metrics: precision, recall, cross-entropy, R² | https://scikit-learn.org/ | scikit-learn 1.0+ |
| Natural Language Toolkit | spaCy / NLTK (for intent detection preprocessing) | https://spacy.io/; https://www.nltk.org/ | spaCy 3.0 / NLTK 3.6 |
| Recurrent Neural Network Models | RNN, LSTM, Conv-LSTM | https://keras.io/ | Implemented in Keras |
| System Hardware | Intel Core i7, 16GB RAM, NVIDIA GTX 1660 Ti GPU | Local system | |
| Topic Modeling Tool | Gensim (used for Latent Dirichlet Allocation) | https://radimrehurek.com/gensim/ | Gensim 4.1.2 |
| Visualization Tools | Matplotlib, Seaborn (for plotting performance graphs) | https://seaborn.pydata.org/; https://matplotlib.org/ | Matplotlib 3.5+, Seaborn 0.11 |
| Word Embedding | Word2Vec / GloVe pre-trained embeddings | https://nlp.stanford.edu/projects/glove/ | GloVe (100D), Stanford NLP |