December 23rd, 2025
We present an open-source virtual agent platform for conducting real-time motivational interviews, combining state-of-the-art language and diffusion models to adapt to users' behavior and profile. Utilizing the Greta 2.0 platform, it supports various topics, including nutrition and sport-focused interventions, and offers a flexible, validated tool for enhancing digital therapeutic interactions.
Our research investigates multimodal interaction with a focus on adapting verbal and nonverbal behavior of a virtual agent, such as facial expression during interaction. Recent analysis in machine learning, particularly large language model, enables more natural and flexible human-computer interaction. To begin, prepare a Windows computer for the experiment.
Install Java SE Development Kit 8 from the official distribution source. Then install the Visual C+Redistributable for Visual Studio 2013. Request a CereProc license through the online application portal.
Install OpenFace from the provided repository. Obtain a webcam for video capture. Next, download the Greta software release from the specified repository.
Compile the software using NetBeans following the provided instructions. Obtain a Mistral application programming interface key from the online console. Create the file, then paste the application programming interface key into the created file.
Now, obtain a Deepgram application programming interface key from the online console. Create the file path and paste the application programming interface key into the created file. For MoDiff model import, first download the MoDiff model weights from the provided source.
Place the downloaded file into the MoDiff data folder. Next launch Modular JAR. Click on File, Open, Greta, Advanced, and choose 20250128 Greta Expe Lucie_full.xml.
Ensure that the configuration displayed matches the expected setup. Now start the OpenFace offline ZeroMQ program. In the Record tab, uncheck all options except broadcast with ZeroMQ.
In the File tab, click Open Webcam. Authorize live feature extraction using the available webcam. Select the webcam to use.
Wait until the webcam is loaded and live feature extraction starts automatically. Next, in the OpenFace2 Output Stream Reader, click Connect. Wait for the available feature list to populate.
Click on Select All and then set to validate the selected features. Under MoDiff, click Launch and wait for the connection confirmation message. Then press Connect to activate the connection between MoDiff and the data receiver.
Under Filter, click Perform to allow execution of the generated data. In the MoDiff window, click Enable. Wait 90 seconds for the model to stabilize.
To launch ASR, in the Dee gram window, press Enable. Then select the condition RL to use dream or baseline to use a plain large language model. In the MI Counselor RL window, click Enable.
Wait 30 seconds for the model to start. Place a large monitor on the table with a chair positioned in front. Position a webcam on top of the monitor facing the chair.
Place a directional microphone on the table in front of the chair. Have the participant complete the Decision Balance Scale or DBS questionnaire using a five-step Likert scale. Then ask the participant to speak into the microphone.
To begin, identify the participant profile and evaluate the DBS score. Classify the participant as resistant, hesitant, or open to change based on the score. In the MI Counselor RL window, choose the discussion theme chosen by the participant.
Click Start to allow the agent to initiate the discussion with the participant. Monitor the answer section of the MI Counselor window for harmful output. In the Deepgram window, click Listen after the agent finishes its turn.
If speech is not recognized, enter the participant's speech into the request field and click Send. Wait for the participant's answer. Ask the participant to complete the post-intervention questionnaires.
Then debrief the participant using the prepared speech. Users interacting with the agent with adaptive facial expressions expressed more positive sentimentality in their disclosure than users in the other behavior conditions. No significant differences were observed across the three expression conditions in subjective questionnaire scores assessing attitude, social rapport, and motivational interviewing quality.
The mismatched facial expression condition, that is when interpersonal contingency is no longer respected, was generally rated lower than both the inexpressive, that is when the agent shows no expression, and adaptive, that is the agent's expression is driven by our model conditions across subjective measures. Perceived attitude had a strong direct effect on perceived interviewing quality. The relationship between perceived attitude and motivational interviewing quality was partially mediated by social rapport, accounting for 43%of the effect.
Participants interacting with the adaptive dream dialogue manager reported significantly higher rapport than those interacting with the plain large language model baseline. The dream dialogue manager showed better adaptation to different participant profiles than the baseline model in motivation scores measured by the Decisional Balance Scale. Both the baseline and dream conditions produced client evaluation of motivational interviewing scores above the therapeutic threshold.
Our finding shows that adapting verbal and nonverbal behavior significantly improved the effectiveness of the agent and improved the perception of social interactive agents. Our platform enables the evaluation of other adaptive components, such as gesture generation and the expression of other dialogue topics. Future research will focus on long-term interaction and continuous adaptation across multiple dialogue sessions.
View the full transcript and gain access to thousands of scientific videos
This study presents an open-source virtual agent platform designed for real-time motivational interviews, utilizing advanced language and diffusion models to adapt to user behavior. The platform, Greta 2.0, supports various intervention topics, including nutrition and sports, enhancing digital therapeutic interactions.