Evaluating Usability Aspects of a Mixed Reality Solution for Immersive Analytics in Industry 4.0 Scenarios

Burkhard Hoppenstedt; Thomas Probst; Manfred Reichert; Winfried Schlee; Klaus Kammerer; Myra Spiliopoulou; Johannes Schobel; Michael Winter; Anna Felnhofer; Oswald D. Kothgassner; Rüdiger Pryss

doi:10.3791/61349

Engineering

Evaluating Usability Aspects of a Mixed Reality Solution for Immersive Analytics in Industry 4.0 Scenarios

Published: October 6, 2020 doi: 10.3791/61349

Burkhard Hoppenstedt¹, Thomas Probst², Manfred Reichert¹, Winfried Schlee³, Klaus Kammerer¹, Myra Spiliopoulou⁴, Johannes Schobel¹, Michael Winter¹, Anna Felnhofer⁵, Oswald D. Kothgassner⁶, Rüdiger Pryss⁷

¹Institute of Databases and Information Systems, Ulm University, ²Department for Psychotherapy and Biopsychosocial Health, Danube University Krems, ³Department of Psychiatry and Psychotherapy, University of Regensburg, ⁴Faculty of Computer Science, Otto von Guericke University Magdeburg, ⁵Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, ⁶Department of Child and Adolescent Psychiatry, Medical University of Vienna, ⁷Institute of Clinical Epidemiology and Biometry, University of Würzburg

Summary

This protocol delineates the technical setting of a developed mixed reality application that is used for immersive analytics. Based on this, measures are presented, which were used in a study to gain insights into usability aspects of the developed technical solution.

Abstract

In medicine or industry, the analysis of high-dimensional data sets is increasingly required. However, available technical solutions are often complex to use. Therefore, new approaches like immersive analytics are welcome. Immersive analytics promise to experience high-dimensional data sets in a convenient manner for various user groups and data sets. Technically, virtual-reality devices are used to enable immersive analytics. In Industry 4.0, for example, scenarios like the identification of outliers or anomalies in high-dimensional data sets are pursued goals of immersive analytics. In this context, two important questions should be addressed for any developed technical solution on immersive analytics: First, is the technical solutions being helpful or not? Second, is the bodily experience of the technical solution positive or negative? The first question aims at the general feasibility of a technical solution, while the second one aims at the wearing comfort. Extant studies and protocols, which systematically address these questions are still rare. In this work, a study protocol is presented, which mainly investigates the usability for immersive analytics in Industry 4.0 scenarios. Specifically, the protocol is based on four pillars. First, it categorizes users based on previous experiences. Second, tasks are presented, which can be used to evaluate the feasibility of the technical solution. Third, measures are presented, which quantify the learning effect of a user. Fourth, a questionnaire evaluates the stress level when performing tasks. Based on these pillars, a technical setting was implemented that uses mixed reality smartglasses to apply the study protocol. The results of the conducted study show the applicability of the protocol on the one hand and the feasibility of immersive analytics in Industry 4.0 scenarios on the other. The presented protocol includes a discussion of discovered limitations.

Introduction

Virtual-reality solutions (VR solutions) are increasingly important in different fields. Often, with VR solutions (including Virtual Reality, Mixed Reality, and Augmented Reality), the accomplishment of many daily tasks and procedures shall be eased. For example, in the automotive domain, the configuration procedure of a car can be supported by the use of Virtual Reality¹ (VR). Researchers and practitioners have investigated and developed a multitude of approaches and solutions in this context. However, studies that investigate usability aspects are still rare. In general, the aspects should be considered in the light of two major questions. First, it must be evaluated whether a VR solution actually outperforms an approach that does not make use of VR techniques. Second, as VR solutions are mainly relying on heavy and complex hardware devices, parameters like the wearing comfort and mental effort should be investigated more in-depth. In addition, the mentioned aspects should always be investigated with respect to the application field in question. Although many extant approaches see the needs to investigate these questions², less studies exist that have presented results.

A research topic in the field of VR, which is currently important, is denoted with immersive analytics. It is derived from the research field of visual analytics, which tries to include the human perception into analytics tasks. This process is also well-known as visual data mining⁴. Immersive analytics includes topics from the fields of data visualization, visual analytics, virtual reality, computer graphics, and human-computer interaction⁵. Recent advantages in head-mounted displays (HMD) led to improved possibilities for exploring data in an immersive way. Along these trends, new challenges and research questions emerge, like the development of new interaction systems, the need to investigate user fatigue, or the development of sophisticated 3D visualizations⁶. In a previous publication⁶, important principles of immersive analytics are discussed. In the light of big data, methods like immersive analytics are more and more needed to enable a better analysis of complex data pools. Only a few studies exist that investigate usability aspects of immersive analytics solutions. Furthermore, the domain or field in question should also be considered in such studies. In this work, an immersive analytics prototype was developed, and based on that, a protocol, which investigates the developed solution for Industry 4.0 scenarios. The protocol thereby exploits the experience method², which is based on subjective, performance, and physiological aspects. In the protocol at hand, the subjective aspects are measured through perceived stress of the study users. Performance, in turn, is measured through the required time and errors that are made to accomplish analysis tasks. Finally, a skin conductance sensor measured physiological parameters. The first two measures will be presented in this work, while the measured skin conductance requires further efforts to be evaluated.

The presented study involves several research fields, particularly including neuroscience aspects and information systems. Interestingly, considerations on neuroscience aspects of information systems have recently garnered attention of several research groups⁷^,⁸, showing the demand to explore the use of IT systems also from a cognitive viewpoint. Another field that is relevant for this work constitutes the investigation of human factors of information systems⁹^,¹⁰^,¹¹. In the field of human-computer interaction, instruments exist to investigate the usability of a solution. Note that the System Usability Scale is mainly used in this context¹². Thinking Aloud Protocols¹³ are another widely used study technique to learn more about the use of information systems. Although many approaches exist to measure usability aspects of information systems, and some of them have been presented long ago¹⁴, still questions emerge that require to investigate new measures or study methods. Therefore, research in this field is very active¹²^,¹⁵^,¹⁶.

In the following, the reasons will be discussed why two prevalently used methods have not been considered in the current work. First, the System Usability Scale was not used. The scale is based on ten questions¹⁷ and its use can be found in several other VR studies¹⁸ as well. As this study mainly aims at the measurement of stress¹⁹, a stress-related questionnaire was more appropriate. Second, no Thinking Aloud Protocol²⁰ was used. Although this protocol type has shown its usefulness in general¹³, it was not used here as the stress level of study users might increase only due to the fact that the think aloud session must be accomplished in parallel to the use of a heavy and complex VR device. Although these two techniques have not been used, results of other recent studies have been incorporated in the study at hand. For example, in previous works²¹^,²², the authors distinguish between novices and experts in their studies. Based on the successful outcome of these studies, the protocol at hand utilizes this presented separation of study users. The stress measurement, in turn, is based on ideas of the following works¹⁵^,¹⁹^,²¹^,²².

At first, for conducting the study, a suitable Industry 4.0 scenario must be found for accomplishing analytical tasks. Inspired by another work of the authors²³, two scenarios (i.e., the analysis tasks) have been identified, (1) Detection of Outliers, and (2) Recognition of Clusters. Both scenarios are challenging, and are highly relevant in the context of the maintenance of high-throughput production machines. Based on this decision, six major considerations have driven the study protocol presented in this work:

The solution developed for the study will be technically based on mixed reality smartglasses (see Table of Materials) and will be developed as a mixed reality application.
A suitable test must be developed, which is able to distinguish novices from advanced users.
Performance measures should consider time and errors.
A desktop application must be developed, which can be compared to the immersive analytics solution.
A measure must be applied to evaluate the perceived stress level.
In addition to the latter point, features shall be developed to mitigate the stress level while a user accomplishes the procedure of the two mentioned analysis tasks (i.e., (1) Detection of Outliers, and (2) Recognition of Clusters).

Based on the six mentioned points, the study protocol incorporates the following procedure. Outlier Detection and Cluster Recognition Analysis tasks have to be accomplished in an immersive way using mixed reality smartglasses (see Table of Materials). Therefore, a new application was developed. Spatial sounds shall ease the performing of analysis tasks without increasing the mental effort. A voice feature shall ease the navigation used for the developed application of the mixed reality smartglasses (see Table of Materials). A mental rotation test shall be the basis to distinguish novices from advanced users. The stress level is measured based on a questionnaire. Performance, in turn, is evaluated based on the (1) time a user requires for the analysis tasks, and based on the (2) errors that were made by a user for the analysis tasks. The performance in mixed reality smartglass is compared with the accomplishment of the same tasks in a newly developed and comparable 2D desktop application. In addition, a skin conductance device is used to measure the skin conductance level as a possible indicator for stress. Results to this measurement are subject to further analysis and will not be discussed in this work. The authors revealed in another study with the same device that additional considerations are required²⁴.

Based on this protocol, the following five research questions (RQs) are addressed:

RQ1: Do spatial imagination abilities of the participants affect the performance of tasks significantly?
RQ2: Is there a significant change of task performance over time?
RQ3: Is there a significant change of task performance when using spatial sounds in the immersive analytics solution?
RQ4: Is the developed immersive analytics perceived stressful by the users?
RQ5: Do users perform better when using an immersive analytics solution compared to an 2D approach?

Figure 1 summarizes the presented protocol with respect to two scales. It shows the developed and used measures and their novelty with respect to the level of interaction. As the interaction level constitutes an important aspect when developing features for a VR setting, Figure 1 shall better show the novelty of the entire protocol developed in this work. Although the evaluation of the aspects within the two used scales is subjective, their overall evaluation is based on the current related work and the following major considerations: One important principle constitutes the use of abstractions of an environment for a natural interaction, in which the user has become attuned to. With respect to the protocol at hand, the visualization of point clouds seems to be intuitive for users and the recognition of patterns in such clouds has been recognized as a manageable task in general. Another important principle constitutes to overlay affordances. Hereby, the use of spatial sounds as used in the protocol at hand is an example, as they correlate with the proximity of a searched object. The authors recommend to tune the representations in a way that most information is located in the intermediate zone, which is most important for human perception. The reason why the authors did not include this principle was to encourage the user to find the best spot by themselves as well as to try to orientate themselves in a data visualization space, which is too large to be shown at once. In the presented approach, no further considerations of the characteristics of the 3D data to be shown were made. For example, if a dimension is assumed to be temporal, scatterplots could have been shown. The authors consider this kind of visualization generally interesting in the context of Industry 4.0. However, it has to been focused on a reasonably small set of visualizations. Moreover, a previous publication already focused on the collaborative analysis of data. In this work, this research question was excluded due to complexity of the other addressed issues in this study. In the presented setup here, the user is able to explore the immersive space by walking around. Other approaches offer controllers to explore the virtual space. In this study, the focus is set on the usability by using the System Usability Scale (SUS). Another previous publication has conducted a study for economic experts, but with VR headsets. In general, and most importantly, this study complains about the limited field of view for other devices like the used mixed reality smartglasses in this work (see Table of Materials). Their findings show that beginners in the field of VR were able to use the analytic tool efficiently. This matches with the experiences of this study, although in this work beginners were not classified to have VR or gaming experiences. In contrast to most VR solutions, mixed reality is not fixed to a position as it allows to track the real environment. VR approaches such as mention the use of special chairs for a 360° experience to free the user from his desktop. The authors of indicate that perception issues influence the performance of immersive analytics; for example, by using shadows. For the study at hand, this is not feasible, as the used mixed reality smartglasses (see table of materials) are not able to display shadows. A workaround could be a virtual floor, but such a setup was out of the scope of this study. A survey study in the field of immersive analytics identified 3D scatterplots as one of the most common representations of multi-dimensional data. Altogether, the aspects shown in Figure 1 cannot be found currently compiled to a protocol that investigates usability aspects of immersive analytics for Industry 4.0 scenarios.

Subscription Required. Please recommend JoVE to your librarian.

Protocol

All materials and methods were approved by the Ethics Committee of Ulm University, and were carried out in accordance with the approved guidelines. All participants gave their written informed consent.

1. Establish Appropriate Study Environment

NOTE: The study was conducted in a controlled environment to cope with the complex hardware setting. The used mixed reality smartglasses (see Table of Materials) and the laptop for the 2D application were explained to the study participants.

Check the technical solution before each participant; set in default mode. Prepare the questionnaires and place next to a participant.
Let participants solve tasks from the use cases outlier detection and cluster recognition in one session (i.e., average time was 43 min).
Start the study by welcoming the participants and introducing the goal of the study, as well as the overall procedure.
Participants using the skin conductance measurement device (see Table of Materials) must adhere to a short resting phase, to receive a baseline measurement. Only half of the participants used this device.
All participants have to fill out the State-Trait Anxiety Inventory (STAI) questionnaire³¹, prior to the start of the experiment.
1. Next, participants have to perform the mental rotation test (see Figure 4, this test evaluated the spatial imagination abilities), which was the basis to distinguish high from low performers (high performers are advanced users, while low performers are novices), followed by the spatial sound test to measure spatial hearing abilities of a participant.
  NOTE: A median split of the test scores in the mental rotation test³² was used to distinguish low from high performers.
Randomly separate participants into two groups; either start with the task on outlier detection or cluster recognition, while continuing with the other use case afterwards. For the cluster recognition task, half of the participants firstly started with the used mixed reality smartglasses (see Table of Materials), and then used the 2D application, while the other half firstly started with the 2D application, and then used the mixed reality smartglasses (see Table of Materials). For the outlier detection task, randomly select one group which receives sound support, while the other part of the group receives no sound support.
Conclude the session, participants have to answer the State-Trait Anxiety Inventory (STAI) questionnaire³¹ again, as well as the self-developed, and a demographic questionnaire.
Store the generated data, which was automatically recorded by each developed application, on the laptop's storage after the session was accomplished.

2. Study Protocol for Participants

Prepare the experiment (see Figure 2 for the room of the experiment) for each participant. Present the desktop PC, the used mixed reality smartglasses, and hand out the questionnaires.
Inform the participants that the experiment will take 40 to 50 minutes, and that half of them start after the pretests (see Points 3-6 of Study Protocol) firstly with the outlier detection test (see Point 7 of Study Protocol), followed by the cluster recognition test (see Point 8 of Study Protocol), while the others accomplish these two tests vice versa (i.e., Point 8 of the Study Protocol before Point 7).
Decide randomly whether a skin conductance measurement is done. In case of yes, prepare the skin conductance measurement device³³ and inform the participant to put on the device. Request a short resting phase from participants to receive a baseline measurement for their stress level.
Request participants to fill out the State-Trait Anxiety Inventory (STAI) questionnaire³¹ and inform them that it measures the current perceived stress before the experiment.
Conduct a mental rotation test.
1. Inform participants that their mental rotation capabilities are evaluated and usher them in front of a desktop computer. Inform participants about the test procedure. Note that they had to identify similar objects that had different positions in a simulated 3D space.
2. Inform participants that only two of the five shown objects are similar and that they will have 2 minutes for the entire test. Inform participants that seven tasks could be accomplished within the given 2 minutes and tell them that performance measures are recorded for each accomplished task.
Evaluate spatial sound abilities.
1. Inform participants that their spatial sound abilities are evaluated and usher them in front of a desktop computer. Inform participants about the test procedure. Explain to participants that six sound samples must be detected, which will be played for 13 seconds each.
2. Inform participants that they have to detect the direction (analogously to the four compass directions) of which the sound is coming from.
Evaluate outlier detection skills.
1. Request participants to put on the mixed reality smartglasses. Explain to them that outliers must be found within the world created for the mixed reality smartglasses.
2. Further inform them that an outlier is a red-marked point, all other points are white-marked. Explain then to them that they must direct their gaze to the red-colored point to detect it.
3. Further inform the participants that not only visual help is provided, additionally environmental sounds support them to find outliers. Provide the information to the participants that they have to accomplish 8 outlier tasks, meaning that 8 times within the virtual world, the red-colored point has to be found. For each participant, 4 tasks are sound-supported, while 4 tasks are sound-unsupported. For each participant, it is randomly selected whether they start a task sound-supported or not. Then, dependent from the first task, it changes from task to task whether sound support is provided or not.
4. Tell participants which information will be recorded: required time for each task, length of walking, and how their final moving position is looking like related to their starting position. Finally tell participants that the red-marked point changes to green if it was detected (see Figure 3).
Evaluate cluster recognition skills.
1. Randomly decide for the participant whether firstly to use the mixed reality smartglasses or to usher the participant to a desktop computer. In the following, only the procedure for the mixed reality setting is described. If a participant firstly starts with the desktop computer, the procedure is the same in changed order and except the voice commands, they are only provided when using the mixed reality solution.
2. For participants using mixed reality: Request participants to put on the mixed reality smartglasses. Inform participants how to find clusters within the world created with the used mixed reality smartglasses. Emphasize to the participants that they had to distinguish between overlapping clusters by moving around them.
3. For participants using mixed reality: Explain to participants that they can navigate in the virtual world and around the clusters using voice commands. Finally tell participants that they had to detect six clusters.
4. For participants using mixed reality: Request participants to remove the used mixed reality smartglasses. Usher participants to a desktop computer and tell them to use the software shown on the screen of the desktop computer. Inform them that the same type of clusters like shown in the used mixed reality smartglasses had to be detected using the software on the desktop computer (see Figure 7 and Figure 8).
Request participants to fill out three questionnaires, namely the State-Trait Anxiety Inventory (STAI) questionnaire³¹, a self-developed questionnaire to gather subjective feedback, and a demographic questionnaire to gather information about them.
Request participants to remove the skin conductance measurement device³³ if they were requested in the beginning to put it on.
Relieve participants from the experiment by saying thanks for the participation.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

Setting up Measures for the Experiment
For the outlier detection task, the following performance measures were defined: time, path, and angle. See Figure 6 for the measurements.

Time was recorded until a red-marked point (i.e., the outlier) was found. This performance measure indicates how long a participant needed to find the red-marked point. Time is denoted as the variable "time" (in milliseconds) in the results.

While participants tried to find the red-marked point, their walking path length was determined. The basis of this calculation was that the used mixed reality smartglasses (see Table of Materials) collect the current position as a 3D vector relatively to the starting position at a frame rate of 60 frames per second. Based on this, the length of path a participant had walked could be calculated. This performance measure indicates whether participants walked a lot or not. Path is denoted as PathLength in the results. Based on the PathLength, three more performance measures were derived: PathMean, PathVariance, and BoundingBox. PathMean denotes the average speed of participants in meter per frame, PathVariance the erraticness of a movement, and BoundingBox denotes whether participants had intensively used their bounding box. The latter is determined based on the maximum and minimum positions of all movements (i.e., participants that often change their walking position revealed higher BoundingBox values).

The last value that was measured is denoted with AngleMean and constitutes a derived value of the angle, which is denoted with AngleMean. The latter denotes the rotation between the current position and the starting position of a participant at a frame rate of 60 per second. Based on this, the average rotation speed in degrees per frame was calculated. Derived on this value, the erraticness of the rotation using the variance was calculated, which is denoted as AngleVariance.

To summarize the purposes of the calculated path and angle values, the path indicates whether users walk much or not. If they are not walking much, it might indicate their lack of orientation. The angle, in turn, should indicate whether participants make quick or sudden head movements. If they are doing sudden head movements at multiple times, this might indicate again a lack of orientation.

For the cluster detection task, the following performance measures were defined: time and errors. Time was recorded until the point in time at which participants reported how many clusters they have detected. This performance measure indicates how long participants needed to find clusters. Time is denoted as Time (in milliseconds). Errors are identified in the sense of a binary decision (true/false). Either the number of reported clusters was correct (true) or not correct (false). Errors are denoted with errors.

The state version of State-Trait Anxiety Inventory (STAI) questionnaire³¹ was used to measure the state anxiety, a construct similar to state stress. The questionnaire comprises 20 items and was handed out before the study started, as well as afterwards to evaluate the changes in the state anxiety. For the evaluation of this questionnaire, all positive attributes were flipped (e.g., an answer '4' becomes a '1'), and all answers are summed up to a final STAI score. The skin conductance was measured for 30 randomly selected participants by using the skin conductance measurement device (see Table of Materials)³³.

After the two task types have been accomplished, a self-developed questionnaire was handed out at the end of the study to ask for participant's feedback. The questionnaire is shown in Table 1. Furthermore, a demographic questionnaire asked about gender, age, and education of all participants.

Overall Study Procedure and Study Information
The overall conducted study procedure is illustrated in Figure 9. 60 participants joined the study. The participants were mostly recruited at Ulm University and software companies from Ulm. The participating students were mainly from the fields of computer science, psychology, and physics. Ten were female and 50 were male.

Based on the mental rotation pretest, 31 were categorized as low performers, while 29 were categorized as high performers. Specifically, 7 females and 24 males were categorized as low performers, while 3 females and 26 males were categorized as high performers. For the statistical evaluations, 3 software tools were used (see Table of Materials).

Frequencies, percentages, means, and standard deviations were calculated as descriptive statistics. Low and high performers were compared in baseline demographic variables using Fisher's exact tests and t-Tests for independent samples. For RQ1 -RQ5, linear multilevel models with the full maximum likelihood estimation were performed. Two levels were included, where level one represents the repeated assessments (either in outlier detection or cluster recognition), and level two the participants. The performance measures (except errors) were the dependent variables in these models. In RQ 1, also Fisher's exact tests for the error probabilities were used. In RQ3, performance in time in spatial sounds versus no sounds were investigated (sound vs. no-sound was included as predictor in the models). The STAI scores were evaluated using t-Tests for dependent samples for RQ4. In RQ5, the effect of the 2D application versus the used mixed reality smartglasses (see table of materials) was investigated, using McNemar's test for the error probability. All statistical tests were performed two tailed; the significance value was set to P<.05.

The skin conductance results have not been analyzed and are subject to future work. Importantly, the authors revealed in another study with the same device that additional considerations are required²⁴.

For the mental rotation test, the differences of the mental rotation test results between participants were used to distinguish low from high performers. For the spatial ability test, all participants showed good scores and therefore were all categorized to high performers with respect to their spatial abilities.

At first, important results of the participants are summarized: Low and high performers in mental rotation showed no differences in their baseline variables (gender, age, and education). Descriptively, the low performers had a higher percentage of female participants than high performers and high performers were younger than low performers. Table 2 summarizes the characteristics about the participants.

Regarding results for RQ1, for the cluster recognition task, low and high performers did not differ significantly for the 2D application (4 errors for low and 2 errors for high performers) and the 3D approach (8 errors for low and 2 errors for high performers). For the outlier’s detection task, high performers were significantly faster than low performers. In addition, high performers required a shorter walking distance to solve the tasks. For the outlier’s task, Table 3 summarizes the detailed results.

Regarding results for RQ2, significant results emerged only for the outlier’s detection task. The BoundingBox, the PathLength, the PathVariance, the PathMean, the Angle-Variance, and the AngleMean increased significantly from task to task (see Table 4). The recorded time, in turn, did not change significantly from task to task using the mixed reality smartglasses (see Table of Materials).

Regarding results for RQ3, based on the spatial sounds, the participants were able to solve the tasks in the outlier detection case quicker than without using spatial sounds (see Table 5).

Regarding results for RQ4, at the pre-assessment, the average state on the STAI scores were M = 44.58 (SD = 4.67). At post-assessment, it was M = 45.72 (SD = 4.43). This change did not attain statistical significance (p = .175). Descriptive statistics of the answers in the self-developed questionnaire are presented in Figure 10.

Regarding results for RQ5, the mixed reality smartglasses (see Table of Materials) approach indicates significantly faster cluster recognition times than using a desktop computer (see Table 6). However, the speed advantage when using the mixed reality smartglasses (see Table of Materials) was rather small (i.e., in a milliseconds range).

Finally, note that the data of this study can be found at³⁶.

Figure 1: Investigated Aspects on the scale Interaction versus Novelty. The figure shows the used measures and their novelty with respect to the interaction level. Please click here to view a larger version of this figure.

Figure 2: Pictures of the study room. Two pictures of the study room are presented. Please click here to view a larger version of this figure.

Figure 3: Detected Outlier. The screenshot shows a detected outlier. Please click here to view a larger version of this figure.

Figure 4: Example of the mental rotation test. The screenshot shows the 3D-objects participants were confronted with; i.e., two out of five objects in different positions with the same object structure had to bet detected. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

Figure 5: Setting for the Spatial Ability Test. In (A), the audio configuration for the task Back is shown, while, in (B), the schematic user interface of the test is shown. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

Figure 6: Illustration of the Setting for the Task Outlier’s Detection. Three major aspects are shown. First, the outliers are illustrated. Second, performance measures are shown. Third, the way how the sound support was calculated is shown. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

Figure 7: Illustration of the Setting for the Task Cluster Recognition. Consider the scenarios A-C for a better impression, participants had to change their gaze to identify clusters correctly. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

Figure 8: Illustration of the Setting for the Task Cluster Recognition in Matlab. The figure illustrates clusters provided in Matlab, which was the basis for the 2D desktop application. Please click here to view a larger version of this figure.

Figure 9: Overall Study Procedure at a Glance. This figure presents the steps participants had to accomplish, in their chronological order. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

Figure 10: Results of the self-developed questionnaire (see Table 1). The results are shown using box plots. This figure has been modified based on this work³⁵. Please click here to view a larger version of this figure.

#Question	Question	Target	Scale	Meaning
1	As how stressful did you experience wearing the glasses?	Wearing	1-10	10 means high, 1 means low
2	How stressful was the outlier’s task?	Outliers	1-10	10 means high, 1 means low
3	As how stressful did you experience the spatial sounds?	Sound	1-10	10 means high, 1 means low
4	How stressful was the task finding clusters in Mixed Reality?	Cluster MR	1-10	10 means high, 1 means low
5	How stressful was the task finding clusters in the desktop approach?	Cluster DT	1-10	10 means high, 1 means low
6	How stressful was the usage of the voice commands?	Voice	1-10	10 means high, 1 means low
7	Did you feel supported by the spatial sounds?	Sound	1-10	10 means high, 1 means low

Table 1: Self-developed questionnaire for user feedback. It comprises 7 question. For each question, participants had to determine a value within a scale from 1-10, whereby 1 means a low value (i.e., bad feedback), and 10 a high value (i.e., a very good feedback).

Variable	Low performer (n=31)	High performer	P Value
	(n=31)	(n=29)
Gender, n(%)
Female	7 (23%)	3 (10%)
Male	24 (77%)	26 (90%)	.302 (a)
Age Category, n(%)
<25	1 (3%)	5 (17%)
25-35	27 (87%)	21 (72%)
36-45	0 (0%)	2 (7%)
46-55	1 (3%)	0 (0%)
>55	2 (6%)	1 (3%)	.099 (a)
Highest Education, n(%)
High School	3 (10%)	5 (17%)
Bachelor	7 (23%)	6 (21%)
Master	21 (68%)	18 (62%)	.692 (a)
Mental Rotation Test, Mean (SD)
Correct Answers	3.03 (1.40)	5.31 (0.76)	.001 (b)
Wrong Answers	2.19 (1.47)	1.21 (0.56)	.000 (b)
Spatial Hearing Test, Mean (SD) ©
Correct Answers	4.39 (1.09)	4.31 (1.00)	.467 (b)
Wrong Answers	1.61 (1.09)	1.69 (1.00)	.940 (b)
a:Fisher’s Exact Test
b:Two-sample t-test
c: SD Standard Deviation

Table 2: Participant sample description and comparison between low and high performers in baseline variables. The table shows data to the three demographic questions on gender, age, and education. In addition, the results of the two pretests are presented.

Variable	Estimate	SE (a)	Result
BoundingBox for low performer across tasks	2,224	.438	t(60.00) = 5.08; p<.001
Alteration of BoundingBox for high performer across tasks	+.131	.630	t(60.00) = .21; p=.836
Time for low performer across tasks	20,919	1,045	t(60.00) = 20.02; p<.001
Alteration of Time for high performer across tasks	-3,863	1,503	t(60.00) = -2.57; p=.013
Pathlength for low performer across tasks	5,637	.613	t(60.00) = 9.19; p<.001
Alteration of Pathlength for high performer across tasks	-1,624	.882	t(60.00) = -1.84; p=.071
PathVariance for low performer across tasks	4.3E-4	4.7E-5	t(65.15) = 9.25; p<.001
Alteration of PathVariance for high performer across tasks	+4.3E-6	6.7E-5	t(65.15) = .063; p=.950
PathMean for low performer across tasks	.0047	5.3E-4	t(60.00) = 8.697; p<.001
Alteration of PathMean for high performer across tasks	+3.8E-5	7.7E-4	t(60.00) = .05; p=.960
AngleVariance for low performer across tasks	.0012	7.3E-5	t(85.70) = 16.15; p<.001
Alteration of AngleVariance for high performer across tasks	-2.7E-5	1.0E-4	t(85.70) = -.26; p=.796
AngleMean for low performer across tasks	.015	.001	t(60.00) = 14.27; p<.001
Alteration of AngleMean for high performer across tasks	-3.0E-4	1.5E-3	t(60.00) = -.20; p=.842
(a) SE = Standard Error

Table 3: Results of the Multilevel Models for RQ1 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ1 for the outlier’s detection task (for all performance measures).

Variable	Estimate	SE (a)	Result
BoundingBox at first task	.984	.392	t(138.12) = 2.51; p=.013
Alteration of BoundingBox from task to task	+.373	.067	t(420.00) = 5.59; p<.001
Time at first task	19,431	1,283	t(302.08) = 15.11; p<.001
Alteration of Time from task to task	-.108	.286	t(420.00) = -.37; p=.709
Pathlength at first task	3,903	.646	t(214.81) = 6.05; p<.001
Alteration of Pathlength from task to task	+.271	.131	t(420.00) = 2.06; p=.040
PathVariance at first task	3.1E-4	3.7E-5	t(117.77) = 8.43; p<.001
Alteration of PathVariance from task to task	+3.5E-5	4.5E-6	t(455.00) = 7.90; p<.001
PathMean at first task	.0033	4.2E-4	t(88.98) = 7.66; p<.001
Alteration of PathMean from task to task	+4.1E-4	5.2E-5	t(420.00) = 7.81; p<.001
AngleVariance at first task	.001	5.7E-5	t(129.86) = 17.92; p<.001
Alteration of AngleVariance from task to task	+4.1E-5	6.5E-6	t(541.75) = 6.34; p<.001
AngleMean at first task	.0127	8.1E-4	t(82.17) = 15.52; p<.001
Alteration of AngleMean from task to task	+6.1E-4	9.0E-5	t(420.00) = 6.86; p<.001
(a) SE = Standard Error

Table 4: Results of the Multilevel Models for RQ2 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ2 for the outlier’s detection task (for all performance measures).

Variable	Estimate	SE (a)	Result
BoundingBox without sound across tasks	2,459	.352	t(93.26) = 6.98; p<.001
Alteration of BoundingBox with sound across tasks	-.344	.316	t(420.00) = -1.09; p=.277
Time without sound across tasks	20,550	1,030	t(161.17) = 19.94; p<.001
Alteration of time with sound across tasks	-2,996	1,319	t(420.00) = -2.27; p=.024
Pathlength without sound across tasks	5,193	.545	t(121.81) = 9.54; p<.001
Alteration of Pathlength with sound across tasks	-.682	.604	t(420.00) = -1.13; p=.260
PathVariance without sound across tasks	.0004	3.5E-5	t(79.74) = 12.110; p<.001
Alteration of PathVariance with sound across tasks	+1.3E-5	2.2E-5	t(429.20) = .592; p=.554
PathMean without sound across tasks	.005	4.0E-4	t(73.66) = 11.35; p<.001
Alteration of PathMean with sound across tasks	+1.4E-4	2.5E-4	t(420.00) = .56; p=.575
AngleVariance without sound across tasks	.0012	5.4E-5	t(101.32) = 21.00; p<.001
Alteration of AngleVariance with sound across tasks	+3.3E-5	3.1E-5	t(648.56) = 1.07; p=.284
AngleMean without sound across tasks	.0145	7.8E-4	t(70.17) = 18.51; p<.001
Alteration of AngleMean with sound across tasks	+6.0E-4	4.3E-4	t(420.00) = 1.39; p=.166
(a) SE = Standard Error

Table 5: Results of the Multilevel Models for RQ3 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ3 for the outlier’s detection task (for all performance measures).

Variable	Estimate	SE (a)	Result
Time with desktop across tasks	10,536	.228	t(156.43) = 46.120; p<.001
Alteration of time with Hololens across tasks	-.631	.286	t(660.00) = -2.206; p=.028
(a) SE = Standard Error

Table 6: Results of the Multilevel Models for RQ5 (Cluster Recognition Using the Smartglasses). The table shows statistical results of RQ5 for the cluster recognition task (for all performance measures).

Subscription Required. Please recommend JoVE to your librarian.

Discussion

Regarding the developed mixed reality smartglasses (see Table of Materials) application, two aspects were particularly beneficial. The use of spatial sounds for the outlier’s detection task was positively perceived on one hand (see the results of RQ3). On the other, the use of voice commands was also perceived positively (see Figure 10).

Regarding the study participants, although the number of recruited participants was rather small for an empirical study, the number is competitive compared to many other works. Nevertheless, a larger-scale study is planned based on the shown protocol. However, as it showed its feasibility for 60 participants, more participants are expected to reveal no further challenges. It was discussed that the selection of participants could be broader (in the sense of the fields the participants are coming from) and that the number of baseline variables to distinguish between high and low performers could be higher. On the other, if these aspects are changed to higher numbers, the protocol itself has not to be changed profoundly.

In general, the revealed limitations do not affect the conduction of a study based on the protocol shown in this work, they only affect the recruitment and the used questions for the demographic questionnaire. However, one limitation of this study is nevertheless important: the overall required time to finish the experiment for one participant is high. On the other hand, as the participants did not complain about the wearing comfort, or that the test device is burdening them too much, the time of conducting the overall protocol for one participant can be considered to be acceptable. Finally, in a future experiment, several aspects have to be added to the protocol. In particular, the outlier detection task should also be evaluated in the 2D desktop application. Furthermore, other hardware devices like the used mixed reality smartglasses (see Table of Materials) must be also evaluated. However, the protocol seems to be beneficial in a broader sense.

The following major insights were gained for the presented protocol. First, it showed its feasibility for evaluating immersive analytics for a mixed-reality solution. Specifically, the used mixed reality smartglasses (see Table of Materials) revealed their feasibility to evaluate immersive analytics in a mixed-reality application for Industry 4.0 scenarios. Second, the comparison of the developed used mixed reality smartglasses (see Table of Materials) application with a 2D desktop application was helpful to investigate whether the mixed-reality solution can outperform an application that does not make use of VR techniques. Third, the measurement of physiological parameters or vital signs should be always considered in such experiments. In this work, stress was measured using a questionnaire and a skin conductance device. Although the latter worked technically properly, the authors revealed in another study with the same device that additional considerations are required²⁴. Fourth, the spatial ability test and the separation of high and low performers was advantageous. In summary, although the presented protocol seems to be complex at a first glance (see Figure 9), it showed its usefulness technically. Regarding the results, it also revealed its usefulness.

As the detection of outliers and the recognition of clusters are typical tasks in the evaluation of many high-dimensional data sets in Industry 4.0 scenarios, their use in an empirical study is representative for this field of research. The protocol showed that these scenarios can be well-integrated in a usability study on immersive analytics. Therefore, the used setting can be recommended for other studies in this context.

As the outcome of the shown study showed that the use of a mixed-reality solution based on the utilized smartglasses (see Table of Materials) is useful to investigate immersive analytics for Industry 4.0 scenarios, the protocol might be used for other usability studies in the given context as well.

Subscription Required. Please recommend JoVE to your librarian.

Disclosures

The authors have nothing to disclose.

Acknowledgments

The authors have nothing to acknowledge.

Materials

Name	Company	Catalog Number	Comments
edaMove	movisens
HoloLens	Microsoft
Matlab R2017a	MathWorks
RPY2	GNU General Public License v2 or later (GPLv2+) (GPLv2+)		https://pypi.org/project/rpy2/
SPSS 25.0	IBM

DOWNLOAD MATERIALS LIST

References

Korinth, M., Sommer-Dittrich, T., Reichert, M., Pryss, R. Design and Evaluation of a Virtual Reality-Based Car Configuration Concept. Science and Information Conference. , Springer, Cham. 169-189 (2019).
Whalen, T. E., Noël, S., Stewart, J. Measuring the human side of virtual reality. IEEE International Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2003. , IEEE. 8-12 (2003).
Martens, M. A., et al. It feels real: physiological responses to a stressful virtual reality environment and its impact on working memory. Journal of Psychopharmacology. 33 (10), 1264-1273 (2019).
Keim, D. A. Information visualization and visual data mining. IEEE transactions on Visualization and Computer Graphics. 8 (1), 1-8 (2002).
Dwyer, T., et al. Immersive analytics: An introduction. Immersive analytics. , Springer, Cham. 1-23 (2018).
Moloney, J., Spehar, B., Globa, A., Wang, R. The affordance of virtual reality to enable the sensory representation of multi-dimensional data for immersive analytics: from experience to insight. Journal of Big Data. 5 (1), 53 (2018).
Davis, F. D., Riedl, R., Vom Brocke, J., Léger, P. M., Randolph, A. B. Information Systems and Neuroscience. , Springer. (2018).
Huckins, J. F., et al. Fusing mobile phone sensing and brain imaging to assess depression in college students. Frontiers in Neuroscience. 13, 248 (2019).
Preece, J., et al. Human-computer interaction. , Addison-Wesley Longman Ltd. (1994).
Card, S. K. The psychology of human-computer interaction. , CRC Press. (2018).
Pelayo, S., Senathirajah, Y. Human factors and sociotechnical issues. Yearbook of Medical Informatics. 28 (01), 078-080 (2019).
Bangor, A., Kortum, P., Miller, J. Determining what individual SUS scores mean: adding an adjective rating scale. Journal of Usability Studies. 4 (3), 114-123 (2009).
Krahmer, E., Ummelen, N. Thinking about thinking aloud: A comparison of two verbal protocols for usability testing. IEEE Transactions on Professional Communication. 47 (2), 105-117 (2004).
Hornbæk, K. Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies. 64 (2), 79-102 (2006).
Peppa, V., Lysikatos, S., Metaxas, G. Human-Computer interaction and usability testing: Application adoption on B2C websites. Global Journal of Engineering Education. 14 (1), 112-118 (2012).
Alwashmi, M. F., Hawboldt, J., Davis, E., Fetters, M. D. The iterative convergent design for mobile health usability testing: mixed-methods approach. JMIR mHealth and uHealth. 7 (4), 11656 (2019).
System Usability Scale (SUS). Assistant Secretary for Public Affairs. , Available from: https://www.hhs.gov/about/agencies/aspa/how-to-and-tools/methods/system-usability-scale.html (2013).
Fang, Y. M., Lin, C. The Usability Testing of VR Interface for Tourism Apps. Applied Sciences. 9 (16), 3215 (2019).
Pryss, R., et al. Exploring the Time Trend of Stress Levels While Using the Crowdsensing Mobile Health Platform, TrackYourStress, and the Influence of Perceived Stress Reactivity: Ecological Momentary Assessment Pilot Study. JMIR mHealth and uHealth. 7 (10), 13978 (2019).
Zugal, S., et al. Investigating expressiveness and understandability of hierarchy in declarative business process models. Software & Systems Modeling. 14 (3), 1081-1103 (2015).
Schobel, J., et al. Learnability of a configurator empowering end users to create mobile data collection instruments: usability study. JMIR mHealth and uHealth. 6 (6), 148 (2018).
Schobel, J., Probst, T., Reichert, M., Schickler, M., Pryss, R. Enabling Sophisticated Lifecycle Support for Mobile Healthcare Data Collection Applications. IEEE Access. 7, 61204-61217 (2019).
Hoppenstedt, B., et al. Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data. Sensors. 19 (18), 3903 (2019).
Winter, M., Pryss, R., Probst, T., Reichert, M. Towards the Applicability of Measuring the Electrodermal Activity in the Context of Process Model Comprehension: Feasibility Study. Sensors. 20, 4561 (2020).
Butscher, S., Hubenschmid, S., Müller, J., Fuchs, J., Reiterer, H. Clusters, trends, and outliers: How immersive technologies can facilitate the collaborative analysis of multidimensional data. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. , 1-12 (2018).
Wagner Filho, J. A., Rey, M. F., Freitas, C. M. D. S., Nedel, L. Immersive analytics of dimensionally-reduced data scatterplots. 2nd Workshop on Immersive Analytics. , (2017).
Batch, A., et al. There is no spoon: Evaluating performance, space use, and presence with expert domain users in immersive analytics. IEEE Transactions on Visualization and Computer Graphics. 26 (1), 536-546 (2019).
Cliquet, G., Perreira, M., Picarougne, F., Prié, Y., Vigier, T. Towards hmd-based immersive analytics. HAL. , Available from: https://hal.archives-ouvertes.fr/hal-01631306 (2017).
Luboschik, M., Berger, P., Staadt, O. On spatial perception issues in augmented reality based immersive analytics. Proceedings of the 2016 ACM Companion on Interactive Surfaces and Spaces. , 47-53 (2016).
Fonnet, A., Prié, Y. Survey of Immersive Analytics. IEEE Transactions on Visualization and Computer. , (2019).
Spielberger, C. D., Gorsuch, R. L., Lushene, R. E. STAI Manual for the Stait-Trait Anxiety Inventory (self-evaluation questionnaire). Consulting Psychologist. 22, Palo Alto, CA, USA. 1-24 (1970).
Vandenberg, S. G., Kuse, A. R. Mental rotations, a group test of three-dimensional spatial visualization. Perceptual Motor Skills. 47 (2), 599-604 (1978).
Härtel, S., Gnam, J. P., Löffler, S., Bös, K. Estimation of energy expenditure using accelerometers and activity-based energy models-Validation of a new device. European Review of Aging and Physical Activity. 8 (2), 109-114 (2011).
Gautier, L. RPY2: A Simple and Efficient Access to R from Python. , Available from: https://sourceforge.net/projects/rpy/ (2020).
Hoppenstedt, B., et al. Applicability of immersive analytics in mixed reality: Usability study. IEEE Access. 7, 71921-71932 (2019).
Hoppenstedt, B. Applicability of Immersive Analytics in Mixed Reality: Usability Study. IEEE Dataport. , (2019).

Engineering