This protocol delineates the technical setting of a developed mixed reality application that is used for immersive analytics. Based on this, measures are presented, which were used in a study to gain insights into usability aspects of the developed technical solution.
In medicine or industry, the analysis of high-dimensional data sets is increasingly required. However, available technical solutions are often complex to use. Therefore, new approaches like immersive analytics are welcome. Immersive analytics promise to experience high-dimensional data sets in a convenient manner for various user groups and data sets. Technically, virtual-reality devices are used to enable immersive analytics. In Industry 4.0, for example, scenarios like the identification of outliers or anomalies in high-dimensional data sets are pursued goals of immersive analytics. In this context, two important questions should be addressed for any developed technical solution on immersive analytics: First, is the technical solutions being helpful or not? Second, is the bodily experience of the technical solution positive or negative? The first question aims at the general feasibility of a technical solution, while the second one aims at the wearing comfort. Extant studies and protocols, which systematically address these questions are still rare. In this work, a study protocol is presented, which mainly investigates the usability for immersive analytics in Industry 4.0 scenarios. Specifically, the protocol is based on four pillars. First, it categorizes users based on previous experiences. Second, tasks are presented, which can be used to evaluate the feasibility of the technical solution. Third, measures are presented, which quantify the learning effect of a user. Fourth, a questionnaire evaluates the stress level when performing tasks. Based on these pillars, a technical setting was implemented that uses mixed reality smartglasses to apply the study protocol. The results of the conducted study show the applicability of the protocol on the one hand and the feasibility of immersive analytics in Industry 4.0 scenarios on the other. The presented protocol includes a discussion of discovered limitations.
Virtual-reality solutions (VR solutions) are increasingly important in different fields. Often, with VR solutions (including Virtual Reality, Mixed Reality, and Augmented Reality), the accomplishment of many daily tasks and procedures shall be eased. For example, in the automotive domain, the configuration procedure of a car can be supported by the use of Virtual Reality1 (VR). Researchers and practitioners have investigated and developed a multitude of approaches and solutions in this context. However, studies that investigate usability aspects are still rare. In general, the aspects should be considered in the light of two major questions. First, it must be evaluated whether a VR solution actually outperforms an approach that does not make use of VR techniques. Second, as VR solutions are mainly relying on heavy and complex hardware devices, parameters like the wearing comfort and mental effort should be investigated more in-depth. In addition, the mentioned aspects should always be investigated with respect to the application field in question. Although many extant approaches see the needs to investigate these questions2, less studies exist that have presented results.
A research topic in the field of VR, which is currently important, is denoted with immersive analytics. It is derived from the research field of visual analytics, which tries to include the human perception into analytics tasks. This process is also well-known as visual data mining4. Immersive analytics includes topics from the fields of data visualization, visual analytics, virtual reality, computer graphics, and human-computer interaction5. Recent advantages in head-mounted displays (HMD) led to improved possibilities for exploring data in an immersive way. Along these trends, new challenges and research questions emerge, like the development of new interaction systems, the need to investigate user fatigue, or the development of sophisticated 3D visualizations6. In a previous publication6, important principles of immersive analytics are discussed. In the light of big data, methods like immersive analytics are more and more needed to enable a better analysis of complex data pools. Only a few studies exist that investigate usability aspects of immersive analytics solutions. Furthermore, the domain or field in question should also be considered in such studies. In this work, an immersive analytics prototype was developed, and based on that, a protocol, which investigates the developed solution for Industry 4.0 scenarios. The protocol thereby exploits the experience method2, which is based on subjective, performance, and physiological aspects. In the protocol at hand, the subjective aspects are measured through perceived stress of the study users. Performance, in turn, is measured through the required time and errors that are made to accomplish analysis tasks. Finally, a skin conductance sensor measured physiological parameters. The first two measures will be presented in this work, while the measured skin conductance requires further efforts to be evaluated.
The presented study involves several research fields, particularly including neuroscience aspects and information systems. Interestingly, considerations on neuroscience aspects of information systems have recently garnered attention of several research groups7,8, showing the demand to explore the use of IT systems also from a cognitive viewpoint. Another field that is relevant for this work constitutes the investigation of human factors of information systems9,10,11. In the field of human-computer interaction, instruments exist to investigate the usability of a solution. Note that the System Usability Scale is mainly used in this context12. Thinking Aloud Protocols13 are another widely used study technique to learn more about the use of information systems. Although many approaches exist to measure usability aspects of information systems, and some of them have been presented long ago14, still questions emerge that require to investigate new measures or study methods. Therefore, research in this field is very active12,15,16.
In the following, the reasons will be discussed why two prevalently used methods have not been considered in the current work. First, the System Usability Scale was not used. The scale is based on ten questions17 and its use can be found in several other VR studies18 as well. As this study mainly aims at the measurement of stress19, a stress-related questionnaire was more appropriate. Second, no Thinking Aloud Protocol20 was used. Although this protocol type has shown its usefulness in general13, it was not used here as the stress level of study users might increase only due to the fact that the think aloud session must be accomplished in parallel to the use of a heavy and complex VR device. Although these two techniques have not been used, results of other recent studies have been incorporated in the study at hand. For example, in previous works21,22, the authors distinguish between novices and experts in their studies. Based on the successful outcome of these studies, the protocol at hand utilizes this presented separation of study users. The stress measurement, in turn, is based on ideas of the following works15,19,21,22.
At first, for conducting the study, a suitable Industry 4.0 scenario must be found for accomplishing analytical tasks. Inspired by another work of the authors23, two scenarios (i.e., the analysis tasks) have been identified, (1) Detection of Outliers, and (2) Recognition of Clusters. Both scenarios are challenging, and are highly relevant in the context of the maintenance of high-throughput production machines. Based on this decision, six major considerations have driven the study protocol presented in this work:
Based on the six mentioned points, the study protocol incorporates the following procedure. Outlier Detection and Cluster Recognition Analysis tasks have to be accomplished in an immersive way using mixed reality smartglasses (see Table of Materials). Therefore, a new application was developed. Spatial sounds shall ease the performing of analysis tasks without increasing the mental effort. A voice feature shall ease the navigation used for the developed application of the mixed reality smartglasses (see Table of Materials). A mental rotation test shall be the basis to distinguish novices from advanced users. The stress level is measured based on a questionnaire. Performance, in turn, is evaluated based on the (1) time a user requires for the analysis tasks, and based on the (2) errors that were made by a user for the analysis tasks. The performance in mixed reality smartglass is compared with the accomplishment of the same tasks in a newly developed and comparable 2D desktop application. In addition, a skin conductance device is used to measure the skin conductance level as a possible indicator for stress. Results to this measurement are subject to further analysis and will not be discussed in this work. The authors revealed in another study with the same device that additional considerations are required24.
Based on this protocol, the following five research questions (RQs) are addressed:
RQ1: Do spatial imagination abilities of the participants affect the performance of tasks significantly?
RQ2: Is there a significant change of task performance over time?
RQ3: Is there a significant change of task performance when using spatial sounds in the immersive analytics solution?
RQ4: Is the developed immersive analytics perceived stressful by the users?
RQ5: Do users perform better when using an immersive analytics solution compared to an 2D approach?
Figure 1 summarizes the presented protocol with respect to two scales. It shows the developed and used measures and their novelty with respect to the level of interaction. As the interaction level constitutes an important aspect when developing features for a VR setting, Figure 1 shall better show the novelty of the entire protocol developed in this work. Although the evaluation of the aspects within the two used scales is subjective, their overall evaluation is based on the current related work and the following major considerations: One important principle constitutes the use of abstractions of an environment for a natural interaction, in which the user has become attuned to. With respect to the protocol at hand, the visualization of point clouds seems to be intuitive for users and the recognition of patterns in such clouds has been recognized as a manageable task in general. Another important principle constitutes to overlay affordances. Hereby, the use of spatial sounds as used in the protocol at hand is an example, as they correlate with the proximity of a searched object. The authors recommend to tune the representations in a way that most information is located in the intermediate zone, which is most important for human perception. The reason why the authors did not include this principle was to encourage the user to find the best spot by themselves as well as to try to orientate themselves in a data visualization space, which is too large to be shown at once. In the presented approach, no further considerations of the characteristics of the 3D data to be shown were made. For example, if a dimension is assumed to be temporal, scatterplots could have been shown. The authors consider this kind of visualization generally interesting in the context of Industry 4.0. However, it has to been focused on a reasonably small set of visualizations. Moreover, a previous publication already focused on the collaborative analysis of data. In this work, this research question was excluded due to complexity of the other addressed issues in this study. In the presented setup here, the user is able to explore the immersive space by walking around. Other approaches offer controllers to explore the virtual space. In this study, the focus is set on the usability by using the System Usability Scale (SUS). Another previous publication has conducted a study for economic experts, but with VR headsets. In general, and most importantly, this study complains about the limited field of view for other devices like the used mixed reality smartglasses in this work (see Table of Materials). Their findings show that beginners in the field of VR were able to use the analytic tool efficiently. This matches with the experiences of this study, although in this work beginners were not classified to have VR or gaming experiences. In contrast to most VR solutions, mixed reality is not fixed to a position as it allows to track the real environment. VR approaches such as mention the use of special chairs for a 360° experience to free the user from his desktop. The authors of indicate that perception issues influence the performance of immersive analytics; for example, by using shadows. For the study at hand, this is not feasible, as the used mixed reality smartglasses (see table of materials) are not able to display shadows. A workaround could be a virtual floor, but such a setup was out of the scope of this study. A survey study in the field of immersive analytics identified 3D scatterplots as one of the most common representations of multi-dimensional data. Altogether, the aspects shown in Figure 1 cannot be found currently compiled to a protocol that investigates usability aspects of immersive analytics for Industry 4.0 scenarios.
All materials and methods were approved by the Ethics Committee of Ulm University, and were carried out in accordance with the approved guidelines. All participants gave their written informed consent.
1. Establish Appropriate Study Environment
NOTE: The study was conducted in a controlled environment to cope with the complex hardware setting. The used mixed reality smartglasses (see Table of Materials) and the laptop for the 2D application were explained to the study participants.
2. Study Protocol for Participants
Setting up Measures for the Experiment
For the outlier detection task, the following performance measures were defined: time, path, and angle. See Figure 6 for the measurements.
Time was recorded until a red-marked point (i.e., the outlier) was found. This performance measure indicates how long a participant needed to find the red-marked point. Time is denoted as the variable "time" (in milliseconds) in the results.
While participants tried to find the red-marked point, their walking path length was determined. The basis of this calculation was that the used mixed reality smartglasses (see Table of Materials) collect the current position as a 3D vector relatively to the starting position at a frame rate of 60 frames per second. Based on this, the length of path a participant had walked could be calculated. This performance measure indicates whether participants walked a lot or not. Path is denoted as PathLength in the results. Based on the PathLength, three more performance measures were derived: PathMean, PathVariance, and BoundingBox. PathMean denotes the average speed of participants in meter per frame, PathVariance the erraticness of a movement, and BoundingBox denotes whether participants had intensively used their bounding box. The latter is determined based on the maximum and minimum positions of all movements (i.e., participants that often change their walking position revealed higher BoundingBox values).
The last value that was measured is denoted with AngleMean and constitutes a derived value of the angle, which is denoted with AngleMean. The latter denotes the rotation between the current position and the starting position of a participant at a frame rate of 60 per second. Based on this, the average rotation speed in degrees per frame was calculated. Derived on this value, the erraticness of the rotation using the variance was calculated, which is denoted as AngleVariance.
To summarize the purposes of the calculated path and angle values, the path indicates whether users walk much or not. If they are not walking much, it might indicate their lack of orientation. The angle, in turn, should indicate whether participants make quick or sudden head movements. If they are doing sudden head movements at multiple times, this might indicate again a lack of orientation.
For the cluster detection task, the following performance measures were defined: time and errors. Time was recorded until the point in time at which participants reported how many clusters they have detected. This performance measure indicates how long participants needed to find clusters. Time is denoted as Time (in milliseconds). Errors are identified in the sense of a binary decision (true/false). Either the number of reported clusters was correct (true) or not correct (false). Errors are denoted with errors.
The state version of State-Trait Anxiety Inventory (STAI) questionnaire31 was used to measure the state anxiety, a construct similar to state stress. The questionnaire comprises 20 items and was handed out before the study started, as well as afterwards to evaluate the changes in the state anxiety. For the evaluation of this questionnaire, all positive attributes were flipped (e.g., an answer '4' becomes a '1'), and all answers are summed up to a final STAI score. The skin conductance was measured for 30 randomly selected participants by using the skin conductance measurement device (see Table of Materials)33.
After the two task types have been accomplished, a self-developed questionnaire was handed out at the end of the study to ask for participant's feedback. The questionnaire is shown in Table 1. Furthermore, a demographic questionnaire asked about gender, age, and education of all participants.
Overall Study Procedure and Study Information
The overall conducted study procedure is illustrated in Figure 9. 60 participants joined the study. The participants were mostly recruited at Ulm University and software companies from Ulm. The participating students were mainly from the fields of computer science, psychology, and physics. Ten were female and 50 were male.
Based on the mental rotation pretest, 31 were categorized as low performers, while 29 were categorized as high performers. Specifically, 7 females and 24 males were categorized as low performers, while 3 females and 26 males were categorized as high performers. For the statistical evaluations, 3 software tools were used (see Table of Materials).
Frequencies, percentages, means, and standard deviations were calculated as descriptive statistics. Low and high performers were compared in baseline demographic variables using Fisher's exact tests and t-Tests for independent samples. For RQ1 -RQ5, linear multilevel models with the full maximum likelihood estimation were performed. Two levels were included, where level one represents the repeated assessments (either in outlier detection or cluster recognition), and level two the participants. The performance measures (except errors) were the dependent variables in these models. In RQ 1, also Fisher's exact tests for the error probabilities were used. In RQ3, performance in time in spatial sounds versus no sounds were investigated (sound vs. no-sound was included as predictor in the models). The STAI scores were evaluated using t-Tests for dependent samples for RQ4. In RQ5, the effect of the 2D application versus the used mixed reality smartglasses (see table of materials) was investigated, using McNemar's test for the error probability. All statistical tests were performed two tailed; the significance value was set to P<.05.
The skin conductance results have not been analyzed and are subject to future work. Importantly, the authors revealed in another study with the same device that additional considerations are required24.
For the mental rotation test, the differences of the mental rotation test results between participants were used to distinguish low from high performers. For the spatial ability test, all participants showed good scores and therefore were all categorized to high performers with respect to their spatial abilities.
At first, important results of the participants are summarized: Low and high performers in mental rotation showed no differences in their baseline variables (gender, age, and education). Descriptively, the low performers had a higher percentage of female participants than high performers and high performers were younger than low performers. Table 2 summarizes the characteristics about the participants.
Regarding results for RQ1, for the cluster recognition task, low and high performers did not differ significantly for the 2D application (4 errors for low and 2 errors for high performers) and the 3D approach (8 errors for low and 2 errors for high performers). For the outlier’s detection task, high performers were significantly faster than low performers. In addition, high performers required a shorter walking distance to solve the tasks. For the outlier’s task, Table 3 summarizes the detailed results.
Regarding results for RQ2, significant results emerged only for the outlier’s detection task. The BoundingBox, the PathLength, the PathVariance, the PathMean, the Angle-Variance, and the AngleMean increased significantly from task to task (see Table 4). The recorded time, in turn, did not change significantly from task to task using the mixed reality smartglasses (see Table of Materials).
Regarding results for RQ3, based on the spatial sounds, the participants were able to solve the tasks in the outlier detection case quicker than without using spatial sounds (see Table 5).
Regarding results for RQ4, at the pre-assessment, the average state on the STAI scores were M = 44.58 (SD = 4.67). At post-assessment, it was M = 45.72 (SD = 4.43). This change did not attain statistical significance (p = .175). Descriptive statistics of the answers in the self-developed questionnaire are presented in Figure 10.
Regarding results for RQ5, the mixed reality smartglasses (see Table of Materials) approach indicates significantly faster cluster recognition times than using a desktop computer (see Table 6). However, the speed advantage when using the mixed reality smartglasses (see Table of Materials) was rather small (i.e., in a milliseconds range).
Finally, note that the data of this study can be found at36.
Figure 1: Investigated Aspects on the scale Interaction versus Novelty. The figure shows the used measures and their novelty with respect to the interaction level. Please click here to view a larger version of this figure.
Figure 2: Pictures of the study room. Two pictures of the study room are presented. Please click here to view a larger version of this figure.
Figure 3: Detected Outlier. The screenshot shows a detected outlier. Please click here to view a larger version of this figure.
Figure 4: Example of the mental rotation test. The screenshot shows the 3D-objects participants were confronted with; i.e., two out of five objects in different positions with the same object structure had to bet detected. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
Figure 5: Setting for the Spatial Ability Test. In (A), the audio configuration for the task Back is shown, while, in (B), the schematic user interface of the test is shown. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
Figure 6: Illustration of the Setting for the Task Outlier’s Detection. Three major aspects are shown. First, the outliers are illustrated. Second, performance measures are shown. Third, the way how the sound support was calculated is shown. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
Figure 7: Illustration of the Setting for the Task Cluster Recognition. Consider the scenarios A-C for a better impression, participants had to change their gaze to identify clusters correctly. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
Figure 8: Illustration of the Setting for the Task Cluster Recognition in Matlab. The figure illustrates clusters provided in Matlab, which was the basis for the 2D desktop application. Please click here to view a larger version of this figure.
Figure 9: Overall Study Procedure at a Glance. This figure presents the steps participants had to accomplish, in their chronological order. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
Figure 10: Results of the self-developed questionnaire (see Table 1). The results are shown using box plots. This figure has been modified based on this work35. Please click here to view a larger version of this figure.
#Question | Question | Target | Scale | Meaning |
1 | As how stressful did you experience wearing the glasses? | Wearing | 1-10 | 10 means high, 1 means low |
2 | How stressful was the outlier’s task? | Outliers | 1-10 | 10 means high, 1 means low |
3 | As how stressful did you experience the spatial sounds? | Sound | 1-10 | 10 means high, 1 means low |
4 | How stressful was the task finding clusters in Mixed Reality? | Cluster MR | 1-10 | 10 means high, 1 means low |
5 | How stressful was the task finding clusters in the desktop approach? | Cluster DT | 1-10 | 10 means high, 1 means low |
6 | How stressful was the usage of the voice commands? | Voice | 1-10 | 10 means high, 1 means low |
7 | Did you feel supported by the spatial sounds? | Sound | 1-10 | 10 means high, 1 means low |
Table 1: Self-developed questionnaire for user feedback. It comprises 7 question. For each question, participants had to determine a value within a scale from 1-10, whereby 1 means a low value (i.e., bad feedback), and 10 a high value (i.e., a very good feedback).
Variable | Low performer (n=31) | High performer | P Value |
(n=31) | (n=29) | ||
Gender, n(%) | |||
Female | 7 (23%) | 3 (10%) | |
Male | 24 (77%) | 26 (90%) | .302 (a) |
Age Category, n(%) | |||
<25 | 1 (3%) | 5 (17%) | |
25-35 | 27 (87%) | 21 (72%) | |
36-45 | 0 (0%) | 2 (7%) | |
46-55 | 1 (3%) | 0 (0%) | |
>55 | 2 (6%) | 1 (3%) | .099 (a) |
Highest Education, n(%) | |||
High School | 3 (10%) | 5 (17%) | |
Bachelor | 7 (23%) | 6 (21%) | |
Master | 21 (68%) | 18 (62%) | .692 (a) |
Mental Rotation Test, Mean (SD) | |||
Correct Answers | 3.03 (1.40) | 5.31 (0.76) | .001 (b) |
Wrong Answers | 2.19 (1.47) | 1.21 (0.56) | .000 (b) |
Spatial Hearing Test, Mean (SD) © | |||
Correct Answers | 4.39 (1.09) | 4.31 (1.00) | .467 (b) |
Wrong Answers | 1.61 (1.09) | 1.69 (1.00) | .940 (b) |
a:Fisher’s Exact Test | |||
b:Two-sample t-test | |||
c: SD Standard Deviation |
Table 2: Participant sample description and comparison between low and high performers in baseline variables. The table shows data to the three demographic questions on gender, age, and education. In addition, the results of the two pretests are presented.
Variable | Estimate | SE (a) | Result |
BoundingBox for low performer across tasks | 2,224 | .438 | t(60.00) = 5.08; p<.001 |
Alteration of BoundingBox for high performer across tasks | +.131 | .630 | t(60.00) = .21; p=.836 |
Time for low performer across tasks | 20,919 | 1,045 | t(60.00) = 20.02; p<.001 |
Alteration of Time for high performer across tasks | -3,863 | 1,503 | t(60.00) = -2.57; p=.013 |
Pathlength for low performer across tasks | 5,637 | .613 | t(60.00) = 9.19; p<.001 |
Alteration of Pathlength for high performer across tasks | -1,624 | .882 | t(60.00) = -1.84; p=.071 |
PathVariance for low performer across tasks | 4.3E-4 | 4.7E-5 | t(65.15) = 9.25; p<.001 |
Alteration of PathVariance for high performer across tasks | +4.3E-6 | 6.7E-5 | t(65.15) = .063; p=.950 |
PathMean for low performer across tasks | .0047 | 5.3E-4 | t(60.00) = 8.697; p<.001 |
Alteration of PathMean for high performer across tasks | +3.8E-5 | 7.7E-4 | t(60.00) = .05; p=.960 |
AngleVariance for low performer across tasks | .0012 | 7.3E-5 | t(85.70) = 16.15; p<.001 |
Alteration of AngleVariance for high performer across tasks | -2.7E-5 | 1.0E-4 | t(85.70) = -.26; p=.796 |
AngleMean for low performer across tasks | .015 | .001 | t(60.00) = 14.27; p<.001 |
Alteration of AngleMean for high performer across tasks | -3.0E-4 | 1.5E-3 | t(60.00) = -.20; p=.842 |
(a) SE = Standard Error |
Table 3: Results of the Multilevel Models for RQ1 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ1 for the outlier’s detection task (for all performance measures).
Variable | Estimate | SE (a) | Result |
BoundingBox at first task | .984 | .392 | t(138.12) = 2.51; p=.013 |
Alteration of BoundingBox from task to task | +.373 | .067 | t(420.00) = 5.59; p<.001 |
Time at first task | 19,431 | 1,283 | t(302.08) = 15.11; p<.001 |
Alteration of Time from task to task | -.108 | .286 | t(420.00) = -.37; p=.709 |
Pathlength at first task | 3,903 | .646 | t(214.81) = 6.05; p<.001 |
Alteration of Pathlength from task to task | +.271 | .131 | t(420.00) = 2.06; p=.040 |
PathVariance at first task | 3.1E-4 | 3.7E-5 | t(117.77) = 8.43; p<.001 |
Alteration of PathVariance from task to task | +3.5E-5 | 4.5E-6 | t(455.00) = 7.90; p<.001 |
PathMean at first task | .0033 | 4.2E-4 | t(88.98) = 7.66; p<.001 |
Alteration of PathMean from task to task | +4.1E-4 | 5.2E-5 | t(420.00) = 7.81; p<.001 |
AngleVariance at first task | .001 | 5.7E-5 | t(129.86) = 17.92; p<.001 |
Alteration of AngleVariance from task to task | +4.1E-5 | 6.5E-6 | t(541.75) = 6.34; p<.001 |
AngleMean at first task | .0127 | 8.1E-4 | t(82.17) = 15.52; p<.001 |
Alteration of AngleMean from task to task | +6.1E-4 | 9.0E-5 | t(420.00) = 6.86; p<.001 |
(a) SE = Standard Error |
Table 4: Results of the Multilevel Models for RQ2 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ2 for the outlier’s detection task (for all performance measures).
Variable | Estimate | SE (a) | Result |
BoundingBox without sound across tasks | 2,459 | .352 | t(93.26) = 6.98; p<.001 |
Alteration of BoundingBox with sound across tasks | -.344 | .316 | t(420.00) = -1.09; p=.277 |
Time without sound across tasks | 20,550 | 1,030 | t(161.17) = 19.94; p<.001 |
Alteration of time with sound across tasks | -2,996 | 1,319 | t(420.00) = -2.27; p=.024 |
Pathlength without sound across tasks | 5,193 | .545 | t(121.81) = 9.54; p<.001 |
Alteration of Pathlength with sound across tasks | -.682 | .604 | t(420.00) = -1.13; p=.260 |
PathVariance without sound across tasks | .0004 | 3.5E-5 | t(79.74) = 12.110; p<.001 |
Alteration of PathVariance with sound across tasks | +1.3E-5 | 2.2E-5 | t(429.20) = .592; p=.554 |
PathMean without sound across tasks | .005 | 4.0E-4 | t(73.66) = 11.35; p<.001 |
Alteration of PathMean with sound across tasks | +1.4E-4 | 2.5E-4 | t(420.00) = .56; p=.575 |
AngleVariance without sound across tasks | .0012 | 5.4E-5 | t(101.32) = 21.00; p<.001 |
Alteration of AngleVariance with sound across tasks | +3.3E-5 | 3.1E-5 | t(648.56) = 1.07; p=.284 |
AngleMean without sound across tasks | .0145 | 7.8E-4 | t(70.17) = 18.51; p<.001 |
Alteration of AngleMean with sound across tasks | +6.0E-4 | 4.3E-4 | t(420.00) = 1.39; p=.166 |
(a) SE = Standard Error |
Table 5: Results of the Multilevel Models for RQ3 (Outlier Detection Using the Smartglasses). The table shows statistical results of RQ3 for the outlier’s detection task (for all performance measures).
Variable | Estimate | SE (a) | Result |
Time with desktop across tasks | 10,536 | .228 | t(156.43) = 46.120; p<.001 |
Alteration of time with Hololens across tasks | -.631 | .286 | t(660.00) = -2.206; p=.028 |
(a) SE = Standard Error |
Table 6: Results of the Multilevel Models for RQ5 (Cluster Recognition Using the Smartglasses). The table shows statistical results of RQ5 for the cluster recognition task (for all performance measures).
Regarding the developed mixed reality smartglasses (see Table of Materials) application, two aspects were particularly beneficial. The use of spatial sounds for the outlier’s detection task was positively perceived on one hand (see the results of RQ3). On the other, the use of voice commands was also perceived positively (see Figure 10).
Regarding the study participants, although the number of recruited participants was rather small for an empirical study, the number is competitive compared to many other works. Nevertheless, a larger-scale study is planned based on the shown protocol. However, as it showed its feasibility for 60 participants, more participants are expected to reveal no further challenges. It was discussed that the selection of participants could be broader (in the sense of the fields the participants are coming from) and that the number of baseline variables to distinguish between high and low performers could be higher. On the other, if these aspects are changed to higher numbers, the protocol itself has not to be changed profoundly.
In general, the revealed limitations do not affect the conduction of a study based on the protocol shown in this work, they only affect the recruitment and the used questions for the demographic questionnaire. However, one limitation of this study is nevertheless important: the overall required time to finish the experiment for one participant is high. On the other hand, as the participants did not complain about the wearing comfort, or that the test device is burdening them too much, the time of conducting the overall protocol for one participant can be considered to be acceptable. Finally, in a future experiment, several aspects have to be added to the protocol. In particular, the outlier detection task should also be evaluated in the 2D desktop application. Furthermore, other hardware devices like the used mixed reality smartglasses (see Table of Materials) must be also evaluated. However, the protocol seems to be beneficial in a broader sense.
The following major insights were gained for the presented protocol. First, it showed its feasibility for evaluating immersive analytics for a mixed-reality solution. Specifically, the used mixed reality smartglasses (see Table of Materials) revealed their feasibility to evaluate immersive analytics in a mixed-reality application for Industry 4.0 scenarios. Second, the comparison of the developed used mixed reality smartglasses (see Table of Materials) application with a 2D desktop application was helpful to investigate whether the mixed-reality solution can outperform an application that does not make use of VR techniques. Third, the measurement of physiological parameters or vital signs should be always considered in such experiments. In this work, stress was measured using a questionnaire and a skin conductance device. Although the latter worked technically properly, the authors revealed in another study with the same device that additional considerations are required24. Fourth, the spatial ability test and the separation of high and low performers was advantageous. In summary, although the presented protocol seems to be complex at a first glance (see Figure 9), it showed its usefulness technically. Regarding the results, it also revealed its usefulness.
As the detection of outliers and the recognition of clusters are typical tasks in the evaluation of many high-dimensional data sets in Industry 4.0 scenarios, their use in an empirical study is representative for this field of research. The protocol showed that these scenarios can be well-integrated in a usability study on immersive analytics. Therefore, the used setting can be recommended for other studies in this context.
As the outcome of the shown study showed that the use of a mixed-reality solution based on the utilized smartglasses (see Table of Materials) is useful to investigate immersive analytics for Industry 4.0 scenarios, the protocol might be used for other usability studies in the given context as well.
The authors have nothing to disclose.
The authors have nothing to acknowledge.
edaMove | movisens | ||
HoloLens | Microsoft | ||
Matlab R2017a | MathWorks | ||
RPY2 | GNU General Public License v2 or later (GPLv2+) (GPLv2+) | https://pypi.org/project/rpy2/ | |
SPSS 25.0 | IBM |