Body movements
As shown by our results, we found differences in head and hand movements depending on the user perspective and the three exercise types. These differences are of particular importance for VR-based rehabilitation, where the patient’s condition constrains the range of motion that patient can perform.
The user perspective had the main effect on the head movements; there was significantly longer total distance traveled by the head over 20 reaching attempts in the first-person (M=3.52, SD = 1.30) perspective than third-person perspective (M=2.41, SD = 0.70). Compared to the third-person perspective, in which the avatar and targets appear in front of the user, the first-person perspective required the subjects to explore a wider area of screen space, which implicitly required the use of peripheral vision. From the perspective of VR-based rehabilitation, the use of the first-person perspective may be beneficial when the patient requires broader spatial exploration as in rehabilitation of spatial neglect [11].
For upper-extremity movements, the first-person perspective with higher cognitive load exercises (i.e., flipped) yielded longer trajectories. In contrast, upper-extremity movements for the normal and trail exercise settings did not reveal significant differences between the two user perspectives; thus, revealing a zero-effect from the visual cues shown as trails and the shortest path between the hand and target on the trajectory distance.
User performance
Our results indicate that the user perspective alone is not the main factor influencing user performance, but a combination of user perspective and cognitive load. As expected, we found that flipped exercises (higher cognitive load) required more time to be completed (Fig. 3) in both user perspectives and had lower success rates (Fig. 6), compared to the normal and trail exercises. However, for this higher cognitive load exercise, subjects took more time to complete the task and had lower task success rates in the first-person perspective than in the third-person perspective. For normal exercises, there were no differences between user perspectives. For trail exercises, subjects took longer to complete the task in third-person perspective compared to first-person perspective (opposite case than trail exercises) and had similar success rates between the two perspectives. Similar to Salamin [6], user perspective alone did not influence completion time, and similar to Covaci et al. [7], user perspective alone did not influence task success.
We believe that the opposite results in completion time between the higher cognitive load task and the trail task relies on two factors: movements feedback and environmental feedback. In third-person perspective, the subjects were able to instantly observe wrong actions in the avatar and make corrections accordingly while in the first-person perspective, the restricted field of view did not ease this immediate feedback. This movement feedback was particularly useful for the higher cognitive task, the most unfamiliar for the subjects. As for environmental feedback, in the trail exercise, the displayed segment connecting the closest hand to the target was more visible in the first-person perspective; thus, it caught the subject’s attention faster. These visual cues may also be helpful to get people’s attention back, especially for patients with spatial neglect. Opposed to Salamin’s preference for using the first-person perspective for precision tasks [5], we believe that the third-person perspective feedback is preferred for precision tasks with a higher cognitive load that require unfamiliar movements and performed on an environment with field of view constraints. The recommendation may not apply to other cognitive tasks such as memory attention unless mixed in a dual attention setup with unfamiliar movements.
Level of engagement
In this study, we found that the user perspective had no significant effect on the degree of user-perceived engagement. Our findings are supported by the conflicting results reported by Denisova and Cairns [12] (first-person perspective more immersive) and respectively by Salamin [5] (third-person perspective more engaging). We believe that the low degree of task difficulty (for a healthy subject) and the static foot positioning required by the rehabilitation application may have influenced the subjects’ sense of engagement. As suggested by Faria et al. [13], exercises requiring more cognitive effort and attention may gather more attention. On the other hand, the study of Schuurink and Toet [14] suggests that the level of stimulation induced by a virtual reality environment is independent of the question of first- and third-person perspectives; our results support Schuurink and Toet’s conclusion in the context of tasks with relatively reduced cognitive effort.
We looked into the self-perceived level of engagement due to its potential to improve a patient’s experience with rehabilitation tasks. To measure this we used the definition of engagement from the gaming literature, while implementing our experiments in an immersive VR environment. We note that the term ‘immersion’ is overloaded between gaming research and VR research. Gaming research operates under the definition given by Jennet et al. [4], who describe the different levels of engagement as immersion, an experience felt by gamers. In the gaming literature, immersion is therefore used interchangeably with engagement and involvement [15]. Jennet et al. further created the IEQ questionnaire to assess immersion (defined as engagement), based on components that influence the gamer experience. In contrast, VR research uses the term immersion to describe a property of the technology [16] that influences the level of presence — the “illusion of being there, notwithstanding that you know for sure that you are not” [17]. Each of the two definitions of immersion is equally influential in its field of origin. Because we seek to measure game engagement, in our questionnaire we followed the Jennet et al. [4] definition. This definition acknowledges the influence of presence on engagement, and suggests that presence only appears at the deeper level of engagement.
Depth perception errors
Based on the questionnaire feedback, in the third-person perspective a few subjects experienced misperceptions, with no further impact, for objects placed to the sides and close to the frontal plane. Two of the subjects reported that in the third-person view the displayed scene seemed to be rendered in 2D; another subject thought mistakenly that the objects were behind the avatar. It is possible the lack of geometrical visual cues in spherical targets contributed to these isolated depth perception issues, although we provided consistently shadows as a cue. For example, Powell et al. [18] suggest using complex geometries (more complex than our simple sphere geometry) to further provide visual cues for reaching tasks. Nevertheless, none of these reported ln appear to have resulted in lower user performance than in the first-person view.
Conversely, no subjects reported depth-perception errors in first-person perspective, despite lower user performance results under specific settings. A recent study [19] in CAVE-like environments suggested that targets near the screen—that hence appear large to the user (as in first-person perspective)—are easier to understand than objects far into the scene (as in third-person perspective). Results from the Bruder et al. study [19] further report up to 50% misinterpretations of distance for objects that are further away. We believe their findings help explain the difference in our user reports of depth-perception errors.
Assumptions and limitations
Concerning assumptions and limitations, our study used a specific infrastructure that is currently not readily accessible to the public due to cost, technical support, and space requirements. We note, however, that the CAVE2 merely serves as a vehicle for testing our hypothesis, since it allows both first person and third person large screen feedback display, aside from the CAVE2’s many other, vast capabilities, which were not used in this study. We believe these findings would transfer to any another immersive system that allows both first person and third person large screen display. Given a strong need in domains like rehabilitation, such systems are not inaccessible to rehabilitation clinics. Our findings may also not be directly transferable to other platforms, for example, head-mounted displays. However, immersive large environments have specific advantages over head-mounted displays (e.g., non-intrusive equipment) that make such environments relevant to rehabilitation clinics. Last but not least, as suggested by Levin et al. [20], in general, further evaluation of the performance, quality and surrogate aspects of motor behavior are needed to analyze the fidelity of VR environment tasks to physical environment tasks. We note, nevertheless, on the deliberate simplicity of the exercise set we designed and we tested in this work, in collaboration with a domain expert.
For our study design, even though we randomized the starting user perspective per subject, the order of the exercises with different target sizes was not randomized which could introduce confounding effects. Moreover, our study considers engagement from the perspective of a self-reported “experience that produces a lack of awareness of time and the real world” [4]. We do not consider other aspects of the virtual reality experience such as embodiment (the sensation of being inside a body) [21].
The success score median of 20 (out of 20) across all exercises and both perspectives indicates that our game had an easy level of difficulty for healthy subjects. We corroborated this last statement with the results of the user survey; on a 1 to 5 Likert scale from easy to difficult, subjects scored the first-person perspective as 2.04 ± 0.92 and the third-person perspective as 2.21 ± 0.92. These results are not surprising as the arm-reaching exercises were designed for physical therapy patients, but tested on healthy subjects; rehabilitation patients would find the exercises more difficult. Furthermore, we allowed the healthy subjects eight seconds to target each object, when in fact they only needed an average of 2.3 seconds ± 1.1. These numerical results may not transfer to a population of stroke subjects. Besides, even though we developed our application in collaboration with rehabilitation experts, further studies with actual actions is required to evaluate its application on the target population and the transferability of our results. Finally, while using smaller target sizes did not increase the difficulty, it may be possible that smaller objects than the ones we used could make the game more challenging.