Augmented Reality and Robotics: A Survey and Taxonomy for AR-enhanced Human-Robot Interaction and Robotic Interfaces

This paper contributes to a taxonomy of augmented reality and robotics based on a survey of 460 research papers. Augmented and mixed reality (AR/MR) have emerged as a new way to enhance human-robot interaction (HRI) and robotic interfaces (e.g., actuated and shape-changing interfaces). Recently, an increasing number of studies in HCI, HRI, and robotics have demonstrated how AR enables better interactions between people and robots. However, often research remains focused on individual explorations and key design strategies, and research questions are rarely analyzed systematically. In this paper, we synthesize and categorize this research field in the following dimensions: 1) approaches to augmenting reality; 2) characteristics of robots; 3) purposes and benefits; 4) classification of presented information; 5) design components and strategies for visual augmentation; 6) interaction techniques and modalities; 7) application domains; and 8) evaluation strategies. We formulate key challenges and opportunities to guide and inform future research in AR and robotics.


INTRODUCTION
As robots become more ubiquitous, designing the best possible interaction between people and robots is becoming increasingly important. Traditionally, interaction with robots often relies on the robot's internal physical or visual feedback capabilities, such as robots' movements [106,280,426,490], gestural motion [81, 186,321], gaze outputs [22,28,202,307], physical transformation [170], or visual feedback through lights [34,60,403,427] or small displays [129,172,446]. However, such modalities have several key limitations. For example, the robot's form factor cannot be easily modified on demand, thus it is often difficult to provide expressive physical feedback that goes beyond internal capabilities [450]. While visual feedback such as lights or displays can be more flexible, the expression of such visual outputs is still bound to the fixed physical design of the robot. For example, it can be challenging to present expressive information given the fixed size of a small display, where it cannot show the data or information associated with the physical space that is situated outside the screen. Augmented reality (AR) interfaces promise to address these challenges, as AR enables us to design expressive visual feedback without many of the constraints of physical reality. In addition, AR can present visual feedback in one's line of sight, tightly coupled with the physical interaction space, which reduces the user's cognitive load when switching the context and attention between the robot and an external display. Recent advances in AR opened up exciting new opportunities for human-robot interaction research, and over the last decades, an increasing number of works have started investigating how AR can be integrated into robotics to augment their inherent visual and physical output capabilities. However, often these research projects are individual explorations, and key design strategies, common practices, and open research questions in AR and robotics research are rarely analyzed systematically, especially from an interaction design perspective. With the recent proliferation of this research field, we see a need to synthesize the existing works to facilitate further advances in both HCI and robotics communities.
In this paper, we review a corpus of 460 papers to synthesize the taxonomy for AR and robotics research. In particular, we synthesized the research field into the following design space dimensions (with a brief visual summary in Figure 1): 1) approaches to augmenting reality for HRI; 2) characteristics of augmented robots; 3) purposes and benefits of the use of AR; 4) classification of presented information; 5) design components and strategies for visual augmentation; 6) interaction techniques and modalities; 7) application domains; and 8) evaluation strategies. Our goal is to provide a common ground and understanding for researchers in the field, which both includes AR-enhanced human-robot interaction [151] and robotic user interfaces [37,218] research (such as actuated tangible [349] and shape-changing interfaces [16,87,359]). We envision this paper can help researchers situate their work within the large design space and explore novel interfaces for AR-enhanced humanrobot interaction (AR-HRI). Furthermore, our taxonomy and detailed design space dimensions (together with the comprehensive index linking to related work) can help readers to more rapidly find practical AR-HRI techniques, which they can then use, iterate and evolve into their own future designs. Finally, we formulate open research questions, challenges, and opportunities to guide and stimulate the research communities of HCI, HRI, and robotics.

SCOPE, CONTRIBUTIONS, AND METHODOLOGY 2.1 Scope and Definitions
The topic covered by this paper is "robotic systems that utilize AR for interaction". In this section, we describe this scope in more detail and clarify what is included and what is not.
2.1.1 Human-Robot Interaction and Robotic Interfaces. "Robotic systems" could take different forms-from traditional industrial robots to self-driving cars or actuated user interfaces. In this paper, we do not limit the scope of robots and include any type of robotic or actuated systems that are designed to interact with people. More specifically, our paper also covers robotic interface [37,218] research. Here, robotic interfaces refer to interfaces that use robots and/or actuated systems as a medium for human-computer interaction 1 . This includes actuated tangible interfaces [349], adaptive environments [154,399], swarm user interfaces [235], and shape-changing interfaces [16,87,359].
2.1.2 AR vs VR. Among HRI and robotic interface research, we specifically focus on AR, but not on VR. In the robotics literature, VR has been used for many different purposes, such as interactive simulation [173,276,277] or haptic environments [291,421,449]. However, our focus is on visual augmentation in the real world to enhance real robots in the physical space, thus we specifically investigate systems that uses AR for robotics.

What is AR.
The definition of AR can also vary based on the context [405]. For example, Azuma defines AR as "systems that have the following three characteristics: 1) combines real and virtual, 2) interactive in real time, 3) registered in 3D" [32]. Milgram and Kishino also describe this with the reality-virtuality continuum [293]. More broadly, Bimber and Rasker [41] also discuss spatial augmented reality (SAR) as one of the categories in AR. In this paper, we take AR as a broader scope and include any systems that augment physical objects or surroundings environments in the real world, regardless of the technology used.

Contributions
Augmented reality in the field of robotics has been the scope of other related review papers (e.g., [100,155,281,356]) that our taxonomy expands upon. Most of these earlier papers reviewed key application use cases in the research field. For example, Makhataeva and Varol surveyed example applications of AR for robotics in a 5-year timeframe [281] and Qian et al. reviewed AR applications for robotic surgery in particular [356]. From the HRI perspective, Green et al. provide a literature review research for collaborative HRI [155], which focuses in particular on collaboration through the means AR technologies. And more recently, human-robot interaction and VR/MR/AR (VAM-HRI) as also been the topic of workshops [466].
we present a taxonomy with a novel set of design space dimensions, providing a holistic view based on the different dimensions unifying the design space, with a focus on interaction and visual augmentation design perspectives. Second, our paper also systematically covers a broader scope of HCI and HRI literature, including robotic, actuated, and shape-changing user interfaces. This field is increasingly popular in the field of HCI, [16,87,349,359] but not well explored in terms of the combination with AR. By incorporating this research, our paper provides a more comprehensive view to position and design novel AR/MR interactions for robotic systems. Third, we also discuss open research questions and opportunities that facilitate further research in this field. We believe that our taxonomy -with the design classifications and their insights, and the articulation of open research questions -will be invaluable tools for providing a common ground and understanding when designing AR/MR interfaces for HRI. This will help researchers identify or explore novel interactions. Finally, we also compiled a large corpus of research literature using our taxonomy as an interactive website 2 , which can provide a more content-rich, up-to-date, and extensible literature review. Inspired by similar attempts in personal fabrication [5,36], data physicalization [1,192], and material-based shape-changing interactions [6,353], our website, along with this paper, could provide similar benefits to the broader community of both researchers and practitioners.

Dataset and Inclusion Criteria.
To collect a representative set of AR and robotics papers, we conducted a systematic search in the ACM Digital Library, IEEE Xplore, MDPI, Springer, and Elsevier. Our search terms include the combination of "augmented reality" AND "robot" in the title and/or author keywords since 2000. We also searched for synonyms of each keyword, such as "mixed reality", "AR", "MR" for augmented reality and "robotic", "actuated", "shapechanging" for robot. This gave us a total of 925 papers after removing duplicates. Then, four authors individually looked at each paper to exclude out-of-scope papers, which, for example, only focus on 2 https://ilab.ucalgary.ca/ar-and-robotics/ AR-based tracking but not on visual augmentation, or were concept or position papers, etc. After this process, we obtained 396 papers in total. To complement this keyword search, we also identified an additional relevant 64 papers by leveraging the authors' expertise in HCI, HRI, and robotic interfaces. By merging these papers, we finally selected a corpus of 460 papers for our literature review.
While our systematic compilation of this corpus provides an in-depth view into the research space, this set can not be a complete or exhaustive list in this domain. The boundaries and scope of our corpus may not be clear cut, and as with any selection of papers, there were many papers right at the boundaries of our inclusion/exclusion criteria. Nevertheless, our focus was on the development of a taxonomy and this corpus serves as a representative subset of the most relevant papers. We aim to address this inherent limitation of any taxonomy by making our coding and dataset open-source, available for others to iterate and expand upon.
2.3.2 Analysis and Synthesis. The dataset was analyzed through a multi-step process. One of the authors conducted open-coding on a small subset of our sample to identify a first approximation of the dimensions and categories within the design space. Next, all authors reflected upon the initial design space classification to discuss the consistency and comprehensiveness of the categorization methods, where then categories were merged, expanded, and removed. Next, three other co-authors performed systematic coding with individual tagging for categorization of the complete dataset. Finally, we reflected upon the individual tagging to resolve the discrepancies to obtain the final coding results.
In the following sections, we present our results and findings of this classification by using color-coded text and figures. We provide a list of key citations directly within the figures, with the goal of facilitating lookup of relevant papers within each dimension and all of the corresponding sub-categories. Furthermore, in the appendix of this paper we included several tables with a complete compilation of all citations and count of the papers in our corpus that fall within each of the categories and sub-categories of the design spacewhich we hope will help researchers to more easily find relevant papers (e.g., finding all papers that use AR for "improving safety" [76,158,197,198,209,262,285,302,372,443,450,479,481] [ 53,133,153,209,481] [ 21,96,116,127,159,243,258,374,430,431,472] [ 14,210,382,418,436,445,475] On-Environment On-Robot 1 2 Figure 3: Approaches to augmenting reality in robotics. with robots, "augment surroundings" of robots, or provide visual feedback of "paths and trajectories").

APPROACHES TO AUGMENTING REALITY IN ROBOTICS
In this section, we discuss the different approaches to augmenting reality in robotics ( Figure 3). To classify how to augment reality, we propose to categorize based on two dimensions: First, we categorize the approaches based on the placement of the augmented reality hardware (i.e., where the optical path is overridden with digital information). For our purpose, we adopt and extend Bimber and Raskar's [41] classification in the context of robotics research. Here, we propose three different locations: 1) on-body, 2) on-environment, and 3) on-robot. Second, we classify based on the target location of visual augmentation, i.e., where is augmented. We can categorize this based on 1) augmenting robots or 2) augmenting surroundings. Given these two dimensions, we can map the existing works into the design space (Figure 3 Right). Walker et al. [450] include augmenting user interface (UI) as another category. Since the research that has been done in this area can be roughly considered augmenting the environment, we decided to not include it as a separate category.
Approach-1. Augment Robots: AR is used to augment robots themselves by overlaying or anchoring additional information on top of the robots (Figure 3 Top).
-On-Body: The first category augments robots through on-body AR devices. This can be either 1) head-mounted displays (HMD) [197,372,450] or 2) mobile AR interfaces [76,209]. For example, VRoom [197,198] augments the telepresence robot's appearance by overlaying a remote user. Similarly, Young et al. [481] demonstrated adding an animated face onto a Roomba robot to show an expressive emotion on mobile AR devices.
-On-Environment: The second category augments robots with devices embedded in the surrounding environment. Technologies often used with this approach include 1) environment-attached projectors [21] or 2) see-through displays [243]. For example, Drone-SAR [96] also shows how we can augment the drone itself with projection mapping. Showing the overlaid information on top of robotic interfaces can also fall into this category. Similarly, shape-shifting walls [431] or handheld shape-changing interfaces [258,374] are also directly augmented with the overlaid animation of information.
-On-Robot: In the third category, the robots augment their own appearance, which is unique in AR and robotics research, compared to Bimber and Raskar's taxonomy [41]. For example, Furhat [14] animates a face with a back-projected robot head, so that the robot can augment its own face without an external AR device. The common technologies used are robot-attached projectors [418,436], which augments itself by using its own body as a screen. Alternatively, robot-attached displays can also fall into this category [445,475].
Approach-2. Augment Surroundings: Alternatively, AR is also used to augment the surroundings of the robots. Here, the surroundings include 1) surrounding mid-air 3D space, 2) surrounding physical objects, or 3) surrounding physical environments, such as wall, floor, ceiling, etc (Figure 3 Bottom).
-On-Body: Similarly, this category augments robots' surroundings through 1) HMD [372,450], 2) mobile AR devices [76], or 3) handheld projector [177]. One benefit of HMD or mobile AR devices is an expressive rendering capability enabled by leveraging 3D graphics and spatial scene understanding. For example, Drone Augmented Human Vision [114] uses HMD-based AR to change the appearance of the wall for remote control of drones. RoMA [341] uses HMD for overlaying the interactive 3D models on a robotic 3D printer.
-On-Environment: In contrast to HMD or handheld devices, the on-environment approach allows much easier ways to share the AR experiences with co-located users. Augmentation can be done through 1) projection mapping [425] or 2) surface displays [163]. For example, Touch and Toys [163] leverage a large surface display to show additional information in the surroundings of robots. Andersen et al. [21] investigates the use of projection mapping to highlight or augment surrounding objects to communicate the robot's intentions. While it allows the shared content for multiple people, the drawback of this approach is a fixed location due to the requirements of installed-equipment, which may limit the flexibility and mobility for outdoor scenarios.
-On-Robot: In this category, the robots themselves augment the surrounding environments. We identified that the common approach is to utilize the robot-attached projector to augment surrounding physical environments [210,382]. For example, Kasetani et al. [210] attach a projector to a mobile robot to make a self-propelled projector for ubiquitous displays. Moreover, DisplayDrone [382] shows a projected image onto the surrounding walls for on-demand displays. The main benefit of this approach is that the user does not require any on-body or environment-instrumented devices, thus it enables mobile, flexible, and deployable experiences for different situations.

CHARACTERISTICS OF AUGMENTED ROBOTS
Next, we classify research projects based on the characteristics of augmented robots. Possible design space dimensions span 1) the form factor of robots, 2) the relationship between the users and robots, 3) size and scale of the robots, and 4) proximity for interactions ( Figure 4).
Dimension-2. Relationship: Research also explores different peopleto-robot relationships. In the most common case, one person interacts with a single robot (1:1), but the existing research also explores a situation where one person interacts with multiple robots (1:m). AR for swarm robots falls into this category [176,235,328,425]. On the other hand, collaborative robots require multiple people to interact with a single robotic interface (n:1) [430] or a swarm of robots (n:m) [328].
Dimension-3. Scale: Augmented robots are of different sizes, along a spectrum from small to large: from a small handheld-scale which can be grasped with a single hand [425], tabletop-scale which can fit onto the table [257], and body-scale which is about the same size as human bodies like industrial robotic arms [27,285]. Largescale robots are possible, such as vehicles [2, 7, 320] or even building construction robots.
Dimension-4. Proximity: Proximity refers to the distance between the user and robots when interaction happens. Interactions can vary across the dimension of proximity, from near to far. The proximity can be classified as the spectrum between 1) co-located or 2) remote. The proximity of the robots can influence whether the robots are directly touchable [227,316] or situated in distance [96]. It can also affect how to augment reality, based on whether the robots are visible to the user [171] or out-of-sight for remote interaction [114].

PURPOSES AND BENEFITS OF VISUAL AUGMENTATION
Visual augmentation has many benefits for effective human-robot interaction. In this section, we categorize the purposes of why visual augmentation is used in robotics research. On a higher level, purposes and benefits can be largely categorized as 1) for programming and control, and 2) for understanding, interpretation, and communications ( Figure 5).
Purpose-1. Facilitate Programming: First, the AR interface provides a powerful assistant to facilitate programming robots [33]. One way to facilitate the programming is to simulate programmed behaviors [162], which has been explored since early 1990s [32,219,294]. For example, GhostAR [51] shows the trajectory of robots to help the user see how the robots will behave. Such visual simulation helps the user to program the robots in industry applications [358] or home automation [263]. Another aspect of programming assistance is to directly map with the real world. Robot programming often involves interaction with real-world objects, and going back and forth between physical and virtual worlds is tedious and timeconsuming. AR interfaces allow the user to directly indicate objects or locations in the physical world. For example, Gong et al. [150] utilizes projection-based AR to support the programming of grasping tasks.
Purpose-2. Support Real-time Control and Navigation: Similar to the previous category, AR interfaces facilitate the control, navigation, and teleoperation of the robot. In contrast to programming the behaviors, this category focuses on the real-time operation of the robot, either remote or co-located. For example, exTouch [209] and PinpointFly [76] allows the user to interactively control robots with the visual feedback on a touch screen. AR interfaces also support showing additional information or parameters related to the navigation and control. For example, a world-in-miniature of the physical world [4] or real-time camera view [171] is used to support remote navigation of drones.
Purpose-4. Communicate Intent: AR interfaces can also help to communicate the robot's intention to the user through spatial information. For example, Walker et al. show that the AR representations can better communicate the drone's intent through the experiments using three different designs [450]. Similarly, Rosen et al. reveal that the AR visualization can better present the robotic arm's intent through the spatial trajectory, compared to the traditional interfaces [372]. AR interfaces can be also used to indicate the state of robot manipulation such as indicating warning or completion of the task [21] or communicating intent with passersby or pedestrians for wheelchairs [458] or self-driving cars [320].
Purpose-5. Increase the Expressiveness: Finally, AR can also be used to augment the robot's expression [3]. For example, Groechel et al. [158] uses an AR view to provide virtual arms to a social robot (e.g., Kuri Robot) to enhance the social expressions when communicating with the users. Examples include adding facial expressions [481], overlaying remote users [198,401], and interactive content [96] onto robots. AR is a helpful medium to increase the expressiveness of shape-changing interfaces [258]. For example, Sublimate [243] or inFORM [127] uses see-through display or projection mapping to provide a virtual surface on a shape display.

CLASSIFICATION OF PRESENTED INFORMATION
This section summarizes types of information presented in AR interfaces. The categories we identified include 1) robot's internal information, 2) external information about the environment, 3) plan and activity, and 4) supplemental content ( Figure 6).
Information-1. Robot's Internal Information: The first category is the robot's internal information. This can include 1) robot's internal status, 2) robot's software and hardware condition, 3) robot's internal functionality and capability. Examples include the robot's emotional state for social interaction [158,481], a warning sign when the user's program is wrong [262], the drone's current information such as altitude, flight mode, flight status, and dilution of precision [15,332], and the robot's reachable region to indicate safe and dangerous zones [282]. Showing the robot's hardware components is also included in this category. For example, showing or highlighting physical parts of the robot for maintenance [285,302] is also classified as this category.

Information-2. External Information about the Environment:
Another category is external information about the environment. This includes 1) sensor data from the internal or external sensors, 2) camera or video feed, 3) information about external objects, 4) depth map or 3D reconstructed scene of the environment. Examples include camera feeds for remote drone operations [171], the world in miniature of the environment [4], sensor stream data of the environment [15], visualization of obstacles [263], a local cost map for search task [305], a 3D reconstructed view of the environment [114,332], a warning sign projected onto an object that

Internal Information External Information Plan and Activity Supplemental Content
Internal Status [479] Robot's Capability [66] Object Status [21] Sensor/Camera Data [377] Plan and Target   indicates the robot's intention [21], visual feedback about the localization of the robot [468], and position and label of objects for grasping tasks [153]. Such embedded external information improves the situation awareness and comprehension of the task, especially for real-time control and navigation.
Information-3. Plan and Activity: The previous two categories focus on the current information, but plan and activity are related to future information about the robot's behavior. This includes 1) a plan of the robot's motion and behavior, 2) simulation results of the programmed behavior, 3) visualization of a target and goal, 4) progress of the current task. Examples include the future trajectory of the drone [450], the direction of the mobile robots or vehicles [188,320], a highlight of the object the robot is about to grasp [33], the location of the robot's target position [482], and a simulation of the programmed robotic arm's motion and behavior [372]. This type of information helps the user better understand and expect the robot's behavior and intention.
Information-4. Supplemental Content: Finally, AR is also used to show supplemental content for expressive interaction, such as showing interactive content on robots or background images for their surroundings. Examples include a holographic remote user for remote collaboration and telepresence [197,401], a visual scene for games and entertainment [346,369], an overlaid animation or visual content for shape-changing interfaces [243,258], showing the menu for available actions [27,58], and aided color coding or background for dynamic data physicalization [127,425].

DESIGN COMPONENTS AND STRATEGIES FOR VISUAL AUGMENTATION
Different from the previous section that discusses what to show in AR, this section focuses on how to show AR content. To this end, we classify common design practices across the existing visual augmentation examples. At a higher level, we identified the following design strategies and components: 1) UIs and widgets, 2) spatial references and visualizations, and 3) embedded visual effects ( Figure 7).
Design-1. UIs and Widgets: UIs and widgets are a common design practice in AR for robotics to help the user see, understand, and interact with the information related to robots (Figure 7 Top).
-Menus: The menu is often used in mixed reality interfaces for human-robot interaction [140,326,416]. The menu helps the user to see and select the available options [325]. The user can also control or communicate with robots through a menu and gestural interaction [58].
-Information Panels: Information panels show the robot's internal or external status as floating windows [443] with either textual or visual representations. Textual information can be effective to present precise information such as the current altitude of the drone [15] or the measured length [96]. More complex visual information can also shown such as a network graph of the current task and program [262].
-Labels and Annotations: Labels and annotations are used to show information about the object. Also, they are used to annotate objects [96]. 7 -Controls and Handles: Controls and handles are another user interface example. They allow the user to control robots through a virtual handle [169]. Also, AR can show the control value surrounding the robot [340].
-Monitors and Displays: Monitor or displays help the user to situate themselves in the remote environment [445]. Camera monitors allow the user to better navigate the drone for inspection or aerial photography tasks [171]. The camera feed can be also combined with the real-time 3D reconstruction [332]. In contrast, monitor or display are also used to display spatially registered content in the surrounding environment [382] or on top of the robot [443] Design-2. Spatial References and Visualizations: Spatial references and visualizations are a technique used to overlay data spatially. Similar to embedded visualizations [463], this design can directly embed data on top of their corresponding physical referents. The representation can be from a simple graphical element, such as points (0D), paths (1D), or areas (2D/3D), to more complex visualizations like color maps (Figure 7 Middle).
-Points and Locations: Points are used to visualize a specific location in AR. These points can be used to highlight a landmark [136], target location [482], or way point [451], which is associated to the geospatial information. Additionally, points can be used as a control or anchor point to manipulate virtual objects or boundaries [325].
-Paths and Trajectories: Similarly, paths and trajectories are another common approaches to represent spatial references as lines [358,407,450]. For example, paths are commonly used to visualize the expected behaviors for real-time or programmed control [75,76,494]. By combining the interactive points, the user can modify these paths by adding, editing, or deleting the way points [451].
-Areas and Boundaries: Areas and boundaries are used to highlight specific regions of the physical environment. They can visualize a virtual bounding box for safety purposes [68,182] or highlight a region to show the robot's intent [21,105]. Alternatively, the areas and boundaries are also visualized as a group of objects or robots [191]. Some research projects also demonstrated the use of interactive sketching for specifying the boundaries in home automation [191,263].
-Other Visualizations: Spatial visualizations can also take more complex and expressive forms. For example, spatial color/heat map visualization can indicate the safe and danger zones in the workspace, based on the robot's reachable areas [282]. Alternatively, a force map visualizes the field of force to provide visual affordance for the robot's control [176,177,212].
Design-3. Embedded Visual Effects: Embedded visual effects refer to graphical content directly embedded in the real world. In contrast to spatial visualization, embedded visualization does not need to encode data. Common embedded visual effects are 1) anthropomorphic effects, 2) virtual replica, and 3) texture mapping of physical objects (Figure 7 Bottom).
-Anthropomorphic Effects: Anthropomorphic effects are visual augmentations that render human-inspired graphics. Such design can add an interactive effect of 1) a robot's body [3], such as arms [158] and eyes [450], 2) faces and facial expressions [14,180,481], 3) a human-avatar [198,401,472], or 4) character animation [23,473], on top of the robots. For example, it can augment the robot's face by animated facial expression with realistic images [14] or cartoonlike animation [481], which can improve the social expression of the robots [158] and engage more interaction [23,473]. In addition to augmenting a robot's body, it can also show the image of a real person to facilitate remote communication [197,401,436,472].
-Virtual Replica and Ghost Effects: A virtual replica is a 3D rendering of robots, objects, or external environments. By combining with spatial references, a virtual replica is helpful to visualize the simulated behaviors [76, 169,372,451,494]. By rendering multiple virtual replicas, the system can also show the ghost effect with a series of semi-transparent replica [51,372]. In addition, a replica of external objects or environments is also used to facilitate colocated programming [25,33] or real-time navigation in the hidden space [114]. Also, a miniaturized replica of the environment (i.e., the world in miniature) helps drone navigation [4].
-Texture Mapping Effects based on Shape: Finally, texture mapping overlays interactive content onto physical objects to increase expressiveness. This technique is often used to enhance shape-changing interfaces and displays [127,175,308,374], such as overlaying terrain [244,245], landscape [116], animated game elements [258,431], colored texture [308], or NURBS (Non-Uniform Rational Basis Spline) surface effects [243]. Texture effects can also augment the surrounding background of the robot. For example, by overlaying the background texture onto the surrounding walls or surfaces, AR can contextualize the robots with the background of an immersive educational game [369,370], a visual map [340,425,431], or a solar system [328].

INTERACTIONS
Dimension-1. Level of Interactivity: In this section, we survey the interactions in AR and robotics research. The first dimension is level of interactivity ( Figure 8).

Level of Interactivity
-Explicit and Indirect Manipulation: Indirect manipulation is the user's input through remote manipulation without any physical contact. The interaction can take place through pointing out objects [325], selecting and drawing [191], or explicitly determining actions with body motion (e.g., changing the setting in a virtual menu [58]) -Explicit and Direct Physical Manipulation: Finally, this category involves the user's direct touch inputs with their hands or bodies. The user can physically interact with the robots through embodied body interaction [369]. Several interaction techniques utilize the deformation of objects or robots [243], grasping and manipulating [163], or physically demonstrating [270].
Dimension-2. Interaction Modalities: Next, we synthesize categories based on the interaction modalities ( Figure 9).
-Tangible: The user can interact with robots by changing the shape or by physically deforming the object [243,258], moving robots by grasping and moving tangible objects [163,340], or controlling robots by grasping and manipulating robots themselves [328].
-Touch: Touch interactions often involve the touch screen of mobiles, tablets, or other interactive surfaces. The user can interact with robots by dragging or drawing on a tablet [76, 209,212], touching and pointing the target position [163], and manipulating virtual menus on a smartphone [53]. The touch interaction is particularly useful when requiring precise input for controlling [153,169] or programming the robot's motion [136,407].
-Pointer and Controller: The pointer and controller allow the user to manipulate robots through spatial interaction or device action. Since the controller provides tactile feedback, it reduces the effort to manipulate robots [171]. While many controller inputs are explicit interactions [191,467], the user can also implicitly communicate with robots, such as designing a 3D virtual object with the pointer [341].
-Spatial Gesture: Spatial gestures are a common interaction modality for HMD-based interfaces [27,33,58,59,114,262,320,326]. With these kinds of gestures, users can manipulate virtual way points [325,358] or operate robots with a virtual menu [58]. The spatial gesture is also used to implicitly manipulate swarm robots through remote interaction [401].
-Gaze: Gaze is often used to accompany the spatial gesture [27,114,262,300,325,358,482], such as when performing menu selection [27]. But, some works investigate the gaze itself to control the robot by pointing out the location in 3D space [28].
-Voice: Some research leveraged voice input to execute commands for the robot operation [27,182,197,354], especially in co-located settings.
-Proximity: Finally, proximity is used as an implicit form of interaction to communicate with robots [14,182,305]. For example, the AR's trajectory will be updated to show the robot's intent when the a passerby approaches the robot [458]. Also, the shape-shifting wall can change the content on the robot based on the user's behavior and position [431].

APPLICATION DOMAINS
We identified a range of different application domains in AR and robotics. Figure 10 summarizes the each category and the list of related papers. We classified the existing works into the following high-level application-type clusters: 1) domestic and everyday use, 2) industry applications, 3) entertainment, 4) education and training, 5) social interaction, 6) design and creative tasks, 7) medical and health, 8) telepresence and remote collaboration, 9) mobility and transportation, 10) search and rescue, 11) robots for workspaces, and 12) data physicalization.

EVALUATION STRATEGIES
In this section, we report our analysis of evaluation strategies for augmented reality and robotics. The main categories we identified are following the classification by Ledo et al. [236]: 1) evaluation through demonstration, (2) technical evaluations, and (3) user evaluations. The goal of this section is to help researchers finding the best technique to evaluate their systems, when designing AR for robotic systems.
Evaluation-1. Evaluation through Demonstration: Evaluation through demonstration is a technique to see how well a system will potentially work in certain situations. The most common approach from our findings include showing example applications [84, 127,142,209,245,325,366,382,418,476] and proof-of-concept demonstrations of a system [24,55,117,162,199,221,270,305,375,479]. Other common approaches include demonstrating a system through a workshop [244,246,476], demonstrating the idea to a focus group [9,367,472,492], carrying out case studies [105,189,324], and providing a conceptual idea [200,367,374,472,492].
Evaluation-2. Technical Evaluation: Technical Evaluation refers to how well a system performs based on internal technical measures of the system. The most common approaches for technical evaluation are measuring latency [48,51,59,69,318,495], accuracy of tracking [15,58,59,64,486], and success rate [262,314]. Also, we found some works evaluate their system performances based on the comparison with other systems, which for example, include comparing tracking algorithms with other approaches [38,59,63,132,406].
Evaluation-3. User Evaluation: User evaluation refers to measuring the effectiveness of a system through user studies. To measure the user performance when interacting with the system, there are many different approaches and methods that are used. For example, the NASA TLX questionnaire is a very popular technique for user evaluation [15,63,67,69,358], which can be found used mostly for industry related applications. Other approaches include running quantitative [38,171,184] and qualitative [21,138,165] lab studies, through interviews [64,103,443] and questionnaires [48,59,495]. Sometimes systems combine user evaluations techniques with demonstration [111,382] or technical evaluations [15,406]. In observational studies [58,369,382], researchers can also get user feedback through observations [135,448]. Finally, some systems also ran lab studies through expert interviews [10,116,340] to get specific feedback from the expert's perspectives.

DISCUSSION AND FINDINGS
Based on the analysis of our taxonomy, Figure 11 shows a summary of the number of papers for each dimension. In this section, we discuss common strategies and gaps across characteristics of selected dimensions.
Robot -Proximity Category: In terms of proximity, co-located with distance are the preferred method in AR-HRI systems (251 papers). This means that the current trend for AR-HRI systems is to have users co-located with the robot, but to not make any sort of contact with it. This also suggests that AR devices provide a promising way to interact with robots without having the need to directly make contact with it, such as performing robotic manipulation programming through AR [326].  use of virtual control handles, possibly implying that AR is not yet commonly used for providing direct control to robots.
Interactions -Level of Interactivity Category: For the Interaction Level category, we observed that explicit and indirect input is the most common approach within AR-HRI systems (295 papers). This means that user input through AR to interact with the robot must go through some sort of input mapping to accurately interact with the robot. This is an area that should be further explored, which we mention in Section 11 -Immersive Authoring and Prototyping Environments for AR-HRI. However, while AR may not be the popular approach in terms of controlling a robot's movement, as mentioned above, it is still an effective medium to provide other sorts of input, such as path trajectories [358], for robots.

Interactions -Modality Category:
In the Interaction Modality category, pointers and controllers (136 papers) and spatial gestures (116 papers) are most commonly used. Spatial gestures, for example, are used in applications such as robot gaming [273]. Furthermore, touch (66 papers) and tangibles (68 papers) are also common interaction modalities, indicating that these traditional forms of modality are seen as effective options for AR-HRI systems (for example, in applications such as medical robots [459] and collaborative robots [493]). It is promising to see how many AR-HRI systems are using tangible modalities to provide shape-changing elements [243] and control [340] to robots. Gaze and voice input are less common across the papers in our corpus, similar to proximity-based input, pointing to interesting opportunities for future work to explore these modalities in the AR-HRI context.

FUTURE OPPORTUNITIES
Finally, we formulate open research questions, challenges, and opportunities for AR and robotics research. For each opportunity, we also discuss potential research directions, providing sketches and relevant sections or references as a source of inspiration. We hope this section will guide, inspire, and stimulate the future of AR-enhanced Human-Robot Interaction (AR-HRI) research.

Making AR-HRI Practical and Ubiquitous
Technological and Practical Challenges Deployment and Evaluation In-the-Wild Opportunity-1. Making AR-HRI Practical and Ubiquitous: -Technological and Practical Challenges: While AR-HRI has a great promise, there are many technological and practical challenges ahead of us. For example, the accurate realistic superimposition or occlusion of virtual elements is still very challenging due to noisy real-time tracking. The improvement of display and tracking technologies would broaden the range of practical applications, especially when more precise alignments are needed, such as roboticassisted surgery or medical applications (Section 9.7). Moreover, error-reliable system design is also important for practical applications. AR-HRI is used to improve safety for human co-workers (Section 5.2), however, if the AR system fails in such a safetycritical situation, users might be at risk (e.g., device malfunctions, content misalignment, obscured critical objects with inappropriate content overlap, etc). It is important to increase the reliability of AR systems from both systems design and user interaction perspectives (e.g., What extent should users rely on AR systems in case the system fails? How can we avoid visual clutter or the occlusion of critical information in a dangerous area? etc). These technical and practical challenges should be addressed before we can see AR devices be common in everyday life.
-Deployment and Evaluation In-the-Wild: Related to the above, most prior AR-HRI research has been done in controlled laboratory conditions. It is still questionable whether these systems and findings can be directly applied to a real-world situation. For example, outdoor scenarios like search-and-rescue or building construction (Section 9) may require very different technical requirements than indoor scenarios (e.g., Is projection mapping visible enough outdoors? Can outside-in tracking sufficiently cover the area that needs to be tracked?). On the other hand, the current HMD devices still have many usability and technical limitations, such as display resolution, visual comfort, battery life, weight, the field of view, and latency issues. To appropriately design a practical system for real-world applications, it is important to design based on the user's needs through user-centered design by conducting a repeated cycle of interviews, prototyping, and evaluation. In particular, researchers need to carefully consider different approaches or technological choices (Section 3) to meet the user's needs. The deployment and evaluation in the wild will allow us to develop a better understanding of what kind of designs or techniques should work and what should not in real-world situations.

Designing and Exploring New AR-HRI
Re-imagining Robot Design Immersive Authoring and Prototyping

Opportunity-2. Designing and Exploring New AR-HRI:
-Re-imagining Robot Design without Physical Constraints: With AR-HRI, we have a unique opportunity to re-imagine robots design without constraints of physical reality. For example, we have covered interesting attempts from the prior works, like making nonhumanoid robots humanoid [180,198,481] or making robots visually animated [3,14,158] (Section 7.3), either through HMD [197] or projecion [14] (Section 3). However, this is just the tip of the iceberg of such possibilities. For example, what if robots would look like a fictional character [57,208] or behave like Disney's character animation? [214,434,444] We believe there is still a huge untapped design opportunity for augmented virtual skins of the robots by fully leveraging the unlimited visual expressions. In addition, there is also a rich design space of dynamic appearance change by leveraging visual illusion [259,260], such as making robots disappear [299,369], change color [152,453], or transform its shape [170,392,424,425] with the power of AR. By increasing the expressiveness of robots (Section 5.5), this could improve the engagement of the users and enable interesting applications (e.g., using drones that have facial expression [173] or human body/face [148] for remote telepresence [197]). We argue that there are still many opportunities for such unconventional robot design with expressive visual augmentation. We invite and encourage researchers to re-imagine such possibilities for the upcoming AR/MR era.
-Immersive Authoring and Prototyping Environments for AR-HRI: Prototyping functional AR-HRI systems is still very hard, given the high barrier of requirements in both software and hardware skills. Moreover, the development of such systems is pretty timeconsuming-people need to continuously move back and forth between the computer screen and the real world, which hinders the rapid design exploration and evaluation. To address this, we need a better authoring and prototyping tool that allows even nonprogrammers to design and prototype to broaden the AR-HRI research community. For example, what if, users can design and prototype interactions through direct manipulation within AR, rather than coding on a computer screen? (e.g., one metaphor is, for example, Figma for app development or Adobe Character Animator for animation) In such tools, users also must be able to design without the need for low-level robot programming, such as actuation control, sensor access, and networking. Such AR authoring tools have been explored in the HCI context [40,247,311,455] but still relatively unexplored in the domain of AR-HRI except for a few examples [51,422]. We envision the future of intuitive authoring tools should invoke further design explorations of AR-HRI systems (Section 7) by democratizing the opportunity to the broader community. Opportunity-3. AR-HRI for Better Decision-Making: -Real-time Embedded Data Visualization for AR-HRI: AR interfaces promise to support operators' complex decision-making (Section 5.2) by aggregating and visualizing various data sources, such as internal, external, or goal-related information (Section 6.1-6.3).

Real-time Embedded Data Viz for AR-HRI Explainable and Explorable Robotics
Currently, such visualizations are mainly limited with simple spatial references of user-defined data points (Section 7.2), but there is still a huge potential to connect data visualization to HRI [428] in the context of AR-HRI. For example, what if AR interfaces can directly embed real-time data onto the real world, rather than on a computer screen? We could even combine real-time visualizations with a world-in-miniature [95] to facilitate navigation in a large area, such as drone navigation for search-and-rescue. We can take inspiration from existing immersive data analysis [70,113] or real-time embedded data visualization research [423,462,463] to better design such data-centric interfaces for AR-HRI. We encourage the researchers to start thinking about how we can apply these emerging data visualization practices for AR-HRI systems in the future.
-Explainable and Explorable Robotics through AR-HRI: As robots become more and more intelligent and autonomous, it becomes more important to make the robot's decision-making process visible and interpretable. This is often called Explainable AI in the context of machine learning and AI research, but it is also becoming relevant to robotics research as when navigating in a crowded place? More importantly, these visualizations are also explorable-users can interactively explore to see how the robot's decision would change when the physical world changes (e.g., directly manipulating physical obstacles to see how the robot's optimal path updates). Such interfaces could help programmers, operators, or co-workers understand the robot's behavior more easily and interactively. Future research should connect explainable robotics with AR to better visualize the robot's decision-making process embedded in real-world.

Novel Interaction Design Enabled by AR-HRI
Natural Input Interactions with AR-HRI Blending the Virtual and Physical Worlds

Opportunity-4. Novel Interaction Design enabled by AR-HRI:
-Natural Input Interactions with AR-HRI Devices: With the proliferation of HMD devices, it is now possible to use expressive inputs more casually and ubiquitously, including gesture, gaze, head, voice, and proximity-based interaction (Section 8.2). In contrast to environment-installed tracking, HMD-based hand-and gaze-tracking could enable more natural interactions without the constraint of location. For example, with the hand-tracking capability, we can now implement expressive gesture interactions, such as finger-snap, hand-waving, hand-pointing, and mid-air drawing for swarm drone controls in entertainment, search and rescue, firefighting, or agricultural foraging [17,217]). In addition, the combination of multiple modalities, such as voice, gaze, and gesture is also an interesting direction. For example, when the user says "Can you bring this to there?", it is usually difficult to clarify the ambiguity (e.g., "this" or "there"), but with the combination of gaze and gesture, it is much easier to clarify these ambiguities within the context. AR-based visual feedback could also help the user clarify their intentions. The user could even casually register or program such a new input on-demand through end-user robot programming (Section 5.1). Exploring new interactions enabled by AR-HRI systems is also an exciting opportunity.
-Further Blending the Virtual and Physical Worlds: As robots weave themselves into the fabric of our everyday environment, the term "robots" no longer refer to only traditional humanoid or industry robots, but can become a variety of forms (Section 2.1 and Section 4.1)-from self-driving cars [7] to robotic furniture [421,478], wearable robots [99], haptic devices [449], shape-changing displays [127], and actuated interfaces [331]. These ubiquitous robots will be used to actuate our physical world to make the world more dynamic and reconfigurable. By levering both AR and this physical reconfigurability, we envision further blending virtual and physical worlds with a seamless coupling between pixels and atoms. Currently, AR is only used to visually augment appearances of the physical world. However, what if AR can also "physically" affect the real-world? For example, what if a virtual user pushes a physical wall then it moves synchronously? What if virtual wind can wave a physical cloth or flag? What if virtual explosion can make a shock wave collapse physical boxes? Such virtual-physical interactions would make AR more immersive with the power of visual illusion, which can also have some practical applications such as entertainment, remote collaboration, and education. Previously, such ideas were only partially explored [23,401], but we believe there still remains a rich design space to be further explored.
For future work, we should further seek to blend virtual and physical worlds by leveraging both visually (AR) and physically (robotic reconfiguration) programmable environments.