Author: King’s College London
Societal acceptance of eXtended Reality (XR) systems will intrinsically depend upon the security of the interaction between the user and the system, which encompasses aspects such as privacy and trustworthiness. Establishing security thus necessitates treating the XR system as a socio-technical entity, wherein technology and human users engage in the exchange of messages and data. Both technology and users contribute to the overall security of the system, but they also have the potential to introduce vulnerabilities through unexpected or mutated behaviour. For instance, an XR system may misinterpret human actions due to limitations in its algorithms or understanding of human behaviour. Conversely, the users may make mistakes by deviating from the expected communication or interaction norms, which can trigger unintended responses or cause the system to start behaving unpredictably, thus disrupting the immersive experience and unknowingly compromising the system’s security.
Security developers and analysts have so far focused on XR systems primarily as technical systems, constructed upon software processes, digital communication protocols, cryptographic algorithms, and so forth. They concentrate on addressing the complexity of the system they are developing or analyzing, often neglecting to consider the human user as an integral part of the system’s security. In essence, they do not consider the importance of human factors and their impact on security. Essentially, there exists an intricate interplay between the technical aspects and the social dynamics, such as user interaction processes and behaviours, but state-of-the-art approaches are not adequately equipped to consider human behavioural or cognitive aspects in relation to the technical security of XR systems, as they typically focus on modelling basic communication systems.
To sum up, addressing security concerns of XR systems from a socio-technical lens, rather than a purely technical one, remains terra incognita, with no recognised methodologies or comprehensive toolset. Thus, formal and automated methods and tools need to be extended, or new ones developed from scratch, to tackle the challenges in designing secure content-sharing for XR systems and their interaction with humans who can misunderstand or misbehave. The Explainable Security (XSec) paradigm, which extends Explainable AI (XAI), can be considered in the security of the decisions and the explanations themselves, thereby contributing to the overall trustworthiness of the system. Moreover, since the composition of secure system components might still yield an insecure system, existing methods and tools must scale to verify that this composition yields indeed a secure XR system.
The SERMAS project aims to contribute by carrying out research and development in all of these directions.
- What is the SERMAS Toolkit?
The toolkit is at the core of the SERMAS modular architecture and offers a set of features developers can use when creating new applications. The toolkit setup includes the allocation of resources like computing, GPUs, storage, and database systems.
XR Developers can login to the Toolkit Developer Console, create an agent, select modules (e.g. integrated dialogue capabilities, detection mechanisms for people and objects), and customise everything to be applied in different contexts.
- What is the expected impact of the SERMAS toolkit?
The toolkit will be exploited as a sort of platform that supports innovators in developing socially acceptable XR systems by simplifying the design, development, deployment, and management phases.
- How the SERMAS Toolkit will provide the integration point for supporting the SERMAS modules?
The toolkit offers an API to interact with the available modules. Users will be able to register and create their applications and benefit from the modules developed within SERMAS and the open call(s). We plan to provide a CLI (command line interface) to ease the interaction with the APIs and provide easy-to-use documentation to interact with the SERMAS Toolkit.
4. What are the benefits of the SERMAS Toolkit to develop more socially acceptable systems?
The toolkit is an integrated set of features that allows the creation of agents (virtual or robotics) which should be able to show behaviours and reactions that are accepted by the users as it was a real person. In other words, the agents should be able to automatically detect the presence of a person, act gestures such as greetings and start interacting via face-to-face communication to establish a friendly interaction. Being able to infer the emotional state of the user, the agents can adapt their responses and behaviours, accordingly, providing a more personalised and empathetic experience.
Authors: Luca Capra, Marico Conci, Spindox Labs
As you know SERMAS launched its first open call to XR Innovators (researchers, startups, SMEs) to design and develop innovative solutions addressing challenges proposed by our consortium on the needs identified when working on the pilots and developing the XR Agent and the SERMAS Toolkit.
The third parties will fully leverage the potential benefits of SERMAS results to foster an XR technology adoption, and we will offer an innovative, collaborative environment with specialised infrastructure, technology and knowledge.
Our consortia innovators could apply to one of six challenges to develop high-value and impactful components, content and frameworks in AR/VR/XR. In response to our first open call, we received 37 applications from 46 entities, from 15 different countries across Europe.
From this pool, the SERMAS team carefully evaluated and selected five sub-projects that are dedicated to developing and executing innovative solutions aimed at addressing the challenges outlined by our consortium.
Follow the journey of our innovators and stay tuned for further developments.
From November 28th to December 1st, the SERMAS team was at the heart of innovation at the Immersive Tech Week 2023 in Rotterdam. This international gathering served as a convergence point for developers in the extended reality (XR) field, fostering collaboration, and highlighting projects and technologies that will shape the future of immersive technology.
One of the standout moments during the event was SERMAS's active participation in the F6S Innovation session titled "Connecting Founders to Horizon Europe Funding Opportunities." This session explained the role played by European funds in driving XR innovation to new heights and facilitating collaborative ventures. Besides showcasing SERMAS's objectives and opportunities, the session also showed vast opportunities for XR enthusiasts to access funding and support from Horizon Europe.
SERMAS shared the spotlight with other XR projects, each contributing to the immersive technology landscape in unique ways: VOX Reality, XR2Learn, XR4ED, and CORTEX2. This collective showcase demonstrated the diversity and depth of XR applications, illustrating how this technology reshapes industries and experiences across the board.
At the vibrant heart of the event, booths 18 and 19 served as the dynamic playground for all participating projects to share their technologies and captivating demos. Here, SERMAS, joined by the partners from DW Innovation and Spindox Labs, shared the preliminary outcomes of our project, shedding light on the latest developments in our pilots and presenting the innovative SERMAS Toolkit. These booths became the connection where attendees were not only introduced to our current endeavours but also gained insight into the trajectory we envision for the future of the SERMAS project.
Summing up, the Immersive Tech Week 2023 event was not merely a showcase of technology; it was an exciting glimpse into the future of the power of XR. We share here our thank you to the entire VRDays team for organising this event and providing a space for innovation and collaboration.
Let’s see what the future holds for next year in XR and the SERMAS project. Don’t forget to join us on this journey as we continue to push the boundaries to make XR systems more socially acceptable.
See you next year!
Author: DW Innovation
The most excited we've seen the teenage brother of a consortium member this year, was when the makers of Baldur's Gate, a famous role-playing game (RPG), made the following announcement: Henceforth, players can change their avatars during the game. Gaming studio Patch 3 explained that with the new update, it'll be possible to customize characters even after exiting the respective editor. That's a novel thing in the world of avatar-based games, a feature enabled by the progress made in the field of generative AI. This technology now greatly simplifies the process of building virtual environments and its characters, thus enhancing the range of creative choices.
Avatars, AI, and the metaverse
So what is an avatar? Simply put, it's a virtual (maybe slightly fictitious) representation of yourself, capable of performing actions (e.g. engaging in conversations) in a digital space. It can also function as a bot modelled on a human counterpart. Or as an entirely virtual entity/agent without any blueprint in the physical world. Depending on the use case, generative AI can help design and change appearance, voices, movements, as well as thoughts and dialogues of an avatar in what is often referred to as the metaverse now. The concept of the avatar is actually decades old (think: Ultima, Second Life, World of Warcraft etc.), but it's only now – in the 2020s – that avatars are becoming an integral part of all kinds of digital services and entertainment. How did we get to this point?
3D, XR, the pandemic, and "the way of the future"
A very brief explanation could go like this: Advancements in computer graphics and streaming had already enabled a shift from 2D to 3D in terms of avatars and agents and catapulted video calls into the mainstream in the late 2010s. And then, in 2020, the pandemic hit. People were isolated. And people quickly realized that social interaction is an essential and fundamental need. So a lot of companies started pumping a lot of money into extended reality (XR) tech and collaborative online spaces. In this context, popular gaming platforms like Roblox also invested heavily into the expansion of avatar tech (and there didn't seem to be any problems in funding all this).
Last year, an AI avatar of Elvis Presley thrilled the audience of "America's Got Talent". Created by a singer and a technologist working together, it gave a convincing performance of the song "Hound Dog". In London’s West End, the ABBA show "Voyage", which features all four members of the Swedish band in avatar form, has hit the mark of one million spectators. ABBA's Björn Ulvaeus says that avatars are "the way of the future". And Tom Hanks is convinced that he will act beyond his grave – in the form of an AI-driven avatar.
Metaverse platform Ready Player Me (named after the 2011 VR sci-fi bestseller Ready Player One), recently launched its generative AI avatar creator. And similar tools are popping up just everywhere. Another one is Magic Avatars, a service that creates realistic 3D avatars from a single photo. There's also Synthesia, the most advanced AI avatar video maker in the market right now, with more than 120 languages and accents available. The NBA created new AI tech which allows fans to have their very own avatar replace a regular player in a game – and Microsoft recently introduced an avatar generator for MS teams. These avatars are the harbingers of Microsoft Mesh, the Microsoft metaverse that's not available yet. With their avatar, users are supposed to take part in XR meetings and work collaboratively, without having to turn on their cameras. Microsoft advertises that employees can appear in any form they choose and that feels most comfortable, which is intended to promote diversity and inclusion.
Inclusive avatars
Speaking of which, avatars are also used as sign language interpreters now. Israeli startup CODA aims to make video watching a lot easier for the deaf and hearing-impaired community. To this end, the company enlists AI-driven avatars able to translate spoken language into sign language almost instantaneously. That's next level because up until recently, similar companies like signer.ai (India) or Signapse (UK) only offered to translate written text into sign language.
What's next?
In a nutshell, there are all kinds of (more or less) sophisticated avatars for all kinds of purposes now, but one big challenge remains: Creating an avatar / 3D character / virtual agent (call it what you like) which is portable, sustainable, versatile – and can be used in multiple environments. A solution like this would be very appealing to users facing a complex, rapidly growing digital/immersive space. Hopefully, SERMAS can make a positive contribution here.
In the meantime, let's all enjoy playing a little with 3D authoring tools, generative AI and immersive RPG worlds.
Authors: Valeria Villani, Beatrice Capelli, Lorenzo Sabattini, UNIMORE, 2023.
The growing spread of robots for service and industrial purposes calls for versatile, intuitive and portable interaction approaches. In particular, in industrial environments, operators should be able to interact with robots in a fast, effective, and possibly effortless manner. To this end, reality enhancement techniques have been used to achieve efficient management and simplify interactions, in particular in manufacturing and logistics processes.
Building upon this, in this paper we propose a system based on mixed reality that allows a ubiquitous interface for heterogeneous robotic systems in dynamic scenarios, where users are involved in different tasks and need to interact with different robots. By means of mixed reality, users can interact with a robot through manipulation of its virtual replica, which is always colocated with the user and is extracted when interaction is needed.
The system has been tested in a simulated intralogistics setting, where different robots are present and require sporadic intervention by human operators, who are involved in other tasks. In our setting we consider the presence of drones and AGVs with different levels of autonomy, calling for different user interventions.
The proposed approach has been validated in virtual reality, considering quantitative and qualitative assessment of performance and user's feedback.
Link to the publication: https://arxiv.org/abs/2307.05280
Publication in the conference IEEE International Conference on Systems Man and Cybernetics 2023
Authors: Mohsen Mesgar, Thy Thy Tran, Goran Glavaš, and Iryna Gurevych. 2023
In task-oriented dialogue (ToD) new intents emerge on a regular basis, with a handful of available utterances at best. This renders effective Few-Shot Intent Classification (FSIC) a central challenge for modular ToD systems. Recent FSIC methods appear to be similar: they use pre-trained language models (PLMs) to encode utterances and predominantly resort to nearest-neighbor-based inference. However, they also differ in major components: they start from different PLMs, use different encoding architectures and utterance similarity functions, and adopt different training regimes.
The coupling of these vital components together with the lack of informative ablations prevents the identification of factors that drive the (reported) FSIC performance. We propose a unified framework to evaluate these components along the following key dimensions:(1) Encoding architectures: Cross-Encoder vs. Bi-Encoders;(2) Similarity function: Parameterized (i.e., trainable) vs. non-parameterized; (3) Training regimes: Episodic meta-learning vs conventional (i.e., non-episodic) training. Our experimental results on seven FSIC benchmarks reveal three new important findings. First, the unexplored combination of cross-encoder architecture and episodic meta-learning consistently yields the best FSIC performance. Second, episodic training substantially outperforms its non-episodic counterpart.
Finally, we show that splitting episodes into support and query sets has a limited and inconsistent effect on performance. Our findings show the importance of ablations and fair comparisons in FSIC. We publicly release our code and data.
Read the publication here.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics.
1. Nine months into the project, how do you think SERMAS is shaping its work to make a difference in the XR sphere?
The project can hopefully make XR more human-centered, more trustworthy, and thus more useful and enjoyable.
2. What is the SERMAS vision of XR?
I'm just going to spell out the acronym: socially-acceptable extended reality models and systems. That sums it up nicely.
3. In relation to the project outcomes, what is the main result that SERMAS will provide?
It's still a bit too early to predict final results, but we should aim for a modular, sustainable toolkit that allows users to create relatable XR agents which can then be put to work across different platforms and scenarios.
4. Reflecting on the past six months and gazing into the future, which trend stands out prominently and will shape the landscape of XR?
It's hard to pick one. Avatars and all kinds of virtual identities will probably play an important role. Then there's the big hype around generative AI – which will most likely lead to some sort of generative XR, i.e. we'll see more stunning images and other machine learning "magic" in augmented and virtual reality experiences. Last, but not least, there's a good chance that the introduction of the Apple Vision Pro headset will kick-off a new software development boom which will then lead to the creation of spatial computing experiences that are more sophisticated and more popular.
5. Which sector do you believe is making the most extensive use of XR technology? Additionally, looking ahead to the future, which sector do you anticipate will adopt XR technology?
I guess a lot of people in manufacturing/engineering and medicine/healthcare are using XR, especially when it comes to augmented reality tutorials and simulations. XR is also leaving its mark in education and entertainment. As for new sectors: I don't think we're going to see any surprises. People will try and upscale XR in existing fields of application, though. Think secondary schools and university labs.
6. Describe the project in one word.
Promising.
Is there a shift in media organizations regarding the use of immersive tech?
Author: The DW Innovation Team
After roughly ten years of XR in international journalism, it has become pretty hard to compile a comprehensive list of interesting and influential experiments. From New York to Cologne and from Helsinki to Lyons (to name just a few hotspots where colleagues of ours used to work on projects), lots of newsrooms and R&D departments have worked with immersive tech. They took us to food banks in California and to landing sites on Mars. They put us in prison cells and on refugee boats in the Mediterranean. They showed antique artefacts, pro athletes, and the rainforest. They used gaming engines, iOS and Android SDKs, XR-enhanced browsers, cardboard contraptions, tethered headsets, stand-alone headsets, and smartphones. However, no matter how their story was produced and what it focused on, it was exactly that: a story. The better part of the XR decade was apparently dedicated to immersive reporting/journalism.
Even though it's hard to say at this point if we're looking at a real trend, it seems that the last couple of years saw at least a slight shift: away from XR as a storytelling device – and towards XR as an infrastructure technology. What may have started with a couple of immersive, avatar-driven meetings during peak COVID gradually moved to other areas of media operations, first and foremost: media production planning and media training.
Two recent R&D projects in DW's portfolio seem to be proof of this. The first one is XR4DRAMA (2020-2023) which heavily relied on AR, VR, and 3D models to improve situational awareness for reporters and filmmakers. The second one is actually SERMAS, which uses XR tech and avatars to improve and scale-up security training for journalists. At the same time, DW is still using XR storytelling tools (e.g. Fader in MediaVerse) and looking into new kinds of immersive reporting (e.g. the Snap Spectacles in a new lab project). However, the major part of the innovation's teams time and budget is currently spent on the aforementioned infrastructure projects.
The XR playground has clearly been expanded, and more changes are underway.