Authors: Luca Capra, Marico Conci, Spindox Labs

  1. What is the SERMAS Toolkit? 

The toolkit is at the core of the SERMAS modular architecture and offers a set of features developers can use when creating new applications. The toolkit setup includes the allocation of resources like computing, GPUs, storage, and database systems. 

XR Developers can login to the Toolkit Developer Console, create an agent, select modules (e.g. integrated dialogue capabilities, detection mechanisms for people and objects), and customise everything to be applied in different contexts. 

  1. What is the expected impact of the SERMAS toolkit? 

The toolkit will be exploited as a sort of platform that supports innovators in developing socially acceptable XR systems by simplifying the design, development, deployment, and management phases. 

  1. How the SERMAS Toolkit will provide the integration point for supporting the SERMAS modules? 

The toolkit offers an API to interact with the available modules. Users will be able to register and create their applications and benefit from the modules developed within SERMAS and the open call(s). We plan to provide a CLI (command line interface) to ease the interaction with the APIs and provide easy-to-use documentation to interact with the SERMAS Toolkit. 

4. What are the benefits of the SERMAS Toolkit to develop more socially acceptable systems? 

The toolkit is an integrated set of features that allows the creation of agents (virtual or robotics) which should be able to show behaviours and reactions that are accepted by the users as it was a real person. In other words, the agents should be able to automatically detect the presence of a person, act gestures such as greetings and start interacting via face-to-face communication to establish a friendly interaction. Being able to infer the emotional state of the user, the agents can adapt their responses and behaviours, accordingly, providing a more personalised and empathetic experience.  

Authors: Mohsen Mesgar, Thy Thy Tran, Goran Glavaš, and Iryna Gurevych. 2023

In task-oriented dialogue (ToD) new intents emerge on a regular basis, with a handful of available utterances at best. This renders effective Few-Shot Intent Classification (FSIC) a central challenge for modular ToD systems. Recent FSIC methods appear to be similar: they use pre-trained language models (PLMs) to encode utterances and predominantly resort to nearest-neighbor-based inference. However, they also differ in major components: they start from different PLMs, use different encoding architectures and utterance similarity functions, and adopt different training regimes.

The coupling of these vital components together with the lack of informative ablations prevents the identification of factors that drive the (reported) FSIC performance. We propose a unified framework to evaluate these components along the following key dimensions:(1) Encoding architectures: Cross-Encoder vs. Bi-Encoders;(2) Similarity function: Parameterized (i.e., trainable) vs. non-parameterized; (3) Training regimes: Episodic meta-learning vs conventional (i.e., non-episodic) training. Our experimental results on seven FSIC benchmarks reveal three new important findings. First, the unexplored combination of cross-encoder architecture and episodic meta-learning consistently yields the best FSIC performance. Second, episodic training substantially outperforms its non-episodic counterpart.

Finally, we show that splitting episodes into support and query sets has a limited and inconsistent effect on performance. Our findings show the importance of ablations and fair comparisons in FSIC. We publicly release our code and data.

Read the publication here.

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics.

Is there a shift in media organizations regarding the use of immersive tech?

Author: The DW Innovation Team

After roughly ten years of XR in international journalism, it has become pretty hard to compile a comprehensive list of interesting and influential experiments. From New York to Cologne and from Helsinki to Lyons (to name just a few hotspots where colleagues of ours used to work on projects), lots of newsrooms and R&D departments have worked with immersive tech. They took us to food banks in California and to landing sites on Mars. They put us in prison cells and on refugee boats in the Mediterranean. They showed antique artefacts, pro athletes, and the rainforest. They used gaming engines, iOS and Android SDKs, XR-enhanced browsers, cardboard contraptions, tethered headsets, stand-alone headsets, and smartphones. However, no matter how their story was produced and what it focused on, it was exactly that: a story. The better part of the XR decade was apparently dedicated to immersive reporting/journalism.

Even though it's hard to say at this point if we're looking at a real trend, it seems that the last couple of years saw at least a slight shift: away from XR as a storytelling device – and towards XR as an infrastructure technology. What may have started with a couple of immersive, avatar-driven meetings during peak COVID gradually moved to other areas of media operations, first and foremost: media production planning and media training.

Two recent R&D projects in DW's portfolio seem to be proof of this. The first one is XR4DRAMA (2020-2023) which heavily relied on AR, VR, and 3D models to improve situational awareness for reporters and filmmakers. The second one is actually SERMAS, which uses XR tech and avatars to improve and scale-up security training for journalists. At the same time, DW is still using XR storytelling tools (e.g. Fader in MediaVerse) and looking into new kinds of immersive reporting (e.g. the Snap Spectacles in a new lab project). However, the major part of the innovation's teams time and budget is currently spent on the aforementioned infrastructure projects.

The XR playground has clearly been expanded, and more changes are underway.


The world of technology has made vast strides in the last decade, with innovative new ideas and advances in fields such as Extended Reality (XR), Virtual Reality (VR), Augmented Reality (AR) and robotics. However, while there are some examples of robotics being used in everyday life - such as automatic vacuum cleaners or lawnmowers, they are still not widely used. One of the reasons we feel that we encounter this difficulty of acceptance is that the ability to interact in a natural and intuitive way is lacking. 

For example, this ability to interact is particularly relevant when it comes to public spaces such as offices, museums, and other institutions, where XR and robotics must be able to interact naturally with untrained users, forcing a level of trust and familiarity that can only come from careful design and implementation. 

Another major obstacle preventing widespread adoption is the issue of trust. Many people are understandably concerned about technology spying or stealing their personal data. Thus, it is essential that we build trust in these technologies from the ground up, ensuring that they are designed to be secure and trustworthy. 

Finally, the architecture of these systems is also important. Modular development is key to ensuring that these technologies can be customized for different situations while using the same underlying system can help people feel more comfortable and familiar with them. 

In conclusion, while XR, VR, AR and robotics have enormous potential, there are still many obstacles to their widespread use in everyday life. Natural interaction, trust and careful design are all essential if we are to make these technologies more accessible and useful for everyone. At SERMAS, we are focusing on these topics with the primary objective of humanizing these systems and making them more socially acceptable to users, thus demystifying their problematic aspects.

Author: Spindox Labs

Extended Reality (XR) is supported by a multitude of platforms, ranging from mobile devices and desktops to virtual reality (VR) and augmented reality (AR) headsets. However, this diversity of platforms poses a significant challenge for developers. They must navigate multiple development environments, languages, and tools to create XR applications that work seamlessly across all platforms. Fortunately, multiplatform XR development tools have emerged to ease this burden. 

Multiplatform XR development tools provide developers with a common framework for building XR applications across multiple platforms. These tools enable developers to write code once and deploy it across various devices, reducing development time and costs while ensuring consistent user experiences across devices. Some of the most popular multiplatform XR development tools include Unity, Unreal Engine, and Vuforia. 

Unity is a popular development platform for building 2D, 3D, and XR applications. Unity allows developers to create content for various platforms, including iOS, Android, Windows, and macOS, as well as VR and AR devices. With Unity, developers can write code in C# and deploy it across multiple devices, making it a popular choice for XR development. 

Unreal Engine is another widely used multiplatform development tool. Developed by Epic Games, Unreal Engine is a powerful engine for building high-quality, immersive XR experiences. It supports a variety of platforms, including mobile devices, desktops, consoles, and VR and AR devices. Unreal Engine also provides developers with a visual scripting language, making it an accessible option for those who prefer a more visual approach to development. 

Vuforia is a popular AR development tool that allows developers to create AR applications for various platforms, including iOS, Android, and HoloLens. Vuforia provides developers with tools for image recognition, tracking, and rendering, enabling them to create AR experiences that integrate seamlessly with the physical world. 

While these multiplatform XR development tools have simplified development across various devices, they still suffer from fragmentation. Each platform has its APIs (Application Programme Interfaces), rendering engines, and hardware requirements, making it challenging to create applications that work seamlessly across all devices.  

Source: https://www.khronos.org/openxr 

OpenXR, developed by Khronos Group, a consortium of leading technology companies, including Microsoft, Oculus, and Epic Games, among others, is an open, royalty-free standard for XR application development, that aims to simplify XR development by providing a common API that works across various platforms and devices, enabling developers to create XR applications that run smoothly across all platforms. This platform provides a unified API that abstracts the differences between various XR platforms, allowing developers to create applications that work seamlessly across different devices. This standard enables developers to write code once and deploy it across multiple platforms, reducing development time and costs while ensuring consistent user experiences across devices. 

In conclusion, multiplatform XR development tools have revolutionized the XR development landscape by enabling developers to create content for various devices. However, these tools suffer from fragmentation, making it challenging to create applications that work seamlessly across all platforms. The OpenXR platform addresses this fragmentation by providing a unified standard for XR application development, enabling developers to create XR applications that work seamlessly across various devices. As XR continues to evolve, OpenXR will play a critical role in simplifying development and accelerating the growth of XR applications. OpenXR 1.0 was released to the public by Khronos Group at SIGGRAPH 2019

Funded by the european union
Copyright by SERMAS Consortium 2023. All Rights Reserved.