Our second open call has wrapped up, and while we await the results, we wanted to share some insights. SERMAS OC2 DEMONSTRATE aimed to push the boundaries by funding innovative AR/VR/XR sub-projects. This round invited fresh ideas from new industries and domains beyond those explored in the SERMAS pilots.

OC 2 Demonstration

We encouraged applicants to take the cutting-edge XR technologies from the SERMAS Toolkit and reimagine them within their chosen fields, aligning with the vision of their proposed projects. The focus was on applying these innovations to one or more key areas: training, services, and guidance.

OC2 Demo statistics - countries
OC2 Demo statistics - sectors
OC2 Demo statistics - domains
OC2 Demo statistics - applications and entries

Here’s a look at the results: We had over 100 applications initiated in our pipeline (a big thank you to all the innovators who applied!). A total of 81 entities submitted applications. Greece took the lead with 13 applications, followed closely by Italy with 10, and Portugal rounding out the top three with seven applications.

In terms of domains, healthcare came out on top, representing 31% of submissions. Manufacturing, Education, and Cultural Heritage were tied, each with 19%. As for the sectors, training and services were neck and neck, with 39% and 36% of applications, respectively.

We’re in the final stages of selecting our winners—stay tuned!

Can you briefly explain what your project is all about? What’s unique about it?

The PRINIA project addresses the need to protect individual privacy and personal data in XR facial recognition. To achieve this, PRINIA develops an autonomous module that implements facial recognition in XR while ensuring the protection of individual privacy and compliance with relevant EU legislation and regulations by adopting a validated research framework and a three-step innovation process: i) privacy rules and compliance management, ii) facial recognition supporting privacy enhancement, and iii) integration for privacy-preserving XR facial recognition. The derived autonomous module supports various scenarios within XR environments, including individual identification, group categorization, and headset user identification, ensuring that users can be identified and categorized without compromising privacy or other sensitive data when using immersive technologies.

What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?

The biggest milestone achieved by the PRINIA project so far is completing the conceptual design of the PRINIA module, including the management of privacy rules/compliance and the implementation of XR facial recognition with privacy enhancement techniques. This milestone signifies a step forward in designing a system that meets the PRINIA and SERMAS projects' goals by demonstrating a path toward achieving a privacy-preserving facial recognition module for XR environments. What surprised us most on this journey is that while the EU has described several steps for privacy preservation in detail and that there is a growing interest in privacy-preserving techniques, challenges like scalability and accuracy often arise. In PRINIA, we now focus on specifying such challenges and refining differential privacy techniques to overcome these limitations.

How did you measure success?

To measure the success of PRINIA, we use SMART objectives and KPIs. They are divided into three main categories: promotion and dissemination, technical excellence, and user experience. Regarding promotion and dissemination, we have presented PRINIA in the "Shaping The Future" workshop, aiming to co-develop principles for policy recommendations for responsible innovation in virtual worlds during the esteemed CHI conference. We also publish and share newsletters periodically. Regarding technical excellence, we have tested the identification accuracy for HMD user identification while applying privacy enchantments, achieving a score of over 90%. Regarding user acceptance, we plan to perform evaluation studies in the following months, aiming to measure user acceptance, workload, and convenience during the PRINIA scenarios.

What are your goals over the next three and six, months?

Over the following months, the goal is to complete the MVPs and integrate them with the SERMAS ecosystem. In particular, during the next three months, we will focus on identifying and verifying individuals in an XR environment. For example, imagine trainees who undergo a facial recognition process for authentication during virtual journalism training, and once authenticated, they are granted access to the training environment. Next, in the following three months, we will focus on categorizing individuals into groups based on common characteristics, such as emotional states. For example, PRINIA will be able to categorize customers based on their facial attributes (e.g., satisfied customers, anxious customers, neutral customers), and personalized customer experience is provided in the customer reception kiosk.

How has SERMAS helped you during the past few months?

SERMAS has helped us in multiple ways during the past few months. We collaborate well with our mentors (King's College London), who support and guide us in making appropriate decisions and adjustments to implement the PRINIA project better and achieve the SERMAS objectives. We have regular meetings to share insights and keep track of the projects. Moreover, we receive feedback and insights from sprint review meetings with the rest of the stakeholders and partners of the SERMAS consortium. Finally, SERMAS supports us technically and financially to implement the solutions described in the PRINIA project.

The PRINIA project was presented by George E. Raptis, Designer and Researcher at Human Opsis, at the Shaping the Future Workshop.

Companies: Human Opsis and Algolysis.

Can you briefly explain what your project is all about? What’s unique about it?

3dify is a user-friendly and open-source web application to empower end-users to effortlessly convert 2D images into realistic 3D avatars requiring no technical skills in 3D modeling. 3Dify makes the best out of AI, intending to minimize the craft-making for extracting face features and converting them into MakeHuman representation. However, users are not substituted by AI solutions, as they can still manually refine facial features, select hairstyles, fine-tune colours and body proportions, and even add emotional expressions, creating a highly versatile tool for avatar authoring. The active role of end-users sets a new paradigm, allowing individuals to create semi-realistic avatars leveraging Unity for high-fidelity rendering, real-time shadow generation, advanced customization, dynamic animations, and emotional feedback.

What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?

The main goal of 3dify is to provide an open-source tool that allows non-experts to create a fully animated and ready-to-use 3D semi-realistic avatar.

How did you measure success?

The avatar generation combines custom algorithms and AI-based processes that accurately extract dozens of facial features from a single image to create a semirealistic avatar in less than one minute. KPIs: Number of mapped face features > 25; Response time between the submission of the input and the downloaded output < 20 sec.

What are your goals over the next three and six, months?

For the second half of the project, we plan to enhance the web application by improving avatar customization and adding a preview feature for animations that display different emotional states.

How has SERMAS helped you during the past few months?

SERMAS has been instrumental in advancing our project. The development of 3dify has helped us take a step forward in our research on Artificial Intelligence and avatar generation, which in turn will improve our academic output in the field. Their feedback and insights have been essential in improving the functionality and usability of our application.

3DIFY - Team

Company: Centro Regionale Information Communication Technology

Can you briefly explain what your project is all about? What’s unique about it?

The LANGSWITCH project aims to develop socially acceptable ASR (Automatic Speech Recognition) systems suitable for use in environments and agents such as those of the SERMAS project. These ASR systems must work in real-time, perform well in noisy environments, distinguish between different speakers, respect the privacy of the user, and work in a variety of languages with very different characteristics (large, less-resourced...).

What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?

So far we have worked with English, Spanish and Basque. For English, we have outperformed a high-quality, state-of-the-art system such as Whisper in very demanding noise conditions. For Spanish and Basque, we have improved on Whisper's results in general and also in noisy conditions.

How did you measure success?

To measure the performance of our system, we use the standard WER (Word Error Rate) indicator, which is the percentage of words that are not correctly transcribed. We measure it in different noise conditions: no noise at all and different SNRs (Signal Noise Ratios), ranging from 10 dB (quite high ambient noise) to 0 dB (noise at the same volume as the speech). For English, we aimed to obtain a WER of around 5% in noisy conditions. We obtained 5.13, 5.65 and 7.21% in the 10, 8 and 5 dB conditions respectively by fine-tuning the Whisper large model and improving its results by one to two points. The results obtained by the Whisper small model and our system developed by fine-tuning it are worse, but the improvement over the base Whisper small model is greater, from two to six points. For the other languages, the results we obtain are similar, but the improvements over the base Whisper models are greater because the base Whisper models do not perform as well for these languages.

What are your goals over the next three and six, months?

Over the next six months, we plan to develop ASR systems that perform similarly in noisy environments in two more languages, French and Italian, as well as a speaker identification system that respects user privacy.

How has SERMAS helped you during the past few months?

The mentoring provided by SERMAS has been very helpful, as they have provided us with detailed use cases of ASR systems in real-world scenarios, thus pointing out the direction and specifics of our developments.

Company: ORAI NLP Teknologiak (Elhuyar Fundazioa)

Can you briefly explain what your project is all about? What’s unique about it?

3DforXR is about generating 3D assets, i.e. photorealistic 3D models, in a fast, easy and straightforward way, processing them, and then using them inside eXtended Reality applications and experiences. By combining 3D Computer Vision and Artificial Intelligence, 3DforXR will support three different modalities. 1) multi-view 3D reconstruction, where one can upload multiple overlapping images of an object and obtain automatically an exact 3D copy of it, 2) single image 3D prediction, where a single front-facing image of an object is enough to generate a 3D model that resembles, as well as possible, to reality and 3) 3D from text, where one provides just a textual description of the 3D model he needs for an XR application and 3DforXR tool generates a relative 3D asset. Although several software solutions for 3D reconstruction exist in the market, 3DforXR’s unique point is that it offers a single point, a web application at the end of the project, that combines different approaches, where one can generate 3D models optimized and ready to be used in XR applications, from different inputs, images or just text.

What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?

We are currently very close to the completion of the second release of the 3DforXR project which corresponds with the deployment of the enhanced versions of the two modalities that correspond to 3D asset generation from multiple or single images. We are excited by the progress achieved so far and were surprised in a positive way by the amelioration of the results in the demanding single image 3D prediction module. A library with processing tools to modify both the geometry and the appearance of the derived 3D models is also ready to be shared with SERMAS partners and we can’t wait for their valuable feedback on the second release of our tools.

How did you measure success?

To measure the success of the two developed modalities we performed quantitative and qualitative evaluations of their results. To evaluate the accuracy of the generated 3D models two KPIs were estimated. For the multi-view 3D reconstruction approach, we measured depth estimation accuracy which was higher than 85%. To estimate this, we run our software on synthetic data, i.e. images synthesized through computer graphics from ground-truth 3D models and compared the estimated depth maps corresponding to each image against the ground truth ones. For the single-image prediction module a similar approach was followed, where instead of depth maps, we compared the predicted 3D models against a publicly available dataset. An F-Score of 80% was achieved for the successful examples.

What are your goals over the next three and six, months?

Our main goal for the next three months is to focus on the development of the third modality of 3DforXR, which is the generation of 3D assets from textual descriptions. Looking further ahead in time in the next 6 months we are looking forward to integrating all the 3DforXR technology in the SERMAS toolkit, offering it to end users via a single web application with an intuitive User Interface and proceeding with actions towards the communication, dissemination and exploitation of the project outcomes. A preparation of a publication to share our technological progress with the scientific XR and Computer Vision community is also in our goals towards the last trimester of the project.

How has SERMAS helped you during the past few months?

Besides the obvious help of providing valuable funding to develop 3DforXR, mentoring meetings helped us verify that we are on the right track, while the feedback from the evaluation of the first Release allowed our team to focus on what is considered more important by the SERMAS consortium in order to maximize the impact of our solution.

Company: Up2metric

Can you briefly explain what your project is all about? What’s unique about it?

ALIVE aims to develop Empathic Virtual Assistants (EVAs) for customer support use cases. EVAs are capable of recognising, interpreting and responding to human emotions, by applying state-of-the-art Deep Learning (DL) algorithms for vision, text and voice-based emotion recognition. The output of the models is congregated into a unique state that is fed in real-time in the dialogue state of a Large Language Model (LLM) for empathic text generation, and in the state machine of the Avatar to adapt its state accordingly and modify each answer and interaction (e.g., facial expressions) with the maximum personalization to the user’s needs.

What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?

Our biggest milestone is the development of a pipeline, taking as input audio, text and image that recognizes the user's emotion and provides it as input to an LLM. Our project has received a lot of interest from the research community and the industry, even from stakeholders having diverse backgrounds (i.e., neuroscientists, and philosophers).

How did you measure success?

One we developed five basic emotional states, recognizable by the user. Two: we use at least three modalities as input to perceive the user’s presence/interaction and integrate at least two as aggregators of the final emotional state.

What are your goals over the next three and six, months?

Provide the generated empathetic text to an Avatar, which will adapt to them using empathetically relevant facial expressions. Prepare a small-scale pilot for validation purposes. Disseminate/exploit the project's outcomes.

How has SERMAS helped you during the past few months?

Apart from funding, we receive fruitful feedback and guidance from our mentor (Viktor Schmuck) every month.

Companies: Thingenious and IGODI

Author: Viktor Schmuck, Research Associate in Robotics at King's College London

On the 21st of May 2024, the European Council formally adopted the EU Artificial Intelligence (AI) Act. But what does it cover, and how does it impact solutions, such as the Socially-acceptable Extended Reality Models And Systems (SERMAS) project, which deals with biometric identification and emotion recognition? 

According to the official website of the EU Artificial Intelligence Act, the purpose of the drawn regulation is to promote the adoption of human-centric AI solutions that people can trust, ensuring that such products are developed having in mind the health, safety, and fundamental rights of people. As such, the EU AI Act outlines, among others, a set of rules, prohibitions, and requirements for AI systems and their operators.  

When analysing something like the result of the SERMAS project, the SERMAS Toolkit, from the EU AI Act’s point of view, we don’t only need to verify whether the designed solution is compliant with the regulations laid down but also assess if the outcomes of the project, such as exploitable results (e.g., a software product, solutions deployed “in the wild”), will be compliant with them.  

Parts of the Toolkit are being developed in research institutions, such as SUPSI, TUDa, or KCL, which makes it fall under the exclusion criteria of scientific research and development according to Article 2(6). In addition, since the AI systems and models are not yet placed onto the market or in service, the AI Act regulations are not yet applicable for the Toolkit according to Article 2(8) as well. As a general rule of thumb, if a solution is either being developed solely as a scientific research activity or is not yet placed on the market, it does not need to go through the administrative process and risk assessment outlined by the Act. 

That being said, if a solution involving AI is planned to be put on the market or be provided as a service for those who wish to deploy it, even if it’s open source, it is better to design its components with the regulations in mind and prepare the necessary documentation required for the legal release of the software. This article outlines some of these aspects on the example of the SERMAS Toolkit. Still, it’s important to emphasize that AI systems need to be individually evaluated (and e.g., registered) against the regulations of the Act. 

First, we should look at the syntax relevant to the SERMAS Toolkit. These terms are all outlined in Article 3 of the regulations. To begin with, the SERMAS Toolkit is considered an “AI system” since it is a “machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for ... implicit objectives, infers, from the input it receives” (Article 3(1)). A part of the SERMAS Toolkit is a virtual agent whose facial expressions and behaviour are adjusted based on how it perceives users, and since it is capable of user identification. However, the system is not fully autonomous as it has predefined knowledge and behaviour that it exhibits which is not governed by AI systems. Therefore, it is considered an AI system that has varied levels of autonomy. Speaking of user identification, the Toolkit also deals with:  

While Article 6 outlines that systems that have biometric identification and emotion recognition count as High-Risk AI systems, the SERMAS Toolkit is only considered to be one due to the emotion recognition, since Annex III states that “biometric identification systems ... shall not include AI systems intended to be used for biometric verification the sole purpose of which is to confirm that a specific natural person is the person he or she claims to be” (Annex III(1a)) - which is the only use-case of biometric identification. However, Article 6(2) and Annex III also specify that an AI system is considered high risk if it is “intended to be used for emotion recognition” (Annex III(1c)) to any extent. While according to the EU AI Act, an AI system having emotion recognition would be prohibited, the SERMAS Toolkit and any other system can be deployed if a review body finds that it does not negatively affect its users (e.g., cause harm or discriminate against them). Highlighting factors relevant to the SERMAS Toolkit, each AI system is evaluated based on the “intended purpose of the AI system”, the extent of its use, the amount of data it processes, and the extent to which it acts autonomously (Article 7(2a-d)). Moreover, the review body evaluates the potential of harm or people suffering an adverse impact by using the system, and the possibility of a person overriding a “decision or recommendation that may lead to potential harm” (Article 7(2g)). Finally, “the magnitude and likelihood of benefit of the deployment of the AI system for individuals, groups, or society at large, including possible improvements in product safety” (Article 7(2j)) is also taken into account.  

But how does the SERMAS Toolkit compare to the above criteria? To begin with, it only processes the absolute necessary, minimum amount of data and is undergoing multiple reviews during its conception to ensure the secure management of any identifiable personal data. Moreover, it only uses user identification to enhance a building’s security in an access-control scenario. And finally, it solely uses emotion recognition with the aim of providing and enhancing user experience by adapting the behaviour of a virtual agent. This tailored experience does not change the output of the agent, only its body language and facial expressions – which can only take shape as neutral to positive nonverbal features. As such, the agent does not and can not discriminate against people.  

So, while the Toolkit is not a prohibited solution after all, it is still considered a High-risk system, which means that upon deployment, or when provided to deployers, some regulations need to be complied with, as outlined by the Act. For instance, a risk-management system needs to be implemented and periodically revised with focus on the system’s biometric identification and emotion recognition sub-systems (Article 9(1-2)). Since these sub-systems are model-based classification or identification methods, the regulations outlined by Article 10(1-6) for the training, validation, and testing of the underlying models and datasets should be followed (e.g., datasets used for training the models should be ethically collected, with as few errors as possible, and without bias). Moreover, a thorough technical documentation is expected to be published and updated according to Article 11(1), and the system should come with appropriate logging of interactions (Article 12(3)) and overseeing tools (Article 14(1-2)), especially during identification processes. Lastly, when the Toolkit is put on the market, which is part of the exploitable results of the project, it needs to undergo assessment and is required to be registered in a database with its accompanying documentation (Article 14(1-2)). 

In conclusion, the EU AI Act means that AI systems dealing with personal data, especially when it comes to emotion recognition or identification that affects how an autonomous system operates, need to comply with the laid-out regulations. Solutions put in service or on the market may be classified as High-risk and would be prohibited by default. However, a thorough assessment procedure, compliance with additional security requirements, logging, and documentation may mean that an AI system can be cleared to be deployed after all.  As for nonprofessional or scientific research purpose systems, such as the still under-development SERMAS Toolkit, they are encouraged to voluntarily comply with the requirements outlined by the Act (Recital 109), and in the long run, they will benefit from being developed with the requirements in mind. 

The EU AI Act can be explored in detail here, and to get a quick assessment of whether an AI system that will be put on the market or in service is affected by it, and to what extent, they also provide this compliance checker tool

(*hits play*)

Do you know this song? Some of you probably do, others maybe not. Some are probably searching for it online at this very moment, and maybe most of you just don’t care about the type of music we listen to while we work. In any case, the message is the same: meeting you is all we want to do!

If you are an innovator, device manufacturer, technology provider and integrator in the AI/ML/VR/XR, connect with us because our OC2 DEMONSTRATE is running and “hey, we’ve been trying to meet you”. 🎶

Why should you meet us, you ask?

SERMAS offers an innovative, collaborative environment with specialised infrastructure, technology, knowledge, and the chance of being funded with up to EUR 150 000 per sub-project (lump sum per consortia).

Our OC2 DEMONSTRATE sub-projects must provide solutions as autonomous applications built on the SERMAS API and using the SERMAS Toolkit for web-based digital avatars and/or the SERMAS ROS2 Proxy for robotics applications.

We cover nine domains/industries: education, energy, tourism, cultural heritage, health, manufacturing, bank & insurance, retail and marketing & advertising. These domains should be considered in one or multiple of the overarching fields of application: training, services, and guidance.

The consortium of applicants is encouraged to use the XR technologies from the SERMAS Toolkit and adapt them to the domain of their choice. It’s possible to select more than one model and/or tool to be applied to their proposed agent and facilitate the demonstration of the pilot.

When should we do it?

Now is the time. Read the guidelines for applicants, and apply by 26 June 2024 - 17:00 CEST.

From the floor, of Immersive Tech Week in Rotterdam, we interview the partners from SERMAS, XR2Learn, VOXReality, CORTEX2, and XR4ED about the predictions for the future of XR, project expectations for 2024, and the benefits of XR technologies

Dive into this insightful discussion with Axel Primavesi, Ioannis Chatzigiannakis, Jordane RICHTER, Charles Gosme, Olga Chatzifoti, Fotis Liarokapis, Alain Pagani, Moonisa Ahsan. Thank you for your valuable insights!

Watch the full interview here.

As you know SERMAS launched its first open call to XR Innovators (researchers, startups, SMEs) to design and develop innovative solutions addressing challenges proposed by our consortium on the needs identified when working on the pilots and developing the XR Agent and the SERMAS Toolkit.

The third parties will fully leverage the potential benefits of SERMAS results to foster an XR technology adoption, and we will offer an innovative, collaborative environment with specialised infrastructure, technology and knowledge.

Our consortia innovators could apply to one of six challenges to develop high-value and impactful components, content and frameworks in AR/VR/XR. In response to our first open call, we received 37 applications from 46 entities, from 15 different countries across Europe.

From this pool, the SERMAS team carefully evaluated and selected five sub-projects that are dedicated to developing and executing innovative solutions aimed at addressing the challenges outlined by our consortium.

Follow the journey of our innovators and stay tuned for further developments.

Funded by the european union
Copyright by SERMAS Consortium 2023. All Rights Reserved.