Can you briefly explain what your project is all about? What’s unique about it?
The PRINIA project addresses the need to protect individual privacy and personal data in XR facial recognition. To achieve this, PRINIA develops an autonomous module that implements facial recognition in XR while ensuring the protection of individual privacy and compliance with relevant EU legislation and regulations by adopting a validated research framework and a three-step innovation process: i) privacy rules and compliance management, ii) facial recognition supporting privacy enhancement, and iii) integration for privacy-preserving XR facial recognition. The derived autonomous module supports various scenarios within XR environments, including individual identification, group categorization, and headset user identification, ensuring that users can be identified and categorized without compromising privacy or other sensitive data when using immersive technologies.
What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?
The biggest milestone achieved by the PRINIA project so far is completing the conceptual design of the PRINIA module, including the management of privacy rules/compliance and the implementation of XR facial recognition with privacy enhancement techniques. This milestone signifies a step forward in designing a system that meets the PRINIA and SERMAS projects' goals by demonstrating a path toward achieving a privacy-preserving facial recognition module for XR environments. What surprised us most on this journey is that while the EU has described several steps for privacy preservation in detail and that there is a growing interest in privacy-preserving techniques, challenges like scalability and accuracy often arise. In PRINIA, we now focus on specifying such challenges and refining differential privacy techniques to overcome these limitations.
How did you measure success?
To measure the success of PRINIA, we use SMART objectives and KPIs. They are divided into three main categories: promotion and dissemination, technical excellence, and user experience. Regarding promotion and dissemination, we have presented PRINIA in the "Shaping The Future" workshop, aiming to co-develop principles for policy recommendations for responsible innovation in virtual worlds during the esteemed CHI conference. We also publish and share newsletters periodically. Regarding technical excellence, we have tested the identification accuracy for HMD user identification while applying privacy enchantments, achieving a score of over 90%. Regarding user acceptance, we plan to perform evaluation studies in the following months, aiming to measure user acceptance, workload, and convenience during the PRINIA scenarios.
What are your goals over the next three and six, months?
Over the following months, the goal is to complete the MVPs and integrate them with the SERMAS ecosystem. In particular, during the next three months, we will focus on identifying and verifying individuals in an XR environment. For example, imagine trainees who undergo a facial recognition process for authentication during virtual journalism training, and once authenticated, they are granted access to the training environment. Next, in the following three months, we will focus on categorizing individuals into groups based on common characteristics, such as emotional states. For example, PRINIA will be able to categorize customers based on their facial attributes (e.g., satisfied customers, anxious customers, neutral customers), and personalized customer experience is provided in the customer reception kiosk.
How has SERMAS helped you during the past few months?
SERMAS has helped us in multiple ways during the past few months. We collaborate well with our mentors (King's College London), who support and guide us in making appropriate decisions and adjustments to implement the PRINIA project better and achieve the SERMAS objectives. We have regular meetings to share insights and keep track of the projects. Moreover, we receive feedback and insights from sprint review meetings with the rest of the stakeholders and partners of the SERMAS consortium. Finally, SERMAS supports us technically and financially to implement the solutions described in the PRINIA project.
Companies: Human Opsis and Algolysis.
Can you briefly explain what your project is all about? What’s unique about it?
3dify is a user-friendly and open-source web application to empower end-users to effortlessly convert 2D images into realistic 3D avatars requiring no technical skills in 3D modeling. 3Dify makes the best out of AI, intending to minimize the craft-making for extracting face features and converting them into MakeHuman representation. However, users are not substituted by AI solutions, as they can still manually refine facial features, select hairstyles, fine-tune colours and body proportions, and even add emotional expressions, creating a highly versatile tool for avatar authoring. The active role of end-users sets a new paradigm, allowing individuals to create semi-realistic avatars leveraging Unity for high-fidelity rendering, real-time shadow generation, advanced customization, dynamic animations, and emotional feedback.
What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?
The main goal of 3dify is to provide an open-source tool that allows non-experts to create a fully animated and ready-to-use 3D semi-realistic avatar.
How did you measure success?
The avatar generation combines custom algorithms and AI-based processes that accurately extract dozens of facial features from a single image to create a semirealistic avatar in less than one minute. KPIs: Number of mapped face features > 25; Response time between the submission of the input and the downloaded output < 20 sec.
What are your goals over the next three and six, months?
For the second half of the project, we plan to enhance the web application by improving avatar customization and adding a preview feature for animations that display different emotional states.
How has SERMAS helped you during the past few months?
SERMAS has been instrumental in advancing our project. The development of 3dify has helped us take a step forward in our research on Artificial Intelligence and avatar generation, which in turn will improve our academic output in the field. Their feedback and insights have been essential in improving the functionality and usability of our application.
Company: Centro Regionale Information Communication Technology
Can you briefly explain what your project is all about? What’s unique about it?
The LANGSWITCH project aims to develop socially acceptable ASR (Automatic Speech Recognition) systems suitable for use in environments and agents such as those of the SERMAS project. These ASR systems must work in real-time, perform well in noisy environments, distinguish between different speakers, respect the privacy of the user, and work in a variety of languages with very different characteristics (large, less-resourced...).
What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?
So far we have worked with English, Spanish and Basque. For English, we have outperformed a high-quality, state-of-the-art system such as Whisper in very demanding noise conditions. For Spanish and Basque, we have improved on Whisper's results in general and also in noisy conditions.
How did you measure success?
To measure the performance of our system, we use the standard WER (Word Error Rate) indicator, which is the percentage of words that are not correctly transcribed. We measure it in different noise conditions: no noise at all and different SNRs (Signal Noise Ratios), ranging from 10 dB (quite high ambient noise) to 0 dB (noise at the same volume as the speech). For English, we aimed to obtain a WER of around 5% in noisy conditions. We obtained 5.13, 5.65 and 7.21% in the 10, 8 and 5 dB conditions respectively by fine-tuning the Whisper large model and improving its results by one to two points. The results obtained by the Whisper small model and our system developed by fine-tuning it are worse, but the improvement over the base Whisper small model is greater, from two to six points. For the other languages, the results we obtain are similar, but the improvements over the base Whisper models are greater because the base Whisper models do not perform as well for these languages.
What are your goals over the next three and six, months?
Over the next six months, we plan to develop ASR systems that perform similarly in noisy environments in two more languages, French and Italian, as well as a speaker identification system that respects user privacy.
How has SERMAS helped you during the past few months?
The mentoring provided by SERMAS has been very helpful, as they have provided us with detailed use cases of ASR systems in real-world scenarios, thus pointing out the direction and specifics of our developments.
Company: ORAI NLP Teknologiak (Elhuyar Fundazioa)
Can you briefly explain what your project is all about? What’s unique about it?
3DforXR is about generating 3D assets, i.e. photorealistic 3D models, in a fast, easy and straightforward way, processing them, and then using them inside eXtended Reality applications and experiences. By combining 3D Computer Vision and Artificial Intelligence, 3DforXR will support three different modalities. 1) multi-view 3D reconstruction, where one can upload multiple overlapping images of an object and obtain automatically an exact 3D copy of it, 2) single image 3D prediction, where a single front-facing image of an object is enough to generate a 3D model that resembles, as well as possible, to reality and 3) 3D from text, where one provides just a textual description of the 3D model he needs for an XR application and 3DforXR tool generates a relative 3D asset. Although several software solutions for 3D reconstruction exist in the market, 3DforXR’s unique point is that it offers a single point, a web application at the end of the project, that combines different approaches, where one can generate 3D models optimized and ready to be used in XR applications, from different inputs, images or just text.
What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?
We are currently very close to the completion of the second release of the 3DforXR project which corresponds with the deployment of the enhanced versions of the two modalities that correspond to 3D asset generation from multiple or single images. We are excited by the progress achieved so far and were surprised in a positive way by the amelioration of the results in the demanding single image 3D prediction module. A library with processing tools to modify both the geometry and the appearance of the derived 3D models is also ready to be shared with SERMAS partners and we can’t wait for their valuable feedback on the second release of our tools.
How did you measure success?
To measure the success of the two developed modalities we performed quantitative and qualitative evaluations of their results. To evaluate the accuracy of the generated 3D models two KPIs were estimated. For the multi-view 3D reconstruction approach, we measured depth estimation accuracy which was higher than 85%. To estimate this, we run our software on synthetic data, i.e. images synthesized through computer graphics from ground-truth 3D models and compared the estimated depth maps corresponding to each image against the ground truth ones. For the single-image prediction module a similar approach was followed, where instead of depth maps, we compared the predicted 3D models against a publicly available dataset. An F-Score of 80% was achieved for the successful examples.
What are your goals over the next three and six, months?
Our main goal for the next three months is to focus on the development of the third modality of 3DforXR, which is the generation of 3D assets from textual descriptions. Looking further ahead in time in the next 6 months we are looking forward to integrating all the 3DforXR technology in the SERMAS toolkit, offering it to end users via a single web application with an intuitive User Interface and proceeding with actions towards the communication, dissemination and exploitation of the project outcomes. A preparation of a publication to share our technological progress with the scientific XR and Computer Vision community is also in our goals towards the last trimester of the project.
How has SERMAS helped you during the past few months?
Besides the obvious help of providing valuable funding to develop 3DforXR, mentoring meetings helped us verify that we are on the right track, while the feedback from the evaluation of the first Release allowed our team to focus on what is considered more important by the SERMAS consortium in order to maximize the impact of our solution.
Company: Up2metric
Can you briefly explain what your project is all about? What’s unique about it?
ALIVE aims to develop Empathic Virtual Assistants (EVAs) for customer support use cases. EVAs are capable of recognising, interpreting and responding to human emotions, by applying state-of-the-art Deep Learning (DL) algorithms for vision, text and voice-based emotion recognition. The output of the models is congregated into a unique state that is fed in real-time in the dialogue state of a Large Language Model (LLM) for empathic text generation, and in the state machine of the Avatar to adapt its state accordingly and modify each answer and interaction (e.g., facial expressions) with the maximum personalization to the user’s needs.
What’s the biggest milestone with your project your startup(s) have achieved so far, and what has surprised you most on this journey?
Our biggest milestone is the development of a pipeline, taking as input audio, text and image that recognizes the user's emotion and provides it as input to an LLM. Our project has received a lot of interest from the research community and the industry, even from stakeholders having diverse backgrounds (i.e., neuroscientists, and philosophers).
How did you measure success?
One we developed five basic emotional states, recognizable by the user. Two: we use at least three modalities as input to perceive the user’s presence/interaction and integrate at least two as aggregators of the final emotional state.
What are your goals over the next three and six, months?
Provide the generated empathetic text to an Avatar, which will adapt to them using empathetically relevant facial expressions. Prepare a small-scale pilot for validation purposes. Disseminate/exploit the project's outcomes.
How has SERMAS helped you during the past few months?
Apart from funding, we receive fruitful feedback and guidance from our mentor (Viktor Schmuck) every month.
Companies: Thingenious and IGODI
1. Nine months into the project, how do you think SERMAS is shaping its work to make a difference in the XR sphere?
The project can hopefully make XR more human-centered, more trustworthy, and thus more useful and enjoyable.
2. What is the SERMAS vision of XR?
I'm just going to spell out the acronym: socially-acceptable extended reality models and systems. That sums it up nicely.
3. In relation to the project outcomes, what is the main result that SERMAS will provide?
It's still a bit too early to predict final results, but we should aim for a modular, sustainable toolkit that allows users to create relatable XR agents which can then be put to work across different platforms and scenarios.
4. Reflecting on the past six months and gazing into the future, which trend stands out prominently and will shape the landscape of XR?
It's hard to pick one. Avatars and all kinds of virtual identities will probably play an important role. Then there's the big hype around generative AI – which will most likely lead to some sort of generative XR, i.e. we'll see more stunning images and other machine learning "magic" in augmented and virtual reality experiences. Last, but not least, there's a good chance that the introduction of the Apple Vision Pro headset will kick-off a new software development boom which will then lead to the creation of spatial computing experiences that are more sophisticated and more popular.
5. Which sector do you believe is making the most extensive use of XR technology? Additionally, looking ahead to the future, which sector do you anticipate will adopt XR technology?
I guess a lot of people in manufacturing/engineering and medicine/healthcare are using XR, especially when it comes to augmented reality tutorials and simulations. XR is also leaving its mark in education and entertainment. As for new sectors: I don't think we're going to see any surprises. People will try and upscale XR in existing fields of application, though. Think secondary schools and university labs.
6. Describe the project in one word.
Promising.
1. How do you think the SERMAS project can make a difference in the XR sphere?
The big ambition of SERMAS is to realize socially acceptable XR agents, to elevate the interaction between human users and AI-based technology to the next level. With a special focus on the robotics domain, we believe that the activities that will be developed in the project will push the boundaries of human-robot interaction, by exploiting the integration between verbal and nonverbal communication and benefiting from the synergy with different domains of research such as psychology.
2. What is the SERMAS vision of XR?
XR agent is intended to be in a very large sense, from a virtual to a physical one. This aspect confers to the project a deeper interest since it gives the opportunity to analyze the interaction from a very wide point of view.
3. What is the main result SERMAS will deliver?
We aim at implementing socially-accepted behaviours for artificial agents, such as robots so that a larger basin of users will have the occasion to interact with these machines, whose level of ubiquity has reached nowadays their highest point.
4. What do you think is the most important trend in the XR field for 2023?
We believe in the ambition of SERMAS in aiming at developing interaction modalities that involve any users, which means also people who are not used to handling high-tech tools. This aspect is, on one side, very interesting and stimulating from the research point of view since it proves researchers with novel challenges. On the other side, it favors the availability of this kind of technology to a wider range of users, including non-technicians such as elderly people, children, or patients.
5. Which sector is using the most XR at the moment?
The industrial sector is surely the domain where XR has been very much investigated as a possible auxiliary and complementary technology in addition to the classical tools, as demonstrated by the vast literature in this field, see for example [1] [2] [3].
6. Describe the project in one word.
Stimulating.
[1] Tariq Masood, Johannes Egger, “Augmented reality in support of Industry 4.0—Implementation challenges and success factors,” Robotics and Computer-Integrated Manufacturing, Volume 58, 2019, Pages 181-195, https://www.sciencedirect.com/science/article/abs/pii/S0736584518304101?casa_token=EMCUW2PGcCEAAAAA:bCE3Tezhv2vqRZqIHgpW2GdUczWPcXPGc-BOuH1H3f5ogfjvYN5fsECOeYH_cZnd5jPdtQ4c9A
[2] Luís Fernando de Souza Cardoso, Flávia Cristina Martins Queiroz Mariano, Ezequiel Roberto Zorzal, “A survey of industrial augmented reality,” Computers & Industrial Engineering, Volume 139, 2020, https://www.sciencedirect.com/science/article/abs/pii/S036083521930628X?casa_token=60JxQT7yQJMAAAAA:YmgHuvtcpAWvaI4zBr2iBd9-UkTPcKEhi4APNOPjb4asTgdZA1IhEYTc71gKNu4mZWs9LbjG2Q
[3] Eleonora Bottani & Giuseppe Vignali (2019) Augmented reality technology in the manufacturing industry: A review of the last decade, IISE Transactions, 51:3, 284-310, https://www.tandfonline.com/doi/citedby/10.1080/24725854.2018.1493244?scroll=top&needAccess=true&role=tab
1. How do you think the SERMAS project can make a difference in the XR sphere?
With a focus on social acceptance, SERMAS provides a novel framework for building agents that can interact with humans reliably to enhance the XR experience.
2. What is the SERMAS vision of XR?
We see the opportunity opened by XR and AI technologies that will revolutionize our daily lives as smartphones have done. XR technologies can be applied in most sectors to support us, humans, from public services to personalized healthcare and education. XR technologies are around for quite a while now. However, in practice, they’re rarely used. As potential use cases are rather limitless, how to include them in our daily lives “the right way” is still an open question. One important factor is the lack of user acceptability. And this is the starting point for SERMAS.
3. What is the main result SERMAS will deliver?
At TUDa, we will deploy frameworks for the design, development, deployment and management of methods for language-based interactions between humans and the SERMAS XR agent.
4. What do you think is the most important trend in the XR field for 2023?
Improve user experience with XR technologies
5. Which sector is using the most XR at the moment?
Education: Virtual training; industry and manufacturing
6. Describe the project in one word.
Innovation
1. How do you think the SERMAS project can make a difference in the XR sphere?
SERMAS will provide evidence of the importance of human factors related to technology. Thanks to XR people have access to many services but they are still looking for a ‘more natural’ interaction with a virtual system.
2. What is the SERMAS vision of XR?
SERMAS’s vision of XR is driven by a socio-technical approach. This means that XR is considered as a technology that will expand the potential of people. This will be possible both by designing an easy-to-use technology and also to be able to address important principles, such as diversity, ethics and security.
3. What is the main result SERMAS will deliver?
When a person will be able to interact with a non-human agent as they would do with another person.
4. What do you think is the most important trend in the XR field for 2023?
Given the fact that many sectors are moving quickly in this direction, there will be an increase in the usage of XR in most sectors. In my opinion: gaming, entertainment, retail and healthcare.
5. Which sector is using the most XR at the moment?
Sectors focused on social interactions and entertainment.
6. Describe the project in one word.
SERMAS is SERMAS! (when the interaction with an agent will not create any social concern, it will simply be SERMAS!) 😊
1. How do you think the SERMAS project can make a difference in the XR sphere?
What really sets SERMAS apart, in my view, is the joint multidisciplinary effort of researchers of different disciplines and sectors working on the theory and practice of XR systems. Although such multidisciplinarity is certainly not uncommon in the XR sphere, SERMAS brings an unprecedented effort to achieve not just the development of new models and systems, but also the guarantee that these models and systems will be efficient, secure, trustworthy, and thus, ultimately, socially acceptable.
2. What is the SERMAS vision of XR?
SERMAS’ vision of XR is that XR will really enter the life of the citizen only if it will be efficient, secure, trustworthy, and thus, ultimately, socially acceptable. Functionality alone won’t be enough, nor will the hype surrounding social media. It will need to be accessible, and attractive, to the ”normal” users, including those who are technology averse or usually unable or unwilling to engage with technology.
3. What is the main result SERMAS will deliver?
A novel approach to design, develop and analyse XR agents. XR agents developed with the SERMAS approach will be not just attractive for developers and users, but also trailblazing as they will pave the way to the adoption of a novel generation of XR.
4. What do you think is the most important trend in the XR field for 2023?
An advanced, secure, trustworthy socially acceptable user experience.
5. Which sector is using the most XR at the moment?
The arts, but other sectors are gaining ground rapidly, basically, all those that provide services of some form. Oh, and social media, of course.
6. Describe the project in one word.
Trailblazing.