The detection of mutual gaze in the context of human-robot interaction is crucial for the understanding of human partners’ behaviour. Indeed, the monitoring of the users’ gaze from a long distance enables the prediction of their intention and allows the robot to be proactive. Nonetheless, current implementations struggle or cannot operate in scenarios where detection from long distances is required. In this work, we propose a ROS2 software pipeline that detects mutual gaze up to 5 m of distance. The code relies on robust of-the-shelf perception algorithms.
You can read the whole scientific paper here.
As this year’s challenge addressed gesture generation in a dyadic context instead of a monadic one, our aim was to investigate how the previous state-of-the-art approach can be improved to be more applicable for the generation of both speaker and listener behaviours. The presented solution investigates how taking into account the conversational role of the target agent during training and inference time can influence the overall social appropriateness of the resulting gesture generation system. Our system is evaluated qualitatively based on three factors, including human likeness, appropriateness for agent speech, and appropriateness for interlocutor speech. Our results show that having separate models for listener and speaker behaviours could have potential, especially to generate better listener behaviour. However, the underlying model structures between the speaker and listener behaviour should be different, building on previous state-of-the-art monadic and dyadic solutions.
Read the full paper here.
Authors: Valeria Villani, Beatrice Capelli, Lorenzo Sabattini, UNIMORE, 2023.
The growing spread of robots for service and industrial purposes calls for versatile, intuitive and portable interaction approaches. In particular, in industrial environments, operators should be able to interact with robots in a fast, effective, and possibly effortless manner. To this end, reality enhancement techniques have been used to achieve efficient management and simplify interactions, in particular in manufacturing and logistics processes.
Building upon this, in this paper we propose a system based on mixed reality that allows a ubiquitous interface for heterogeneous robotic systems in dynamic scenarios, where users are involved in different tasks and need to interact with different robots. By means of mixed reality, users can interact with a robot through manipulation of its virtual replica, which is always colocated with the user and is extracted when interaction is needed.
The system has been tested in a simulated intralogistics setting, where different robots are present and require sporadic intervention by human operators, who are involved in other tasks. In our setting we consider the presence of drones and AGVs with different levels of autonomy, calling for different user interventions.
The proposed approach has been validated in virtual reality, considering quantitative and qualitative assessment of performance and user's feedback.
Link to the publication: https://arxiv.org/abs/2307.05280
Publication in the conference IEEE International Conference on Systems Man and Cybernetics 2023
Author: The TUDa Team
With groundbreaking dialogue applications such as ChatGPT, a question is increasingly being asked – whether dialogue agents will make humans redundant. This is a question we need to ask ourselves and one that will certainly become more relevant in the coming years. As such, it is worth pointing out the obvious weaknesses of current large-scale language models, especially when it comes to assessing whether they could make human professionals redundant.
When it comes to the potential replacement of human programmers, the evidence is sobering. Large language models are not yet ready to replace programmers, nor will they be in the near future. The recent GitHub Copilot showed that while models can provide good code suggestions, the code can still contain errors that need to be corrected by a human programmer. While progress is being made in code generation, and larger models seem to have better code generation capabilities than smaller models, we are still far from the point where long, complex programs can be generated from scratch. For high-stakes programming scenarios, such as critical infrastructure design, the transition to the use of AI programmers would be even slower.
In the case of writing-intensive fields such as journalism, there may be an opportunity to reduce the human workload by having language models generate templates for articles. However, their feasibility for factual fields will depend on the trustworthiness they gain in the future. For now, it is known that these models will continue to hallucinate, and the problem is not fully solved, so until it is, language models will not be able to replace a human journalist.
At the same time, large language models could greatly boost the productivity of human professionals. An example is the use of language models in creative writing, such as providing suggestions or ideas to a human writer. Creativity often benefits from brainstorming, which a language assistant could help with. But ultimately, creative fields are driven by the natural human desire for artistic expression, so there will always be a human involved, no matter how skilled the models become.