As this year’s challenge addressed gesture generation in a dyadic context instead of a monadic one, our aim was to investigate how the previous state-of-the-art approach can be improved to be more applicable for the generation of both speaker and listener behaviours. The presented solution investigates how taking into account the conversational role of the target agent during training and inference time can influence the overall social appropriateness of the resulting gesture generation system. Our system is evaluated qualitatively based on three factors, including human likeness, appropriateness for agent speech, and appropriateness for interlocutor speech. Our results show that having separate models for listener and speaker behaviours could have potential, especially to generate better listener behaviour. However, the underlying model structures between the speaker and listener behaviour should be different, building on previous state-of-the-art monadic and dyadic solutions.
Read the full paper here.
OUR POLICY Privacy Policy Cookies Policy Terms of Use