Sequence-to-Sequence Predictive models: from Prosody to Communicative Gestures
Résumé
Communicative gestures and speech prosody are tightly linked. Our aim is to predict when gestures are performed based on prosody. We develop a model based on a seq2seq recurrent neural network with attention mechanism. The model is trained on a corpus of natural dyadic interaction where the speech prosody and the gestures have been annotated. Because the output of the model is a sequence, we use a sequence comparison technique to evaluate the model performance. We find that the model can predict certain gesture classes. In our experiment, we also replace some input features with random values to find which prosody features are pertinent. We find that the F0 is pertinent. Lastly, we also train the model on one speaker and test it with the other speaker to find whether the model is generalisable. We find that the models which we train on one speaker also works for another speaker of the same conversation.
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...