Humanising Text-to-Speech Through Emotional Expression in Online Courses

November 24, 2019

This paper outlines an innovative approach to evaluating the emotional content of three online courses using the affective computing approach of prosody detection on two different text-to-speech (TTS) voices in conjunction with human raters judging the emotional content of the text. This work intends to establish the potential variation on the emotional delivery of online educational resources through the use of a synthetic voice, which automatically articulates text into audio. Preliminary results from this pilot research suggest that about one out of every three sentences (35%) in a Massive Open Online Course (MOOC) contained emotional text and two existing assistive technology voices had poor emotional alignment when reading this text. Synthetic voices were more likely to be overly negative when considering their expression as compared to the emotional content of the text they are reading, which was most frequently neutral. We also analysed a synthetic voice for which we configured the emotional expression to align with course text, which showed promising improvements.