News & Analysis
/
Article

Take Note: testing the intelligibility of synthesized speech

APR 18, 2025
Automated platforms may help inform enhancements in Text-To-Speech technology
Take Note: testing the intelligibility of synthesized speech internal name

Take Note: testing the intelligibility of synthesized speech lead image

Speech synthesis technology has seen major advancements since 1980, when several Text-To-Speech (TTS) systems first became publicly available. Improvements in acoustics, linguistics, signal processing and the advent of artificial intelligence have enabled many now-common TTS systems that can synthesize human voices intelligibly, even in noisy conditions. Automatic Speech Recognition (ASR) systems have also seen huge improvements throughout the years. Some ASR systems actually differentiate speech from noise better than human listeners.

Yang et al. evaluated speech intelligibility of both human and synthesized voices using speech recordings of four human talkers (two female, two male) and twelve synthesized voices (six females, six males). Artificially synthesized speech was generated by three commercial TTS platforms: Amazon Polly, Microsoft Azure Text-To-Speech and Google Text-To-Speech.

The researchers further transcribed those speech recordings in noisy environments using five ASR platforms and found that two of the systems tested recognized 10% more words than did their human counterparts.

“With the help of modern ASR systems, we can start thinking about further improving speech synthesis technology,” said author Ye Yang. “For example, we may use ASR to screen highly intelligible speech materials and use these materials to develop a speech synthesis system that produces highly intelligible speech. Such a system is beneficial for hearing loss listeners, new language learners, or noisy scenarios.”

The results also demonstrated a high correlation between ASR recognition results and human recognition results and illuminated areas for advancement.

“We may improve speech enhancement, or noise reduction, systems by utilizing the intelligibility prediction power of an ASR system,” said Yang. “There are a lot of potential uses of ASR systems waiting to be explored.”

Source: “Evaluating synthesized speech intelligibility in noise,” by Ye Yang, Dathan Nguyen, Katherine Chen, and Fan-Gang Zeng, JASA Express Letters (2025). The article can be accessed at https://doi.org/10.1121/10.0036397 .

More Science
/
Article
/
Article
A new contrast agent based on asphaltenes could enable low-cost, scalable, and environmentally conscious imaging for tumor cells.
/
Article
Tailored thermal properties of a microporous material can aid in trapping CO2
/
Article
A multi-modeling approach effort demonstrates potentially increased intensity and frequency of storms in the northwest Pacific.
/
Article
The algorithm makes finding outliers in complex systems a little less… complex.