Comparing Listening Gains from AI Text-to-Speech vs. Real Human Voice Recordings in Uzbek EFL

Baxramova Malika

Authors

Baxramova Malika Urgench State Pedagogical Institute

Keywords:

Artificial intelligence, text-to-speech, listening comprehension, EFL learners, Uzbek education

Abstract

This study investigates the comparative effectiveness of AI-based text-to-speech (TTS) technology and traditional human voice recordings in fostering listening comprehension among Uzbek EFL learners. With the growing implementation of AI in education, TTS tools offer a promising alternative to human recordings, particularly in resource-limited contexts. Seventy-two intermediate-level secondary school students in Tashkent participated in a five-week intervention, where one group listened to AI-generated audio while the other engaged with native-speaker voice recordings based on identical scripts. Both groups completed comprehension quizzes and summary tasks to measure improvement.

Findings indicate that while both approaches led to significant listening gains, students exposed to human voices performed slightly better in interpreting emotional tone, stress, and implicit meaning. Meanwhile, the AI TTS group demonstrated more consistent progress in recognizing vocabulary and understanding explicit content. Learners appreciated the clarity, predictability, and slower pacing of the TTS voices, which lowered anxiety and improved focus. Despite some limitations in expressiveness and naturalness, AI TTS tools proved to be an effective supplemental resource, especially in classrooms where access to native-speaker recordings is limited. The study concludes that a hybrid approach combining TTS and human voice recordings could offer the most balanced and accessible strategy for EFL listening instruction in Uzbekistan.

Downloads

Download data is not yet available.

References

1. Godwin-Jones, R. (2021). Emerging technologies: AI, TTS, and personalized learning in language education. Language Learning & Technology, 25(2), 1–15.

– Analyzes how TTS systems are transforming language input and learner engagement.

2. Kukulska-Hulme, A. (2020). Mobile and AI-assisted language learning: Future directions. ReCALL, 32(3), 245–264.– Discusses AI's growing role in mobile and offline learning, including speech tools.

3. Mayer, R. E. (2009). Multimedia Learning. Cambridge University Press.

– Provides foundational theory on how learners process spoken input, supporting multimodal instruction.

4. Wagner, E. (2008). Video listening tests: A pilot study. Language Testing, 25(4), 493–518.– Highlights the importance of natural voice input and paralinguistic cues in comprehension.

5. Trofimovich, P., & Isaacs, T. (2012). Disentangling accent from comprehensibility. Studies in Second Language Acquisition, 34(3), 385–412.– Shows how voice features influence understanding and listener perception.

6. Chiu, T. K. F., & Churchill, D. (2016). Adoption of text-to-speech technology in education: A review of the literature. British Journal of Educational Technology, 47(4), 619–633. Comprehensive review of TTS tools and their cognitive and instructional affordances.

7. Reinders, H. (2011). Digital games in language learning and teaching. Palgrave Macmillan.Discusses digital tools including TTS in interactive and individualized learning environments.

8. Suvorov, R. (2014). Automated and human-generated feedback in listening tasks: A comparative study. CALL-EJ, 15(1), 45–61. Explores student responses to different types of auditory input and feedback mechanisms.

9. UNESCO (2022). Artificial Intelligence and Education: Guidance for Policy Makers. Paris: UNESCO. Provides a framework for ethical and effective AI integration in educational systems.

10. Yuldashev, K., & Nazarova, D. (2022). AI-powered tools for EFL instruction in Uzbek classrooms: A case study. Journal of Modern Pedagogical Innovations, 4(1), 23–38.

Examines the adaptation and acceptance of AI-generated materials in the Uzbek EFL context.