Sprache · Stimme · Gehör 2023; 47(03): 145-150
DOI: 10.1055/a-2089-5778
Schwerpunktthema

Künstliche Intelligenz für die Analyse pathologischer Sprache

Artificial Intelligence for the Analysis of Pathologic Speech
Tobias Bocklet
,
Elmar Nöth
,
Korbinian Riedhammer

Sprache kann eine Vielzahl von diagnostisch relevanten Informationen enthalten. In diesem Übersichtsartikel wird aufgezeigt, wie Methoden der Künstlichen Intelligenz, insbesondere Maschinelles Lernen und Sprachverarbeitung, angewendet auf Sprachsignale eingesetzt werden können: zur Bewertung von Verständlichkeit, zur Automatisierung von standardisierten Tests und zur Bestimmung medizinischer Skalen und Diagnosen. Eine abschließende kritischen Betrachtung von akustischen Merkmalen über eine Vielzahl von Pathologien gibt Grund zur Annahme, dass diese Marker tatsächlich diagnostisch relevante Informationen enthalten.

Abstract

Speech can contain a variety of diagnostically relevant cues. In this article, it is shown how artificial intelligence, in particular machine learning and speech processing, can be applied to speech signals: to assess intelligibility, to automate standardized tests, and to determine medical scales and diagnoses. We conclude with critical review of acoustic features across a variety of pathologies that give reason to believe that these markers do indeed contain diagnostically relevant information.



Publication History

Article published online:
05 September 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • Literatur

  • 1 Ghahremani P, Ali BB, Povey D. et al. A Pitch Algorithm Tuned for Automatic Speech Recognition Systems. Proc IEEE Int Conf Acoustics Speech Signal Processing (ICASSP) 2014; DOI: 10.1109/ICASSP.2014.6854049.
  • 2 Radford A, Kim JW, Xu T. et al. Robust Speech Recognition via Large-Scale Weak Supervision. https://doi.org/10.48550/arXiv.2212.04356
  • 3 Eyben F, Wöllmer M, Schuller B. Opensmile: The Munich Versatile and Fast Open-Source Audio Feature Extractor. Proc 18th ACM Int Conf Multimedia. New York, USA: Association for Computing Machinery; 2010: 1459-1462
  • 4 Baevski A, Zhou Y, Mohamed A. et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. Adv Neural Inform Proc Syst 2020; 12449-12460
  • 5 Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20: 273-297
  • 6 Devlin J, Chang M-W, Lee K. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc 2019 Conf North Am Chapter Assoc Comput Linguist: Human Language Technologies, Vol 1. Minneapolis: Association for Computational Linguistics; 2019: 4171-4186
  • 7 Haderlein T, Riedhammer K, Nöth E. et al. Application of Automatic Speech Recognition to Quantitative Assessment of Tracheoesophageal Speech in Different Signal Quality. Folia Phoniatr Logop 2009; 61: 12-17
  • 8 Bocklet T, Toy H, Nöth E. et al. Automatic Evaluation of Tracheoesophageal Substitute Voice: Sustained Vowel versus Standard Text. Folia Phoniatr Logop Folia Phoniatr Logop 2009; 61: 112-116
  • 9 Böhm N, Knipfer C, Maier A. et al. Sprechqualität und psychische Beeinträchtigung nach der Therapie von Mundhöhlentumoren. Laryngorhinootologie 2016; 95: 610-619 DOI: 10.1055/s-0042-102256.
  • 10 Knipfer C, Riemann M, Bocklet T. et al. Speech intelligibility enhancement after maxillary denture treatment and its impact on quality of life. Int J Prosthodont 2014; 27: 61-69 DOI: 10.11607/ijp.3597.
  • 11 Stelzle F, Knipfer C, Schuster M. et al. Factors influencing relative speech intelligibility in patients with oral squamous cell carcinoma: a prospective study using automatic, computer-based speech analysis. Int J Oral Maxillofac Surg 2013; 42: 1377-1384
  • 12 Ruff S, Bocklet T, Nöth E. et al. Speech Production Quality of Cochlear Implant Users with Respect to Duration and Onset of Hearing Loss. J Otorhinolaryngol Its Relat Spec 2017; 72: 282-294
  • 13 Schuster M, Maier A, Bocklet T. et al. Automatically evaluated degree of intelligibility of children with different cleft type from preschool and elementary school measured by automatic speech recognition. Int J Pediatr Otorhinolaryngol 2012; 76: 362-369
  • 14 Bocklet T, Riedhammer K, Nöth E. et al. Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling. J Voice 2012; 26: 390-397 DOI: 10.1016/j.jvoice.2011.04.010.
  • 15 Haderlein T, Döllinger M, Matoušek V. et al. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples. Logoped Phoniatr Vocol 2016; 41: 106-116 DOI: 10.3109/14015439.2015.1019563.
  • 16 Bayerl SP, Wagner D, Nöth E. et al. Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0. Proc Ann Conf Int Speech Commun Assoc (Interspeech) ISCA 2022; 2868-2872 DOI: 10.21437/Interspeech.2022-10908.
  • 17 Maier A, Hönig F, Bocklet T. et al. Automatic detection of articulation disorders in children with cleft lip and palate. J Acoust Soc Am 2009; 126: 2589-2602
  • 18 Baumann I, Wagner D, Bayerl S. et al. Nonwords Pronunciation Classification in Language Development Tests for Preschool Children. Proc Interspeech 2022; 3643-3647
  • 19 Riedhammer K. An Automatic Intelligibility Test Based on the Post-Laryngectomy Telephone Test. Berlin: Mueller; 2008
  • 20 Bayerl SP, Hönig F, Reister J. et al. Towards Automated Assessment of Stuttering and Stuttering Therapy. In: Sojka P, Kopeček I, Pala K. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science, vol 12284. Cham: Springer; 2020. https://doi.org/10.1007/978-3-030-58323-1_42
  • 21 Braun F, Förstel M, Oppermann B. et al. Automated Evaluation of Standardized Dementia Screening Tests. Proc Ann Conf Int Speech Commun Assoc (Interspeech) 2022; DOI: 10.48550/arXiv.2206.06208.
  • 22 Braun F, Erzigkeit A, Lehfeld H. et al. Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments Using Acoustic Features. In: Sojka P, Kopeček I, Pala K. et al. Text, Speech, and Dialogu. Basel: Springer International Publishing; 2022
  • 23 Bocklet T, Steidl S, Nöth E. et al. Automatic Evaluation of Parkinson’s Speech – Acoustic, Prosodic and Voice Related Cues. In: ISCA (Hrsg.) Proc Ann Conf Int Speech Commun Assoc (INTERSPEECH) 2013; 1149-1153
  • 24 Vasquez J, Bocklet T, Orozco JR. et al. Comparison of user models based on GMM-UBM and i-vectors for speech, handwriting, and gait assessment of Parkinson’s disease patients. Proc IEEE Int Conf Acoustics Speech Signal Proc (ICASSP) 2020; 6544-6548
  • 25 Correa JCV, Orozco-Arroyave JR, Bocklet T. et al. Towards an automatic evaluation of the dysarthria level of patients with Parkinson’s disease. J Commun Disord 2018; 76: 21-36 DOI: 10.1016/j.jcomdis.2018.08.002.
  • 26 Pérez-Toro PA, Arias-Vergara T, Klumpp P. et al. Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing. Speech Commun 2022; 145: 10-20 DOI: 10.1016/j.specom.2022.09.001.
  • 27 Pérez-Toro PA, Vásquez-Correa JC, Arias-Vergara T. et al. Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer’s Disease. ICASSP 2021–2021 IEEE Int Conf Acoustics Speech Signal Proc (ICASSP) 2021; 8338-8342
  • 28 Wang D, Ding Y, Zhao Q. et al. ECAPA-TDNN Based Depression Detection from Clinical Speech. Proc Interspeech 2022; 3333-3337