SAINS MALAYSIANA

Sains Malaysiana 48(12)(2019): 2737–2747

http://dx.doi.org/10.17576/jsm-2019-4812-15

Automatic Speech Intelligibility Detection for Speakers with Speech Impairments: The Identification of Significant Speech Features

(Pengesanan Kecerdasan Pertuturan Automatik untuk Penutur dengan Ketaksempurnaan Pertuturan: Pengenalpastian Ciri Pertuturan Penting)

FADHILAH ROSDI¹*, MUMTAZ BEGUM MUSTAFA², SITI SALWAH SALIM² & NOR AZAN MAT ZIN¹

¹Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 46300 UKM Bangi, Selangor Darul Ehsan, Malaysia

²Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

Diserahkan: 17 Oktober 2018/Diterima: 2 Oktober 2019

ABSTRACT

Selection of relevant features is important for discriminating speech in detection based ASR system, thus contributing to the improved performance of the detector. In the context of speech impairments, speech errors can be discriminated from regular speech by adopting the appropriate discriminative speech features with high discriminative ability between the impaired and the control group. However, identification of suitable discriminative speech features for error detection in impaired speech was not well investigated in the literature. Characteristics of impaired speech are grossly different from regular speech, thus making the existing speech features to be less effective in recognizing the impaired speech. To overcome this gap, the speech features of impaired speech based on the prosody, pronunciation and voice quality are analyzed for identifying the significant speech features which are related to the intelligibility deficits. In this research, we investigate the relations of speech impairments due to cerebral palsy, and hearing impairment with the prosody, pronunciation, and voice quality. Later, we identify the relationship of the speech features with the speech intelligibility classification and the significant speech features in improving the discriminative ability of an automatic speech intelligibility detection system. The findings showed that prosody, pronunciation and voice quality features are statistically significant speech features for improving the detection ability of impaired speeches. Voice quality is identified as the best speech features with more discriminative power in detecting speech intelligibility of impaired speech.

Keywords: Automatic speech intelligibility detection; speech detection; speech features; speech impairments

ABSTRAK

Pemilihan ciri yang relevan untuk membezakan pertuturan dalam sistem ASR berasaskan pengesanan adalah penting kerana menyumbang kepada peningkatan prestasi pengesan. Dalam konteks ketaksempurnaan pertuturan, kesalahan pertuturan boleh didiskriminasi daripada pertuturan biasa dengan menggunakan ciri pertuturan diskriminatif yang bersesuaian dengan keupayaan diskriminatif yang tinggi antara kumpulan terjejas dan kumpulan kawalan. Walau bagaimanapun, pengenalpastian ciri pertuturan diskriminatif yang sesuai untuk pengesanan ralat dalam pertuturan yang terjejas tidak dikaji dengan baik dalam kajian kepustakawan. Ciri pertuturan yang terjejas adalah sangat berbeza daripada pertuturan biasa, dengan itu, menjadikan ciri pertuturan sedia ada kurang berkesan dalam mengenal pasti pertuturan yang terjejas. Untuk mengatasi jurang ini, ciri pertuturan ketaksempurnaan pertuturan berdasarkan prosodi, sebutan dan kualiti suara dianalisis untuk mengenal pasti ciri pertuturan penting yang berkaitan dengan defisit kecerdasan. Dalam penyelidikan ini, kami mengkaji hubungan antara kecacatan pertuturan akibat lumpuh otak dan kecacatan pendengaran dengan prosodi, sebutan dan kualiti suara. Seterusnya, kami mengenal pasti hubungan ciri pertuturan dengan pengelasan kecerdasan pertuturan dan ciri pertuturan yang penting dalam meningkatkan keupayaan diskriminatif sistem pengesanan kecerdasan pertuturan secara automatik. Hasil menunjukkan bahawa ciri prosodi, sebutan dan suara adalah ciri pertuturan yang signifikan secara statistik untuk meningkatkan keupayaan pengesanan pertuturan yang terjejas. Kualiti suara dikenal pasti sebagai ciri pertuturan terbaik dengan kuasa yang lebih diskriminatif dalam mengesan kecerdasan pertuturan yang terjejas.

Kata kunci: Ciri pertuturan; ketaksempurnaan pertuturan; pengesanan kecerdasan pertuturan automatik; pengesanan pertuturan

RUJUKAN

Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh & Khaled Shaalan. 2019. Speech recognition using deep neural networks. A Systematic Review, IEEE Access 7: 19143-19165.

American Speech and Hearing Association (ASHA). 1993. Dysarthria. http://www.asha.org/public/speech/impairments/ dysarthria.htm. Accessed on 4th January 2018.

Bauman-Waengler, J. 2012. Articulatory and Phonological Impairments: A Clinical Focus. 5th ed. New Jersey: Allyn & Bacon Communication Sciences and Impairments Series.

Bhushan, C.K. 2016. Speech recognition using artificial neural network - A Review. Int’l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) 3(1) http://dx.doi.org/10.15242/IJCCIE.U0116002.

Blaney, B. & Wilson, J. 2000. Acoustic variability in dysarthria and computer speech recognition. Clinical Linguistic and Phonetic 14(4): 307-327.

Butt, A.H. 2012. Speech assessment for the classification of hypokinetic dysarthria in Parkinson’s disease (Masters Dissertation). Computer Engineering, Dalarna University (Unpublished).

Colton, R.H. & Casper, J.K. 2006. Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment. Baltimore: Lippincott Williams & Wilkins.

Cutler, A., Dahan, D. & Donselaar, W.v. 1997. Prosody in the comprehension of spoken language: A literature review. Language and Speech 40: 141-201.

del Hoyo, C. 2012. Design of detectors for automatic speech recognition. PhD Thesis. Department of Electronics and Telecommunications. Norwegian University of Science and Technology (Unpublished).

El-Imam, Y.A. & Don, Z.M. 2005. Rules and algorithms for phonetic transcription of standard Malay. IEICE Transactions on Information and Systems (10): 2354-2372.

Eyben, F., Wöllmer, M. & Schuller, B. 2010. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia 2010: 1459-1462.

Falk, T.H., Chan, W.Y. & Shein, F. 2012. Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication 54(5): 622-631.

Farrús, M., Hernando, J. & Ejarque, P. 2007. Jitter and shimmer measurements for speaker recognition. Proceedings of the International Conference Interspeech. August 27-31, Antwerp, Belgium. pp. 778-781.

Fook, C.Y. & Muthusamy, H. 2013. Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish Journal of Electrical Engineering & Computer Sciences. doi: 10.3906/elk-1112-84.

Green, R. 1966. Linguistic subgrouping within Polynesia: The implications for prehistoric settlement. Journal of the Polynesian Society 75: 6-38.

Haynes, W.O. & Pindzola, R.H. 2012. Motor speech disorders, dysphagia, and the oral exam. In Diagnosis and Evaluation in Speech Pathology. 8th ed., edited by Haynes, W.O. & Pindzola, R.H. Upper Saddle River, New Jersey: Pearson Education Inc. pp. 239-266.

Huang, X., Acero, A. & Hon, H.W. 2001. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Upper Saddle River, New Jersey: Prentice Hall.

John, P.H. 2006. 2006. Hidden Markov Models for Speech Recognition. Slide presentation, Oregon Health & Science University OGI School of Science & Engineering. Accessed on 23 November 2017.

http://cc.cpe.ku.ac.th/~jim/lecnotes/markov_models/articles/Yan2003-HMMSpeech Recognition%20.pdf.

Jurafsky, D. & Martin, J.H. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, New Jersey: Prentice Hall.

Kent, R.D., Weismer, G., Kent, J.F. & Rosenbek, J.C. 1989. Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Impairments 54: 482-499.

Khan, T., Westin, J. & Dougherty, M. 2014. Classification of speech intelligibility in Parkinson’s Disease. Biocybernetics and Biomedical Engineering 34: 35-45.

Kim, J., Kumar, N., Tsiartas, A., Li, M. & Narayanan, S.S. 2015. Automatic intelligibility classification of sentence-level pathological speech. Computer Speech & Language 29(1): 132-144.

Lapteva, O. 2011. Speaker Perception and Recognition: An Integrative Framework for Computational Speech Processing. Kassel, Hessen: Kassel University Press.

Michie, D., Spiegelhalter, D.J., Taylor, C.C. & Campell, J. 1994. Machine Learning, Neural, and Statistical Classification. New York: Ellis Horwood.

Nolan, F. 2002. The ‘telephone effect’ on formants: A response. Forensic Linguistics 9(1): 74-82.

Pawley, A. 1966. Polynesian languages: A subgrouping based on shared innovations in morphology. Journal of the Polynesian Society 75: 39-64.

Rosell, M. 2006. An Introduction to Front-End Processing and Acous t ic Features for Automatic Speech Recognition. http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.120.5299

Rosen, S. & Howell, P. 2011. Signals and Systems for Speech and Hearing. Leiden, Netherlands: BRILL.

Tan, T.P., Goh, S.S. & Khaw, Y.M. 2012. A Malay dialect translation and synthesis system: Proposal and preliminary system. International Conference on Asian Language Processing (IALP). Hanoi, Vietnam.

Ting, H.N., Bakar, A.R.A., Santhosh, J., Al-Zidi, M.G., Ibrahim, I.A. & Cheok, N.S. 2017. Effects of speech phonological features during passive perception on cortical auditory evoked potential in sensorineural hearing loss. Sains Malaysiana 46(12): 2477-2488.

Vipperla, R., Renals, S. & Frankel, J. 2010. Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing 2010: 525783.

Wertzner, H.F., Schreiber, S. & Amaro, L. 2005. Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological impairments. Brazilian Journal of Otorhinolaryngology 71(5): 582-588.

Young, V. & Mihailidis, A. 2010. Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology 22(2): 99-112.

Zhang, Z., Geiger, J., Pohjalainen, J., Amr El-Desoky, M., Jin, W. & Schuller, B. 2018. Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology 9(5): Article No. 49.

*Pengarang untuk surat-menyurat; email: fadhilah.rosdi@ukm.edu.my

sebelumnya

kandungan

seterusnya