| Sains Malaysiana 48(12)(2019): 2737–2747
          
         http://dx.doi.org/10.17576/jsm-2019-4812-15 
                
             
           Automatic Speech
            Intelligibility Detection for Speakers with Speech Impairments: The
            Identification of Significant Speech Features
            
           (Pengesanan Kecerdasan
            Pertuturan Automatik untuk Penutur dengan Ketaksempurnaan Pertuturan: Pengenalpastian
            Ciri Pertuturan Penting)
            
           
             
           FADHILAH ROSDI1*, MUMTAZ BEGUM MUSTAFA2, SITI SALWAH SALIM2 & NOR AZAN MAT ZIN1
            
           
             
           1Faculty
            of Information Science and Technology, Universiti Kebangsaan Malaysia, 46300 UKM
            Bangi, Selangor Darul Ehsan, Malaysia
            
           
             
           2Faculty
            of Computer Science and Information Technology, University of Malaya, 50603 Kuala
            Lumpur, Federal Territory, Malaysia
            
           
             
           Diserahkan: 17 Oktober
            2018/Diterima: 2 Oktober 2019
            
           
             
           ABSTRACT
            
           Selection of relevant features is important for discriminating
            speech in detection based ASR system, thus contributing to the
            improved performance of the detector. In the context of speech impairments,
            speech errors can be discriminated from regular speech by adopting the
            appropriate discriminative speech features with high discriminative ability
            between the impaired and the control group. However, identification of suitable
            discriminative speech features for error detection in impaired speech was not
            well investigated in the literature. Characteristics of impaired speech are
            grossly different from regular speech, thus making the existing speech features
            to be less effective in recognizing the impaired speech. To overcome this gap,
            the speech features of impaired speech based on the prosody, pronunciation and
            voice quality are analyzed for identifying the significant speech features
            which are related to the intelligibility deficits. In this research, we
            investigate the relations of speech impairments due to cerebral palsy, and
            hearing impairment with the prosody, pronunciation, and voice quality. Later,
            we identify the relationship of the speech features with the speech
            intelligibility classification and the significant speech features in improving
            the discriminative ability of an automatic speech intelligibility detection
            system. The findings showed that prosody, pronunciation and voice quality
            features are statistically significant speech features for improving the
            detection ability of impaired speeches. Voice quality is identified as the best
            speech features with more discriminative power in detecting speech
            intelligibility of impaired speech.
  
           Keywords: Automatic speech intelligibility detection; speech
            detection; speech features; speech impairments
            
           
             
           ABSTRAK
            
           Pemilihan ciri yang relevan untuk membezakan pertuturan dalam
            sistem ASR berasaskan pengesanan adalah penting kerana
            menyumbang kepada peningkatan prestasi pengesan. Dalam konteks ketaksempurnaan
            pertuturan, kesalahan pertuturan boleh didiskriminasi daripada pertuturan biasa
            dengan menggunakan ciri pertuturan diskriminatif yang bersesuaian dengan
            keupayaan diskriminatif yang tinggi antara kumpulan terjejas dan kumpulan
            kawalan. Walau bagaimanapun, pengenalpastian ciri pertuturan diskriminatif yang
            sesuai untuk pengesanan ralat dalam pertuturan yang terjejas tidak dikaji
            dengan baik dalam kajian kepustakawan. Ciri pertuturan yang terjejas adalah
            sangat berbeza daripada pertuturan biasa, dengan itu, menjadikan ciri
            pertuturan sedia ada kurang berkesan dalam mengenal pasti pertuturan yang
            terjejas. Untuk mengatasi jurang ini, ciri pertuturan ketaksempurnaan
            pertuturan berdasarkan prosodi, sebutan dan kualiti suara dianalisis untuk
            mengenal pasti ciri pertuturan penting yang berkaitan dengan defisit
            kecerdasan. Dalam penyelidikan ini, kami mengkaji hubungan antara kecacatan
            pertuturan akibat lumpuh otak dan kecacatan pendengaran dengan prosodi, sebutan
            dan kualiti suara. Seterusnya, kami mengenal pasti hubungan ciri pertuturan
            dengan pengelasan kecerdasan pertuturan dan ciri pertuturan yang penting dalam
            meningkatkan keupayaan diskriminatif sistem pengesanan kecerdasan pertuturan
            secara automatik. Hasil menunjukkan bahawa ciri prosodi, sebutan dan suara
            adalah ciri pertuturan yang signifikan secara statistik untuk meningkatkan
            keupayaan pengesanan pertuturan yang terjejas. Kualiti suara dikenal pasti
            sebagai ciri pertuturan terbaik dengan kuasa yang lebih diskriminatif dalam
            mengesan kecerdasan pertuturan yang terjejas.
  
           Kata kunci: Ciri pertuturan; ketaksempurnaan pertuturan;
            pengesanan kecerdasan pertuturan automatik; pengesanan pertuturan
            
           RUJUKAN
            
           Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh
  & Khaled Shaalan. 2019. Speech recognition using deep neural networks. A
    Systematic Review, IEEE Access 7: 19143-19165.
  
           American Speech and Hearing Association (ASHA). 1993. Dysarthria.
            http://www.asha.org/public/speech/impairments/ dysarthria.htm. Accessed on 4th
            January 2018.
            
           Bauman-Waengler, J. 2012. Articulatory and Phonological
            Impairments: A Clinical Focus. 5th ed. New Jersey: Allyn & Bacon
            Communication Sciences and Impairments Series.
  
           Bhushan, C.K. 2016. Speech recognition using artificial neural
            network - A Review. Int’l Journal of Computing, Communications &
              Instrumentation Engg. (IJCCIE) 3(1) http://dx.doi.org/10.15242/IJCCIE.U0116002.
  
           Blaney, B. & Wilson, J. 2000. Acoustic variability in
            dysarthria and computer speech recognition. Clinical Linguistic and Phonetic 14(4): 307-327.
  
           Butt, A.H. 2012. Speech assessment for the classification of
            hypokinetic dysarthria in Parkinson’s disease (Masters Dissertation). Computer
            Engineering, Dalarna University (Unpublished).
  
           Colton, R.H. & Casper, J.K. 2006. Understanding Voice
            Problems: A Physiological Perspective for Diagnosis and Treatment. Baltimore:
            Lippincott Williams & Wilkins.
  
           Cutler, A., Dahan, D. & Donselaar, W.v. 1997. Prosody in the
            comprehension of spoken language: A literature review. Language and Speech 40:
            141-201.
  
           del Hoyo, C. 2012. Design of detectors for automatic speech recognition.
            PhD Thesis. Department of Electronics and Telecommunications. Norwegian
            University of Science and Technology (Unpublished).
            
           El-Imam, Y.A. & Don, Z.M. 2005. Rules and algorithms for
            phonetic transcription of standard Malay. IEICE Transactions on Information
              and Systems (10): 2354-2372.
  
           Eyben, F., Wöllmer, M. & Schuller, B. 2010. Opensmile: The
            munich versatile and fast open-source audio feature extractor. In Proceedings
              of the 18th ACM International Conference on Multimedia 2010: 1459-1462.
  
           Falk, T.H., Chan, W.Y. & Shein, F. 2012. Characterization of
            atypical vocal source excitation, temporal dynamics and prosody for objective
            measurement of dysarthric word intelligibility. Speech Communication 54(5):
            622-631.
  
           Farrús, M., Hernando, J. & Ejarque, P. 2007. Jitter and
            shimmer measurements for speaker recognition. Proceedings of the
              International Conference Interspeech. August 27-31, Antwerp, Belgium. pp.
            778-781.
  
           Fook, C.Y. & Muthusamy, H. 2013. Comparison of speech
            parameterization techniques for the classification of speech disfluencies. Turkish
              Journal of Electrical Engineering & Computer Sciences. doi:
            10.3906/elk-1112-84.
  
           Green, R. 1966. Linguistic subgrouping within Polynesia: The
            implications for prehistoric settlement. Journal of the Polynesian Society 75:
            6-38.
  
           Haynes, W.O. & Pindzola, R.H. 2012. Motor speech disorders,
            dysphagia, and the oral exam. In Diagnosis and Evaluation in Speech
              Pathology. 8th ed., edited by Haynes, W.O. & Pindzola, R.H. Upper
            Saddle River, New Jersey: Pearson Education Inc. pp. 239-266.
  
           Huang, X., Acero, A. & Hon, H.W. 2001. Spoken Language
            Processing: A Guide to Theory, Algorithm and System Development. Upper
            Saddle River, New Jersey: Prentice Hall.
  
           John, P.H. 2006. 2006. Hidden Markov Models for Speech Recognition.
            Slide presentation, Oregon Health & Science University OGI School of
            Science & Engineering. Accessed on 23 November 2017.
  
           http://cc.cpe.ku.ac.th/~jim/lecnotes/markov_models/articles/Yan2003-HMMSpeech
            Recognition%20.pdf.
            
           Jurafsky, D. & Martin, J.H. 2009. Speech and Language
            Processing: An Introduction to Natural Language Processing, Computational
            Linguistics, and Speech Recognition. Upper Saddle River, New Jersey:
            Prentice Hall.
  
           Kent, R.D., Weismer, G., Kent, J.F. & Rosenbek, J.C. 1989.
            Toward phonetic intelligibility testing in dysarthria. Journal of Speech and
              Hearing Impairments 54: 482-499.
  
           Khan, T., Westin, J. & Dougherty, M. 2014. Classification of
            speech intelligibility in Parkinson’s Disease. Biocybernetics and Biomedical
              Engineering 34: 35-45.
  
           Kim, J., Kumar, N., Tsiartas, A., Li, M. & Narayanan, S.S.
            2015. Automatic intelligibility classification of sentence-level pathological
            speech. Computer Speech & Language 29(1): 132-144.
  
           Lapteva, O. 2011. Speaker
            Perception and Recognition: An Integrative Framework for Computational Speech
            Processing. Kassel, Hessen: Kassel University Press.
  
           Michie, D.,
            Spiegelhalter, D.J., Taylor, C.C. & Campell, J. 1994. Machine Learning,
              Neural, and Statistical Classification. New York: Ellis Horwood.
  
           Nolan, F. 2002. The
  ‘telephone effect’ on formants: A response. Forensic Linguistics 9(1):
            74-82.
  
           Pawley, A. 1966.
            Polynesian languages: A subgrouping based on shared innovations in morphology. Journal
              of the Polynesian Society 75: 39-64.
  
           Rosell, M. 2006. An
            Introduction to Front-End Processing and Acous t ic Features for Automatic
            Speech Recognition. http://citeseerx.ist.psu.edu/viewdoc/
            summary?doi=10.1.1.120.5299
  
           Rosen, S. & Howell,
            P. 2011. Signals and Systems for Speech and Hearing. Leiden,
            Netherlands: BRILL.
  
           Tan, T.P., Goh, S.S.
  & Khaw, Y.M. 2012. A Malay dialect translation and synthesis system:
            Proposal and preliminary system. International Conference on Asian Language
              Processing (IALP). Hanoi, Vietnam.
  
           Ting, H.N., Bakar,
            A.R.A., Santhosh, J., Al-Zidi, M.G., Ibrahim, I.A. & Cheok, N.S. 2017.
            Effects of speech phonological features during passive perception on cortical
            auditory evoked potential in sensorineural hearing loss. Sains Malaysiana 46(12):
            2477-2488.
  
           Vipperla, R., Renals, S.
  & Frankel, J. 2010. Ageing voices: The effect of changes in voice
            parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music
              Processing 2010: 525783.
  
           Wertzner, H.F.,
            Schreiber, S. & Amaro, L. 2005. Analysis of fundamental frequency, jitter,
            shimmer and vocal intensity in children with phonological impairments. Brazilian
              Journal of Otorhinolaryngology 71(5): 582-588.
  
           Young, V. &
            Mihailidis, A. 2010. Difficulties in automatic speech recognition of dysarthric
            speakers and implications for speech-based applications used by the elderly: A
            literature review. Assistive Technology 22(2): 99-112.
  
           Zhang, Z., Geiger, J.,
            Pohjalainen, J., Amr El-Desoky, M., Jin, W. & Schuller, B. 2018. Deep
            learning for environmentally robust speech recognition: An overview of recent
            developments. ACM Transactions on Intelligent Systems and Technology 9(5):
            Article No. 49.
  
           
             
           *Pengarang untuk
            surat-menyurat; email: fadhilah.rosdi@ukm.edu.my
            
             
                    
       |