首页 » 文章 » 文章详细信息
Advances in Multimedia Volume 2018 ,2018-08-16
Mobile Phone-Based Audio Announcement Detection and Recognition for People with Hearing Impairment
Research Article
Yong Ruan 1 , 2 Yueliang Qian 1 Xiangdong Wang 1
Show affiliations
DOI:10.1155/2018/8786308
Received 2018-05-04, accepted for publication 2018-08-05, Published 2018-08-05
PDF
摘要

Automatic audio announcement systems are widely used in public places such as transportation vehicles and facilities, hospitals, and banks. However, these systems cannot be used by people with hearing impairment. That brings great inconvenience to their lives. In this paper, an approach of audio announcement detection and recognition for the hearing-impaired people based on the smart phone is proposed and a mobile phone application (app) is developed, taking the bank as a major applying scenario. Using the app, the users can sign up alerts for their numbers and then the system begins to detect audio announcements using the microphone on the smart phone. For each audio announcement detected, the speech within it is recognized and the text is displayed on the screen of the phone. When the number the user input is announced, alert will be given by vibration. For audio announcement detection, a method based on audio segment classification and postprocessing is proposed, which uses a SVM classifier trained on audio announcements and environment noise collected in banks. For announcement speech recognition, an ASR engine is developed using a GMM-HMM-based acoustic model and a finite state transducer (FST) based grammar. The acoustic model is trained on audio announcement speech collected in banks, and the grammar is human-defined according to the patterns used by the automatic audio announcement systems. Experimental results show that character error rates (CERs) around 5% can be achieved for the announcement speech, which shows feasibility of the proposed method and system.

授权许可

Copyright © 2018 Yong Ruan et al. 2018
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

通讯作者

Xiangdong Wang.Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, cas.cn.xdwang@ict.ac.cn

推荐引用方式

Yong Ruan,Yueliang Qian,Xiangdong Wang. Mobile Phone-Based Audio Announcement Detection and Recognition for People with Hearing Impairment. Advances in Multimedia ,Vol.2018(2018)

您觉得这篇文章对您有帮助吗?
分享和收藏
0

是否收藏?

参考文献
[1] S. Baghel, S. R. M. Prasanna, P. Guha. Classification of multi speaker shouted speech and single speaker normal speech. :2388-2392. DOI: 10.1109/5.880077.
[2] D. Li, I. K. Sethi, N. Dimitrova, T. McGee. et al.(2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters.22(5):533-544. DOI: 10.1109/5.880077.
[3] D. Amodei, S. Ananthanarayanan, R. Anubhai. Deep speech 2: End-to-end speech recognition in english and mandarin. :173-182. DOI: 10.1109/5.880077.
[4] W. Chan, N. Jaitly, Q. Le, O. Vinyals. et al.Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. :4960-4964. DOI: 10.1109/5.880077.
[5] A. Graves, S. Fernández, F. Gomez, J. Schmidhuber. et al.Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. :369-376. DOI: 10.1109/5.880077.
[6] J. Tebelskis. (1995). Speech recognition using neural networks. DOI: 10.1109/5.880077.
[7] D. Chander, M. V. Sireesha. (2004). Passenger bus alert system for easy navigation of blind. DOI: 10.1109/5.880077.
[8] L. Lu, H.-J. Zhang, S. Z. Li. (2003). Content-based audio classification and segmentation by using support vector machines. Multimedia Systems.8(6):482-492. DOI: 10.1109/5.880077.
[9] D. Palaz, M. Magimai-Doss, R. Collobert. Analysis of CNN-based speech recognition system using raw speech as input. :11-15. DOI: 10.1109/5.880077.
[10] D. Bahdanau, C. Kyunghyun, Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. . DOI: 10.1109/5.880077.
[11] L. Lu, H.-J. Zhang, H. Jiang. (2002). Content analysis for audio classification and segmentation. IEEE Transactions on Audio, Speech and Language Processing.10(7):504-516. DOI: 10.1109/5.880077.
[12] H. K. Palo, M. N. Mohanty, M. Chandra. (2016). Efficient feature combination techniques for emotional speech classification. International Journal of Speech Technology.19(1):135-150. DOI: 10.1109/5.880077.
[13] H. Sak, A. Senior, K. Rao, O. Irsoy. et al.Learning acoustic frame labeling for speech recognition with recurrent neural networks. :4280-4284. DOI: 10.1109/5.880077.
[14] B. M. J. Leiner. (2003). Noise-Robust Speech Recognition. DOI: 10.1109/5.880077.
[15] B.-H. Juang, S. Furui. (2000). Automatic recognition and understanding of spoken language - A first step toward natural human-machine communication. Proceedings of the IEEE.88(8):1142-1165. DOI: 10.1109/5.880077.
[16] S. Pfeiffer, S. Fischer, W. Effelsberg. Automatic audio content analysis. :21-30. DOI: 10.1109/5.880077.
[17] K. Khaldi, A.-O. Boudraa, M. Turki. (2016). Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Processing.10(1):69-80. DOI: 10.1109/5.880077.
[18] H. Soltau, H. Liao, H. Sak. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition. :3707-3711. DOI: 10.1109/5.880077.
[19] W. Shi, X. Fan. Speech classification based on cuckoo algorithm and support vector machines. :98-102. DOI: 10.1109/5.880077.
[20] N. Morgan, H. Bourlard. (1995). Continuous speech recognition. IEEE Signal Processing Magazine.12(3):24-42. DOI: 10.1109/5.880077.
[21] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel. et al.End-to-end attention-based large vocabulary speech recognition. :4945-4949. DOI: 10.1109/5.880077.
[22] D. Yu, L. Deng. (2016). Automatic speech recognition. DOI: 10.1109/5.880077.
文献评价指标
浏览 34次
下载全文 3次
评分次数 0次
用户评分 0.0分
分享 0次