فهرست:
فهرست مطالب
فصل اول: معرفی سیستم های تشخیص گوینده
1-1 –مقدمه...........................................................................................................................................................................................................2
1-2-مراحل مختلف کاری سیستم های تشخیص گوینده...........................................................................................................................6
1-2-1- قطعه بند آکوستیکی.....................................................................................................................................................................7
1-2-2-تشخیص گفتار از غیر گفتار..........................................................................................................................................................8
1-2-3-تشخیص جنسیت گوینده..............................................................................................................................................................9
1-2-4-تشخیص تغییر گوینده...................................................................................................................................................................9
1-3-روش های بخش بندی و خوشه بندی گویندگان..............................................................................................................................10
1-3-1-روشهای بر اساس فاصله...........................................................................................................................................................10
1-3-2-روشهای بر اساس مدل..............................................................................................................................................................11
1-3-3-روشهای هیبرید یا ترکیبی.......................................................................................................................................................11
1-4-خوشه بندی نمودن..................................................................................................................................................................................11
1-5- خلاصه........................................................................................................................................................................................................12
فصل دوم: تشخیص گفتار از نواحی غیرگفتاری
2-1-مقدمه...........................................................................................................................................................................................................14
2-2-ساختار قسمت تشخیص گفتار از غیر گفتار........................................................................................16
2-2-1-پیش پردازش.................................................................................................................................................................................16
2-2-2-استخراج ویژگی............................................................................................................................................................................17
2-2-2-1-انرژی...................................................................................................................................................................................18
2-2-2-2-نرخ عبور از صفر...............................................................................................................................................................19
2-2-2-3- استخراج ویژگی به کمک ضرایب کپسترال فرکانسی در مقیاس مل.................................................................19
2-2-2-4- ضرایب LPC...........................................................................................................23
2-2-2-5- آنتروپی.............................................................................................................................................................................24
2-2-2-6- اندازه متناوب بودن........................................................................................................................................................26
ح
2-2-2-7- اطلاعات زیر باند.............................................................................................................................................................28
2-2-2-8- سایر پارامترها..................................................................................................................................................................28
2-2-3- محاسبه آستانه.............................................................................................................................................................................29
2-2-4- تصمیمات VAD..............................................................................................................29
2-2-4-1- تصمیم گیری مبتنی بر مدل مخفی مارکوف..........................................................................................................30
2-2-4-2- تصمیم گیری مبتنی بر شبکه های عصبی...............................................................................................................31
2-2-5- تصحیح نتایج VAD..........................................................................................................33
2-3- بلوک دیاگرام چند VAD استاندارد....................................................................................................................................................33
2-3-1-استاندارد ETSI AMR........................................................................................................33
2-3-2- الگوریتم GSM...............................................................................................................34
2-4-خلاصه.........................................................................................................................................................................................................35
فصل سوم: آشکارسازی تغییر گوینده
3-1-مقدمه...........................................................................................................................................................................................................37
3-2-بخش بندی گوینده..................................................................................................................................................................................38
3-2-1-بخش بندی بر اساس فاصله.......................................................................................................................................................38
3-2-2-بخش بندی بر اساس مدل........................................................................................................................................................40
3-2-3-بخش بندی هیبرید......................................................................................................................................................................40
3-3-مقایسه روشهای بخش بندی................................................................................................................................................................40
3-4-روشهای متداول آشکارسازی گوینده..................................................................................................................................................41
3-4-1- معیار اطلاعات بیزین( (BIC................................................................................................41
3-4-1-2- بخش بندی با استفاده از مدل آماری گوینده..........................................................................................................42
3-4-2- ترکیب آماره T2 و BIC.......................................................................................................45
3-4-2-1- سرعت و بهره بیشتر در بخش بندی T2-BIC........................................................................................................47
3-4-3- فاصله نرخ درستنمایی عمومی((GLR...................................................................................................................................49
3-4-4-فاصله KL2.....................................................................................................................49
3-4-5- آشکارسازی تغییر گوینده با استفاده از DSD.............................................................................51
3-4-6- BIC متقاطع(Cross-BIC (XBIC))……............................................................................................................................52
3-4-7-درستنمایی مدل مخلوط گوسی..(GMM-L) ......................................................................................................................53
3-5-خلاصه.........................................................................................................................................................................................................53
خ
فصل چهارم: روشهای دستهبندی
4-1-مقدمه...........................................................................................................................................................................................................55
4-2-اجزا سیستم خوشه بندی........................................................................................................................................................................56
4-3-روش های خوشه بندی............................................................................................................................................................................57
4-3-1-روش های خوشه بندی سلسله مراتبی....................................................................................................................................58
4-3-1-1-تکنیکهای خوشهبندی بالارونده.................................................................................................................................59
4-3-1-2-تکنیکهای خوشهبندی پایین رونده...........................................................................................................................60
4-3-2-روش های خوشه بندی افرازی..................................................................................................................................................61
4-4- روش های خوشه بندی متداول در سیستم های خوشه بندی گوینده........................................................................................61
4-5- دستهبندی کننده ماشین های بردار پشتیبان...................................................................................................................................63
4-5-1- دستهبندی کننده ماشین بردار پشتیبان خطی....................................................................................................................63
4-5-1-1- دستهبندی کلاسهای جداپذیر...................................................................................................................................63
4-5-1-2- دستهبندی کلاسهای جدا ناپذیر...............................................................................................................................68
4-5-1-3- دستهبندی دادههای چند کلاسه با ماشینهای بردار پشتیبان............................................................................71
4-5-2- ماشینهای بردار پشتیبان غیر خطی......................................................................................................................................72
4-6- خلاصه........................................................................................................................................................................................................74
فصل پنجم: پیاده سازی و مشاهدات سیستم ترکیبی پیشنهادی
5-1-مقدمه...........................................................................................................................................................................................................76
5-2-ساختار سیستم پیاده سازی شده..........................................................................................................................................................77
5-3-پایگاه داده...................................................................................................................................................................................................80
5-4-استخراج ویژگی.........................................................................................................................................................................................82
5-5-معیار ارزیابی سیستم های تشخیص گوینده.......................................................................................................................................84
5-6-نتایج آزمایشات..........................................................................................................................................................................................88
5-6-1- اثر اعمال VAD بر روی سیگنال گفتار.........................................................................................................................................88
5-6-2- اثر تغییر طول پنجره VAD بر روی دقت سیستم......................................................................................................................89
5-6-3- اثر تغییر طول پنجره BIC بر روی نتایج بخش بندی................................................................................................................89
د
5-6-4-دقت.حاصل.از.بخش.بندی.بر.دو.نوع.از.دادگان با استفاده از MFCC.....................................................................................................93
5-6-5-اثرتغییر.بردار.ویژگی.بر.روی.دقت.مرحله.بخش بندی................................................................................................................93
5-6-6-مقایسه.نتایج.مرحله.بخشبندی.با.بکارگیری.بردارهای.ویژگی متفاوت..................................................................................95
5-6-7-اثرجنسیت،گویندگان.برتشخیص.درست.مرزهای.بخش بندی.................................................................................................96
5-6-8-دقت مرحله خوشهبندی بکارگیری ماشین بردار پشتیبان(SVM) با بردار ویژگی MFCC………..…………96
5-6-9-دقت مرحله خوشه بندی ماشین بردار پشتیبان با بکارگیری بردار ویژگی root-MFCC ............................................................97
5-6-10- اثر تغییر نوع تابع کرنل ماشین بردار پشتیبان بر روی دقت مرحله خوشهبندی...............................................................98
5-7-خلاصه.........................................................................................................................................................................................................98
فصل ششم: جمع بندی و پیشنهادات
6-1-جمع بندی و خلاصه نتایج.................................................................................................................................................................100
6-2-پیشنهادات...............................................................................................................................................................................................101
منابع..................................................................................................................................................................................................................
منبع:
[1].Xavier.Anguera.Mir, Phd Thesis, “Robust Speaker Diarization for meetings”, 2006.
[2].L.Docio, C.Garcia, ”Speaker Segmentation, detection and tracking in multi-speaker long audio recordings”, Third COST275 Workshop Bimetrics on the internet. 2005.
[3]. Janes.Zibert, B.Vesnicer, F.Mihelie, ”A System for speaker detection and tracking in audio broadcast news”, IEEE proceeding, pp.51-61, 2008.
[4].A.F.Martin, M.A.Przybocki, “Speaker recognition in a multi-speaker environment”, Euro speech 2001 Scandinavia, Coference on Speech Communication and Technology, 2001.
[5]. R.O.Duda, P.E.Hart, D.G.Stork, “Pattern Classification” ,john wiley and sons , 2nd edition, 2007.
[6]. Christopher M.Bishop, “Pattern Recognition and Machine learning”, pp.738, Springer2006.
[7]. M.A.Siegler,U.Jain,B.Raj, M.Stern, “Automatic Segmentation, Classification and Clustering of Broadcast News Audio”, Proc.DARPA Speech Recognition Workshop, Chantilly, Virginia, pp.97-99, 1997.
[8].S.Chen, P.Gopalakrishnan, “ Speaker , Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion”, Proc .Darpa Broadcast News Transcription Understanding Workshop, Lansdowne, VA, USA, pp . 127-132, 1998.
[9].T.Hain, S.E.Johnson, A.Tuerk, P.C.Woodland, S.J.Young, “Segment generation and clustering in the HTK broadcast news transcription system”, Proc.Darpa Broadcast News Transcription and Understanding Workshop , Landsdowne, pp.133-137, 1998.
[10].J.Amera, C.Wooters, “ A Robust speaker clustering algorithm”, Proc.ASRU(Automatic Speech Recognition Understanding) Workshop, U.S. Virgin Islands, pp.411-416, 2003
[11].B.Zhou, J.H.L.Hansen, “Unsupervised Audio Stream Segmentation and clustering via the Baysian Information Criterion”, Proc. ICSLP, Beijing, China, pp. 714-717, 2000.
[12].K.Sommez, L.Heck, M.Weintraub, “Speaker Tracking and Detection with Multiple Speakers”, Proc. EUROSPEECH , Budapest, Vol. 5, pp. 2219 – 2222, 1999.
[13].P.C.Woodland, T.Hain, S.Johnson, T.Niesler, A.Tuerk, S.B.Young, “ Experiments in Broadcast News Transcription”, Proc.ICASSP, Seattle, Washington, pp.909 ff, 1998.
[14].L.Wilcox, F.Chen, D.Kimber, V.Balasubramanian, “Segmentation of Speech Using Speaker Identification “, Proc. ICASSP, Adeliade, Australia, Vol, pp. 161-164, 1994.
[15].H.Kim, D.Ertelt, T.Sikora, “ Hybrid speaker-based segmentation system using model-level clustering”, Proc. ICASSP, Philadelphia, USA, Vol,pp. 745-748, 2005.
[16].H.Kim, T.Sikora, “Automatic Segmentation of Speakers in Broadcast Audio Material”, Proc. SPIE, Vol. 5307, pp.429-438, 2003.
[17].P.Yu, F.Seide, C.Ma, E.Chang, “ An Improved Model-based Speaker Segmentation System”, Proc. EUROSPEECH, Geneva, Switzerland, pp. 2025-2028, 2003.
[18].D.Valj, B.Kacic, B.Horvat, “Usage of frame dropping and frame attenuation algorithms in automatic speech recognition system”, IEEE proceeding, pp.149-152, 2003.
[19].J.Faneuff, “Spatial, spectral, and perceptual nonlinear noise reduction for hands-free microphones in a car”, Master Thesis Electrical and computer Engineering, 2002.
[20].L.Karray, C.Mokbel, J.Monne, “ Solutions for robust speech\non speech detection in wireless environment”, IEEE proceeding, pp.166-170, 2002.
[21].همایونپور.م، ا.ش.نبوی، "مقایسه و ارزیابی روشهای تشخیص گفتار از سکوت"، کنفرانس بین المللی فن آوری اطلاعات، دی ماه 1382. صفحه 629-639
[22].D.R.Paoletti, G.Erten, “Enhanced silence detection in variable rate coding systems using voice extraction “, proc. 43IEEE Midwest symp, vol.2, PP.592-594, 2000.
[23].A.Benyassine, E.Shlomot, H.Yu Su, E.Yuen, “ Arobust low complexity voice activity detection algoritm for speech communication systems “, IEEE proceeding, pp. 97-98, 1997.
[24].A.Sangwan, M.C.Chiranth, H.S.Jamadagni, R.Sah, R.V.Prasad, V.Gaurav, “ VAD techniques for real-time speech transmission on the Internet”, 5th IEEE Internetional conference on High-speed Networks and Multimedia communications, pp. 46-50, 2002.
[25].S.G.Tanyer, H.Ozer, “Voice activity detection in non-stationary Gaussian noise” proceeding of ICSP,pp. 1620-1623. 1998.
[26].W.Shin, B.Lee, Y.Lee, “Speech/ non-speech classification using multiple features for robudt end point detection”, IEEE ICASSP, pp.876-881, 2000.
[27].B.V.Harsha, “Anoise robust activity detection algorithm”, proc. Of int. symposium of intelligent multimedia video and speech processing, pp. 322-325, 2004.
[28].R.Khemchandani, “Twin Support Vector Machines for Pattern Classification”, IEEE Transactions on pattern analysis and machin intelligence, pp.905-910, 2007.
[29].B.Fergani, M.Davy, A.Houacine, “ Speaker Diarization using one-class support vector machines”, Sience Direct, Speech Communication50, pp.355-365, 2008.
[30].H.I.Kim, S.K.Park, “ Voice activity detection algorithm using radial basis function network”, Electronics Letters, Vol.40, No.22, 2004.
[31].P.Renevey, A.Drygajlo, “Entropy based Voice Activity Detection in very noisy conditions”, Eurospeech’01 , pp.1883-1886 , 2001.
[32].Jia-Lin Shen, Jeih-Weih Hung, Lin-Shan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language, Sydney, Australia, November 30-December4, 1998.
[33].I.Abdullah, S.Montresor, M.Baudry, “Robust speech/non-speech detection in adverse conditions using an entropy based estimator”, IEEE proceeding, pp.757-760, 1977.
[34].R.Tucker, “ Voice activity detection using a periodicity measure”, IEEE Proceeding-I. VoI. 139, No.4, pp.377-380, 1992.
[35].I.D.Lee, H.P.Stern, S.A.Mahmoud, “ A voice activity detection algorithm for communication systems with dynamically varying back ground acoustic noise”, IEEE proceeding, pp.1214-1218, 1998.
[36].H.Kobatake, K.Tawa, A.Ishida, “Speech/non-speech discrimination for speech recognition system under real life noise environment “, IEEE proceeding, pp.365-368, 1989.
[37].J.Ramirez, J.C.Segura, C.Benitez, A.De la Torre, A.Rubio, “ A new adaptive long-term Spectral Estimation voice activity detector”, EUROSPEECH, pp.3041-3044, 2003.
[38].Ramirez et al, “Efficient voice activity detection algorithms using long-term speech information”, speech communication, Vol.42, Issues 3-4, pp.271-278, 2004.
[39].F.Beritelli, S.Casale, A,Cavallaro, ”A robust voice activity detector for wireless communication using soft computing”, IEEE proceeding, pp.1818-1828, 1998.
[40].Q.Jin, K.Laskowski, T.Schultz, A.Waibel, ”Speaker Segmentation and Clustering in meetings”, ICSLP, JAEJU Island, Korea, pp.945-951, 2004.
[41].J.Rmirez, J.C.Segura, C.Benitez, A.De la Torre, A. Rubio, ”An Effective Subband OSF-Based VAD with Noise Reduction for robust speech recognition” IEEE 2005.
[42].J.Wei, L.Du, Z.Yan, H.Zeng, “A new algorithm for voice activity detection “, IEEE proceeding, pp.588-590, 2003.
[43].Vijayachander, Shobha Devi, “ A novel algorithm for voice activity detection”, IEEE proceeding, pp.222-225, 2005.
[44].M.Jelinek, F.Labonte, “Robust signal/noise discrimination for wideband speech and audio coding”, proc.IEEE Workshop on speech Coding, Delevan, Wisconsin, USA,pp.151-153, September 17-20, 2000.
[45].N.R.Garner, P.A.Barrett, D.M.Howard, A.M.Tyrrell, “ Robust noise detection for speech detection and enhancement”, electronics letters, Vol.33, No.4, pp.270-271, 1997.
[46].M.Orlandi, A.Santarelli, D.Falavigna, “Maximum Likelihood endpoint detection with time-domain features”, eurospeech 2003, Geneva, pp.1757-1760.
[47].A.Acero, C.Crespo, C.Del La Torre, J.C.Torrecilla, “Robust HMM-based endpoint detection”, Euro speech, pp.1551-1554, 1993.
[48].W.H.Abdullah, “HMM-based techniques for speech segments extraction”, science programming, pp.221-239, 2002.
[49].H.Othman, T.Abdulnasr, “Asemi-continuos state transition propability HMM-based voice activity detection “, IEEE proceeding-I. Vol.139, No.4, pp.821-824, 2004.
[50].R.Sarikaya, J.H.L.Hansen, “Robust speech activity detection in the presence of noise”, ICSLP, 1998.
[51].F.Beitelli, S.Casale, A.Cavallaro, “Adaptive voice activity detection for wireless communications based on hybrid fuzzy learning”, IEEE proceeding, pp.1729-1734, 1998.
[52].A.Cavallaro, F.Beritelli, S.Casale, ”Afuzzy logic based speech detection algorithm for communications in noisy environment”, IEEE proceeding, pp.565-568, 1998.
[53].Y.Tian, J.Wu, Z.Wang, D.Lu,”Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection”, IEEE proceeding, pp.444-447, 2003.
[54].F.Beaufays, D.Boies, M.Weintraub, Q.Zhu, “Using speech/non-speech detection to bias recognition search on noisy data”, IEEE proceeding, pp.424-427, 2003.
[55].S.Grashey, “A new voice activity detection based on self organizing maps” ,Euro Speech, pp.1733-1736. 2003.
[56].A.Sangwan, H.S.Jamadagni, M.C.Chiranth, R.Sah, R.V.Prasad, V.Guarav, “Second and third adaptable threshold for VAD in VoIP”, IEEE proceeding, pp.1693-1696, 2002.
[57].C.Dong, K.Jinming, “ A robust voice activity detector applied for AMR”, proceeding of ICASP, pp.687-692, 2000.
[58].E.Cornu, H.Shikhzadeh, R.L.Brennan, H.R.Abutalebi, E.C.Y.Tam, P.Iles, K.W.Waong, “ETSI AMR2 VAD:Evaluation and ultra low resource implementation”, IEEE proceeding, pp.585-587, 2003.
[59].P.A.Barrette,”Information tone handling in the half rate GSM voice activity detector” , IEEE proceeding, pp.72-76, 1995.
[60].A.Benyassine, E.Shlomot, H.Yusu, ”ITU-T recommendation G.729 Annex B:A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data application”, IEEE procedding, pp.64-73, 1997.
[61].J.Shaojun, G.Hitato, Y.Fuliang, ” Anew algorithm for voice activity detection based on wavelet transform”, proc.of int.symposium of intelligent multimedia, video and speech processing, pp.222-225, 2004.
[62].L.Rabiner, B.H.Juang,”Fundamentals of Speech Recognition” Prentice Hall, 1993.
[63].J.R.Deller, J.G.Proakis, J.H.L.Hansen, “Discrete-Time Processing of Speech Signals”, Macmillan Publishing Company, 1993.
[64].T.Kemp, M.Schmidt, M.Westphal, A.Waibel, ”Strategies for automatic segmentation of audio data”, Proc.ICASSP, Istanbul. Turkey, Vol.3, 1423-1426, 2000.
[65].S.Kwon, Sh.N, ”Unsupervised Speaker Indexing Using Generic Models”, IEEE Transactions on Speech and Audio Processing, Vol. 13, no.5, pp. 1004-1013, 2005.
[66].H.Gish, M.H.Siu, R.Rohlicek, “ Segregation of Speakers for Speech Recognition and Speaker Identification”, Proc. ICASSP, Toronto, Canada, Vol.2, pp.873-876, 1991.
[67].L.Lu, H.J.Zhang, “Content Analysis for Audio Classification and Segmentation “, IEEE Transaction on Speech and Audio Processing, Vol. 10, NO. 7, pp. 504-516, 2002.
[68].B.Zhou, J.H.L.Hansen, “Efficient Audio Stream Segmentation via the Combined T2-Statistic and Bayesian Information Criterion”, IEEE Transsactions on speech and audio processing, Vol. 13, No.4, pp. 467-474, 2005.
[69].G.Schwarz, “Estimating the Dimension of a Model”, The Annals of statistics, Vol. 6, No. 2, pp.462-464, 1978.
[70].J.Ajmera, H.Bourlard, I.Lapidot, I.Mccowan,”Unknown-Multiple speaker clustering using HHM”, Proc.ICSLP,Denver, USA, PP.573-576, 2002.
[71].Laura Docio-Fernandez, Carmen Garcia-Mateo, “ Speaker Segmentation , Detection and Tracking in Multi Speaker Long Audio Recordings”, Third COST275 Workshop “Biometrics on the Internet”, University of Hertfordshire, Hatfield, UK, 2004.
[72].W.H.Tsai, S.S.Cheng, and H.M.Wang, “Speaker Clustering of Speech Utterancves using a voice characteristic reference space”, Proc. ICSLP, Jeju Island, Korea, pp.1237-1241, 2004.
[73].S.E.Tranter, M.J.F.Gales, R.Sinha, S.Umesh, P.C.Woodland, “ The Development of The Cambridge University RT-04 Diarisation System”, RT-04F Workshop, pp.1557-1565, 2004.
[74].C.Barras, X.Zhu, S.Meignier, J.-L.Gauvain, “Improving Speaker Diarization”, proc.RT-04F Workshop (Fall 2004 Rich Transcription Workshop), pp.1498-1503, 2004.
[75].Daniel.Moraru, Mathieu.Ben, Guillaume Gravier, “Experiments on Speaker tracking and segmentation in radio broadcast news”, INTERSPEECH, Lisbon, Portugal, pp.3049-3052, 2005.
[76].A.K.Jain, M.N.Murty and P.J.Flynn,” Data Clustering: A review”, ACM Computing Surveys, Vol. 31, No.pp.264-323, 1999.
[77].Kh.Aghajani, M.S Thesis, “Voice Activity Detection in the Speech Signal With Stationary Noise Based By Wavelet Transform”, sharifuniversity of technology, computer engineering department, 2006.
[78].H.Veisi,M.SThesis,”Model-based methods for noise robust speech recognitionsystems”, sharifuniversity of technology, computer engineering department, 2005.
[79].Y.Seyyedin, M.S Thesis, “Acoustic segmentation”, sharif university of technology, computer engineering department , 2009.
[80].L.Ardakanian, M.S Thesis, “ Speaker Clustering and Segmentation in a Multi-Speaker Environment”, amirkabiruniversity of technology electrical engineering department, 2006.
[81].B.Ahmed,W.Harvey,”A voice activity detector using Chi-Square test” IEEE proceeding, pp.625-628, 2004.
[82].S.Zhang, S.Zhang, B.Xu,”A Two-Level Method for Unsupervised Speaker-based Audio Segmentation”, IEEE, 18th international conference on pattern recognition, pp.1536-1540, 2006.