Speech Activity Detection and its Evaluation in Speaker Diarization System

  • Sukhvinder Kaur
  • J. S. Sohal Director, LCET, Ludhiana-141113, Punjab
Keywords: Speaker Diarization System; Artificial Neural Network; Gaussian Mixture Model; ROC; DET


In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SAD
comparatively better than Gaussian mixture model based SAD.

Author Biography

Sukhvinder Kaur

Ph.D. Research Scholar, I.K. Gujral PTU, Jalandhar, Kapurthala-144601


How to Cite
Kaur, S., & Sohal, J. S. (2017). Speech Activity Detection and its Evaluation in Speaker Diarization System. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 16(1), 7567-7572. https://doi.org/10.24297/ijct.v16i1.5893