Expanding the Functionality of Speech Corpora for Broader Applications

QR Code

Michal HALON and Andrzej PACUT

NASK National Research Institute, Warsaw, Poland

Abstract

Numerous speech corpora have been developed for a variety of purposes. While some are designed for specific topics, they often can be adapted for broader applications, especially when suitable datasets for specific domains or languages are scarce. This paper introduces a method for adapting existing speech corpora for use in biometric recognition and personalized speech synthesis. This involves verifying the accuracy of key metadata in the processed dataset, such as gender labeling and speaker attribution. To accomplish this, we propose a method that analyzes biometric verification distributions of voice samples. Potential inaccuracies are flagged for subsequent human expert listening analysis. The method was tested on the Clarin-PL polish speech corpus using Phonexia software, resulting in improved biometric recognition metrics, including Equal Error Rate (EER) and False Acceptance/False Rejection Rates (FAR/FRR). Our findings demonstrate that this approach can significantly enhance the reliability and applicability of speech corpora in extended applications, especially for those where suitable datasets are scarce. By reducing the need for extensive manual verification, the proposed method facilitates broader utilization of existing speech corpora for advanced biometric and speech synthesis tasks.

Keywords: Speech corpus, voice biometrics, speech synthesis.
Shares