Doctoral Degrees (Electrical and Electronic Engineering)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Electrical and Electronic Engineering) by Author "Agenbag, Wiehan"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemAutomatic sub-word unit discovery and pronunciation lexicon induction for automatic speech recognition with application to under-resourced languages(Stellenbosch : Stellenbosch University, 2020-04) Agenbag, Wiehan; Niesler, T. R.; Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.ENGLISH ABSTRACT: Automatic speech recognition is an increasingly important mode of human- computer interaction. However, its implementation requires a sub-word unit inventory to be designed and an associated pronunciation lexicon to be crafted, a process that requires linguistic expertise. This step represents a significant bottleneck for most of the world’s under-resourced languages, for which such resources are not available. We address this challenge by developing techniques to automate both the discovery of sub-word units and the induction of corresponding pronunciation lexica. Our first attempts at sub-word unit discovery made use of a shift and scale invariant convolutional sparse coding and dictionary learning framework. After initial investigations showed that this model exhibits significant temporal overlap between units, the sparse codes were constrained to prohibit overlap and the sparse coding basis functions further globally optimised using a metaheuristic search procedure. The result was a unit inventory with a strong correspondence with reference phonemes, but highly variable associated transcriptions. To reduce transcription variability, two lattice-constrained Viterbi training strategies were developed. These involved jointly training either a bigram sub-word unit language model or a unique pronunciation model for each word type along with the unit inventory. By taking this direction, it was necessary to abandon sparse coding in favour of a more conventional HMM-GMM approach. However, the resulting strategies yielded inventories with a higher degree of correspondence with reference phonemes, and led to more consistent transcriptions. The strategies were further refined by introducing a novel sub-word unit discovery approach based on self-organising HMM-GMM states that incorporate orthographic knowledge during sub-word unit discovery. Furthermore, a more sophisticated pronunciation modeling approach and a two-stage pruning process was introduced. We demonstrate that the proposed methods are able to discover sub-word units and associated lexicons that perform as well as expert systems in terms of automatic speech recognition performance for Acholi, and close to this level for Ugandan English. The worst performing language among those evaluated was Luganda, which has a highly agglutinating vocabulary that was observed to make automatic lexicon induction challenging. As a final step, we addressed this by introducing a data-driven morphological segmentation step that is applied before performing lexicon induction. This is demonstrated to close the gap with the expert lexicon for Luganda. The techniques developed in this thesis demonstrate that it is possible to develop an automatic speech recognition system in an underresourced setting using an automatically induced lexicon without sacrificing performance, even in the case of a highly agglutinating language.