Incorporating dynamic trends in HMM states
 
Articulatory feature based speech units
 
Co-articulation modeling through data smoothing methods
 
Understanding variations in speech data
 
New Classification Methods for Speech Recognigiotn
We propose, implement, and evaluate a class of non-stationary-state
hidden Markov models (HMMs) having each state associated with a
distinct polynomial regression function on time plus white Gaussian
noise. The model represents the transitional acoustic trajectories of
speech in a parametric manner, and includes the standard
stationary-state HMM as a special, degenerated case. We develop an
efficient dynamic programming technique which includes the state
sojourn time as an optimization variable, in conjunction with a
state-dependent orthogonal polynomial regression method, for
estimating the model parameters. Experiments on fitting models to
speech data and on limited-vocabulary speech recognition demonstrate
consistent superiority of these new non-stationary-state HMMs over the
traditional stationary-state HMMs.
Paper:
Postscript (621Kb)
IEEE Transactions on Speech and Audio Processing,
2:4 (1994),
pp. 507-520.
Paper:
Postscript (249Kb)
A new method is developed to estimate
the trajectories of spectral center-of-gravity
using robust statistical models with
penalized weighted spline smoothers.
Most of the existing methods for tracking speech formant
trajectories
are based on dynamic programming algorithms with certain
continuity constraints on the formant frequencies.
The objective functions (or loss functions) in these
approaches are usually
ad hoc and have very complex expressions that are
difficult to optimize.
Also, many existing methods rely on the accuracy of the LPC
spectral peaks
and are not very robust
against possible missing or spurious peaks.
Instead of using the peaks of the LPC spectral
functions, we propose a
new approach to the estimation of
the ``center-of-gravities'' in spectrogram
using mixture models of spline smoothers.
Paper:
Postscript (203Kb)
We applied a hierarchically structured Analysis of
Variance (ANOVA) method to analyze, in a quantitative manner, the
contributions of various identifiable factors to the overall acoustic
variability exhibited in fluent speech data of TIMIT processed in the
form of Mel-Frequency Cepstral Coefficients. The results of the
analysis show that the greatest acoustic variability in TIMIT data is
explained by the difference among distinct phonetic labels in TIMIT,
followed by the phonetic context difference given a fixed phonetic
label. The variability among sequential sub-segments within each
TIMIT-defined phonetic segment is found to be significantly greater
than the gender, dialect region, and speaker factors.
Paper:
Postscript (100Kb)
A novel method for classifying frames of speech waveforms
to a given set of phoneme classes is proposed. The method involves
combining an approximation to multiple smoothing spline logistic
regression (known as the ``Support Vector Machine'' in the machine
learning literature) with hidden Markov models (HMMs). The method is
compared with the standard technique in the speech recognition
literature, that of HMMs with Gaussian mixture models. Both models
were trained and tested using data drawn from the publicly available
TIMIT database. Our results show that the two types of models are
competitive for this data, but have very different structures. Such
differences can be used to improve recognition rates by combining the
two types of classifiers.
Paper:
PDF (83Kb)
Incorporating dynamic trends in HMM states
The standard method of hidden Markov modeling (HMM) is widely used for
speech recognition since th 70s. A hidden Markov model contains the
mathematical structure of a (hidden) Markov chain with each state
associated with a distinct independent and identically distributed
(IID) or a stationary random process. The model is used as a type of
data-generator for speech signals and approximates the near
continuously varying speech signals in a piece-wise constant manner.
Such an approximation would be a reasonably good one when each state
is intended to represent only a short portion of sonorant sounds.
However, since the acoustic patterns of continuously spoken speech
sounds are nearly never stationary in nature it would be desirable to
improve this rather poor piece-wise constant approximation in general.
Hidden Markov Models with Non-stationary States
Don X. Sun
Li Deng (U. of Waterloo)
Speech recognition using hidden Markov models with
polynomial regression
functions as non-stationary states
Li Deng (U. of Waterloo)
M. Aksmanovic
Don X. Sun
C.F.J. Wu
Articulatory Feature-Based Hidden Markov Models
We have been developing a
feature-based general statistical framework for automatic speech
recognition via novel designs of minimal or atomic units of speech,
aiming at a parsimonious scheme to share the inter-word and
inter-phone speech data and at a unified way to account for the
context-dependent behaviors in speech. The basic design philosophy
has been motivated by the theory of distinctive features (Chomsky and
Halle, 1968; Stevens, 1986) and by a new form of phonology which
argues for use of multi-dimensional articulatory structures (Browman
and Goldstein, 1992). In this paper, we present the feature-based
recognizer developed most recently, which is capable of operating on
all classes of English sounds. We provide detailed descriptions of
the design considerations for the recognizer and of key aspects of the
design process. This process, which we call lexicon ``compilation'',
consists of three elements:
A standard
phonetic classification task from the TIMIT database is used as a
test-bed to evaluate the performance of the recognizer. The
experimental results provide preliminary evidence for the
effectiveness of our feature-based approach to speech recognition.
A Statistical Framework for Automatic
Speech Recognition
Using the Atomic Units Constructed From
Overlapping Articulatory Features
L. Deng and Don X. Sun
Journal of the Acoustical Society of America,
95:5 (May 1994),
pp. 2702-2719.
Estimation of Spectral Trajectories
Don X. Sun
Analysis of Acoustic-Phonetic Variations in Speech
Don X. Sun
Li Deng (U. of Waterloo)
A Support Vector/Hidden Markov Model Approach to Phoneme
Recognition
Steven E. Golowich
Don X. Sun
Last modified: $Date: 2000/11/02 21:14:27 $
dxsun@research.bell-labs.com