This paragraph intends to provide some hints on how to read a spectrogram. For a more detailed explanation please refer to Rietveld and van Heuven
, p. 144).
While by observing an oscillogram it
is only possible to identify large phonetic classes, the spectrogram
provides us with enough information to determine the phones
. It is possible to deduce the phones (with good chances of success) by exploiting the information about
- the frequency values of the formants
- the energy pattern throughout the spectrogram
- the formant transitions (from vowel to consonants and from consonants to vowels)
Except for the formant transitions
(see further), these information are to be
found also in a spectrum, but spectrograms allow us to have an over time
By reading a spectrogram, we already know that (see also
Making a spectrogram of a speech signals
and Formants in a spectrogram:
- the blacker regions are frequency regions with a high degree of energy,
- the black band at the bottom of the spectrogram displays the F0,
which is not always easy to distinguish from the F1 (especially in
closed vowels, which have a low F1),
- vowels are characterised by a clear formant pattern (and
distinguished in terms of the first 5 formants), displayed as black bands,
- consonants (especially voiceless ones) do not have a clear
formant pattern, but the distribution of energy throughout the spectrum
helps us identifying them,
- the vertical stripes at (more or less regular) time intervals in the
spectrogram correspond to the glottal pulses (cycles of opening and
closing of the glottal folds.
are to be distinguished through their formant values, especially through F1, F2 (see the vowel triangle
) and F3.
(such as in the Dutch words bij
are made up of 2 different vowel sounds. However, in dipthongs these
vowels are not pronounced as they would be pronounced in steady state
vowel context, and the formant values are gradually moving from the values of
one formant into those of the other.
In an VCV
being a voiced consonant), formants are characterised by transitions
. Transitions are fast changes in the formants' central frequency while passing from vowel to consonant (end transitions
) or from consonant to vowel (begin transitions
). The F2 value in the vowel towards which the formant points in a transition is often called locus
F1 transitions go from a higher value in vowels to a lower one in
consonants (or viceversa), and are thus not useful for determining the
place of articulation. As a matter of fact, F1 is related to the degree
of jaw opening: open vowels have a higher F1 than closed ones, see
Instead, the F2 contour provides us with information about the place of
articulation. The F2 values are negatively correlated with the length
of the front cavity: if it gets shorter (for instance by
articulating a [d] through an alveolar obstruction) the F2 gets
higher, while in a [b] (longer vocal tract) this value is lower.
characterised by formants (even in an intervocalic context), but it is
still possible to identify their place of articulation by looking at
which frequency regions display a high concentration of energy. The
same can be done with voiced plosives and fricatives that are not in an
intervocalic context (and thus no formants are clear).
We already know that the fricative [s] and the plosive [t] (but
also the voiced [z] en [d]) are characterised by high frequency energy.
During the articulation of these phones, the vocal tract is divided
into 2 smaller cavities: the one behind the tongue-teeth obstruction
and the one between this obstruction and the lips (where the sound is
finally spread out). The front cavity (between tonge-teeth and lips) is
very small, which releases energy in the higher frequency regions.
In palatal consonants (suach as the plosives [k] and [g], respectively
voiceless and voiced, but also in fricatives) this forecavity is much
bigger, which releases energy in the lower frequency regions.
In the plosives [p], [b] and in the fricatives [f], [v], no forecavity
is present, at all, as these are labial (or labio-dental) phones. In
the spectrum of these phones, the energy is equally spread all over the
are voiced and have a formant structure resembling that of vowels.
Apart from the 2-cavity configuration of the vocal tract (which we have
just seen), additional cavities are involved in the production of these
phones: respectively, the nasal one and the lateral ones (on the
tongue's side). These cavities have their own resonances, which
interfere with the oral ones as antiresonances
In spectrogram it is possible to distinguish between [n] and [m]
exploting the F2 value (similarly to the distinction between [d] and
[b]), having, respectively, a shorter and a longer articulation tract,
which leads to a higher and lower F2.