what is a spectrogram | spectrograms in Praatformants | reading a spectrogram | drawing a spectrogram | exercises

"Reading" a spectrogram

This paragraph intends to provide some hints on how to read a spectrogram. For a more detailed explanation please refer to Rietveld and van Heuven, p. 144).

While by observing an oscillogram it is only possible to identify large phonetic classes, the spectrogram provides us with enough information to determine the phones. It is possible to deduce the phones (with good chances of success) by exploiting the information about
- the frequency values of the formants
- the energy pattern throughout the spectrogram
- the formant transitions (from vowel to consonants and from consonants to vowels)
Except for the formant transitions (see further), these information are to be found also in a spectrum, but spectrograms allow us to have an over time view, instead.

By reading a spectrogram, we already know that (see also Making a spectrogram of a speech signals and Formants in a spectrogram:
- the blacker regions are frequency regions with a high degree of energy,
- the black band at the bottom of the spectrogram displays the F0, which is not always easy to distinguish from the F1 (especially in closed vowels, which have a low F1),
- vowels are characterised by a clear formant pattern (and distinguished in terms of the first 5 formants), displayed as black bands,
- consonants (especially voiceless ones) do not have a clear formant pattern, but the distribution of energy throughout the spectrum helps us identifying them,
- the vertical stripes at (more or less regular) time intervals in the spectrogram correspond to the glottal pulses (cycles of opening and closing of the glottal folds.

Vowels are to be distinguished through their formant values, especially through F1, F2 (see the vowel triangle) and F3.

Dipthongs (such as in the Dutch words bij, bui en bouw) are made up of 2 different vowel sounds. However, in dipthongs  these vowels are not pronounced as they would be pronounced in steady state vowel context, and the formant values are gradually moving from the values of one formant into those of the other.

In an VCV context (being a  voiced consonant), formants are characterised by transitions. Transitions are fast changes in the formants' central frequency while passing from vowel to consonant (end transitions) or from consonant to vowel (begin transitions). The F2 value in the vowel towards which the formant points in a transition is often called locus.
F1 transitions go from a higher value in vowels to a lower one in consonants (or viceversa), and are thus not useful for determining the place of articulation. As a matter of fact, F1 is related to the degree of jaw opening: open vowels have a higher F1 than closed ones, see vowel triangle.
Instead, the F2 contour provides us with information about the place of articulation. The F2 values are negatively correlated with the length of the front cavity: if it gets shorter (for instance by articulating a [d] through an alveolar obstruction) the F2 gets higher, while in a [b] (longer vocal tract) this value is lower.

Voiceless consonants are not characterised by formants (even in an intervocalic context), but it is still possible to identify their place of articulation by looking at which frequency regions display a high concentration of energy. The same can be done with voiced plosives and fricatives that are not in an intervocalic context (and thus no formants are clear).
We already know that the fricative [s] and the plosive [t] (but also the voiced [z] en [d]) are characterised by high frequency energy. During the articulation of these phones, the vocal tract is divided into 2 smaller cavities: the one behind the tongue-teeth obstruction and the one between this obstruction and the lips (where the sound is finally spread out). The front cavity (between tonge-teeth and lips) is very small, which releases energy in the higher frequency regions.
In palatal consonants (suach as the plosives [k] and [g], respectively voiceless and voiced, but also in fricatives) this forecavity is much bigger, which releases energy in the lower frequency regions.
In the plosives [p], [b] and in the fricatives [f], [v], no forecavity is present, at all, as these are labial (or labio-dental) phones. In the spectrum of these phones, the energy is equally spread all over the frequency regions.

Nasals and laterals are voiced and have a formant structure resembling that of vowels. Apart from the 2-cavity configuration of the vocal tract (which we have just seen), additional cavities are involved in the production of these phones: respectively, the nasal one and the lateral ones (on the tongue's side). These cavities have their own resonances, which interfere with the oral ones as antiresonances.
In  spectrogram it is possible to distinguish between [n] and [m] exploting the F2 value (similarly to the distinction between [d] and [b]), having, respectively, a shorter and a longer articulation tract, which leads to a higher and lower F2.

Previous Next