The Effect of the Hypopharyngeal &Supra-Glottic Shapes on the
Singing Voice
Hiroshi Imagawa, Ken-Ichi Sakakibara, Niro Tayama, Seiji Niimi,
2003
Abstract
The
timbre of the singing voice is strongly affected by the shapes of the laryngeal
tube and hypopharynx. We propose a physical model of the vocal tract and
larynx, including the vocal and ventricular folds, for synthesis. We study the
effect of the shapes of the laryngeal tube and hypopharynx on the synthesized
voice using the proposed model.
We synthesized normal phonation, operatic singing, and
throat singing voices by changing the hypopharyngeal and supra-glottic shapes
and evaluated the acoustical effects of the various shapes of the hypopharynx
and laryngeal tube. The results show that all the shapes of the hypopharynx,
larynx tube, piriform fossa, and laryngeal ventricle play an important role in
determining voice quality in singing.
1.
Introduction
The
voice quality in singing is determined by due to both the laryngeal voice and
vocal tract shape. In most cases, the laryngeal voice is characterized by the
vibratory pattern of the vocal folds. These vibratory patterns vary depending
on the vocal registers (whistle, falsetto, modal, and vocal fry) and singing
style (belting, opera, and so on). In throat singing (Khöömei, Khöömij, Kai,
etc.), the ventricular folds (also referred to as the false vocal folds)
vibrate as well and their vibration is essential for the special timbres of the
drone and kargyraa voices [3, 8, 11, 12].
The vocal tract shape also contributes to other
aspects of voice quality besides phonetics. In particular, the contributions of
the lower vocal tract (the larynx tube, piriform fossa, and hypopharynx) seem
to be important in generating the special voice quality in singing. Previous
fiberscope observations have shown differences in the lower vocal tract shape
in various styles of singing. In operatic singing the hypopharynx and piriform
fossa are relatively wide, and larynx tube is relatively narrow. The volume
ratio matching of the larynx tube to the hypopharynx produces “the singer’s
formant” [13]. The belting voice is characterized by a narrow hypopharynx and
piriform fossa and a narrow larynx tube [14]. Japanese Min-yoh is characterised
by a narrow larynx tube and ventricular fold constriction [7].
In this paper, we investigate the production
mechanisms of various timbres in singing voices using a physical model that
allows the vibration of the ventricular folds. The model is realized by
acoustical coupling of the two-by-two-mass model and vocal tract with the
piriform fossa and infraglottic cavity. We use the model to study the acoustic
effect of the shape of the larynx tube (the laryngeal ventricle, ventricular
fold, and upper part) and hypopharynx on the voice quality.
2. Synthesis Model
The
mechanism of the physical model for the synthesis of singing voices is depicted
in Fig. 1. The laryngeal part is described by the 2_2-mass model, which
represents the ventricular folds in a self oscillating model as well as the
vocal folds. The 2_2-mass model was obtained by improving the two-mass model
[2, 4, 10].
The vocal tract is represented as a lossless
transmission line. The length of each section is set to 0.4cm squared. The ventricular folds and laryngeal ventricle
are also assumed to be parts of the vocal tract. In the case that the
ventricular folds vibrate, the area of the ventricular-fold section varies.
The effect of the nasal cavity and change of
intra-pressure of the lungs are neglected for simplicity. Fig. 2 shows the
equivalent circuit of our proposed model.
2.1. Subglottic region
The
cross-sectional areas of the subglottal system are determined based on the
anatomical data reported in [5]. We roughly approximate the subglottic region
by 66 cylindrical sections each 0.4cm long and calculate the acoustic
characteristics by using the equivalent circuit as in Fig. 2. Let y cm be the
distance from the glottis. Then, the areas A cm squared is determined as
follows: A=2.5 if
2.2. Vocal folds
The
vocal folds are represented as the two-mass model proposed in [4]. The F0 of
vocal-fold oscillation is controlled by changing the tension parameter Q [4].
For the initial setting of physical parameters, we use all of the normal values
described in [4] except the rest
glottal areas
2.3. Laryngeal ventricle
The
laryngeal ventricle as a cylindrical section (A1, l1) such that l1= 0.4 cm is
set depending on the phonation type, but to 1.5 cm as the normal value. Note
that even if the ventricular folds strongly constrict and contact as in throat
singing, the space of the laryngeal ventricle is observed [9].
2.4. Ventricular folds
The
ventricular folds contain few muscle fibres and, unlike the vocal folds, their
physical properties essentially do not change. Therefore, it is meaningless to
define a tension parameter for the ventricular folds. Hence, some other
parameterisation is necessary.
It is a physiological fact that the ventricular folds
are adducted by the action of certain laryngeal muscles, such as the
cricoepiglottic muscle and thyroepiglottic muscle [6], but it is unclear
whether their physiological properties, such as mass and stiffness, are changed
or not by the adduction.
We take into account the changing shapes of the ventricular
folds and introduce an adduction parameter Q’ for the ventricular folds, which
is one possible parameterization for the stiffness, mass, and the false glottal
area at rest [10]. We set the initial values of the parameters for the
ventricular folds and laryngeal ventricles as
The two different laryngeal voices in throat singing,
drone and kargyraa, are generated by coupling of the vocal and ventricular
vibrations. In the drone voice, the ventricular folds vibrate in the same
period as the vocal folds, and in the kargyraa voice, the ventricular fold
vibrate in the integer multiple (usually double or triple) period of the vocal
fold. The constriction of the ventricular folds at rest is strong for both cases.
However, the constriction in the case of kargyraa is relatively loose.
Therefore, by changing the area between the ventricular folds at rest, the
vibratory patterns of normal, drone, and kargyraa phonations can be simulated
by using the 2x2-mass model [10].
2.5. Piriform fossa
The
bilateral piriform fossa is assumed to be symmetric and, thereby, implemented
as one cavity. From the result of the preliminary experiment, the acoustic
characteristics are not significantly different between one-cavity and
two-cavity cases.
The piriform fossa is implemented as follows. First,
according to the MRI data in [1], we assume each piriform fossa to be a cone
whose depth is 2 cm and volume is 1.5 cm squared for each side. We represent
the piriform fossa by five cylindrical sections
of
the cylinder portion above the arytenoid apex plane according to [1], and use
the value 0.75 for the end correction coefficient. Finally, if these
adjustments require us to extend the number of sections, we add necessary
sections such that
2.6. Vocal tract
The
vocal tract is represented as a transmission line of n cylindrical hard-walled sections An, Ln) with cross-sectional
area An and length ln. We assume that (A1, l1) is the region of the laryngeal
ventricle with l1 = 0.4 cm and that (A2, l2), (A3, l3) are the spaces between
the ventricular folds with l2, l3 = 0.3cm cm. The ventricular folds are able to
vibrate. We set n = 43. Hence, the length of vocal tract is 17.2 cm. For k = 3,…,43 each section has
length lk =0.4 cm and variable Ak.
3. Acoustic Measurement Using the Synthesis
Model
We
set the length of each part of the vocal tract as shown in Fig. 3. The length
of the larynx tube is 2.4 cm, which includes the ventricular folds sections
with the length of 8cm and laryngeal ventricle section with the length of 4cm.
We also set the length of hypopharynx to 2cm. We set Pl = 5 cm H2O in default
and attached the piriform fossa as shown in Fig. 2. The default areas were set
to 1.5 cm squared for the laryngeal ventricles, 0.5 cm squared for the
ventricular folds and remaining larynx tube section, and 3.14 cm squared for
the hypopharynx and the upper vocal tract.
3.1. Effect of the larynx tube
We
changed the areas of the larynx tube A2,…, A6 excluding the laryngeal ventricle
section. The spectral envelopes of the synthesized sounds are shown in Fig. 4
(LPC analysis, p=24). With decreasing larynx tube area, F2, F3, F4 move lower.
If the area of larynx tube is greater than 0.05 cm squared, the ventricular
folds do not contact. In addition, F5 moves slightly lower and gradually
disappear and the power in the range greater than 4000 Hz decreases.
3.2. Effect of the hypopharynx
Fig.
5 shows the spectral envelopes of the synthesized sounds when we changed the
hypopharynx area. They are shown in F3 and F5 move close to F4. In some
range, A=3 or 6 cm squared in Fig. 5 ,
the formant cluster of F3, F4, F5 is observed around 3000 Hz.
3.3. Effect of the laryngeal
ventricle
The
results are shown in Fig. 6, when the laryngeal ventricle area was changed. As
the area increases, F3, F5 move lower and are pushed close to F2 and F2 is
sharpened. The effect of a zero in the range from 4000 to 5000 Hz also becomes
larger.
3.4. Effect of the piriform fossa
The
spectral envelope of synthesized sounds is shown in Fig. 7 for changes in
piriform fossa volume. As the volume is increased, the effect of a zero around
4500 Hz becomes larger. As pointed out in [15], increasing the volume of the
piriform fossa repels the formants, i.e. F1, F2, F3 and F4 are pushed lower and
so is F4.
4. Singing Voice Synthesis
Based
on the results of the acoustic measurements of the effects of varying the shape
of the larynx tube, hypopharynx, laryngeal ventricle, and piriform fossa, we
chose the nominal settings
of the parameters for normal phonation, operatic singing, and throat singing
(Table 1). In the case of operatic singing, we assume that the lowest four sections
in the hypopharynx have constant areas A7 = A8 = 5.0 cm squared and A9 = A10 =
5.0 cm squared to maintain the large volume ratio of the hypopharynx to the
larynx tube. In throat singing, by controlling the adduction parameter Q’, we
synthesized both drone and kargyraa phontaions. We set the initial value of the
A2, A3, to 0.04 cm squared, Q’ = 1 for drone, and Q’ = 0.55. No significant
differences were observed between the spectral envelopes of the two phonations.
In Fig. 8, the spectral envelope of the drone voice is shown.
The spectral envelopes are show in Fig. 8. The
synthesized operatic singing voice have the singer’s formant around 300 Hz. In
the synthesized throat singing voice, F2 is sharp and the power in the range
from 4000 to 6000 Hz is relatively low. The sharpness of the F2 is the effect
of the laryngeal ventricle resonance and contributes to the generation of the
whistle-like tone.
5. Conclusions
We
studied the acoustic effect of the shape of the larynx tube and hypopharynx.
The acoustic characteristics of the synthesized singing voices reveal good
accordance with known results for operatic and throat singing. Our results show
that the dimensions of the laryngeal ventricle, larynx tube, hypopharynx, and
piriform fossa play an important role in determining voice quality in singing.
For operatic singing, relatively narrow larynx tube
and wide hypopharynx effect to create singing formant as reported in previous
studies [13, 15]. The piriform fossa also contributes to the cluster of F3, F4.
For throat singing, strong constriction of the ventricular folds sharpens F2
and produces the special laryngeal voices by vibration of the ventricular folds
as well.
Most previous studies assumed that the larynx tube has
uniform area. However, our results reveal that the laryngeal ventricle volume
and the ventricular fold constriction also affect the acoustic characteristics
of the singing voices.
In this study, we did not consider the mechanical
interaction of each space in the lower vocal tract. However, the physiological
mechanism determining the shape of the lower vocal tract is very important for
finding appropriate parameters to synthesize various voice qualities. In
particular, layngeal height is very important in determining the configuration
of the lower vocal tract. The activation of the thyropharyngeal muscle,
cricopharyngeal muscle, and paratopharyngeal muscle are also important for
regulating the shape of the hypopharynx and the intrinsic laryngeal muscles are
important for constricting the larynx tube [6]. Further investigations by using
EMG and MRI are needed in order to understand their physiology and, thereby,
control the parameters of the synthesis model.
Acknowledgments
We
would like to thank Seiji Adachi, Kiyoshi Honda, Emi Z. Murano, and Sotaro
Sekimoto for their helpful discussions.
6. References
[1]
J. Dang and K. Honda. Acoustic characteristics of the piriform fossa in models
and humans. J. Acoust. Soc. Am., 101(1):456–465, 1997.
[2]
J. L. Flanagan, K. Ishizaka, and K. L. Shipley. Synthesis of speech from a
dynamical model of the vocal cords and vocal tract.
[3]
L. Fuks, B. Hammarberg, and J. Sundberg. A self-sustained vocal ventricular
phonation mode: acoustical, aerodynamic and glottographic evidences. KTH
TMH-QPSR, 3/1998:49–59, 1998.
[4]
K. Ishizaka and J. L. Flanagan. Synthesis of voiced soudns from a two-mass
model of the vocal cords.
[5]
K. Ishizaka, M. Matsudaira, and T. Kaneko. Input acousticimpedance measurement
of the subglottal system. J. Acoust. Soc. Am., 60(1):190–197, 1976.
[6]
M. Kimura, K.-I. Sakakibara, H. Imagawa, R. Chan, S. Niimi, and N. Tayama.
Histological investigation of the supra-glottal structures in human for
understanding abnormal phonation. J. Acoust. Soc. Am., 112:2446, 2002.
[7]
N. Kobayashi, Y. Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological
characteristics of traditional singing in
[8]
T. C. Levin and M. E. Edgerton. The throat singers of tuva. Scientific
[9]
V. T. Maslov. Functional peculiarities of the larynx during the vocal formation
in Tuva two-voice singing. Vestn. Otorinolaringol., Mar.– Arp.:58–61, 1979. in
Russian.
[10]
K.-I. Sakakibara, H. Imagawa, S. Niimi, and
[11]
K.-I. Sakakibara, T. Konishi, H. Imagawa, E. Z. Murano, K. Kondo, M. Kumada,
and S. Niimi. Observation of the laryngeal movements for throat singing —
vibration of two pairs of the folds in human larynx. Acoust. Soc. Am. World
Wide Press Room, 144th meeting of the ASA, 2002.
http://www.acoustics.org/press/.
[12]
K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa,
and S. Niimi. Vocal fold and false vocal fold vibrations and synthesis of
kh¨o¨omei. In Proc. ICMC 2001, pages 135–138. ICMA, 2001.
[13]
J. Sundberg. The science of the singing voice. Nothern Illinois Univ. Pr.,
1989.
[14]
J. Sundberg, P. Gramming, and J. Lovetri. Comparisons of pharynx, source,
formant and pressure characteristics in Operatic and musical theatre singing.
J. Voice, 7(4):301–310, 1993.
[15]
Return to Mongolian Khoomii Main Page