The Effect of the Hypopharyngeal &Supra-Glottic Shapes on the Singing Voice

Hiroshi Imagawa, Ken-Ichi Sakakibara, Niro Tayama, Seiji Niimi, 2003


The timbre of the singing voice is strongly affected by the shapes of the laryngeal tube and hypopharynx. We propose a physical model of the vocal tract and larynx, including the vocal and ventricular folds, for synthesis. We study the effect of the shapes of the laryngeal tube and hypopharynx on the synthesized voice using the proposed model.

We synthesized normal phonation, operatic singing, and throat singing voices by changing the hypopharyngeal and supra-glottic shapes and evaluated the acoustical effects of the various shapes of the hypopharynx and laryngeal tube. The results show that all the shapes of the hypopharynx, larynx tube, piriform fossa, and laryngeal ventricle play an important role in determining voice quality in singing.


1. Introduction

The voice quality in singing is determined by due to both the laryngeal voice and vocal tract shape. In most cases, the laryngeal voice is characterized by the vibratory pattern of the vocal folds. These vibratory patterns vary depending on the vocal registers (whistle, falsetto, modal, and vocal fry) and singing style (belting, opera, and so on). In throat singing (Khöömei, Khöömij, Kai, etc.), the ventricular folds (also referred to as the false vocal folds) vibrate as well and their vibration is essential for the special timbres of the drone and kargyraa voices [3, 8, 11, 12].

The vocal tract shape also contributes to other aspects of voice quality besides phonetics. In particular, the contributions of the lower vocal tract (the larynx tube, piriform fossa, and hypopharynx) seem to be important in generating the special voice quality in singing. Previous fiberscope observations have shown differences in the lower vocal tract shape in various styles of singing. In operatic singing the hypopharynx and piriform fossa are relatively wide, and larynx tube is relatively narrow. The volume ratio matching of the larynx tube to the hypopharynx produces “the singer’s formant” [13]. The belting voice is characterized by a narrow hypopharynx and piriform fossa and a narrow larynx tube [14]. Japanese Min-yoh is characterised by a narrow larynx tube and ventricular fold constriction [7].

In this paper, we investigate the production mechanisms of various timbres in singing voices using a physical model that allows the vibration of the ventricular folds. The model is realized by acoustical coupling of the two-by-two-mass model and vocal tract with the piriform fossa and infraglottic cavity. We use the model to study the acoustic effect of the shape of the larynx tube (the laryngeal ventricle, ventricular fold, and upper part) and hypopharynx on the voice quality.


2. Synthesis Model

The mechanism of the physical model for the synthesis of singing voices is depicted in Fig. 1. The laryngeal part is described by the 2_2-mass model, which represents the ventricular folds in a self oscillating model as well as the vocal folds. The 2_2-mass model was obtained by improving the two-mass model [2, 4, 10].

The vocal tract is represented as a lossless transmission line. The length of each section is set to 0.4cm squared.  The ventricular folds and laryngeal ventricle are also assumed to be parts of the vocal tract. In the case that the ventricular folds vibrate, the area of the ventricular-fold section varies.

The effect of the nasal cavity and change of intra-pressure of the lungs are neglected for simplicity. Fig. 2 shows the equivalent circuit of our proposed model.


2.1. Subglottic region

The cross-sectional areas of the subglottal system are determined based on the anatomical data reported in [5]. We roughly approximate the subglottic region by 66 cylindrical sections each 0.4cm long and calculate the acoustic characteristics by using the equivalent circuit as in Fig. 2. Let y cm be the distance from the glottis. Then, the areas A cm squared is determined as follows: A=2.5 if



2.2. Vocal folds

The vocal folds are represented as the two-mass model proposed in [4]. The F0 of vocal-fold oscillation is controlled by changing the tension parameter Q [4]. For the initial setting of physical parameters, we use all of the normal values described in [4] except the rest glottal areas



2.3. Laryngeal ventricle

The laryngeal ventricle as a cylindrical section (A1, l1) such that l1= 0.4 cm is set depending on the phonation type, but to 1.5 cm as the normal value. Note that even if the ventricular folds strongly constrict and contact as in throat singing, the space of the laryngeal ventricle is observed [9].


2.4. Ventricular folds

The ventricular folds contain few muscle fibres and, unlike the vocal folds, their physical properties essentially do not change. Therefore, it is meaningless to define a tension parameter for the ventricular folds. Hence, some other parameterisation is necessary.

It is a physiological fact that the ventricular folds are adducted by the action of certain laryngeal muscles, such as the cricoepiglottic muscle and thyroepiglottic muscle [6], but it is unclear whether their physiological properties, such as mass and stiffness, are changed or not by the adduction.

We take into account the changing shapes of the ventricular folds and introduce an adduction parameter Q’ for the ventricular folds, which is one possible parameterization for the stiffness, mass, and the false glottal area at rest [10]. We set the initial values of the parameters for the ventricular folds and laryngeal ventricles as


The two different laryngeal voices in throat singing, drone and kargyraa, are generated by coupling of the vocal and ventricular vibrations. In the drone voice, the ventricular folds vibrate in the same period as the vocal folds, and in the kargyraa voice, the ventricular fold vibrate in the integer multiple (usually double or triple) period of the vocal fold. The constriction of the ventricular folds at rest is strong for both cases. However, the constriction in the case of kargyraa is relatively loose. Therefore, by changing the area between the ventricular folds at rest, the vibratory patterns of normal, drone, and kargyraa phonations can be simulated by using the 2x2-mass model [10].


2.5. Piriform fossa

The bilateral piriform fossa is assumed to be symmetric and, thereby, implemented as one cavity. From the result of the preliminary experiment, the acoustic characteristics are not significantly different between one-cavity and two-cavity cases.

The piriform fossa is implemented as follows. First, according to the MRI data in [1], we assume each piriform fossa to be a cone whose depth is 2 cm and volume is 1.5 cm squared for each side. We represent the piriform fossa by five cylindrical sections

of the cylinder portion above the arytenoid apex plane according to [1], and use the value 0.75 for the end correction coefficient. Finally, if these adjustments require us to extend the number of sections, we add necessary sections such that


2.6. Vocal tract

The vocal tract is represented as a transmission line of n cylindrical hard-walled sections An, Ln) with cross-sectional area An and length ln. We assume that (A1, l1) is the region of the laryngeal ventricle with l1 = 0.4 cm and that (A2, l2), (A3, l3) are the spaces between the ventricular folds with l2, l3 = 0.3cm cm. The ventricular folds are able to vibrate. We set n = 43. Hence, the length of vocal tract is         17.2 cm. For k = 3,…,43 each section has length lk =0.4 cm and variable Ak.


3. Acoustic Measurement Using the Synthesis Model

We set the length of each part of the vocal tract as shown in Fig. 3. The length of the larynx tube is 2.4 cm, which includes the ventricular folds sections with the length of 8cm and laryngeal ventricle section with the length of 4cm. We also set the length of hypopharynx to 2cm. We set Pl = 5 cm H2O in default and attached the piriform fossa as shown in Fig. 2. The default areas were set to 1.5 cm squared for the laryngeal ventricles, 0.5 cm squared for the ventricular folds and remaining larynx tube section, and 3.14 cm squared for the hypopharynx and the upper vocal tract.




3.1. Effect of the larynx tube

We changed the areas of the larynx tube A2,…, A6 excluding the laryngeal ventricle section. The spectral envelopes of the synthesized sounds are shown in Fig. 4 (LPC analysis, p=24). With decreasing larynx tube area, F2, F3, F4 move lower. If the area of larynx tube is greater than 0.05 cm squared, the ventricular folds do not contact. In addition, F5 moves slightly lower and gradually disappear and the power in the range greater than 4000 Hz decreases.








3.2. Effect of the hypopharynx

Fig. 5 shows the spectral envelopes of the synthesized sounds when we changed the hypopharynx area. They are shown in F3 and F5 move close to F4. In some range,  A=3 or 6 cm squared in Fig. 5 , the formant cluster of F3, F4, F5 is observed around 3000 Hz.



3.3. Effect of the laryngeal ventricle

The results are shown in Fig. 6, when the laryngeal ventricle area was changed. As the area increases, F3, F5 move lower and are pushed close to F2 and F2 is sharpened. The effect of a zero in the range from 4000 to 5000 Hz also becomes larger.









3.4. Effect of the piriform fossa

The spectral envelope of synthesized sounds is shown in Fig. 7 for changes in piriform fossa volume. As the volume is increased, the effect of a zero around 4500 Hz becomes larger. As pointed out in [15], increasing the volume of the piriform fossa repels the formants, i.e. F1, F2, F3 and F4 are pushed lower and so is F4.


4. Singing Voice Synthesis

Based on the results of the acoustic measurements of the effects of varying the shape of the larynx tube, hypopharynx, laryngeal ventricle, and piriform fossa, we chose the nominal settings of the parameters for normal phonation, operatic singing, and throat singing (Table 1). In the case of operatic singing, we assume that the lowest four sections in the hypopharynx have constant areas A7 = A8 = 5.0 cm squared and A9 = A10 = 5.0 cm squared to maintain the large volume ratio of the hypopharynx to the larynx tube. In throat singing, by controlling the adduction parameter Q’, we synthesized both drone and kargyraa phontaions. We set the initial value of the A2, A3, to 0.04 cm squared, Q’ = 1 for drone, and Q’ = 0.55. No significant differences were observed between the spectral envelopes of the two phonations. In Fig. 8, the spectral envelope of the drone voice is shown.

The spectral envelopes are show in Fig. 8. The synthesized operatic singing voice have the singer’s formant around 300 Hz. In the synthesized throat singing voice, F2 is sharp and the power in the range from 4000 to 6000 Hz is relatively low. The sharpness of the F2 is the effect of the laryngeal ventricle resonance and contributes to the generation of the whistle-like tone.



5. Conclusions

We studied the acoustic effect of the shape of the larynx tube and hypopharynx. The acoustic characteristics of the synthesized singing voices reveal good accordance with known results for operatic and throat singing. Our results show that the dimensions of the laryngeal ventricle, larynx tube, hypopharynx, and piriform fossa play an important role in determining voice quality in singing.

For operatic singing, relatively narrow larynx tube and wide hypopharynx effect to create singing formant as reported in previous studies [13, 15]. The piriform fossa also contributes to the cluster of F3, F4. For throat singing, strong constriction of the ventricular folds sharpens F2 and produces the special laryngeal voices by vibration of the ventricular folds as well.

Most previous studies assumed that the larynx tube has uniform area. However, our results reveal that the laryngeal ventricle volume and the ventricular fold constriction also affect the acoustic characteristics of the singing voices.

In this study, we did not consider the mechanical interaction of each space in the lower vocal tract. However, the physiological mechanism determining the shape of the lower vocal tract is very important for finding appropriate parameters to synthesize various voice qualities. In particular, layngeal height is very important in determining the configuration of the lower vocal tract. The activation of the thyropharyngeal muscle, cricopharyngeal muscle, and paratopharyngeal muscle are also important for regulating the shape of the hypopharynx and the intrinsic laryngeal muscles are important for constricting the larynx tube [6]. Further investigations by using EMG and MRI are needed in order to understand their physiology and, thereby, control the parameters of the synthesis model.



We would like to thank Seiji Adachi, Kiyoshi Honda, Emi Z. Murano, and Sotaro Sekimoto for their helpful discussions.


6. References

[1] J. Dang and K. Honda. Acoustic characteristics of the piriform fossa in models and humans. J. Acoust. Soc. Am., 101(1):456–465, 1997.

[2] J. L. Flanagan, K. Ishizaka, and K. L. Shipley. Synthesis of speech from a dynamical model of the vocal cords and vocal tract. Bell Systems Tech. J., 54:485–506, 1975.

[3] L. Fuks, B. Hammarberg, and J. Sundberg. A self-sustained vocal ventricular phonation mode: acoustical, aerodynamic and glottographic evidences. KTH TMH-QPSR, 3/1998:49–59, 1998.

[4] K. Ishizaka and J. L. Flanagan. Synthesis of voiced soudns from a two-mass model of the vocal cords. Bell Systems Tech. J., 51(6):1233–1268, 1972.

[5] K. Ishizaka, M. Matsudaira, and T. Kaneko. Input acousticimpedance measurement of the subglottal system. J. Acoust. Soc. Am., 60(1):190–197, 1976.

[6] M. Kimura, K.-I. Sakakibara, H. Imagawa, R. Chan, S. Niimi, and N. Tayama. Histological investigation of the supra-glottal structures in human for understanding abnormal phonation. J. Acoust. Soc. Am., 112:2446, 2002.

[7] N. Kobayashi, Y. Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics of traditional singing in Japan. Tech. Rep. IEICE, SP89-147:39–45, 1990.

[8] T. C. Levin and M. E. Edgerton. The throat singers of tuva. Scientific America, Sep-1999:80–87, 1999.

[9] V. T. Maslov. Functional peculiarities of the larynx during the vocal formation in Tuva two-voice singing. Vestn. Otorinolaringol., Mar.– Arp.:58–61, 1979. in Russian.

[10] K.-I. Sakakibara, H. Imagawa, S. Niimi, and N. Osaka. Synthesis of the laryngeal source of throat singing using a 2_2-mass model. In Proc. ICMC 2002, pages 5–8, 2002.

[11] K.-I. Sakakibara, T. Konishi, H. Imagawa, E. Z. Murano, K. Kondo, M. Kumada, and S. Niimi. Observation of the laryngeal movements for throat singing — vibration of two pairs of the folds in human larynx. Acoust. Soc. Am. World Wide Press Room, 144th meeting of the ASA, 2002.

[12] K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In Proc. ICMC 2001, pages 135–138. ICMA, 2001.

[13] J. Sundberg. The science of the singing voice. Nothern Illinois Univ. Pr., 1989.

[14] J. Sundberg, P. Gramming, and J. Lovetri. Comparisons of pharynx, source, formant and pressure characteristics in Operatic and musical theatre singing. J. Voice, 7(4):301–310, 1993.

[15] I. R. Titze and B. H. Story. Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am., 101(4):2234–2243,1997.


Return to Mongolian Khoomii Main Page