The Laryngeal Flow model for Pressed-Type Singing Voices

Ken-Ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi, Naotoshi Osaka 2006

 

Abstract

Asian traditional pressed-type singing voices are different from the European traditional singing voice in their timbre and voice production mechanism. In throat singing, the ventricular folds and true vocal folds vibrate, resulting in the generation of the special laryngeal voice. On the other hand, in some other pressed-type singing voices, such as Japanese Min-yoh, the ventricular folds only approximate but do not vibrate.

We propose a new laryngeal flow model incorporating the effect of the ventricular fold vibration and laryngeal ventricle resonance. The model is a combination of the known glottal airflow model (R-model), the laryngeal ventricle resonance (Helmholtz resonator), and the modulation of ventricular fold vibration. We will also demonstrate the relation between model parameters and voice quality. The results show that the proposed model is effective for synthesizing the pressed-type singing voices.

 

1. Introduction

Non-interactive parametric glottal models assume that there are no interactions between the glottal source and vocal tract [2]. The glottal source is described by using mathematical equations. Such models are very effective in speech synthesis and coding and therefore have been used in many studies. The R-model [8] and LF-model [3] have become reference models for this type of model.

All of these models assume that the laryngeal voice source is determined by vocal fold vibratory patterns and intend to control voice quality by changing vocal fold vibratory parameters, such as the open quotient (OQ), speed quotient (SQ), closing quotient (CQ), and amplitude quotient (AQ) [1]. However, in throat singing, the vibration of the ventricular folds (VTFs) (also referred to as the false vocal fold) and strong constriction of the supraglottic structure are observed [11], and in some Asian traditional pressed-type singing, such as Japanese Min-yoh, the constriction of the supraglottic structure is also observed, though the VTFs do not vibrate [5]. Therefore, for synthesis of various styles of singing voices, besides the vocal fold vibration, the effects of the VTF vibration and resonance of the laryngeal ventricle must be considered.

In this paper, we propose a new laryngeal model based on glottal flow, laryngeal ventricle resonance, and the modulation of the VTF vibration. As the laryngeal source for the source-filter synthesis, the proposed model is able to control various timbres of singing voices.

 

2. The Laryngeal Flowmodel With Ventricular-Fold Vibratory Modulation

2.1. VTF-modulation model

VTF vibration is observed in various types of phonation. In throat singing, both drone and kargyraa voice phonations are always accompanied by VTF vibrations, as well as vocal fold (VF) vibrations. In the drone voice, the ventricular folds vibrate in the same period as the VFs, and in the kargyraa voice, the VTF vibrate in an integer multiple (usually double or triple) period of the VFs [4, 6, 11]. The results of a simulation using a 2x2-mass model suggest the possible vibratory patterns of the VFs and VTFs [9, 11].

Here, we use “laryngeal flow (source)” to mean the airflow through the VTF slit, and “glottal airflow (source)” to mean the airflow through the slit of the VFs. The laryngeal flows of drone and kargyraa for different two singers are shown in Fig. 1. These flows were obtained from recorded sounds using an inverse-filter analysis. We marked five poles on spectrum in the range from 0 to 5 kHz, constructed the inverse-filter, and manually adjusted it to make the result smooth. By combining the results of high-speed images, EGG waveforms, and these inverse-filtered laryngeal sources, we concluded that, in throat singing, the VTF vibration is indispensable for the generation of the laryngeal flow. Therefore, modelling the laryngeal flow in throat singing requires a new laryngeal model that includes the effect of VTF vibration.

The VTF-modulation model ũ (t) is simply defined as follows:

 

The block-diagram of the model is depicted as shown in Fig. 2. In this paper, we choose a simple R-model [8] for the glottal flow. The R-model is described as follows:

 

 

 

 

 

where α is amplitude, Tp opening time, Tn closing time, and To period. All of these variables are in R>0. The open quotient (OQ) is written as (Tp + Tn) / T0

The vibratory patterns of the VTFs were observed using the

high-speed images and seem to be not exactly sine-shape [7, 10,

11]. However, here we define the VTF-modulation function M (t) by multiplication by constant M of the false glottal area function A’g. A’g. We also define as a sine function:

 

 

 

where α’ represents the amplitude of the VTF vibration, Ag’0 the area between the VTFs at rest, ώ the frequency of the VTF vibration, and θ’ the phase difference of the VTF vibration from VF vibration. All of these are in R>0.  Physiological observations and the simulation using 2x2-mass model suggest that the periods of the VF and VTF vibration satisfy 2π/ώ = nT0.where n ε Z>0.

2.2. VTF-modulation and LVT-resonance model

The laryngeal ventricle is the space between the VFs and VTFs. When the VTFs are strongly constricted, it seems the effect of this small space on the laryngeal voice can not be ignored. The physical model simulation suggests that some acoustic effects occur around 2000 Hz [9, 11]. The inverse-filtered laryngeal voices of throat singing have some ripples (Fig. 1), which almost agree with the physical model simulation results, [7], hence, some appropriate model with laryngeal ventricle resonance is required. Fig. 3 shows spectra of the drone voices of two different singers.

 

 

 

 

 

 

A block diagram of our proposed model (VTF-modulation and LVT-resonance model) is shown in Fig. 4.

The model was obtained as follows: The glottal airflow is convoluted with the time-variant laryngeal ventricle resonator depending on the VTF vibration, and modulated by the vibration of the VTFs.

 

 

We denote the resonator by the laryngeal ventricle by h [t] (z). Then, the laryngeal voice with the laryngeal ventricle and VTF modulation is described as:

 

 We realize h [t] (z) as a time-varying one-pole filter. We calculate the resonance frequency of the laryngeal ventricle, i.e. the frequency of the pole of h [t], by means of a Helmholtz resonator. Let Fv (t) be the resonance frequency, d’ be thickness of the VTF, and Vv be volume of the laryngeal ventricle. Then,

 

 

 

Where c is the sound velocity, 3.53 x 10 cubed cm/s. To permit control flexibility, we define the bandwidth of the resonance by the multiple of variable K, which changes depending on phonation types, and the bandwidth as a Helmholtz resonator.

 

 

The resistance Rv (t), inductance Lv (t), conductance G, and capacitance C satisfy the following equations.

 

 

 

 

 

 

Where ω := 2π/T0 is the frequency of the VF vibration, and dv the thickness of the laryngeal ventricle. The constants are set as follows: the density of air p = 1.14 x 10­³ g/cm³ the viscosity μ = 1.86 x 10-4 dyn. s/cm² ;the adiabatic gas constant ŋ = 1.4; and the specific heat ξ = 0.24 cal/gm . degree.

 

3. Acoustical Characteristics

3.1. VTF-modulation model

We study the effect of the phase difference between the VF and VTF vibrations. In the equation

 

we fix A’g0 = 0.5 max u (t), α’ = 0.35 max u (t). For u (t), we also set T0 = 8 ms, Tp/T0 = 0.42, Tn/T0 = 0.18, and hence OQ = 0.6.  As these settings, the spectral tilt of u (t) is close to 12 dB. We also set ω’ = ω, i.e. study the laryngeal flow, such as the drone voice of throat singing. The laryngeal flows of various θ’ are as shown in Fig. 5.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The EGG waveforms and high-speed images of the same subjects in Fig. 1 suggested that the phase delay of the VTF vibration to the VFs should be around π/4, i.e. θ’ = - π/4 [11]. This value is also supported by its frequent appearance in physical model simulation [9]. In Fig. 5, the laryngeal flow for θ’ s shows the similar characteristics of the drone in Fig. 1 and the opening duration is relatively less than the closing duration.

 

Fig. 6 show the spectral envelops of the laryngeal flows for different θ’. Among θ’ s in Fig. 6, the spectral tilt is the largest when θ’ = - π/4  (-16 dB/octave), smallest when θ’ = - π/2 (- 12dB/octave).

 

3.2. VTF-modulation and LVT-resonance model

We set A’g0 = 0.10 cm and α’ = 0.05 cm. We normalize u(t) by multiplying some real positive value and assume u(t) as the glottal area function. We set the maximal glottal area max u(t) to 0.2 cm². We set the thickness of the VTF d’ to 1.0 cm, the cross sectional area of the laryngeal ventricle to 1.5 cm², and the depth of the laryngeal ventricle to 0.5 cm, K = 20 in Eq. (7). We used these values for calculation of the H[t](z). The other values are the same as above. The laryngeal flows with LVT resonance of various θ’ s are as shown in Fig. 7.

In all cases, ripples are observed after the closure of the glottis. Fig. 8 shows spectra of two flows. The effect of the VTF resonance is observed around 2000 Hz. This feature is observed in all the synthesized sources.

 

 

 

 

 

 

 

 

 

 

 



3.3. False glottal area at rest

When A’g0 is decreased, Eq. (6) implies that the resonance frequency is pushed higher.

The spectra in Fig. 9 shows the spectra for different A’g0. Other conditions are the same as above.

 

 

 

 

 

 

 

 

 

 

 

 

 

3.4. Modulation amplitude

We synthesized laryngeal flows by changing the amplitude of VTF vibrations α’. No significant trends are observed in the behaviours of the synthesized flows.

 

 

 

 

 

 

 

 

 

 

3.5. Laryngeal source for kargyraa

If ώ = 2ω, then u’(t) has a double-period of u(t) and shows behaviour similar to kargyraa phonation. From the characteristics of u(t), in the middle of each period, the laryngeal flow reaches to 0. However, the inverse-filtered karygraa voice maintains flow in each period. Uncompleted closure of the VFs is also observed in the physical model simulation [9]. In order to obtain the similar laryngeal flow shape, the second u(t) flow must start before Tn + Tp or u(t) needs sufficiently large OQ.

 

 

 

 

 

 

 

4. Conclusions

A new laryngeal flow model was proposed. We studied the acoustic characteristics of the model by changing its parameters. To obtain the laryngeal voice shape of the drone voice, VTF modulation is indispensable. In addition, to obtain ripples after the closure of the vocal folds, laryngeal ventricle resonance is effective. These results show the proposed model is effective for synthesizing pressed-type singing voices, such as throat singing. Parameter fitting in terms of analysis-by-synthesis and perceptual evaluation will be addressed as future works. In addition, an effective inverse filtering method in cases that the source has poles and the filter has zeros must be studies.

 

Acknowledgments

We thank Seiji Adachi, Parham Mokhtari, Yoshinao Shiraki, Niro Tayama, and Masahiko Todoriki for their helpful discussions.

 

5. References

[1] P. Alku, T. B¨ackstr¨om, and E.Vilkman. Normalized amplitude quatient for parametrization of the glottal flow. J. Acoust. Soc. Am., 112(2):701–710, 2002.

[2] K. E. Cummings and M. A. Clements. Glottal models for digital speech processing: A historical survey and new results. Digital Signa Processing, 5:21–42, 1995.

[3] G. Fant, J. Liljencrants, and Q.-A. Lin. A four-parameter model of glottal flow. KTH STL QPSR, pages 1–14, 1985.

[4] L. Fuks, B. Hammarberg, and J. Sundberg. A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences. KTH TMH-QPSR, 3/1998:49–59, 1998.

[5] N. Kobayashi, Y. Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics of traditional singing in Japan. Tech. Rep. IEICE, SP89-147:39–45, 1990.

[6] T. C. Levin and M. E. Edgerton. The throat singers of tuva. Scientific America, Sep-1999:80–87, 1999.

[7] P.- A° . Lindestad, M. Sodersten, B. Merker, and S. Granqvist. Voice source characteristics in mongolian ”throat singing”studied with high-speed imaging technique, acoustic spectra, and inverse filtering. J. Voice, 15(1):78–85, 2001.

[8] A. E. Rosenberg. Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am., 49(2):583–590, 1970.

[9] K.-I. Sakakibara, H. Imagawa, S. Niimi, and N. Osaka. Synthesis of the laryngeal source of throat singing using a 2_2- mass model. In Proc. ICMC 2002, pages 5–8, 2002.

[10] K.-I. Sakakibara, T. Konishi, H. Imagawa, E. Z. Murano, K. Kondo, M. Kumada, and S. Niimi. Observation of the laryngeal movements for throat singing — vibration of two pairs of the folds in human larynx. Acoust. Soc. Am. World Wide Press Room, 144th meeting of the ASA, 2002. http://www.acoustics.org/press/.

[11] K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In Proc. ICMC 2001, pages 135–138. ICMA, 2001.

 

Return to Mongolian Khoomii Main Page