Perception of Overtone Singing : Chen-Gia Tsai
Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.
A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.
Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation
Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.
The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.
Perception of rapid fluctuations
Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.