Original Research and Acoustical Analysis in connection 
with the Xöömij Style of Biphonic Singing by Tran Quang Hai Denis Guillou 

Tran Quang Hai , Centre National de la Recherche Scientitique, Paris 1980 Denis GUILLOU, Conservatoire. National des Arts et Métiers, Paris 
This is a paper from the old website apologies for and errors
return to various khöömii papers main page
The present article is limited in its scope to our own original research and to acoustical analysis of biphonic singing, this is preceded by a summary of the various terms proposed by different researchers. The first half the article concerning xöömij technique was written by Tran Quang Hai. Guillou has written the second half concerning acoustical analysis.

Until the present time it has not been possible to confirm that the centre, of biphonic singing within Turco-Mongol culture is in fact Mongolia. Biphonic singing is also employed by neighbouring peoples such as the Tuvins (Touvins), Oirats, Khakass, Gorno-Altais and Baschkirs; it is called kai by the Altais, uzliau by the Baschkirs, and the Tuvins possess four different styles called, sygyt, borbannadyr, ezengileer and kargyraa. A considerable amount of research is at present being carried out throughout the world into this vocal phenomenon, particularly as it is practised in Mongolia.

Research can be carried out in various ways: by means of observation of native performers after one or more visits to the country concerned, or by means of practical instrumental or vocal studies aimed at a better understanding of the musical structure employed by the population being studied. My own research does not belong to either of these two categories since I have never been to Mongolia and I have never learned the xöömij style of biphonic singing from a Mongolian teacher. What 1 shall describe in this article is the result of my own experience which will enable anybody to produce two simultaneous sounds similar to Mongolian biphonic singing.


Simultaneous two-part singing by a single person is known in the Mongol language as xöömij (literally “pharynx”). The manner in which the Mongol word is transcribed is by no means uniform; homi, ho-mi, (Vargyas 1968), khomi, khöömii, (Bosson 1964: 11), xomej, chöömej, (Aksenov 1964) chöömij, (Vietze 1969:15-16. Walcott 1974) xöömij, (Hamayon 1973). French researchers have used other terms to describe this particular vocal technique such as chant biphonique or diphonique (Leipp 1971, Tran Quang Hai 1974). voix guimbarde. voix dédoublee (Heitfer 1973, Hamayon 1973), and chant diphonique solo (Marcel-Dubois 1979). Several terms exist in English such as split-tone singing, throat singing and overtone singing, and in German sweistimmigen Sologesang.

For convenience 1 have employed in this article the term biphonic singing to describe a style of singing realized by a single person producing simultaneously a continuous drone and another sound at a higher pitch issuing from a series of partials or harmonies resembling the sound of the flute.

Origin of My Research.

In 1971, the date of my first contact with Mongolian music in the form of recordings made in Mongolia between 1967 and 1970 by Mrs. Roberte Hamayon, researcher at the Centre National de la Recherche Scientifique and especially after listening to a tape on which were recorded three pieces in the biphonic singing style, I was struck by the extraordinary and unique nature of this vocal technique.

For several months I carried out bibliographical research into articles concerned with this style of singing with the aim of obtaining information on the practice of biphonic singing, but received little satisfaction. Explanations of a merely theoretical and sometimes ambiguous nature did nothing so much as to create and increase the confusion with which my research was surrounded. In spite of my complete ignorance of the training methods for biphonic singing practised by the Mongols, the Tuvins and other peoples, I was not in the least discouraged by the negative results at the beginning of my studies after even several months of effort.

Working Conditions.

The xöömij refers to the simultaneous production of two sounds, one similar to the fundamental produced on the Jew’s harp (produced at the back of the throat), and the other resulting from a modification of the buccal cavity without moving the lips which remain only slightly open; positioning the lips as for a rear vowel results in a low sound, whereas front vowel positioning produces a high sound (Hamayon 1973), a technique similar to that used by the Tuvins (Aksenov 1964). The cheeks are tightened to such a degree that the singer breaks out into a sweat. It is the position of the tongue which determines the melody. Anybody who possesses this technique is able to copy any tune (Hamayon 1973).

I worked entirely alone groping my way through the dark for two years, listening frequently to the recordings made by Hamayon stored in the sound archives of the ethnomusicology department of the Musee de I’Homme. My efforts were however to no avail. Despite my efforts and knowledge of Jew’s harp technique, the initial work was both difficult and discouraging. 1 also tried to whistle while producing a low sound as a drone. However, checking on a sonograph showed that this was not similar to the xöömij technique. At the end of 1972 I got to the stage that I was able to produce a very weak harmonic tone which when recorded on tape, showed that 1 was still a long way from my goal.Then, one day in November 1973, in order to calm my nerves in the appalling traffic congestion of Paris, I happened to make my vocal chords vibrate in the pharynx with my mouth half open while reciting the alphabet. When I arrived at the letter L and the tip of my tongue was about to touch the top of my mouth, I suddenly heard a pure harmonic tone, clear and powerful. I repeated the operation several times and each time I obtained the same result. I then tried to modify the position of the tongue in relation to the foot of the mouth while maintaining the low fundamental. A series of partials resonated in disorder inside my ears. At the beginning I obtained the harmonics of a perfect chord. Slowly but surely, after a week of intensive work, by changing the fundamental tone upwards or downwards, 1 had managed to discover all by myself a vocal jaw’s harp technique or biphonic singing style which appeared to be similar to that used by the Mongols and the Tuvins.

Basic Techniques.

After two months of research and numerous experiments of all kinds I was able to establish some of the basic rules for the realization of what I call biphonic singing.

1) Half open the mouth.
2) Emit a natural sound on the letter A without forcing the voice and remaining in the middle part of the vocal range (between F and A below middle C for men, and between F and A above middle C for women).
3) Intensify the vocal production while vibrating the vocal cords.
4) Force out the breath and hold it for as long as possible.
5) Produce the letter L. Maintain the position with the tip of the tongue touching the roof of the mouth.
6) Intensify the tonal volume while trying to keep the tongue stuck firmly against the palate in order to divide the mouth into two cavities, one at the back and one at the front, so that the air column increases in volume through the mouth and the nose.
7) Slowly pronounce the sounds represented by the phonetic signs “i” anti “u” while varying the position of the lips.
8) Modify the buccal cavity by changing the position of the tongue inside the mouth without interrupting or changing the height of the fundamental already amplified by the vibration of the vocal cords.
9) In this way it is possible to obtain both the drone arid the partials or harmonics either in ascending or descending order according to the desire of the singer. 

For beginners the harmonics of the perfect chord (C. E. G. C) are easy to obtain. However, a considerable amount of hard work is necessary especially to obtain a pentatonic anhemitonic scale. Every person has his favourite note which permits him to produce a large range of partials. This favourite fundamental tone varies according to the tonal quality of the singer’s voice and his windpipe. It often happens that two people using the same fundamental tone do riot necessarily obtain the same series of partials.

Regular practice and the application of the basic techniques which 1 have just described above permitted me to acquire a range of between an eleventh and a thirteenth according to the choice of the drone. Biphonic singing can also be practised by women and children, and several successful experiments have been carried out in this connection.
Other experiments which I have been carrying out recently indicate that it is possible to obtain two simultaneous sounds in two other ways. In the first method, the tongue may be either flat or slightly curved without actually at any stage touching the root of the mouth, and only the mouth and the lips move. Through such variation of the buccal cavity, this time divided into a single cavity it is possible to hear the partials faintly.

In the second method the basic technique described above is used. However instead of keeping the mouth half open it is kept almost completely shut with the lips pulled back and very tight. To make the partials audible, the position of the lips is varied at the same time as that of the tongue. The partials are very clear and distinctive, but the technique is rather exhausting and it is not possible to sing for a long time using it.

In the northeast of Mongolia in the borderland area between Mongolia and Siberia live the Tuvins, a people of Turkish origin numbering one hundred thousand. The Tuvins possess not only the biphonic singing style used by the Mongols, but four other different styles within this genre, called sygyt, ezengileer. kargyraa and borbannadyr. Table 1 will facilitate comparison between these four styles.

Biphonic singing is also practised by a number of ethnic groups in the republics of the Soviet Union bordering on Mongolia.

The late John Levy made a recording in Rajasthan in 1967 on which can be heard an example of biphonic singing similar to that practised by the Mongols and the Tuvins (1). The virtuoso performer in the recording imitates the double flute called the satara (an instrument producing simultaneously a drone and a melody) or the Jew’s harp with his voice. However, this may well be an exceptional example in that no mention is ever made of biphonic singing techniques in the musical traditions of Rajasthan or elsewhere in India.

Tibetan monks, particularly those in the monasteries of Gyume and Gyuto(2), make use of a technique using two simultaneous voices, although this technique is far less developed than that used by the Mongols and the Tuvins. The low register of the drone makes it impossible to produce harmonics as clear and resonant as those emitted by the Mongols and the Tuvins, and furthermore the production of harmonics is not the aim of Tibetan Buddhist chant.

In Western contemporary music groups of singers have also succeeded in emitting two voices at the same time and vocal pieces have been created in the context of avant-garde music (3) and in recent years of electronic music (4).

An X-ray film was mode for the first time in 1974 at the Centre Medico-chirurgical of the Porte do Choisy in Paris at the request of Professor S. Borel-Maisonny, speech therapist and of Professor Emile Leipp, acoustician. This film which was made with the cooperation of the present author made it possible to examine closely the internal functioning and placement of the tongue during biphonic singing, and was thus of great interest. Thanks to this film the author has improved his biphonic singing technique as a result of which he has been able to decrease the volume of the drone and increase that of the harmonics.

Acoustical Analysis-introduction.

The present study is concerned with biphonic singing its understanding and interpretation, and does not constitute a complete and definitive piece of research. In fact the discovery of certain phenomena permits us only to imagine what might be the reality, this being particularly true in relation to the mechanism involved in the production of biphonic singing. Thus it will be necessary to carry out further research in the following areas: psycho -acoustics and particularly the perception of pitch and phonatory acoustics.

Biphonic singing differs from so-called natural singing on account of its sonority as well as of course the vocal technique involved. As its name indicates it consists of two sounds. On the basis of simple aural observation, it is possible to distinguish a first sound whose pitch is constant and which we shall call the drone and a second sound which takes the form of a melody which the singer can produce at will. It is basically possible for anybody to produce this biphonic sonority but to make the second voice dominate and to trace a melody with it depends upon the talent of the artist.

Firstly, we shall examine the concept of pitch perception in terms of acoustics and psycho-acoustics. Secondly we shall try to define biphonic singing, to differentiate it from other vocal techniques and to specify its scope. It will then be worthwhile to formulate several hypotheses concerning the mechanism whereby this style of singing is produced and finally to present a few examples of such a technique.

Pitch Perception.

It is first of all necessary to comprehend exactly what is meant by the pitch of sounds or tonality. This concept presents a considerable amount of ambiguity and does not correspond to the simple principle of the measurement of the frequencies produced. The pitch of sounds is related more to psycho-acoustics than to physics.
Our own proposals are based partially on the recent discoveries of certain researchers, and partially on observations which we have made ourselves with the help of a sonagraph machine.
Fig. 1 Sonagram representation of three types of sound
a) Harmonic spectrum: the harmonics are whole multiples of the fundamental.
b) Partials spectrum: the harmonics are no longer whole multiples of the fundamental.
c) Formant spectrum: two harmonics are intense and constitute a formant in the harmonic spectrum.
The sonagraph makes it possible for us to obtain the image of the sound which we wish to study. On a single piece of paper is given information concerning time and frequency, and, in accordance with the thickness of the line traced information concerning intensity.

The classical manuals on acoustics tell us that the pitch of harmonic sounds, that is sounds with, for example a fundamental with the frequency F and a series of harmonic, F1, F2. F3…. multiples of F. is determined by the frequency of the first fundamental F. This is not entirely correct in that it is possible to suppress electronically this fundamental without thereby changing the subjective pitch of the actually perceived sound. If this theory were correct an electro-acoustic chain not reproducing the lowest sound would change the pitch of the sounds. This is evidently not the case since the tonal quality changes but not the pitch. Certain researchers have proposed a theory which would appear to be more coherent: the pitch of sounds is determined by the separation of the harmonic lines or the difference in frequency between two harmonic lines. What is the pitch of the sounds, in this case for sonic spectra with “partials” (harmonics are not complete multiples of the fundamental)? In this case, the individual perceives an average of the separation of the lines in the zone which interests him. This in fact corresponds with the differences in perception which may be observed from one individual to the other (Fig. 1).
Formant spectrum: the accentuation in intensity of a group of harmonics constitutes a formant and is thus a zone of frequencies in which there is a large amount of energy.Taking this formant into consideration a second concept of the perception of pitch comes to light. It has in effect been established that the position of the formant in the sonic spectrum results in the perception of a new pitch. In this case it is no longer a matter of the separation of the harmonic lines in the formant zone but of the position of the formant in the spectrum. This theory should be qualified however, since conditions also have to be considered.

Experiment: Tran Quang Hai sang two C’s an octave apart making his voice carry as if he were addressing a large audience. We observed, using a sonagram, that the maximum energy was situated in the zone perceptible by the human car (3, 4 KHz) and that the formant was situated between 2 and 4 KHz. We then recorded two C’s an octave apart in the same tonality, but this time he used his voice as it addressing a small audience, and we observed the disappearance of this formant (Fig. 2-a. 2-b).
In this case the disappearance of the formant does not change the pitch of the sounds. We then rapidly observed that the perception of pitch through the position of the formant was only possible it the formant was very acute for knowing that the sonic energy was only divided on two or three harmonics. Thus if the energy density of the formant is large and the formant is narrow the formant gives information concerning the pitch as well as the overall tonality of the sonic item. Through this expedient we arrive at the biphonic vocal technique.

Figure 2
Comparison between Biphonic Technique and Classical Technique.

It may be said that biphonic singing consists as its name indicates, of the production of two sounds, one a drone which is low and constant, and the other at a higher pitch consisting of a formant which displaces itself in the spectrum in order to produce a certain melody. The concept of pitch given by the second voice is moreover somewhat ambiguous. The Western ear may need a certain amount of training before becoming accustomed to the sound quality.

Evidence concerning the drone is relatively easy to obtain thanks to the sonagram: it can be seen clearly and is also very clear on an auditory level. The device in Fig. 3 also makes it possible to see a pure amplitude frequency of a constant nature.
Figure 3
Fig. 4 Normal singing and biphonic singing
a) Sonagraph representation of normal singing. An octave passage is equivalent to a doubling of the gap between the harmonic lines and to a drone of double frequency, (The first bar represents the base line of the sonagram, and the drone is represented by the second bar.)
b) Sonagraph representation of biphonic singing. An octave passage is represented by a displacement of the formant. The harmonic lines of the formant are displaced in a zone in which the frequency is doubled.
After having examined the fundamental tone we compared two spectra, one of biphonic singing and the other of the so-called classical singing style, the two being produced by the same singer. The sonagrams of these two types of singing are shown in Fig. 4. Classical singing is characterised by a doubling of the separation of the harmonic lines when an octave is exceeded

(a). Biphonic singing is characterized on the other hand by the fact that the separation of the lines remains constant (this was foreseeable since the drone is constant), and that the formant is displaced by an octave

(b). In fact it is easy to measure the distance between the lines for each sound. In this case, the perception of the melody in biphonic singing works through the expedient of the displacement of the formant in the sonic spectrum.

It should be stressed that this is only really possible if the formant is high, and this is obviously so in the case of biphonic singing. The sonic energy is divided principally between the drone and the second voice consisting of two or at the most three harmonics.It has sometimes been stated that it is possible to produce a third voice. Using the sonagram we have in actual fact established that this third voice exists (see sonograms of Tuvin techniques), but it is impossible to state that it can be controlled. In our opinion this additional voice results more from the personality of the performer than from any particular technique.

As a result of our work we have been able to establish a parallel between biphonic singing and the technique of the Jew’s harp. As in the case of biphonic singing the Jew’s harp produces several different voices, the drone, the main melody and a counter melody. We may consider this third voice as a counter melody which may be produced on a conscious level but can presumably not be controlled. As far as possibility of variation is concerned, biphonic singing is the same as normal singing except in connection with pitch range.
The time of execution is evidently a function of the thoracic cage of the singer and thus of breathing, since the intensity is related to the output of air. Possibility of variation with regard to intensity is on the other hand relatively restricted and the level of the harmonics is connected to the level of the drone. The singer has to try and retain a suitable drone and produce the harmonics as strongly as possible. We have already observed that the clearer the harmonics the more the formant is narrow and intense. We are able furthermore to observe connections between intensity, time and clarity. Possibility of variation in relation to tone quality may pass without comment, since the resulting sound is in the majority of cases formed from a drone and one or two harmonics. The most interesting question is that of pitch range.
It is generally accepted that, for a sensible tonality (in consideration of the performer and of the piece to be performed a singer may modulate or choose between harmonics 5 and 13. This is true but should be stated more precisely. The range is a function of the tonality. If the tonality is on C2, the range represents nine harmonics from the fifth to the thirteenth, this involving a range of a major thirteenth. If the tonality is raised for example to C3 the choice is made between six harmonics, numbers 3 to 8 (see Table 2), representing an interval of an seventh. The following remarks should be made in this context. Firstly, the pitch range of biphonic singing is more restricted than that of normal singing. Secondly, the singer theoretically selects the tonality which he wishes between C2 and C3. In practice however, he instinctively produces a compromise between the clarity of the second voice and the pitch range of his singing, since the choice of the tonality is also a function of the musical piece to be performed. Thus if the tonality is raised, for example to C3, the choice of harmonics is restricted but the second voice is very clear. In the case of a tonality on C2 the second voice is more indistinct while the pitch range is at a maximum.
The clarity of the sounds can be explained by the fact that in the first case, the singer is only able to select a single harmonic, whereas in the second case, he may select almost two (see Fig.5). As far as pitch range is concerned, it is known that the movement of the buccal resonators is independent of the tonality of the sounds produced by the vocal cords, or, put in another way. The singer always selects harmonics in the same zone of the spectrum whether the harmonics are broad or narrow.

It results from all this that the singer chooses the tonality instinctively in order to have the maximum range and clarity. For Tran Quang Hai, the best compromise exists between C2 and A2. He can thus obtain a range of between an octave and a thirteenth.
Mechanism for the Production of Biphonic Singing.

It is always very difficult to know what is taking place inside a machine when we are placed outside it and can only watch it in operation. This is the case with the phonatory mechanism. The following remarks are only approximate and of a schematic nature and should not be assumed to be the final word on the subject. In dealing by analogy with the phonatory system we can get an idea of the mechanisms but surely not a complete explanation. Fig. 6 is a representation of the phonatory
system which can be compared with Fig. 7, showing an excitation system producing harmonic sounds and a series of resonating systems amplifying certain parts of this spectrum.

A resonator is a cavity equipped with a neck capable of resonating in a certain range of frequencies. The excitation system, i.e., the pharynx and the vocal cords emits a harmonic spectrum consisting of the frequencies F1, F2. F3. F4 … of resonators which select certain frequencies and amplify them. The choice of these frequencies evidently depends upon the ability of the singer. This is the case when a singer projects his voice within a large hail in that he instinctively adapts his resonators in order to produce the maximum energy within the area in which the ear is sensitive.

It should be noted that the amplified frequencies are a function of the volume of the cavity, the section of the opening and the length of the neck constituting the opening:
Through this principle it is possible to see already the action of the size of the buccal cavity, of the opening of the mouth, and of the position of the lips during singing.
However, this does not tell us anything about biphonic singing. In practice we need two voices. The first, the drone, is given to us simply by virtue of the fact that its production is intense, and that in any case, it does not undergo filtering by the resonators. Its intensity, higher than that of the harmonics, permits it to survive on account of buccal and nasal diffusion. We have observed that as the nasal cavity was closed, so the drone diminished in intensity. This occurs for two reasons, firstly that a source of diffusion is closed through the nose and secondly, by closing the nose the flow of air is reduced, as is the sonic intensity produced at the level of the vocal chords.

The possession of several cavities is of prime importance. In practice, we have established that only coupling between several cavities has enabled us to have a sharp formant such as is required by biphonic singing.

For the purposes of this research we initially carried out investigations into the principle of resonators in order to determine the influence of the fundamental parameters. It was observed that the tonality of the sound rises if the mouth is opened wider. In order to investigate the formation of a sharp formant, we carried out the following experiment. Tran Quang Hai produced two kinds of biphonic singing, one with the tongue at rest. i.e., not dividing the mouth into two cavities and the other with the mouth divided into two cavities. The observation which we made is as follows (an observation which could have been foreseen on the basis of the theory of coupled resonators). In the first case the sounds were not clear: the drone could be heard distinctly but the second voice was difficult to bear. There was no clear distinction between the two voices, and, furthermore, the melody was indistinct. The corresponding sonagrams bore this out: with a single buccal cavity the energy of the formant is dispersed over three or four harmonics and so the sense of a second voice is very much on the weak side. On the other hand, when the tongue divides the mouth into two cavities, the formant reappears in a sharp and intense manner. In other words, the harmonic sounds produced by the vocal chords are filtered and amplified in a rough manner with a single buccal cavity and the biphonic effect disappears. Biphonic singing thus necessitates a network of very selective resonators which filters only the harmonics required by the singer.
Fig. 8 shows the responses in frequencies of the resonators, both simple and coupled. In the case of a tight coupling between the two cavities, these produce a single and very sharp resonance. If the coupling is loose, the formant has less intensity and the sonic energy in the spectrum is stemmed. If the cavities are transformed into a single cavity, the pointed curve becomes even rounder, and one ends up with the first example with a very blurred type of biphonic singing (tongue at rest). The conclusion can be drawn that the mouth along with the position of the tongue plays the major role, and it can be compared roughly to a pointed filter which changes its place in the spectrum with the sole aim of selecting the interesting harmonics.
We should like to express our gratitude and sincere thanks to Research Team 165 of the Centre National de la Recherche Scientifique directed by Mr. Gilbert Rouget, who allowed us access to valuable documents concerning biphonic singing stored in the sound archives of his department. Our thanks go also to Professor Claudie Marcel-Dubois, Head of the Department of Ethnomusicology at tile Musee National des Arts et Traditions Populaires, who gave us a great deal of help and encouragement. We should like also to thank Professor Emile Leipp, Dr. Michele Castellango and Professor Solange Borel- Maisonny, who made it possible for us to examine the internal functioning of biphonic singing by means of the production of a radiographic film.
 (Translated from French by Robin THOMPSON).

1. This tape is preserved in the Ethnomusicology Department of the Musee de L’Homnic. Paris. Archive number BM 78 2, 1.
2. See the record “The Music of Tibet.” recorded by Peter Crossley-Holland, Anthology Records (30133) AST 4005, New York, 1970.

3. See the record “The tail of the Tiger.” Ananda 2.

4. An example is the electronic music composition entitled “Ve nguon” (Return to the Source), composedby Nquyen Van Tuong, with Tran Quang Hai as soloist. The first performance was given in France in1975. The third movement (25 minutes) uses biphonic singing.
Figure 9 Sonograms of Xöömij
a) Xöömij from record “Vocal Music of Mongolia” Side B track 1
b) Xöömij from record “Chants Mongola et Buriats” Side B track 3
Figure 10 Sonograms of Tuvin Biphonic singing
a) Borbannadyr
b) Kargyraa