The Art and Science of Music Acoustics – From the Humble Flute to the Mighty Didgeridoo (Featuring Tarzan…)

A photograph featuring an aboriginal didgeridoo player sitting on the beach with his instrument.Fundamentals of Music Acoustics

Any signal that may be represented as an amplitude varying over time has a corresponding frequency spectrum.  This applies to concepts (and natural phenomena) that most human beings encounter daily, without giving them a second thought.  Such as visible light (and colour perception), radio/TV channels, wireless communications…  Even the regular rotation of the Earth.  Even the sound of music…

When these physical phenomena are represented in the shape of a frequency spectrum, certain physical descriptions of their internal processes become much simpler.  A lot of the time, the frequency spectrum clearly shows harmonics, visible as distinct spikes or lines at particular frequencies, that provide insight into the mechanisms that generate the entire signal.

Music acoustics, the science behind music, is in fact the branch of acoustics concerned with researching and describing the physics of music.  For example…  How can individual sounds mesh together and create music?  How do musical instruments work?  What is in a human voice, in terms of the physics of speech and singing?  How can a computer analyse a melody?  What is the clinical use of music in music therapy?


A diagram showing a harmonic series of waves, from frequency = a to frequency = 5a, from wavelength = x to wavelength = x/5.What is that Sound?

Acoustics is the general area of science that deals with the study of mechanical waves, whether in gases, liquids, and solids, including oscillation, vibration, sound, ultrasound and infrasound.

The perception of sound is a subjective experience.

Air pressure variations against the ear drum, and their subsequent physical and neurological signal processing in the human brain, are left open to interpretation at each individual level.

Most sounds that people do recognise as ‘musical’ are dominated by periodic or regular vibrations rather than non-periodic ones.  In other words, musical sounds typically have a definite pitch.

Sounds with definite pitch have harmonic frequency spectra or close to harmonic spectra.  The transmission of these pressure variations through air is via a sound wave.

Using the simplest example, the sound of a sine wave, which is considered to be the most basic model of a sound waveform, causes the air pressure to increase and decrease regularly, and is heard as a very pure tone.


Perfect Pitch

Pure tones can be produced by tuning forks or whistling.  The rate at which the air pressure oscillates is the frequency f of the tone, measured in oscillations (or cycles) per second.  The typical unit often used is called a hertz (Hz).  Frequency is the primary determinant of the perceived pitch.

Pitch is a perceptual property that allows the ordering of sounds on a frequency-related scale.  Pitches are qualitatively compared as being “higher” or “lower” in the sense associated with musical melodies.  Normally, this requires sound whose frequency is clear and stable enough to distinguish it from noise.

Pitch is a major auditory attribute of musical tones, along with the duration, the loudness, and the timbre.  Pitch can be quantified as a frequency, but it is not a purely objective physical property.  Indeed, it is known as a subjective ‘psycho-acoustical’ attribute of sound.  The study of pitch and pitch perception has been a central problem in the domain of psycho-acoustics, instrumental in forming and testing theories of sound representation, processing, and perception in the auditory system.

The perception of sound is a subjective experience.


It is worthy of noting that the frequency of musical instruments can change with altitude due to changes in air pressure.


Tarzan’s Sound Spectrum

The majority of sounds are a complex mix of vibrations.

Ever heard of Tarzan?  His jungle call had no clear pitch.  

A sound spectrum displays the different frequencies present in a sound.  Depending on where you are reading this, you will hear a variety of sounds in your surrounding environment.  Can you hear the wind outside, the rain or the rumble of traffic?  Or perhaps there is music in the background, in which case there is a mixture of high notes and low notes, and some sounds (drum beats and cymbal crashes) with no clear pitch.

The MGM studio that made the first Tarzan movies with Johnny Weissmuller, famously claimed to have enhanced the yell in post-production.  Reportedly, they added and mixed a second track of Weismuller’s voice, somewhat amplified, a hyena howl, played  backwards, a note sung by a female soprano, with the  speed varied to produce a fluttery sound, the growl of a  dog, the raspy note of a violin’s G-string being bowed, and finally… the bleat of a camel !!



Sound spectra are produced using a microphone to measure the sound pressure over a certain time interval, an analogue-digital converter to convert it to a series of numbers (corresponding to the microphone voltage) as a function of time, and a computer to perform a digital Fourier transform calculation upon these numbers.

Broadly speaking, a sound spectrum is a graphic representation of a sound sample, in terms of the amount of vibration produced at each individual frequency.  Usually, the data is presented as a graph of either power (or pressure) as a function of frequency.

The power is usually measured in decibels, while the frequency is measured in vibrations per second (or hertz, Hz), or in thousands of vibrations per second (or kilohertz, kHz).


Two graphs showing the graphical representation of Tarzan's sonogram, in terms of pressure, frequency and "loudness".
Tarzan’s Sonogram: A wave form representation of the sound, and a spectrogram of the frequencies in the yell

Check out Tarzan’s sonogram:

If the loudness changes, the size (or amplitude) of the spectral components gets larger.

If the pitch changes, the frequency of all of the components increases.

If the sound changes, but loudness and pitch remains the same, the timbre – that is the sum of all the qualities that are different in two different sounds which have the same pitch and loudness – changes.

Singing “ah” and “eeh” at the same pitch and loudness, will result in a significant difference between the two spectra.

Quantifying Sound: How Many Decibels?

The decibel (dB) is used to measure sound level, but it is also widely used in electronics, signals and communication.  The decibel is a logarithmic unit used to describe a ratio.  This ratio may involve power, sound pressure, voltage, intensity or several other variables.

The phon and the sone are additional units related to loudness.

For example, let us assume that we have two loudspeakers, the first playing a sound with power P1, and another playing a louder version of the identical sound with power P2.  Everything else being kept the same.

The difference in decibels between the two speakers is defined as

10 log (\frac {P_2} {P_1}) dB,

with a log to the base 10.

If the second loudspeaker produces twice as much power than the first speaker, the difference in dB is then

10 log (\frac {P_2} {P_1}) = 10 log (2) = 3 dB.

If the second speaker had 10 times the power of the first one, the difference in dB would be

10 log (\frac {P_2} {P_1}) = 10 log (10) = 10 dB.

If the second one had a million times the power of the first, the difference in dB would be

10 log (\frac {P_2} {P_1}) = 10 log (10^6) = 60 dB.


Pipes and Harmonics: Open and Closed Waves

Diagrams quantifying Open and Closed Waves. Source: UNSWThe flute is a nearly cylindrical instrument, open to the outside air at both ends.  The player leaves the embouchure hole open to the air, and blows across it.

The clarinet, on the other hand, is a roughly cylindrical instrument, open to the outside air at the bell, but closed by the mouthpiece, reed and the player’s mouth and lips at the other end.

The two instruments have roughly the same length.  (The bore of the clarinet is a little narrower than that of the flute, but this difference is irrelevant here.)

The left diagram shows the type of sound waves occurring in the case of a flute.  On top, the pressure curve (red) is only half a cycle of a sine wave.  Hence the longest sine wave fitting into the open pipe is twice as long as the pipe.  A flute is about 0.6 m long, so it can produce a wavelength that is about twice as long, which is about 2 L = 1.2 m.  The longest wave is also its lowest note, so let’s try calculating the frequency – that is, the speed divided by the wavelength.

The Speed of Sound

Sound travels at about

c_s = 340 m s^-1.

The sound frequency is given by

f_s = \frac {c_s}{\lambda},

where λ is the wavelength.

This gives a frequency f = \frac {c_s}{2 L} = 280 s^-1 = 280 Hz.

Given the approximations, this is close to the frequency of middle C, the lowest note on a flute.  We can also fit in waves that are equal to the length of the flute (half the fundamental wavelength, therefore twice the frequency of the fundamental), 2/3 the length of the flute (one third the fundamental wavelength, so 3 times the frequency of the fundamental), 1/2 of the length of the flute (one quarter the wavelength, so 4 times the frequency of the fundamental).

This set of frequencies is the complete harmonic series.

In the top right diagram, the blue curve is only a quarter of a cycle of a sine wave.  Hence the longest sine wave that fits into the closed pipe is four times as long as the pipe.  Therefore, a clarinet can produce a wavelength that is about four times as long as a clarinet, which is about 4 L = 2.4 m.

This gives a frequency of f = \frac {c_s}{\lambda} = \frac {c_s}{4 L} = 140 s^{-1} = 140 Hzone octave lower than the flute.  Now, the lowest note on a clarinet is either the D or the C# below middle C.  Again, given the approximations, this works out.  We can fit in a wave if the length of the pipe is 3/4 of the wavelength, i.e. if wavelength is 1/3 that of the fundamental and the frequency is 3 times that of the fundamental.  But we cannot fit in a wave with a 1/2 or 1/4 of the fundamental wavelength (twice or four times the frequency).  So the second register of the clarinet is a musical 1/12 above the first.


Try your Fingers at the Virtual FluteA Leo Cullum cartoon depicting the Pied Piper of Hamelin luring the rats out of the town, thinking to himself: "Four years at Julliard for this?" Source: New Yorker

According to the physics website of UNSW (University of New South Wales, Australia), there are approximately 40,000 different fingerings for the Boehm flute.  Results are based on theoretical calculations, using a theory that agrees well with experiment for the hundred or so acoustic impedance spectra measured at UNSW.

All flutes can vary.  Slide and cork positions vary.  Key adjustments and embouchures vary.  Temperature varies.  Already, we saw that the frequency of musical instruments varies with altitude due to simple changes in air pressure.

Theory only makes approximations.

More details are available from the UNSW extensive site on Flute Music Acoustics and Clarinet Music Acoustics… and much more.

Play The Virtual Flute

And there’s even more fun stuff here.  Prepare for the amazing…


Clarinet Robot Wins International Competition!

This clarinet-playing robot was built by a NICTA − UNSW team for the Artemis Orchestra Competition in 2008.  The contest rules require that embedded device robots, having a mass of less than 20 kg, play unmodified musical instruments.  The clarinet robot was the international competition winner.

However, the aim is not to replace human musicians.  So, why a clarinet robot?   The design and development of such a robot provides:

  • an interesting challenge to understand and implement a few of the complicated things that humans do while performing music,
  • an interesting way for the music acoustics lab of the NICTAUNSW team to see how well we understand clarinet playing.

The clarinet robot has been the subject of experimentation to analyse how the pitch, loudness, timbre and transients produced by the clarinet depend on the fingering, mouth pressure, lip force and damping, bit position, reed hardness and mouth geometry.

Look at results and sound files.


As for the Didgeridoo…

The didgeridoo (or didjeridu), or yidaki (or yiraki) in the language of the Yolngu, one of the peoples of Northern Australia, where the instrument originated, is a deceptively simple sound instrument.  A wooden tube, of about 1.2 m to 1.5 m in length, that was hollowed out by termites in the thin trunk of a eucalyptus tree, and with a ring of beeswax around the mouthpiece for sealing and player’s comfort.  Taken altogether, the didgeridoo seems a very unusual tribal music instrument.  Although it plays only one note, the didgeridoo is capable of a spectacular range of different sounds, and the rhythmic variation of these sounds is its chief musical interest.

The instrument is closed at one end by the player’s lips and face.  The difference between closed and open pipes has been explained above by comparing both using wave diagrams. However, we considered models of ideal cylindrical pipes.  Most didgeridoos are flared, and they may have complicated surface geometries.

What makes the sound of the didgeridoo so unique and varied is the powerful interaction between

– sound waves in the instrument,
– sound waves in the didgeridoo player’s vocal tract,
– motions of the player’s lips,
– air flow between the player’s lips.


A schematic diagram explaining didgeridoo playing from lungs to the player's lips and out of the instrument.


In brass instruments, the vocal tract has only a minor effect.

In the case of didgeridoo-playing, it is essential.


The didgeridoo player’s lips produce a sound wave that travels into the instrument, but it also travels back in the other direction, into the vocal tract.  In normal speech, the vocal tract is a resonator designed to assist the radiation of some frequency bands, but not others.  Its resonances are what permits the production of different speech sounds.  (See voice music acoustics for an introduction.)

For a didjeridu player, the vocal tract is working backwards: it still has resonances, but the vibration is coming from the lips, rather than from the vocal folds.  Whether in speech, singing, or didgeridoo-playing, the frequencies at which the vocal tract resonates are determined by the shape of the tract, especially by the position and shape of the tongue.

The sonogram of a didgeridoo. Computer Animated Didge Sound Design (CADSD) Source:

And that gives a range of sounds like that…

A didgeridoo-playing emoticon.


Tell us what you think...