Professional Sound - Indepth

Creating Binaural Audio Content with an Object-Based Workflow

This column originally appeared in the Sound Advice section of the October 2020 issue of Professional Sound magazine.

By Hugo Larin

The concept of spatial or binaural audio can sound like something quite geeky and complicated, so let’s democratize the subject here and now. In traditional stereo mixing, audio engineers send audio signals to two channels, left and right, which we all know as a stereo master bus. This is called channel-based mixing. But the problem is that stereo does not reflect the reality of how we hear.

Nowadays, we are becoming increasingly familiar with the concept of object based
mixing, which simply means that each source (object) is being positioned in a virtual environment (a virtual room) accompanied by their spatial parameters. Object-based audio (OBA) represents a breakthrough in live production, with next-generation codecs enabling the mixer to represent the soundfield (the scene) as an immersive image instead of just two channels.

The second piece of this puzzle is to take this immersive sound image (all the audio objects) and render it to the desired format for playback. The audio information in a binaural rendering gives us what we need to deliver the audio for headphones, while a channel based rendering gives us what we need for delivering the audio through loudspeakers.

Object-based mixing is far from new; it’s been used in movie productions for many years. The multi-channel audio experience you hear in a cinema is usually composed of multiple audio objects that have been positioned and moved within a virtual environment by a mixing engineer. Unlike binaural, this type of multi-channel rendering is designed for a multi-speaker system using various panning techniques. The speaker arrangements in a movie theatre, or in your home entertainment system, are just various channel-based diffusion system formats. Think Atmos, Auro 3D, DTS, Dolby Vision, IMAX, and all the other common surround sound formats.

Moving to object-based mixing for live sound engineers is quite simple in its essence. Individual audio tracks that were previously balanced (panned) between two stereo channels are now being declared as objects and defined by their position. This workflow makes these mixes completely agnostic of the rendering type or the format arrangement.

A mix is now based around a sound image that can be manipulated with a real-time renderer or exported as a multichannel audio file with a standardized metadata model such as Audio Definition Model (ADM). From a portability and deliverability perspective, these exports are ideal, as they aren’t limited to a specific speaker arrangement or channel count, and can be rendered in the desired format. For engineers, moving to object-based mixing is truly a gamechanger, opening the door to any format or stream type for a mix.

Binaural audio differs from stereo in that it is a synthesis that virtualizes every object, and delivers the mix over headphones in two conventional audio channels.

Let’s look at some of the challenges in delivering immersive binaural content.

Picture this: If we stick a microphone in each of your ears and record what you hear, then play it back for another individual, will it sound the same to them as it did to you? The answer is yes, to some extent. You see, for each of us, our body (ears, upper torso, etc.) plays an important role in how we perceive sound. So, while another listener will get some sense of localization, it’s like making them hear with your ears.

As sound strikes the listener, a number of factors influence how the sound is perceived, including the size and shape of the head and ears, ear canal, nasal cavities, and more. This is what we refer to as HRTF filters (Head Related Transfer Function). The most commonly-used tools for these measurements are what we refer to as a Generic Dummy Head. Humans can adapt and compensate for a generic HRTF filter that isn’t the perfect signature for their hearing capabilities.

When streaming and recording material, many will rely on generic HRTFs such as Kemar, or the popular HRTF used in 360/VR pipelines such as the Neumann KU 100. Ideally, for a mixing engineer wanting to work in binaural audio, the ultimate way of getting the most truthful and reliable monitoring experience, with a far more natural sense of space and direction, is by having their own individual HRTF. Unfortunately, a personalized HRTF is prohibitively expensive for most engineers.

That said, services do exist for creating your own personal HRTF, your aural ID, such as Genelec Aural ID. Starting with a video of your head and shoulder region from your mobile phone camera, the aural ID process builds an accurately-scaled 3D model of your head and upper torso dimensions, and from this delivers your personal HRTF file. A simple import of your personal HRTF file in your binaural renderer or monitoring tool gives you a sonic reproduction adapted specifically to you.

As mentioned earlier, headphones break the link to these natural mechanisms we
have acquired over our lifetime, making it harder to localize sounds, since sounds from headphones seem to reside ‘inside’ our heads rather than all around us. Your personal unique HRTF and its effect helps to calculate how your head, external ears, and upper body impact and colour the audio you hear.

Hugo Larin is the director of business development at FLUX:: Immersive, as well as the principal and head of business development at LS Media. He can be reached at hugolarin@lsmediapro.ca.

Author image
About Contributor
You've successfully subscribed to Professional Sound - Indepth
Great! Next, complete checkout for full access to Professional Sound - Indepth
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.