Immersive Audio
Support ambio THEORY

Ambiophonics Theory

Understanding and Installing an Ambiophonic System
by Les Leventhal, University of Manitoba and Ralph Glasgal, Ambiophonics Institute



Recordings of music and film soundtracks contain cues used by the ear/brain to localize sound. Home or studio reproduction using conventional stereo, 5.1, 7.1, or 10.2 distorts these cues and creates false ones. The result is localization distortion, which degrades horizontal and depth imaging of direct and ambient sound, degrades clarity of instruments, colors the sound, and greatly reduces size and depth of the sonic stage. Localization distortion can be reduced to very low levels by a technology called Ambiophonics. Ambiophonics, at its simplest, consists of crosstalk-cancelled playback by two closely-spaced, front speakers. The result is that one can now hear at home what the recording microphones hear—and what the microphones hear is greatly improved horizontal and depth localization; solid, clear, three-dimensional imaging; less colored sound; improved clarity and tonality; improved transient response; and a sonic stage that is very deep and very wide—150-180 degrees—compared to the 60-degree wide stage of the stereo equilateral triangle. Ambiophonics does not artificially increase the width and depth of the stage. Instead, it reduces localization distortion to such low levels that one can hear the width and depth that was actually recorded on the disc. Details are discussed for setting up an Ambiophonic system with 2, 4, or 6 speakers.

First there was mono and then there was stereo.  Using just one speaker, monophonic reproduction sounds like all instruments are located at the speaker.  Using two widely-space speakers, stereophonic reproduction sounds like different instruments have different locations—with the locations stretching from one speaker to the other.  Having different locations for different instruments is so highly valued that stereophonic reproduction—now almost 80 years old—and offshoots such as 5.1 and 7.1 have become standard in the home reproduction of music and movies.  But stereo and its offshoots do far from a perfect job of localizing sound and their imperfections limit the quality, the believability, and the realism of the reproduction.   A new technology—called Ambiophonics—fixes most of the problems with stereophonic reproduction.  Ambiophonics is based on almost a century of psychoacoustics research on how the ear/brain localizes sound.  This historical research, combined with current research, tells us how conventional stereo destroys good sound localization and how to fix the problem.  Ambiophonics fixes most of stereo’s problems by using crosstalk reduction with two closely-spaced front speakers—separated only 20 to 30 degrees.  Yet the sonic stage created can be 150-180-degrees wide!  Just how this can happen is a wonderful tale of how laboratory findings can produce unexpected and beautiful results.


Part 1: Problems with Stereo Reproduction and How to Fix Them

Sound Localization

To understand how stereo distorts instrument localization and how Ambiophonics corrects it, we need to understand first how the human ear/brain localizes sound.  The ear/brain uses three primary cues to localize sound:  interaural loudness differences (ILD), interaural time differences (ITD), and changes to the higher frequencies of the sound by the pinna, the curly shell surrounding the ear canal (pinna localization cues).

Interaural Loudness Differences (ILD). With live music, if a violinist is playing a violin in front of you, the loudness at both ears is about equal.  If the violinist is standing on your right, the violin sound in your right ear will be louder than in your left.  Such loudness differences at the two ears from the same sound source are a cue for the sound’s location.  ILD cues work well only for signals with energy between 90 Hz and 1,000 Hz.

Interaural Time Differences (ITD). If the live violinist is not playing directly in front of you but to your right, the violin sound will reach your right ear a little earlier than your left ear.  The reason is that the violin is just a bit closer to your right ear than your left.  Such a time difference between sound arrivals at the two ears from the same source is a cue for the sound’s location.  Like ILD cues, ITD cues work really well only for signals with energy between 90 Hz 1,000 Hz.

Pinna Localization Cues. The frequency response of a live violin consists of a complex pattern of frequency peaks and valleys.  Before the violin sound enters your ear canal, it bounces around the curls, cavities, and folds of your pinna (ear shell).  Frequency components over about 1,000 Hz interact with these structures and the pattern of peaks and valleys changes enormously.  Moreover, the sound from a live violin located at your left will bounce differently around the left-ear pinna than will the sound from the same live violin located directly in front of you.  (Actually, if the violin is located at just the correct spot on your left, it will have a direct shot at your ear canal and pinna shadows and resonances become less important.)  So the frequency response of the live violin measured at the entrance to your left ear canal changes with the violin’s location.  The brain uses these changes in response patterns as location-finding clues.  Very small horizontal changes in the violin’s location can produce changes so great in the pattern of peaks and valleys that one might view the pinna as an exquisitely sensitive direction finder that converts minute changes in the direction of incoming sound to overwhelming changes in the frequency response pattern.  Even a person with only one functioning ear has some ability to identify the location of most natural sounds.  The pinna locates both transient sounds, like clicks, and continuous sounds.

You have two pinnae.  For a given violin location, the pinnae create quite different response patterns.  The brain interprets each single ear pattern and possibly the difference between the patterns as a location cue.  When the violin is off center, the difference between the patterns will be very large indeed.  Move the violin a little to the left or right and the patterns and thus the differences between them can change greatly. This location detector is so sensitive that subjects can detect a change as small as one degree in the horizontal location of impulsive clicks or speech sibilants.

If a music reproduction system is to produce the correct pinna cues when playing a violin recording, then in theory the speaker reproducing the violin must be located where the violin is supposed to be.  If the live violin is on your left, the speaker reproducing the violin must be at that same angle on your left.  If the violin is supposed to be directly in front of you, as is the case with most soloists, the speaker reproducing the violin must be directly in front of you.  If the speaker reproducing the violin is not where the violin is meant to be, then the pinna cues produced by the speaker will be incorrect for the violin’s location.  As we shall see below, getting the pinna localization cues correct is a nasty problem for any music reproduction system.

How Stereo (and 5.1, 7.1, etc.) Messes Things Up

Conventional stereo (and its offshoots 5.1, 7.1, etc.) creates an illusion, akin to an optical illusion, that does a fair job of localizing sound.  But it does not do an excellent job.  Consider a typical home stereo system with the speakers and the listener forming an equilateral triangle—that is, the speakers are separated by a 60-degree angle as viewed by the listener.  This stereo system does several things that prevent lifelike sound localization. More complex systems such as 5.1, 7.1, etc. have the same problems.  The problems are acoustic crosstalk, comb filter effects, incorrect pinna cues, incorrect ILD and ITD cues, and inconsistent localization cues.

Acoustic Crosstalk. You are listening to a live violinist playing directly in front of you. Both ears hear the violin.  The sound at your left ear is similar to but not exactly the same as the sound at your right ear.  There are many reasons for the slight sound differences but they are not important now.  What is important is that the live violin has produced two versions, two presentations, of the violin sound – one at your left ear and one at your right ear.  This is OK because your ear/brain has spent all its life learning to fuse two sound presentations such as this into one image—so what you perceive now is a single, live violin.  Now consider a typical stereo recording of the same violinist.  The recording is engineered so that the violinist will appear to be located directly in front of you, halfway between the two speakers.  To accomplish this, the two channels of the recording will have similar loudness and will arrive at your ears at about the same time.  The problem is that your left ear hears both speakers and your right ear hears both speakers—and the four sound presentations are not exactly alike in level, arrival time, or frequency response.  Your ear/brain now has four versions to fuse into a single violin.  If your left ear heard only the left speaker and your right ear heard only the right speaker, then your ear/brain would be back in the familiar territory in which it must fuse just two presentations into a single image.  The trouble is that your right ear hears the left speaker and your left ear hears the right speaker.  This is called acoustic crosstalk, where each ear hears the speaker on the opposite side. Your ear/brain did not evolve to deal with four presentations of the same sound source.

Crosstalk produces incorrect head shadows for center images, reducing the lifelikeness of the image.  Head shadow refers to the reduction of mid and high frequencies as sound travels around and over the head to the far ear.  When listening to a live instrument directly in front of you, sound travels to the ears with only a small impact from the intervening fleshy part of the face, that is, with only a small head shadow.  In contrast, when stereo speakers play similar signals to produce an image directly in front of you—a center image—head shadow is large because sound from a speaker must travel around much of the head to reach the far ear.  The result for center images is a tonal balance with less mid and high frequency energy reaching the ears than in real life.  One can eliminate side-speaker head shadow by eliminating side-speaker crosstalk (difficult) or by moving the speakers close together (easy).  When speakers are close together, side-speaker head shadow cannot occur.  Moreover, it is easy to cancel crosstalk when speakers are close together.  When this is done, the sonic stage spreads out well beyond the confines of the close speakers—and this wide stage will have lifelike center images.

Music systems using 5.1 and 7.1 formats have even worse crosstalk than conventional stereo because they use three front speakers:  left, right, and center.  If a recording is mastered so that all three front speakers produce the sound of the same solo violin, your left ear will hear 3 speakers and your right ear will hear three speakers—creating a total of six different presentations of the same violin plus lots of excess bass for the larger instruments.  Pity the unfortunate ear/brain that must deal with this chaos.  Music systems that use three front speakers are as fundamentally flawed as the original two-speaker stereo triangle.  In 5.1 movies, however, the center speaker is usually mono and mostly dialog so that crosstalk does not occur.

Comb Filter Effects. When two identical or correlated broadband signals are separated by just an instant or two in time, the signals add together producing a single signal with a changed frequency response.  The new response can differ substantially from either of the original signals.  The change in response depends on several factors, one being the delay between the signals.  With the right delay, somewhere around .1ms to 1.0ms for audio, the resulting signal, viewed on a scope, displays peaks alternating with deep nulls—resembling a comb.  These are called comb filter effects, or combing.  Unfortunately, acoustic crosstalk from stereo speakers produces comb filter effects.  Consider a conventional stereo system playing a soloist located directly in front.  Both speakers produce similar signals.  The left ear hears the left speaker and, about .22ms later, hears the right speaker.  Comb filter effects result at the left ear.  This is an actual change in the frequency response of the signal from the left speaker and it is caused by acoustic crosstalk.  The same thing happens at the right ear when it hears the right speaker and, .22ms later, hears the left speaker.  Now consider a 5.1 or 7.1 system having three front speakers.  The three speakers are at slightly different distances to the left ear.  Suppose the three speakers are all reproducing a soloist at stage center.  The left ear first hears the left speaker, then an instant later hears the center speaker, then an instant later hears the right speaker.  The same sort of thing happens at the right ear.  Combing chaos!  While the combing is not audible as a change in frequency balance, it does signal the brain that the sound source is not real.  The music sounds canned, without depth or presence, or perhaps very slightly grainy or fuzzy.  This is why, in 5.1 mastering, the center speaker is almost always mono dialog or a mono spot mic’d soloist.

Incorrect Pinna Cues. Again consider the conventional stereo system arranged in an equilateral triangle.  The system is reproducing a soloist at stage center.  Although the soloist is located at stage center, the speakers are located at stage left and stage right and are therefore producing pinna cues appropriate only to sound sources located near the sides of the stage.  These are the wrong pinna cues for the soloist at stage center.  They are incorrect cues, false cues.  In order for the pinna cues to be correct, the center-stage soloist should be reproduced by a center speaker.  However, when an instrument located 30 degrees left of center is being reproduced by the left speaker, the pinna cues are consistent with that location of the sound source.  Hence, with conventional equilateral stereo, pinna cues are correct for sound sources exactly 30 degrees off center but incorrect for sound sources near the center of the stage (or greater than 30 degrees off center).  This is unfortunate since soloists are usually recorded near the center of the stage and the largest portion of musical instruments is usually recorded closer to the center of the stage than to the sides.  In 5.1 and 7.1 systems, if the recording is engineered so that the left speaker reproduces left sound sources, the center speaker reproduces center sound sources, and the right speaker reproduces right sound sources, then the pinna cues would be correct.  But acoustic music recorded with microphones (as opposed to computer-generated sound) for 5.1 and 7.1 cannot be segmented this way effectively.  Movies are often made this compartmentalized way—resulting in essentially 3- or 5-channel mono except when background music is present.

Incorrect ILD and ITD Cues.  When conventional stereo tries to reproduce a center image, one would think that the ILD (interaural loudness difference) and ITD (interaural time difference) cues from the speakers would be correct.  After all, the speakers need only produce equal loudness and equal time delays—ILD and ITD values of zero.  Stereo speakers can easily produce ILD and ITD values of zero.  The problem is that stereo speakers also produce crosstalk and crosstalk creates additional, unwanted ILD and ITD cues that are incorrect for a center image.  Consider a singer recorded at stage center.  The sound from the left speaker arrives at the left ear with the same loudness and time delay as does the sound arriving at the right ear from the right speaker.  The ILD and ITD values are both zero, which are correct for a center image.  So far, so good.  Unfortunately, the sound of the right speaker arriving at the right ear is followed about .22ms later by the crosstalk sound of the right speaker arriving at the left ear—creating an ITD cue of .22ms.  The same sort of thing is happening at the other ear, creating a reverse ITD cue of .22ms.  (One can also think of these as bogus high level early reflections impinging on each ear.)  The ITD cues are actually +.22ms and -.22ms, meaning that the left speaker sound precedes the right by .22ms for one cue and that the right speaker sound precedes the left by .22ms for the other cue.  Crosstalk is creating ITD cues of +.22ms and -.22ms and if you want to create a center image you do not want to be creating incorrect ITD cues of ±.22ms that accompany the correct ITD cue of zero.

A sound at the side in real life produces an ITD of about .70ms.  The maximum ITD that the stereo triangle can produce is about .25ms.

Crosstalk from equilateral stereo creates false ILD/ITD cues for all stage locations except 30 degrees off center, right where a speaker is physically located.  At this location, equilateral stereo produces correct ILD and ITD cues.  To make the sound source appear 30 degrees off center, essentially only one speaker produces sound.  This amounts to mono reproduction—with the speaker located where the sound source is supposed to be.  At 30-degrees off center, all localization cues—ILD, ITD, and pinna cues—are correct for stereo.  For any other stage location, stereo produces ILD/ITD chaos and incorrect pinna cues.  One audible effect of incorrect ILD/ITC cues is to reduce the width of the stereo stage.  The 60-degree stage width you hear with stereo is narrower than the stage width actually captured by the microphones.  Hence, if we could eliminate the crosstalk, the reproduced stage would widen considerably.

When recording microphones hear an extreme side source beyond the 30-degree angle—for example, at 90 degrees—they record the source with ITD and ILD cues appropriate to the extreme angle.  But side images in equilateral stereo cannot be localized beyond the 30-degree angle.  Without complex computer processing, the recorded ITD/ILD cues for extreme side sources are not deliverable—for two reasons:  First, the 30-degree speaker location limits the maximum ITD measured at the ears to .22ms—compared to an ITD of .70ms produced by a live source at 90 degrees.  Second, the 30-degree speaker location limits the maximum ILD measured at the ears to a value smaller than that produced by a live source at 90 degrees.  Finally, pinna cues at the 30-degree speaker location are incorrect for an extreme side source.  Hence, extreme side sources get folded inward and get lumped together at the 30-degree position where the speaker is located.

Inconsistent Localization Cues. Sound localization depends on ILD cues, ITD cues, and pinna cues.  There are two kinds of pinna cues:  (1) cues based on the response of a single pinna by itself to a sound event and (2) cues based on the responses of both pinnae.  Everyday experience shows that sound localization is better when both pinnae are used but it is not yet clear how the brain makes use of the two pinna responses.  Nevertheless, if a sound reproduction system is to provide excellent sound localization, it must reproduce all the localization cues and the cues must provide consistent information about the direction of a sound source.  The reproduced sonic image will seem less realistic if some cues say that the source is up front and other cues say that the source is at your side.  Yet this is exactly what stereo and its offshoots do.  We have seen that stereo provides incorrect pinna cues for a center-stage instrument—because the sound is actually coming from side speakers.  Stereo speakers that are 30-degrees off center provide correct pinna, ILD, and ITD cues only for instruments that are exactly 30 degrees off center.  For a center-stage instrument, stereo speakers provide corrupted ILD/ITD cues (because of crosstalk) and incorrect pinna cues (because speakers are located at the sides).  All of the localization cues have problems—and they are not even consistent with each other.  Your ear/brain did not evolve to deal with this jumble of inconsistent localization cues.  This inconsistency degrades the clarity and realism of all instrument locations except at 30 degrees.  Some listeners have difficulty detecting stable central phantom images—the images jump left or right.  If 5.1 or 7.1 recordings were mastered so that the center speaker alone reproduced center-stage instruments, and side speakers alone reproduced side sounds, then all localization cues would be consistent and correct.  But this would amount to 3-channel mono and acoustic music does not usually benefit from being mastered this way.

How Crosstalk Reduction with Closely-Spaced Front Speakers Fixes Most Problems

Crosstalk Reduction. Electronic circuits that reduce crosstalk in stereo speakers have been available for decades.  Over the years, they have grown in the precision and sophistication of their design, resulting in more complete crosstalk reduction and fewer unpleasant side effects.  Ambiophonics employs a laboratory-grade crosstalk reducer called RACE (Recursive Ambiophonic Crosstalk Eliminator).  In addition to PC versions, RACE has recently become commercially available in certain products of TacT Audio, such as the TacT Ambiophonics digital processor, TacT 2.2 XP, and TacT TCS.  All crosstalk reduction circuits are based on the same principle:  Sounds from the right speaker are cancelled at the left ear by a carefully timed, 180-degree out-of-phase cancellation signal launched by the left speaker.  Sounds from the left speaker are cancelled at the right ear by a carefully timed, 180-degree out-of-phase cancellation signal launched by the right speaker.  If crosstalk cancellation is successful, then the left ear hears only the left speaker and the right ear hears only the right speaker.  The cancellation has usually been done at a very broad range of middle frequencies, the size of the range being adjustable in the more sophisticated cancellation circuits such as RACE.  The timing of the cancellation signals is quite precise and assumes that the speakers are equidistant from the listener.  The four presentations of a sound source that result from conventional stereo are now reduced to the two presentations we experience when listening to live music.  Crosstalk reduction flattens the frequency response of both the center and side stage.  Stereo crosstalk produces ILD/ITD chaos for any location except at the speakers.  This narrows the stage considerably compared to the width heard by the recording microphones.  Crosstalk reduction eliminates ILD/ITD errors and restores the wide stage heard by the microphones—even when the speakers are moved close together.  Although a live instrument at your side produces an ITD of about .70ms, the maximum ITD produced by the stereo triangle is about .25ms.  With crosstalk reduction, even closely-spaced speakers—about 20-30 degrees apart—can deliver essentially whatever ITD has been captured by the recording microphones.  This can even be milliseconds for spaced omnis, but hopefully will not exceed the normal binaural hearing maximum of about .7 milliseconds.

Closely-Spaced Front Speakers. RACE crosstalk reduction works best with front speakers spread roughly 20-30 degrees apart.  The front location of the speakers minimizes the generation of false head-shadows—for example, when side speakers produce a center image—and allows the speakers to produce correct pinna cues for instruments in the center third of the stage.  Moreover, closely-spaced front speakers—together with crosstalk cancellation—produce correct ILD and ITD cues for the entire stage.  In contrast, stereo-triangle speakers produce correct ILD and ITD cues just at the 30-degree points where the speakers are located.  Thus, close speaker spacing and frontal location produce correct ILD, ITD, and pinna cues for the most important sound location:  the central third of the stage, where soloists and most of the instruments are usually located.  Put differently, for closely-spaced front speakers, sound localization cues are consistent (and correct) for the central third of the stage.  For conventional stereo, however, localization cues are inconsistent except at the 30-degree locations, where they are consistent (and correct).

As discussed above, equilateral stereo produces incorrect head shadows for a center image—a serious problem.  One essentially eliminates the problem by moving the speakers close together.  Head shadows are small for close speakers since the sound moves across only a small part of the face on its way to the far ear.  Unfortunately, the small head shadow produced by close speakers is incorrect for side sources—for example, if you listen to a live violin on your extreme right, your left ear will hear greatly diminished mid and high frequencies.  But it is better for head shadow to be correct for center images than side images.  As we shall see, there are ways to provide the normal head shadow for side images without affecting the quality of the center stage.

For side sources, closely-spaced front speakers driven by RACE produce essentially correct ILD/ITD cues.  This is true even for extreme side sources more than 30-degrees off center.  In contrast, equilateral stereo, because of crosstalk and head shadow problems, creates a muddle of ILD/ITD cues (and incorrect pinna cues) for extreme side sources.  While some recording engineers try to compensate, the fact that the overwhelming majority of discs image widely when reproduced Ambiophonically suggests that this defect is not easily ameliorated.

Although RACE-driven closely-spaced front speakers correctly deliver recorded ILD and ITD cues for the entire stage, pinna cues will be incorrect for side sources.  In addition, as in stereo, head shadows may be missing for side images when side sources are recorded on the disc.  Fortunately, sources at the extreme sides are less common in music than central sources.  Moreover, since side sources have a direct shot at the ear canal, the brain depends less on pinna cues for side sources than on ILD/ITD cues.  Getting pinna cues correct for the entire stage is a nasty problem.  But closely-spaced front speakers do a far better job than do stereo-triangle speakers.

One can supplement closely-spaced front speakers with side speakers to produce correct pinna cues and correct head shadows for sources at the extreme sides.  As a result, sources at the left and right 90-degree points will appear routinely when called for by the recording.  Installing side speakers will be discussed in Part 2.

Stereo crosstalk produces comb filter effects.  With equilateral stereo, combing can start below around 1,000 Hz.  As the speakers are moved closer together, the start of the combing moves up in frequency.  When speakers are as close as Ambiophonics specifies, combing occurs at such high frequencies that it is either inaudible or virtually inaudible.

Close-spacing of front speakers is the real innovation of Ambiophonics over previous crosstalk-reduction technologies.  Stereo’s speakers produce head shadows which, like fingerprints, vary greatly across people.  Hence, effective crosstalk elimination is just not possible for side speakers.  (Recall, head shadow consists of mid and high frequency losses as sound travels around the head to the far ear.  To cancel a signal at the far ear, the cancellation signal must be programmed with the same frequency response as the signal it is meant to cancel.  This is not possible since head shadows vary greatly.)  Since effective crosstalk elimination is not possible for side speakers, side-speaker head shadows will always interfere with center images.  If one moves speakers close together to eliminate side-speaker head shadows, it is still desirable to eliminate crosstalk, that is, to make the left ear hear only the left speaker and the right ear hear only the right speaker.  Fortunately, head shadow from a front speaker is so slight that it can be ignored by the crosstalk cancellation software.  The software need consider only the delay and attenuation as sound goes from a front speaker to the far ear—and this is quite doable.  Indeed, with RACE crosstalk reduction, the user adjusts delay and attenuation until the widest stage is heard.  Crosstalk cancellation is then maximum for that user.  Thus, close-spacing of front speakers makes effective crosstalk cancellation possible.  Moreover, close spacing eliminates side-speaker head shadows, satisfies the pinna for the center stage, gets ITD and ILD cues correct for the entire stage, greatly reduces or eliminates audible combing, and avoids stereo’s unconvincing center imaging.  And now that the speakers are close together, you absolutely need crosstalk cancellation or the stage will be 20 degrees wide, if that!

An additional benefit of closely-spaced speakers is that they can be far from the side walls of the listening room.  If a speaker is too close to a side wall, the delay between the direct sound from the speaker and the first reflection off the side wall that hits the listener may be short enough to produce comb filter effects.  Moving speakers away from the walls can reduce combing.  Reducing combing from side wall reflections and increasing the delay of side wall reflections can improve imaging and reduce coloration of the sound—although side-wall reflections have proven to be less harmful to Ambiophonics than to conventional stereo.

One might view Ambiophonics this way:  Conventional stereo and its offshoots 5.1 and 7.1 suffer from acoustically-produced localization distortion.  Ambiophonics is designed to greatly reduce this distortion.  Ambiophonics not only lets you hear the music as it was actually recorded but as it would have sounded if you were at the main microphone location during the performance.  You can enjoy the same wide-angle perspective as the main microphones or a first-row center concert goer.

Audible Benefits of Ambiophonics

Ambiophonics produces two improvements in the sound—spatial improvements and clarity/tonality improvements.  The spatial improvements include the creation of a wider, deeper stage with better horizontal and depth imaging.  The spatial performance is well understood.  One can measure ILD and ITD cues recorded by studio microphones and mathematically predict where the listener will hear the image.  One can also predict the change in image location when the ILD and ITD cues are distorted by crosstalk in conventional stereo.  (See, for example, Glasgal’s Tonmeister Symposium paper, 2005.)  Hence, it is well understood how crosstalk cancellation increases stage size and improves imaging.  Perhaps more important than its spatial performance are Ambiophonic’s clarity/tonality improvements.  Even when listening to a single instrument, the instrument will have better clarity and richer tonality after crosstalk cancellation.  Instruments will sound more real compared to conventional stereo.  Crosstalk cancellation reduces stereo’s unnatural four sound presentations to the ears to the two that we hear with live sound.

Clarity/tonality. Using audiophile terminology, if one compares Ambiophonics to conventional stereo on the reproduction of a single instrument at stage center, Ambiophonics has better transient response, less coloration, more detail, and greater clarity.  The instrument will also have greater tonal richness—a greater number of different tonal colors.  The instrument will just sound more lifelike.  Note that RACE crosstalk cancellation itself does not process or affect transient response, detail, clarity, or tonal richness.  All RACE does is remove crosstalk.  This means that the musical signal recorded on the disc originally had the superior transient response, detail, and tonal richness but the crosstalk from conventional stereo covered and masked these virtues.  Remove the crosstalk and you will hear the music as you never heard it before.  Play a selection with 3 or 4 instruments.  After RACE crosstalk cancellation, you will hear more air and silence between the instruments.  This produces greater clarity and less congestion when the instruments play together.  The instruments will also sound more different from each other in tonal color.  With stereo, the instruments sound like they came from the same tonal pot—all sharing a certain tonal coloration.   There is much higher fidelity in music stored on ordinary LPs, CDs, and DVDs than is recoverable using conventional stereo, 5.1, 7.1, or 10.x.  Compared to these, only Ambiophonics provides just two presentations of the same musical instrument, one presentation for each ear—just like in real life.

Spatial Improvements. Ambiophonics produces a significantly wider and deeper stage than conventional stereo.  The Ambiophonic stage extends way to the left of the left speaker, way to the right of the right speaker, and is very deep.  The stage is typically 150-180 degrees wide.  To anyone accustomed to conventional stereo, which limits the stage to the 60-degree angle between the speakers, Ambiophonics seems like magic.  Ambiophonics does not artificially increase the width and depth of the stage.  Instead, it reduces localization distortion to very low levels so that you can hear the width and depth of stage that was actually recorded.  There are much more localization data stored on ordinary LPs, CDs, and DVDs than are recoverable using stereo, 5.1, 7.1, or 10.x reproduction.

A 150-180 degree stage is very wide.  This does not mean that the instruments stretch across a 150-180 degree arc.  The musical group can occupy a much smaller space but the reverberant field will stretch to 150-180 degrees even though there are no surround speakers.  This means that musical groups play their music in a much larger reverberant field, just like in a concert hall.  A concert hall, however, creates a 360-degree reverberant field.  Ambiophonics now offers RACE for four-speaker (4.x) systems—two speakers up front and two speakers in the rear—that will reproduce a 360-degree reverberant field.  A RACE four-speaker system will also create the 360-degree direct sound field needed for movies.  TacT Audio has incorporated the four-speaker Ambiophonic software in new products—such as the TacT Ambiophonics room correction preamplifier—and similar systems can be configured for PCs using the software available free on the Ambiophonics web site.  The 4.x methodology, with an option to add side speakers (6.x), is described in Part 2 below.

Imaging. Ambiophonics pays so much attention to getting the spatial localization cues correct and consistent that it is no surprise that the imaging is better than conventional stereo.  Horizontal imaging is more precise and so is depth imaging.  Depth imaging refers to the realistic depiction of some instruments as far away and others as close.  If depth imaging is precise, you can hear when one instrument is just in front of another.  This is sometimes called layered depth.  Depth imaging in conventional stereo is typically poor.  A single instrument may have a reasonably precise horizontal location but the depth location is smeared.  As a result, an instrument’s image can sound one foot wide but can appear to extend vaguely somewhere from 10-14 feet ahead of you.  The image is shaped something like a needle pointing at you, not a point in space.  The excellent depth imaging in Ambiophonics can reduce the 10- to 14-foot needle to almost a point.  Perhaps because of the better horizontal but especially because of the better depth imaging, listeners often report that a single instrument sounds more well-defined, more 3-dimensional, and more palpably real in Ambiophonics than in stereo.

Even with excellent speakers, closely matched in frequency response, room acoustics may affect one speaker differently than the other.  Hence, measured at the listening spot, the frequency response of one speaker may differ greatly from the other.  This will degrade and smear imaging.  But if the speakers are located close together, as they are in Ambiophonics, room acoustics are likely to affect both speakers identically.  This benefits imaging.  Owners of TacT room correction devices that include RACE crosstalk reduction software have an additional benefit.  The room correction function does work individually on each speaker, but their proximity helps make their frequency responses highly correlated at the listening position.

Imaging Empty Space. Consider a string quartet recorded in a hall.  Sound reflections from room surfaces fill the hall with ambience.  So the space between and around the instruments is not empty but filled with reverberation.  Like the direct sound of an instrument, reverberation is a sonic event that can be imaged poorly if localization cues are incorrect.  We just do not think of reverberation as a sonic event to be imaged because it is so diffuse and unfocused.  Nevertheless, the 60-degree wide reverberant field of equilateral stereo expands to 150-180 degrees with Ambiophonics.  Moreover, some recordings played back in stereo have a reverberation problem in which a soloist is surrounded by a halo of reverberance that is audibly denser and louder than the rest of the reverberant field.  Ambiophonics distributes this halo and the field becomes more homogeneous.  This suggests that localization distortion can audibly degrade both the sound of instruments and the reverberant field surrounding them.  Ambiophonic’s reproduction of vivid, three-dimensional instruments may result not only from the reduced localization distortion of the direct sound of the instruments but also from the reduced localization distortion of the frontal reverberant field.  Both are now being delivered as they were heard by the recording microphones.

Conclusions. The reduction of localization distortion by Ambiophonics is such a profound change in sound reproduction that it can be easily heard on the most modest audio equipment.  Indeed, if you use RACE crosstalk reduction on music reproduced by the 1-inch speakers built in to your laptop computer, you will hear the Ambiophonic stage.  So large are the spatial and clarity/tonality changes produced by Ambiophonics that one accustomed to conventional stereo should allow the ear/brain several days to accommodate to Ambiophonic sound before evaluating it.

Ambiophonics can produce a “you-are-there” large ensemble experience rather than the cramped sense of “they are here” often delivered via the stereo triangle.  It transports you to the recording site whereas conventional stereo seems to transport the instruments to your listening room.  Listen to Ambiophonics long enough for your ear/brain to accommodate to the larger stage, improved imaging, greater clarity, and improved tonality.  Listen long enough to get used to the sound.  Then press the button that returns the sound to conventional stereo.  The shock of the change will be like a slap in the face.

Recording engineers should use Ambiophonics with their studio monitoring speakers if they want to hear what the microphones hear.  Just like home speakers, studio speakers used in a stereo triangle produce such localization distortion that an engineer cannot hear it when his microphone placement for a piano produces a piano that sounds 70-feet wide when reproduced on a system without localization distortion.

We are fortunate to have a legacy of a half century of stereo recordings—a cultural treasure.  Stored on those CDs and LPs are a wealth of localization information and a level of clarity/tonality that we cannot hear until we use playback systems having very low levels of crosstalk.  Converting one’s system to Ambiophonics will provide the opportunity to listen to old friends with new ears.

Crosstalk-Reduction Circuits/Software

When a crosstalk-reduction device launches a cancellation signal from the left speaker to cancel the right-speaker signal at the left ear, the cancellation signal is unfortunately also heard by the right ear.  Early crosstalk reducers, such as the Carver/Sunfire Hologram and Lexicon’s Panorama, were content to cancel the right-speaker signal at the left ear and to ignore the fact that the cancellation signal was heard by the right ear.  Current laboratory-grade crosstalk-reduction devices such as RACE are designed to cancel the cancellation signal arriving at the right ear with a cancellation signal launched by the right speaker.  Then the cancellation signal launched by the right speaker is heard by the left ear and must be cancelled at the left ear.  And so on.  A crosstalk reducer designed to cancel all the cancellation signals is called recursive.  RACE is recursive.  A cancellation signal is cancelled by a signal that is about 2.5 dB softer when launched by the speaker.  In RACE, this value is adjustable.  The cancellation of cancellation signals could go on forever.  But the amplitude of the cancellation signal decreases at each step and is finally terminated by RACE when the digital value of the sample is all zero.  It is usual in concert halls to refer to reverberation time, which is the time it takes for the reverberation to fall 60 dB after a single musical note is terminated.  60 dB down is almost inaudible.  Humans can easily detect differences in reverberation time and the same applies to crosstalk cancellation.  The cancellation must continue until the cancellation signals are no longer of psychoacoustic significance.

To read the rest of this tutorial available at the Ambiophonics Institute, click on this link.