CC BY-NC-ND 4.0 · Semin Hear 2021; 42(03): 206-223
DOI: 10.1055/s-0041-1735176
Review Article

Binaural Signal Processing in Hearing Aids

Peter Derleth
1  R&D, Sonova AG, Staefa, Switzerland
,
Eleftheria Georganti
1  R&D, Sonova AG, Staefa, Switzerland
,
Matthias Latzel
1  R&D, Sonova AG, Staefa, Switzerland
,
Gilles Courtois
1  R&D, Sonova AG, Staefa, Switzerland
,
Markus Hofbauer
1  R&D, Sonova AG, Staefa, Switzerland
,
Juliane Raether
1  R&D, Sonova AG, Staefa, Switzerland
,
Volker Kuehnel
1  R&D, Sonova AG, Staefa, Switzerland
› Author Affiliations
 

Abstract

For many years, clinicians have understood the advantages of listening with two ears compared with one. In addition to improved speech intelligibility in quiet, noisy, and reverberant environments, binaural versus monaural listening improves perceived sound quality and decreases the effort listeners must expend to understand a target voice of interest or to monitor a multitude of potential target voices. For most individuals with bilateral hearing impairment, the body of evidence collected across decades of research has also found that the provision of two compared with one hearing aid yields significant benefit for the user. This article briefly summarizes the major advantages of binaural compared with monaural hearing, followed by a detailed description of the related technological advances in modern hearing aids. Aspects related to the communication and exchange of data between the left and right hearing aids are discussed together with typical algorithmic approaches implemented in modern hearing aids.


#

Binaural signal processing in hearing aids relates to the binaural hearing capabilities of normal-hearing listeners and the individually altered monaural and binaural hearing capabilities of listeners with a degraded sense of hearing.

In general, information from all human senses is used synergistically to allow the biological system to interact with the world. With its sensitivity to acoustic information arriving from any location around the listener, the sense of hearing is especially helpful for guiding the sense of vision, and, by that, body posture in a direction potentially worth focusing more attention. The sense of hearing is “always on,” thereby allowing it to create a basic mental representation of the physical world in relation to the human body even when a person sleeps or when visual information is unavailable. This representation consists of characteristic information about the surrounding space and the location of acoustic objects. The capability of the senses in combination with the abstraction power of the brain to create and maintain a reliable mental map of the ever-changing environment in relation to the position of the human body is essential for a person to reach a certain level of “peace of mind” or to focus attention on a single acoustic object. People with hearing loss might fall short of creating such a reliable map under all circumstances. Therefore, the challenge for an ideal, technically assisted binaural hearing rehabilitation is twofold: (1) provide acoustic cues to the impaired auditory system in such a way that the individual mental spatial map can be created and maintained and (2) allow attention to focus on single acoustic sources if needed. Achieving both perceptual goals at the same time is not always possible, depending on the degree of hearing loss and the acoustic coupling of the hearing aid to the ear. Modern technical systems offer optimized solutions either for selective listening or for integrative listening (see the article by Jespersen et al in this issue for more information about a system that attempts to address this dilemma).

For many years,[1] clinicians have understood the advantage of listening with two ears compared with one. In addition to improved speech intelligibility in quiet, noisy, and reverberant environments, binaural versus monaural listening improves perceived sound quality and decreases the effort listeners must expend to understand a target voice of interest or to monitor a multitude of potential target voices. For most individuals with bilateral hearing loss, the body of evidence collected across decades of research has also found that the provision of two compared with one hearing aid yields significant benefit for the listener. This article provides a brief summary of the major advantages of binaural compared with monaural hearing. The summary is followed by a more detailed description of the already achieved—and the possible future—advantages of exchanging full-audio information between the left and right hearing aids compared with the technically less-demanding exchange of synchronization information. Wherever possible in this article, special attention is paid to audiological validity. First, does the technological solution improve performance on a group-average level or even on an individual level? Second, how much individualization is necessary to exploit the full rehabilitation potential that is technically built into the hearing aids?

Binaural Hearing

The human auditory system exploits information from both ears to analyze the spatial characteristics of complex acoustic and reverberant environments (e.g., the number, distance, direction, and orientation of acoustic sources; and the amount of reverberation). Listening with two ears (binaural hearing) compared with one (monaural hearing) has several benefits that arise from several monaural and binaural auditory cues that facilitate speech intelligibility in noise. Although research continues to more fully understand why listening with two ears compared with one ear is advantageous for most listeners, much has already been learned. Some of the known benefits are discussed next.

Binaural Comparisons

Listening can be enhanced by comparing differences between the two ears for the same sound. Some differences arise because a sound originating from a particular location in space will arrive earlier at one ear (ipsilateral ear) relative to the other (contralateral ear) and will be louder at the ear closer to the sound source relative to the far ear. The brain integrates information received from each ear and then translates the differences into a unified perception of a single sound arriving from a specific region in space.[2] These differences in the two ears are:

  • Difference in the time of arrival (interaural time difference, ITD).

  • Difference in the intensity (interaural level difference, ILD).

The primary acoustic cues, ITD and ILD, are termed “binaural cues,” and the brain's ability to integrate information that it receives from the two ears is termed “binaural hearing.”[3] An example of ITD and ILD is shown in [Fig. 1]. ITD values increase with increasing lateral displacement from the sagittal plane, that is, to the left or right. Maximal ITDs occur when a sound comes from 90 degrees to the left or right ear. Similarly, a sound arriving from the horizontal plane but 90 degrees to the left reaches the right ear only after being attenuated by the head (“head-shadow effect”). This shadowing effect results in an ILD of the sound between the two ears.

Zoom Image
Figure 1 Interaural differences of time and level impinging on an ideal spherical head from a distant source. An interaural time delay (interaural time difference [ITD]) is produced because it takes longer for the signal to reach the more distant ear (left ear in this case): left side of figure. An interaural level difference (ILD) is produced because the head blocks some of the energy that would have reached the far (left) ear, especially at higher frequencies: right side of figure. (Figure adapted from Grothe et al.[10])

Assuming far-field conditions, the wavefronts reach the ears as plane waves—first at the ipsilateral ear and then at the contralateral ear with an ITD. However, this ITD appears only when the wavelength of the sound is larger than the distance of the curved path between the ipsilateral and contralateral ear; that is, for frequencies below 2 kHz for an average-size head.[4] In higher frequencies, a phase ambiguity occurs, which prevents the auditory system from adequately resolving the ITDs (see [Fig. 2]). However, because their wavelengths are shorter than the head dimensions, a diffraction phenomenon causes ILDs. In contrast, at low frequencies, the head has no effect on the sound intensity because the wavelengths are longer than the head dimensions (see [Fig. 3], left). On the right panel of [Fig. 3], the frequency dependency of the ILD for sound arriving from the right side of a KEMAR (Knowles Electronics Manikin for Acoustic Research; contralateral, 90 degrees) minus the left side (ipsilateral, 270 degrees) can also be seen, especially for frequencies above 1 kHz where there is a significant attenuation of 10 to 20 dB.

Zoom Image
Figure 2 Interaural time differences (ITDs) in the low frequencies (red) can be used for localization, but ITDs for higher frequencies might be ambiguous (blue). This figure is similar to [Fig. 1]; however, the frequency dependency for ITDs can be seen.
Zoom Image
Figure 3 Left panel: Low-frequency sound waves are not affected by the presence of the head (red), while high frequencies (blue) are attenuated by the head shadow. The red curves correspond to the curves with the solid line shown in [Fig. 1] (left) and the blue curves correspond to the curves with the dashed line in [Fig. 1] (right). Right panel: Frequency dependency of the interaural level difference for sound arriving from the right side (contralateral, 90 degrees) minus the left side (ipsilateral, 270 degrees) recorded on KEMAR. Especially at frequencies above 1 kHz there is a significant attenuation of 10 to 20 dB.

In 1907, Lord Rayleigh[5] proposed the duplex theory of hearing whereby the auditory system uses the ITDs below approximately 2 kHz and the ILDs above. This theory is well accepted but incomplete, as it is now known that the auditory system also exploits ITDs in the high frequencies by resolving the interaural envelope difference (IED), which is the temporal variation of the ILDs in the high-frequency region.[5] At low frequencies, the neurons in the auditory system can follow the sound waveform and therefore detect ITDs very well. In contrast, at high frequencies, ITDs can be detected only in the envelope of the signals but not in the fine structure (which is no longer resolved by the neurons) of the transmitted signals. For low-frequency pure tones and noise, human psychophysical experiments show that ITDs as low as 10 to 20 μs can be resolved, which corresponds to an angular accuracy of 1 degree for sound arriving from the front. How well the auditory system can use these highly precise binaural cues to support speech understanding can be measured by the binaural masking level difference (BMLD) or binaural intelligibility level difference (BILD) method.[6] [7] [8]

It should be noted that the auditory system also exploits ITDs and ILDs to estimate the distance of acoustic sources in the vicinity of the listeners (distances <1 m).[9] At greater distances, ITDs and ILDs become distance-independent, and other cues are used, namely, the amount of reverberation and high-frequency attenuation.

The healthy auditory system seamlessly uses redundant ITD and ILD information, for example, when the low- or high-frequency parts of speech are masked by noise. For normal-hearing listeners, a low-frequency noise masking the ITD cues of the target signal can be easily compensated by the high-frequency ILD/IED cues. However, hearing-impaired listeners cannot usually make as much use of this redundancy. The susceptibility to masking depends on the acoustic details of the listening environment and the listener's residual hearing capabilities (e.g., the audibility of the target signal's high-frequency components in the contralateral ear or an asymmetry in thresholds or frequency selectivity between the two ears).

In most hearing aid fittings, an individual user's residual hearing capabilities are unknown, so the fittings are based on group averages, which assume a gradual decline in binaural hearing capabilities with increasing hearing threshold. Most often, hearing aid selection and adjustment is a compromise between comfort (especially, own voice quality) and achievable audibility in the mid- to high-frequency region. Different manufacturers have developed specific solutions for binaural target gain settings that manage this compromise, even for asymmetric hearing losses.

Restoring audibility for softer signal parts independently (i.e., employing wide dynamic range compression) in the two ears will, on average, lead to smaller ILDs reaching the eardrums of the user compared with the original ILDs. Reducing the ILDs might not seem optimal for spatial perception, but providing audibility to both ears is essential for the binaural system to utilize the ITD and ILD information. For example, ensuring audibility allows the user to exploit the high-frequency IED information even if the absolute ILD information is altered. In addition, it is known that within weeks, the auditory system can re-learn ITD and ILD cues that have been altered by different pinnae modifications.[11] The position of the hearing aid microphones is another source of altered ITD and ILD cues. Binaural cues are altered the most when the microphone is positioned behind the ear. Binaural cues are altered to a lesser extent when the microphone is positioned in the ear. Due to these microphone location effects, binaural cues must be re-learned during the acclimatization phase to hearing aids.


#

Binaural Directivity

Typically, sound traveling from the free field to the eardrum must first pass the human torso, head, and pinnae, which together induce a series of direction-dependent acoustic transformations that help listeners localize auditory objects. These transformations, referred to as “head-related transfer functions (HRTFs),” considerably benefit listeners by allowing them to focus attention on the target of interest.[12] The exact nature of HRTFs is highly individual because they depend on the exact shape of the pinnae, the position of the pinnae on the head, asymmetries of the head, and relative changes between head and shoulder/torso position.

Hearing loss per se does not affect a listener's HRTFs because they relate to the listener's anatomical characteristics. However, when a hearing-impaired listener uses hearing aids, the perceived HRTFs will change as a function of the microphones' position. For example, in the case of in-the-ear (ITE) hearing aids, the microphones are placed at the ear canal entrance; therefore, the perceived HRTFs are more similar to the original ones. In contrast, for fully occluded fittings and with a single microphone placed behind the user's ear, the perceived HRTFs are very different from the original individual HRTFs. Consequently, manufacturers offer a function to approximate the average directionality of a pinna when hearing aids are fit with the microphones placed behind the ear. Usually, this is accomplished by applying a weak type of beamformer for frequencies above 1.5 kHz (see the article by Jespersen et al in this issue for more information about this type of processing). This can better preserve front/back localization cues for the user[13] and improve initial user acceptance.


#

Head Movements

The accuracy of sound localization improves when listeners move their heads while the sound is being presented. Wallach[14] suggested that dynamic ITDs and ILDs associated with head movement should resolve confusion regarding the front/back hemifield of a sound source (see [Fig. 4]). Macpherson and Kerr[15] have also shown that dynamic ITDs, rather than ILDs, provide a strong cue for front/back hemifield detection. Other studies have also confirmed the role of head movement in resolving front/back confusion.[16] The fact that head movement can also improve localization in elevation has been confirmed by several subsequent studies.[17] [18] [19] One of these studies[18] showed that dynamic ITDs could compensate for the disruption of monaural spectral cues when tubes are inserted into the ear canals. Similarly, head movement has been reported to improve elevation localization when ear molds disrupt monaural spectral cues.[19] Together, these results suggest that dynamic ITDs associated with head movement can help resolve ambiguities and improve localization performance. They also show the importance of ITDs, especially with respect to head movements, for resolving localization confusions, which is something to be considered when designing algorithms for hearing aids.

Zoom Image
Figure 4 Dynamic interaural time differences and interaural level differences associated with movement of the head help resolve confusion regarding the front/back hemifield of a sound source.

#

The “Better-Ear” Effect

When target and masker signals are produced from different locations, there is a resulting signal-to-noise ratio (SNR) advantage at one ear relative to the other. This SNR advantage is based on the fact that the head acts as an acoustic barrier that produces a level difference between the ears (i.e., head diffraction effects). As aforementioned, this is more prominent at higher frequencies, typically above 2 kHz, because low-frequency wavelengths are substantially larger than the head dimensions and therefore do not “see” an obstacle. This allows the listener to focus resources on the ear with the better SNR, regardless of the target or masker's location, thus improving speech intelligibility in noise.[20] [21] This auditory phenomenon is called “better-ear glimpsing,” and it helps speech intelligibility in noise by utilizing ILDs.[22] The benefit provided by better-ear-glimpsing is limited in hearing-impaired listeners by reduced audibility at high frequencies. In the same study,[22] it was shown that artificially enhancing ILDs at low and mid frequencies can help hearing-impaired listeners understand speech in noise, but the achieved benefit is smaller than in normal-hearing listeners.


#

Binaural Redundancy

Binaural redundancy is the advantage obtained when identical information about the signal is received in both ears. When listening with one instead of two ears, there is only one opportunity for the auditory system to capture the available information in a signal. Binaural redundancy describes a process whereby the brain has two “looks” at each sound.[23] This process is particularly relevant for listeners with asymmetrical hearing losses because the auditory cues available in a signal may be more readily accessible for one ear than for the other. Another aspect of binaural redundancy is that normal-hearing listeners experience an increased perception of loudness (a diotic sound is ∼1.5 times as loud as the same sound presented monaurally at the same level), a phenomenon known as binaural loudness summation.[24] Binaural loudness summation can be advantageous for the perception of sounds whose level is close to the hearing threshold. Hawkins et al[25] showed that hearing-impaired listeners demonstrate binaural summation to the same extent as normal-hearing listeners. This finding suggests that a listener with bilaterally symmetrical sensorineural hearing loss may benefit from binaural loudness summation. Thus, binaural loudness summation is accounted for in generic hearing aid prescriptions, like NAL-NL2 and DSL, and in proprietary fitting rules from different manufacturers.


#
#

Binaural Signal Processing in Hearing Aids

As discussed in the previous section, it is evident that hearing aids should preserve—and potentially enhance—the benefits associated with two-eared listening. This section will discuss aspects related to bilaterally fitted hearing aids and binaural signal processing methods.

Recent studies have shown several advantages of bilateral compared with unilateral hearing aid fitting, such as better speech intelligibility in quiet[26] and in noisy listening environments,[27] and better performance on both objective[28] and subjective[29] indices of sound localization. Given the potential benefits of using two hearing aids compared with one, hearing-impaired individuals strongly prefer bilateral rather than unilateral hearing aid fittings.[30]

Binaural hearing implies a synergistic exchange of information from the left and right ears in the central auditory system. While bilateral fittings, which entail independent processing in the left and right hearing aids, may promote activation of binaural hearing mechanisms, they do not guarantee it. Similarly, to achieve binaural hearing aid fittings, some data exchange is required between the left and right hearing aids. The rate and amount of data exchange impact the power consumption of the hearing aids and possibly the hearing aid size due to the need for a bigger battery and/or wireless antenna. Therefore, binaural processing might not be available in all form factors (e.g., completely-in-the-canal hearing aids). Current technology for wireless binaural data exchange uses either near-field magnetic induction (NFMI) or 2.4 GHz wireless technology. Both approaches are robust and reliable. The NFMI approach can be optimized for low power consumption, but it is restricted in transmission bandwidth and increases the design complexity and size of the hearing aids. In contrast, 2.4 GHz technology offers more bandwidth and reduces design complexity because it can integrate with standard Bluetooth wireless transmission protocols using a single antenna. In the case of full-audio exchange, the overall system delay is usually increased (depending on the details of the chosen wireless technology), which is known to degrade sound quality[31] and may be detrimental for localization in the vertical dimension.[32] Therefore, the activation of binaural signal processing should be strategically adjusted to the targeted perceptual benefits for the individual hearing aid user, which will depend on their auditory needs and residual hearing capabilities. The chosen acoustic coupling and the listening environment also play a significant role in the achievable real-world benefit. The following section will provide an overview of binaural signal processing methods that are relevant to modern hearing aids.

True Binaural Processing Versus Binaural Synchronization

When referring to binaural processing in hearing aids, two approaches can be followed.

  1. Monaural processing with binaural synchronization ([Fig. 5], left)

Zoom Image
Figure 5 (Left) Monaural processing with binaural synchronization. (Right) True binaural processing. P: parameters of left (L) and right (R) algorithm. Dashed lines show wireless data exchange. Thicker lines indicate full-bandwidth audio data, while the thinner lines indicate parameter exchange only.

Each hearing aid processes the audio signals received from its own microphones (solid lines), then exchanges information (i.e., parameter data) with the other hearing aid (dashed lines) to synchronize filter parameters or program settings, for example. Depending on the rate and amount of data exchange between the hearing aids, binaural synchronization can offer substantial advantages for the user. The most basic synchronization is “event-triggered.” For example, changing the volume control on one hearing aid can simultaneously change the volume control on the other hearing aid. Likewise, information about the classification of the listening environment (see the article by Hayes in this issue for more information about this topic) can be exchanged between the hearing aids. Using this information, the joint operation of the two hearing aids can be optimized to maximize the user's benefit. For example, in an asymmetric listening environment with loud machine noise on one side and a conversation partner on the opposite side, the appropriate program settings in both hearing aids can be selected to match the user's residual hearing capabilities and their current physical activity.

With higher investments in the data exchange rate, data volume, and power consumption, a higher degree of synchronized binaural system behavior can be achieved. Whether the high-rate synchronization of a specific signal-processing algorithm (e.g., noise canceller, beam-former, gain model, and limiting system) is beneficial for a user will depend on the details of its implementation and parametrization. For example, while increasing the effectiveness of a noise-canceling algorithm might improve the SNR, this will likely degrade binaural ITD/ILD cues if the algorithm is implemented independently in the two hearing aids. Therefore, high-rate synchronization becomes essential to achieve an overall benefit for the user. The volume and rate of data exchange needed to preserve ITD/ILD information in any given moment (e.g., during head movements) is high and tends to reduce the noise-canceling algorithm's effectiveness. Hence, an over-emphasis on synchronization might reduce the SNR improvement and user benefit that would otherwise be obtained with independent processing. Therefore, in any given listening environment, a user's need for binaural cue preservation must be balanced with their need for SNR improvement to optimize overall user benefit. These specific needs will depend on the residual hearing capabilities and the physical activity of the user.

  1. “True” binaural processing ([Fig. 5], right)

It is meaningful to distinguish between “binaural processing” and the earlier-described “binaural synchronization” because these two technologies offer different degrees of freedom to select the optimal signal processing strategy for the individual user in a specific listening environment. Neither term is formally defined, but both are used interchangeably in the hearing aid community, so the reader must decide on the appropriateness of the solution for the intended use. For the remainder of this article, “binaural processing” is used to indicate that each hearing aid uses the information provided by at least one full bandwidth audio signal from the contralateral side. This maintains all of the processing options and benefits described for the binaurally synchronized systems described earlier. It also allows for signal processing algorithms that operate on the microphone signals from both sides, which, as described later, can lead to the theoretical maximum SNR benefit.

As indicated earlier, the listening environment (e.g., the spatial distribution of the sources, the amount of reverberation), the users' residual hearing capabilities, and the acoustic coupling will affect how the user prioritizes the following desired outcomes: (1) improvement in SNR, (2) preservation of binaural cues, and (3) reduction of the reverberant parts of the signal. In open acoustic couplings (typically used for mild to moderate hearing loss), a binaural processing algorithm's maximum benefit cannot be realized because the low-frequency signals processed by the hearing aid are masked by the direct sound going through the open ear canal. On the other hand, the ITD cues that are important for localization and source separation are preserved by the direct sound and are not affected by the hearing aid signal processing.


#

Beamforming

This section outlines the binaural and monaural multichannel (i.e., multi-microphone) processing method known as “beamforming.” Beamforming uses spatial information gathered from the microphone arrays at the hearing aids to significantly increase the sensitivity to one direction and reduce the sensitivity to all other directions, thereby forming a virtual “beam.”[33] Modern hearing aids are capable of pointing the region of highest sensitivity not only toward the front but also toward the sides and back of the user. Furthermore, the region of reduced sensitivity can be adaptively changed over time to maximally suppress a single noise source in the desired direction. However, for most listening environments, the conversation partner is in the front hemisphere, and nonattended sources are in the back hemisphere. Therefore, a nonadapting (fixed) beam directed toward the front is typically selected in the hearing aids to allow the user to focus on the source of interest by pointing their head in the preferred direction. To allow the user to follow sources in the periphery of the visual field, it is desirable to preserve the natural sensitivity toward the sides. However, noise and other sources that are mainly in the back hemisphere should be attenuated to improve the SNR of the attended speaker in the front hemisphere.

For ease of discussion, this section only considers listening environments where the target is in front of the user. In general, beamforming algorithms provide significant SNR and intelligibility improvements, but as mentioned, the nature of the spatial processing also runs the risk of distorting spatial (binaural) cues. This is especially true if the beam is steered to the sides or if the binaural beamforming is at its maximum. Hence, optimizing the tradeoff between maximal SNR benefit and binaural cue preservation (for natural spatial perception) is essential when parameterizing the beamforming algorithms in hearing aids. This tradeoff is also dependent on individual user preferences and auditory processing capabilities; therefore, the user's adjustment of this tradeoff can be advantageous and can be offered by a mobile phone app.

Monaural Beamforming

When monaural beamforming is applied in a hearing aid, the processing of the signals obtained from the front and back microphone is independent of the opposite hearing aid. Typically, the target directivity pattern in this case is a front cardioid or hyper-cardioid, as can be seen in the polar pattern measured on KEMAR in [Fig. 6i]. With respect to spatial perception, these directivity patterns introduce only minor binaural cue distortions. To imitate the natural characteristics of head acoustics, the “head shadow” effect is preserved, which shifts the peak hearing sensitivity 30 degrees away from the midline. Also, ITDs are mostly preserved. The largest source of spatial distortion is introduced by the microphone positions on the behind-the-ear and receiver-in-the-canal hearing aids because they lead to a complete loss of the user's own pinna acoustic filtering properties. In summary, monaural beamformers (with fixed directionality) can improve SNR and speech intelligibility while preserving most binaural cues. In contrast, binaural cues are distorted by algorithms that adaptively optimize the beam pattern in each hearing aid, independent of the other, to maximally increase the SNR.

Zoom Image
Figure 6 Measured polar patterns on KEMAR of the (i) monaural, (ii) hybrid, and (iii) maximum binaural beamforming. The fading between monaural (i) and maximum binaural directivity (iii) adjusts the tradeoff between signal-to-noise ratio benefit and binaural cue preservation (natural spatial perception).

#

Binaural Beamforming

In the case of binaural beamforming, the processing combines the signals from up to four microphones from both sides using a full-audio exchange via a wireless link. By employing the “binaural microphone array” consisting of the four microphones (two on each left and right hearing aid), a combination of small and large microphone distances becomes available. This allows for more selective beam shapes and steering than monaural beamforming, resulting in an even higher SNR improvement. When binaural beamforming is applied, binaural cues are typically affected, and a complete loss of binaural cues can occur in the most extreme cases (see [Fig. 6iii]) where the maximal possible SNR improvement for diffuse noise is also achieved. In these extreme cases, binaural redundancy is maximized at the cost of spatial perception. The tradeoff between SNR improvement and binaural cue preservation can be controlled by configuring the binaural beamformer parameters accordingly. In addition, there are some design methods for binaural beamforming[34] that maximize the achievable SNR improvement with constraints chosen to partially preserve binaural cues. In particular, maintaining the original ITDs in the low frequencies is an efficient way to preserve localization abilities because ITD is dominant over ILD in case of contradictory binaural cues.[35] [36] This is achieved using a vented mold and/or applying the binaural beamformer only in the mid and high frequencies (see the article by Jespersen et al[33] in this issue for more information about this method).


#

Hybrid Beamforming: Optimizing Individual Tradeoffs in Hearing Aids

The “hybrid beamforming” approach allows for a continuous fading between pure monaural and binaural directivity patterns simply by appropriate parametrization (see [Fig. 6ii]). Thus, the tradeoff between SNR improvement and binaural cue preservation can be adjusted for the individual user needs and preferences. The directivity and the respective tradeoff can be adjusted differently per frequency band, for example, toward better binaural cue preservation at the lower frequencies, which are most relevant to preserve ITDs. The desirable point of tradeoff depends on the following:

  • The current listening environment.

  • The acoustic coupling; the openness of which will affect the dominance of direct versus aided sound

  • The listening intention of the user (e.g., focusing on a single communication target; focusing on a group of conversation partners; comfortably listening to the conversations of others without strong personal involvement; being aware but mainly avoiding listening effort and potential fatigue).

  • The severity of hearing loss and remaining binaural processing capabilities of the user's auditory system.

  • The personal preferences of the user.

In all hearing aids, a reasonable default setting for this tradeoff is chosen. In Phonak hearing aids, for example, the “StereoZoom” (see [Fig. 6ii]) is activated by default only in the “Speech In Loud Noise” program when the overall level is relatively high and a maximum SNR improvement is desired. Nevertheless, it is best for the user or the clinician to have more control over this tradeoff. Smartphone apps or sophisticated remote control units allow users to modify the beamforming parameters to their individual needs whenever they experience a challenging listening environment.


#

Studies on Binaural Beamforming Processing Benefits

This section is an overview of results obtained from recent studies on the benefits of binaural processing with respect to beamforming. The BILD is typically used to quantify binaural processing benefit.[7] It is used to measure the improvement (or lack thereof) in speech-in-noise reception due to binaural processing. As described by Neher et al,[37] the BILD was measured in a headphone experiment using virtual acoustics while giving each listener precise control over stimulus audibility and stimulus presentation (binaural or monaural).

In the study by Neher et al,[37] hearing-impaired listeners differed remarkably in their benefit from directional processing (or beamforming) algorithms in relation to their BILD. The same virtual acoustics technique combined with a simulation of a linked pair of completely occluding behind-the-ear hearing aids (master hearing aids[38]) was used. Five processing schemes with a different tradeoff between SNR improvement and binaural cue preservation were investigated. [Fig. 7] shows corresponding polar patterns (of the left hearing aid using the free-field head-related impulse responses[39]):

  • “pinna”: modest degree of directivity above approximately 1 kHz, binaural cue preservation across the entire frequency range (dichotic). This processing mimics the natural directivity of an average pinna and serves as the reference condition.

  • “beamfull”: designed to maximize the SNR improvement, no binaural cue preservation across the entire frequency range (diotic).

  • “beam > 0.8k”: corresponds to a dichotic signal (‘pinna’) below 0.8 kHz and a diotic signal (“beamfull”) above 0.8 kHz.

  • “beam < 2k”: corresponds to a dichotic signal (“pinna”) above 2 kHz and a diotic signal (“beamfull”) below 2 kHz.

  • “beambetter”: corresponds to “beamfull” but only the better ear is aided (as for biCROS devices).

Zoom Image
Figure 7 Polar patterns of the pinna (left ear), “beamfull” (both ears), “beam > 0.8k” (left ear), and “beam < 2k” (left ear) settings calculated in octave bands with center frequencies of 125, 250, 500, 1,000, 2,000, 4,000, and 8,000 Hz (see colors) using the free-field head-related impulse responses.[39] The azimuth is in degrees and the gain is in decibels. (Figure taken from Neher et al[37]).

The performance of the five processing schemes was investigated in three acoustic scenes with the target speaker always in the front (0 degrees) along with one of two masker scenarios: two maskers positioned laterally at ± 60 degrees of azimuth or spatially diffuse noise typical for a moderately busy cafeteria. For the lateral masker scenario, two types of maskers (informational and energetic) were used:

  • “olsa60”: to represent a scenario with energetic and informational masking, the ± 60-degree maskers consisted of competing sentences. The masking sentences were taken from the same matrix-speech test corpus (German OLSA) as the target sentences.

  • “ists60”: to represent a scenario with energetic masking only, the ± 60-degree maskers consisted of ISTS noise created from the international matrix-sentence test corpus with similarity to multitalker speech babble.

The results of the speech recognition threshold (SRT) measurement as a function of the BILD for the different hearing aid beamforming approaches are shown in [Fig. 8]. A significant interaction between BILD, processing scheme, and acoustic scene was found. There was a clear influence of the BILD on SRT with bilateral directional processing. In acoustic scenes with lateral maskers (“olsa60” and “ists60”), listeners with BILDs greater than approximately 2 dB benefited more from the availability of low-frequency binaural cues (“pinna” and “beam > 0.8k”) than from low-frequency SNR improvement (“beamfull,” “beam < 2k,” “beambetter”). For users with smaller BILDs, the opposite was true. Informational content in the masker (“olsa60” vs. “ists60”) did not affect these findings. In spatially diffuse noise (“cafnois”), processing schemes that maximized the SNR improvement provided the greatest benefit, regardless of BILD status.

Zoom Image
Figure 8 Scatter plots of the binaural intelligibility level difference (BILD) and speech recognition threshold (SRT) measures for the symmetric group. Left: olsa60 scenario; Middle: ists60 scenario; Right: cafnois scenario. Least-square regression lines corresponding to the pinna (long-dashed black line, unfilled black diamonds), beamfull (short-dashed red line, unfilled red circles), beam > 0.8k (double green line, filled green diamonds), beam < 2k (solid purple line, filled purple circles), and beambetter (dotted yellow line, filled yellow triangles) settings are also shown. (Figure taken from Neher et al[37]).

In 2020, another study[40] was conducted to investigate the extent to which the results of Neher et al[37] could be transferred to directional processing strategies used in commercially available hearing aids. Thereby, real acoustical coupling, real listening environments, and the possibility of head movements were relevant. Apart from speech reception in noise, this study investigated how the binaural cue preservation associated with different directional processing schemes affected spatial awareness. In this study, five beamforming settings with different values of directivity index (DI) were tested:

  • Real ear sound (RES): a commercially available beamformer setting simulating the pinna effect[41] of the outer ear with a small degree of directivity (mean DI: −1.0 dB > 1 kHz).

  • UltraZoom (UZ): a commercially available monaural beamformer setting[41] providing SNR improvement over the whole frequency range (mean DI: 2.3 dB).

  • StereoZoom (SZ): a commercially available binaural beamformer setting[41] providing SNR improvement at frequencies < 2 kHz (mean DI: 4.7 dB; see also [Fig. 6ii]).

  • StereoZoom-inv (SZ-inv): an experimental beamformer setting based on SZ that provides SNR improvement > 800 Hz (mean DI: 4.2 dB).

  • FullBeam (FB): an experimental beamformer setting based on SZ that provides SNR improvement over the whole frequency range (mean DI: 4.9 dB).

For the polar patterns, the reader is referred to the publication.[40] The results of this study for the SRT as a function of the BILD and beamforming settings are shown in [Fig. 9].

Zoom Image
Figure 9 Scatterplot of the binaural intelligibility level difference (BILD) and speech recognition threshold (SRT) data. Left panel: diffuse interferer; right panel: lateral interferer. Least-square regression lines corresponding to real ear sound (black solid line, filled black circles), UltraZoom (dashed black line, unfilled black circles), StereoZoom (dotted black line, unfilled black squares), StereoZoom-inv (solid gray line, unfilled gray circles), and FullBeam (dotted gray line, filled gray circles). SRT is signal-to-noise ratio (SNR) in decibel (dB). (Figure taken from Latzel et al[40]).

The analysis revealed that speech intelligibility in noise depends on binaural hearing abilities, masker scenario, and beamformer conditions. Users with poor binaural hearing abilities (lower BILDs) had worse speech perception in noise (higher SRTs) than users with good binaural hearing abilities. An interaction effect between the masker scenario and beamformer was demonstrated, but there was no interaction effect between binaural hearing abilities and the beamformer condition. A post hoc analysis revealed that the commercially available beamformer SZ (dotted black line, unfilled black squares) outperformed all other beamformers, independent of the masking scenario and binaural hearing abilities. This shows that an approach like the SZ, which is a good compromise between preserving and ignoring binaural cues, provides the best speech intelligibility regardless of the user's binaural hearing abilities and the noise scenario.

This study could only partly replicate the results of Neher et al,[37] a study that included the same participants. This may be due to some differences in the setup of the study, such as allowing for real head movements, real listening environments, and real acoustic couplings that preserved low-frequency ITD cues via the direct sound passing through the vent. Additionally, the algorithms differed slightly from the study,[37] as the systems used here were already fine-tuned to be effective under real-world conditions. The study's findings provide a basis for setting beamformer parameters in commercial hearing aids to match the noise situation and the user's binaural hearing abilities. Thus, speech intelligibility and spatial awareness perception should both be considered when adjusting the beamformer.


#
#

Dereverberation

Dereverberation algorithms are used to reduce the undesired effects of reverberation. In modern hearing aids, both multichannel (also binaural) processing and single-channel algorithms are employed.

Some degree of dereverberation can be achieved by multichannel beamforming algorithms (as described earlier), as attenuating some sound sources from nontarget directions also attenuates some of the distinct room reflections. Depending on the optimization criteria (e.g., SNR optimized for a diffuse field), especially for the binaural beamformer, the priority can be set on reducing the (late) reverberant signal parts.

Many hearing aids have single-channel reverberation canceling algorithms, in cascade with the beamformer, that detect and attenuate the reverberation tails in each frequency band. In general, the most effective dereverberation algorithms are multichannel (and binaural) algorithms. Some of them are in a research state and not yet available for implementation in hearing aids. An overview of specific binaural dereverberation algorithms can be found in the study by Tsilfidis et al.[42] In the dereverberation algorithm proposed by Westermann et al,[43] the short-term interaural coherence (IC) is computed between the left and right hearing aid microphone signals. A nonlinear sigmoid mapping function associates some gain values to the computed IC. The mapping function parameters are updated in real-time and are based on the time-frequency behavior of the IC. Other approaches include the work of Braun et al and Marquardt et al,[44] [45] where they considered a time-varying diffuse sound field and resorted to a spherical model of the head to determine the optimal Wiener filter for dereverberation in binaural hearing aids (see the article by Andersen et al in this issue for a description of Wiener filters). In Schwartz et al,[46] dereverberation was reduced by introducing a recursive expectation-maximization algorithm. They theoretically demonstrated that a dereverberation algorithm could be developed to give a user direct control over the tradeoff between dereverberation performance and binaural cues preservation. In the study by Delcroix et al,[47] a linear prediction-based algorithm is described, which achieves a high degree of dereverberation performance but at the cost of additional signal delay.


#

Denoising

Noise reduction can be achieved with multichannel beamforming algorithms in hearing aids by exploiting the spatial separation between the target and noise sources (see the article by Andersen et al in this issue for a more thorough description of this process). However, modern hearing aids typically have single-channel noise reduction algorithms that estimate the short-term SNR in frequency bands and attenuate the bands with lower SNR.

A binaural approach to perform SNR improvement is proposed by Weile and Littau[48] and already is implemented in certain commercial hearing aids. In listening environments where one of the two hearing aids is exposed to more noisy conditions than the other, as detected by SNR estimation on both sides, a higher gain is applied to the hearing aid with the better SNR. Despite the subsequent distortion of the original ILD cue, this approach may be appreciated most by individuals who prioritize noise reduction. As depicted in [Fig. 10], the principle behind this method requires an exchange of the estimated SNR between both hearing aids.

Zoom Image
Figure 10 Principle of a better ear-based noise reduction system. In this example, the estimated signal-to-noise ratio (SNR) is higher in the left hearing aid; therefore, more gain is applied to the signal on this hearing aid compared with the right hearing aid.

The earlier-described method uses the SNR estimates in both hearing aids, exchanges these data between the two hearing aids, and then jointly adjusts the gain in both hearing aids. That is, it belongs to the binaural synchronization category described earlier. However, a similar overall effect can be achieved by two hearing aids operating independently of each other. For example, less gain for noise-dominated mixtures and more gain for speech-dominated mixtures can be achieved if the respective sound classes (programs) in each hearing aid allow for a mixed-mode operation (i.e., a seamless blend between parameters that have been optimized for speech-in-noise or comfort-in-noise).


#

Amplification

To preserve the original input ILD at the output of their hearing aids, certain manufacturers perform a binaural exchange of the amplification parameters. In the study by Weile and Littau,[48] a system is described that ensures the outputs of both hearing aids preserve the original loudness difference that would be perceived at the input. It does this via continuous communication of sound level data between the left and right hearing aids. This processing can be performed in sub-bands, as shown in [Fig. 11]. This approach was shown to reduce the rate of front/back confusions for a broadband sound by 10.5% among a group of 12 moderate-to-severe hearing-impaired listeners.[49] However, no improvement was found in the left/right localization dimension.

Zoom Image
Figure 11 Principle of a binaural amplification that preserves the original interaural level difference (ILD) in sub-bands in a pair of hearing aids.

Another approach presented by Groth[50] attempts to counteract the ILD distortion that occurs when the hearing aid that is exposed to the more intense sound level processes its signals with a stronger compression ratio than the hearing aid on the opposite side. To avoid such a distortion, the gain in the hearing aid with the least intense input signal is reduced to restore the original ILD. This method is said to emulate the auditory efferent pathway's inhibitory effects, whereby sound in one ear inhibits the outer hair cell activity in the opposite ear.[51] [52] In the article by Groth,[50] the results of a localization experiment conducted in laboratory conditions on 10 participants are reported. The aforementioned processing reduced the average localization error in the horizontal plane by approximately 5 degrees. Further investigation of the performance details is needed to understand the extent to which the short-term ILD cues can be preserved by the synchronized operation of the two hearing aids. In addition, more information is needed to understand the user requirements that allow them to seamlessly integrate these cues in the perceptual domain (preserved physical ILD cues or loudness-based ILD restoration). One direction to achieve a better/adjustable compromise between these extremes lies in more sophisticated monaural and binaural amplification schemes (e.g., optimized gain settings to preserve ILD cues during the parts of the signal that carry the primary information about the sound's location, as with the precedence effect).


#
#

Conclusions and Outlook

Binaural signal processing in hearing aids is a fascinating technological opportunity and has found several applications with proven audiological benefit for hearing-impaired listeners. The solution portfolio ranges from relatively easy-to-implement considerations, such as binaural instead of monaural fittings, to “event-driven” synchronization of the operating modes in the two hearing aids, to high-rate data exchange for synchronization of short-term gain values (ILD preservation or restoration) and full-band audio signal exchange of one or more microphone signals.

Scientific evidence for the beneficial effects of binaural signal processing is often provided for specific behavioral tasks in controlled environments (laboratories) where clear benefits have been shown for speech reception threshold, front/back confusion, and localization accuracy. Evaluation of real-world performance in daily wearing conditions may reveal additional aspects that are more challenging to characterize with numerical values or laboratory environments: listening effort, naturalness, seamless integration of the available (multimodal) information into one percept, etc. These aspects can play a critical role in identifying a user's preference for a particular solution in daily wearing situations. Individualized binaural processing strategies, matched to the individual residual binaural hearing capabilities, were shown to have clear benefits under laboratory conditions, but similar settings did not necessarily result in a preference in daily wearing conditions.

Collecting data about the listening environment, including the frequency and duration when the user is exposed to situations where binaural signal processing can be beneficial, will allow better fine-tuning and individualization of hearing aids to the user's needs (see the article by Hayes in this issue for a description of a hearing aid feature that collects this sort of data and for a study that reports on the real-world listening environments of a large group of users). However, this also requires a more in-depth and extensive conversation with the hearing aid user about their real-world experience and potential shortcomings, which will not always be possible. With today's increasing mobile support capabilities, a fruitful alternative approach might be to leave the situation-specific adjustment of the degree of binaural processing to the user. For example, it is conceivable that the machine-learning approach for user-selected gain adjustments described in the article by Balling et al in this issue could be applied to binaural processing.

For future algorithmic solutions to enhance or separate a speech signal(s) from other sound sources (e.g., deep neural network-based solutions), the demand for high-rate synchronization or full-audio exchange between the hearing aids will increase. However, decisions made to achieve these goals will have to balance the current tradeoff between noise-canceling effectiveness and binaural cue distortion.


#
#

Conflict of Interest

None declared.

Acknowledgments

The authors would like to thank Stina Wargert and Stefan Launer for their contributions to this publication.


Address for correspondence

Peter Derleth, Ph.D., R&D
Sonova AG, Laubisrutistrasse 28
8712 Staefa
Switzerland   

Publication History

Publication Date:
24 September 2021 (online)

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA


Zoom Image
Figure 1 Interaural differences of time and level impinging on an ideal spherical head from a distant source. An interaural time delay (interaural time difference [ITD]) is produced because it takes longer for the signal to reach the more distant ear (left ear in this case): left side of figure. An interaural level difference (ILD) is produced because the head blocks some of the energy that would have reached the far (left) ear, especially at higher frequencies: right side of figure. (Figure adapted from Grothe et al.[10])
Zoom Image
Figure 2 Interaural time differences (ITDs) in the low frequencies (red) can be used for localization, but ITDs for higher frequencies might be ambiguous (blue). This figure is similar to [Fig. 1]; however, the frequency dependency for ITDs can be seen.
Zoom Image
Figure 3 Left panel: Low-frequency sound waves are not affected by the presence of the head (red), while high frequencies (blue) are attenuated by the head shadow. The red curves correspond to the curves with the solid line shown in [Fig. 1] (left) and the blue curves correspond to the curves with the dashed line in [Fig. 1] (right). Right panel: Frequency dependency of the interaural level difference for sound arriving from the right side (contralateral, 90 degrees) minus the left side (ipsilateral, 270 degrees) recorded on KEMAR. Especially at frequencies above 1 kHz there is a significant attenuation of 10 to 20 dB.
Zoom Image
Figure 4 Dynamic interaural time differences and interaural level differences associated with movement of the head help resolve confusion regarding the front/back hemifield of a sound source.
Zoom Image
Figure 5 (Left) Monaural processing with binaural synchronization. (Right) True binaural processing. P: parameters of left (L) and right (R) algorithm. Dashed lines show wireless data exchange. Thicker lines indicate full-bandwidth audio data, while the thinner lines indicate parameter exchange only.
Zoom Image
Figure 6 Measured polar patterns on KEMAR of the (i) monaural, (ii) hybrid, and (iii) maximum binaural beamforming. The fading between monaural (i) and maximum binaural directivity (iii) adjusts the tradeoff between signal-to-noise ratio benefit and binaural cue preservation (natural spatial perception).
Zoom Image
Figure 7 Polar patterns of the pinna (left ear), “beamfull” (both ears), “beam > 0.8k” (left ear), and “beam < 2k” (left ear) settings calculated in octave bands with center frequencies of 125, 250, 500, 1,000, 2,000, 4,000, and 8,000 Hz (see colors) using the free-field head-related impulse responses.[39] The azimuth is in degrees and the gain is in decibels. (Figure taken from Neher et al[37]).
Zoom Image
Figure 8 Scatter plots of the binaural intelligibility level difference (BILD) and speech recognition threshold (SRT) measures for the symmetric group. Left: olsa60 scenario; Middle: ists60 scenario; Right: cafnois scenario. Least-square regression lines corresponding to the pinna (long-dashed black line, unfilled black diamonds), beamfull (short-dashed red line, unfilled red circles), beam > 0.8k (double green line, filled green diamonds), beam < 2k (solid purple line, filled purple circles), and beambetter (dotted yellow line, filled yellow triangles) settings are also shown. (Figure taken from Neher et al[37]).
Zoom Image
Figure 9 Scatterplot of the binaural intelligibility level difference (BILD) and speech recognition threshold (SRT) data. Left panel: diffuse interferer; right panel: lateral interferer. Least-square regression lines corresponding to real ear sound (black solid line, filled black circles), UltraZoom (dashed black line, unfilled black circles), StereoZoom (dotted black line, unfilled black squares), StereoZoom-inv (solid gray line, unfilled gray circles), and FullBeam (dotted gray line, filled gray circles). SRT is signal-to-noise ratio (SNR) in decibel (dB). (Figure taken from Latzel et al[40]).
Zoom Image
Figure 10 Principle of a better ear-based noise reduction system. In this example, the estimated signal-to-noise ratio (SNR) is higher in the left hearing aid; therefore, more gain is applied to the signal on this hearing aid compared with the right hearing aid.
Zoom Image
Figure 11 Principle of a binaural amplification that preserves the original interaural level difference (ILD) in sub-bands in a pair of hearing aids.