Vignette 1: Introduction to locaR


Introduction to the locaR package.

Sound localization is a method for pinpointing the location of a sound source in three-dimensional space. It helps answer the question, where did that sound originate from? Answering this question can be valuable in the fields of ecology and behaviour, as knowing an animal’s location can provide important insights into various aspects of their biology. Sound localization has many applications outside of ecology (e.g. gunshot localization); the functions in this package should also work for these other applications, but have only been tested for localizing birds.

The locaR package implements the modified steered response power method of Cobos et al. (2011) to estimate a source location. This localization method relies on the different arrival times of sounds at different locations to estimate the source’s location in three-dimensional space. It is intended to analyze multiple time-synchronized recordings made with an array of widely spaced microphones. (Side note: other types of sound localization methods focus on estimating the direction towards an incoming sound using microphones in very close proximity, i.e. a few centimeters. This package does not do direction-of-arrival localization). Besides the actual localization, this package includes various functions to help manage and organize localization data, to help with detecting sounds of interest in a localization array, and to validate localization estimates.

To localize sound sources with this package, there are 3 basic data requirements:

  1. Recordings must be synchronized.

The synchronization of recordings is fundamental to accurate location estimates, since the localization algorithm estimates source locations based on the time-difference-of-arrival of sounds at different microphones. It is typically desirable to have microphones synchronized within 1 ms of each other. If microphones are not synchronized, the true time-difference-of-arrival cannot be estimated accurately.

  1. Microphone locations must be known (more accurate microphone locations will translate into more accurate source localization).

If microphone locations are not known, the time-difference-of-arrival cannot be accurately translated into source location estimates. Methods for estimating microphone locations include using a tape measure to measure (relative) microphone placement; taking many GPS points and averaging them; and using a survey-grade GPS. The latter option is the best one, as it can estimate microphone locations with an accuracy of a few cm.

  1. Microphones must be placed within earshot of each other.

Of course, if a sound is only audible at one or two microphones, there will not be sufficient information in the time-difference-of-arrival estimates to estimate the source location. Ideally, a signal should reach at least four microphones for localization in the x-y plane, or five microphones for 3D localization. This requirement determines appropriate inter-microphone distance, but there is no universal rule. For example, if localizing gunshots or wolf howls, which transmit over long distances, microphones can be spaced much farther apart than if localizing bird songs, which only transmit short distances. For songbird communities, I have found it best to space microphones by 35-40 meters. However, even within songbirds, different species’ songs transmit very different distances.

Achieving the above data collection requirements can be challenging in practice. At the present time, for example, most commercially available recording units are not capable of producing synchronized recordings. Current models (as of 2022) that are capable of doing this are the Wildlife Acoustics SM3 (with GPS attachment), the Wildlife Acoustics SM4TS (with GPS attachment), and the Frontier Labs BAR-LT. In the future, this list will surely grow. For example, there are currently plans to incorporate GPS synchronization into Audiomoth units. As technology develops, localization should become easier and more accessible.

Developing an intuition for sound localization.

I often say that sound localization is an art as much as a science. The reason for this is that the most accurate source localization estimates are achieved with careful attention to detail and some human involvement. Placing blind trust in the localization algorithm, without any human involvement, can sometimes lead to incorrect location estimates. On the other hand, by developing an intuition for localization, erroneous estimates can be identified and either removed or improved. The locaR package includes some tools for validating localization outputs, so users can decide how much effort to invest in validating results.

Validating results requires human involvement, but can increase data quality; accepting results without validation may increase error, but could dramatically increase data set sizes via increased automation. Regardless of which approach is preferred, it is strongly recommended that sound localization practitioners develop an intuition for localization so they can anticipate when localization is likely to succeed and when or why it may fail.

How localization works.

Given a sound source originating within an array of synchronized microphones, localization algorithms use cross-correlation to estimate the time-difference-of-arrival of the sound at each microphone. Cross-correlation simply involves sliding two signals past one another (along the time dimension) and assessing how similar they are to one another at each time step. When they are more similar to one another, the cross-correlation function reaches a higher value. When cross-correlating the same sound arriving at two different microphones, the peak of the cross-correlation function reveals the amount of delay from one signal to another.

If we have \(k\) microphones, we will be able to calculate \(k*(k-1)/2\) different cross-correlation functions. These cross-correlation functions give an idea of the relative time delay of the signal arriving at each microphone. Note that we can only ever estimate the relative delay of a sound at pairs of microphones, where the nearest microphone to the sound source has delay = 0. Once the cross-correlation functions have been calculated, and incorporating the speed of sound (which is known within a small margin of error), the algorithm can estimate the source location.

The following animation further illustrates the concept: