Vignette 5: Guide to locaR data structures


This vignette is intended to give users a better idea regarding the particulars of data organization in locaR, especially when using the localizeMultiple() function.

As described in the other vignettes, using the localizeMultiple() function requires that data be organized in a specific way - namely that recording sessions, or “surveys” must be set up with a given folder structure. The folder structure is as follows:

So, for a given project one might have on their computer a folder pertaining to that project, and within that several different survey folders, each corresponding to a different date and time (if you have different surveys occurring at the same date and time, e.g. at different sites, put those in separate folders). As described in the “Detecting sound sources” vignette, there are several files that are read by locaR for localization. A few that warrant further description are:

Each of these files is a .csv file. The use of csv files was aimed at making each file easily editable in other programs (e.g. Microsoft Excel).

The coordinates file.

The coordinates file is simply a spreadsheet of spatial coordinates for all microphones. Personally, I just have one big master spreadsheet containing the coordinates for all localization projects, with one row per microphone location. I have found that this works best, because I can always be confident that the master spreadsheet contains the most accurate versions of the coordinates (sometimes we have one set of coordinates taken with a handheld GPS, then another more accurate set taken by a survey-grade GPS). Having a master set of coordinates simplifies things, and also means that I only need one coordinates file for all localization projects.

Here’s what a coordinates file looks like:

head(read.csv(system.file('extdata', "Vignette_Coordinates.csv", package = 'locaR')))
#>   Station Zone  Easting Northing Elevation
#> 1    Ex-1   12 371296.0  5934638  738.0433
#> 2    Ex-2   12 371335.9  5934642  738.5402
#> 3    Ex-3   12 371373.9  5934642  738.5402
#> 4    Ex-4   12 371297.5  5934599  731.2959
#> 5    Ex-5   12 371339.0  5934599  733.7477
#> 6    Ex-6   12 371377.5  5934601  737.7347

There are five columns. Station is the unique name for each microphone location. Zone is the UTM zone, which is not used by any aspect of locaR. Easting, Northing, and Elevation are the x-, y-, and z-coordinates, respectively, of each microphone location. There can be other columns (e.g. a “Comments” column, or “Latitude” and “Longitude” columns with decimal degrees), but these will be ignored.

It is imperative that the Easting, Northing, and Elevation coordinates are measured in meters.

The channels file.

The channels file is constructed once per survey, and specifies which microphone to select from each recording unit.

head(read.csv(system.file('extdata', "Vignette_Channels.csv", package = 'locaR')))
#>   Station Channel
#> 1    Ex-1       1
#> 2    Ex-2       1
#> 3    Ex-3       1
#> 4    Ex-4       1
#> 5    Ex-5       1
#> 6    Ex-6       1

It is a very simple, two-column spreadsheet, in which the first column is named Station, and the second is named Channel. The Station column contains location names that must match names in the coordinates file. The Channel column contains 1’s or 2’s specifying the channel to use for localization. Channel 1 is the left channel, Channel 2 is the right channel.

If working with stereo data, and wanting to select the right channel for some units and the left for others, this can be specified in the channels file by editing the desired rows. If working with mono data, the Channels file is irrelevant. If so, simply don’t fill it in with anything (a blank version is created by the setupSurvey() function). That is what was done in the example provided in the “Intro to localizeMultiple()” vignette.

The adjustments file.

Hopefully you will not need to create and use an adjustments file. It is only needed if it is discovered that the file names do not accurately reflect the start time in the file. I have had this happen with Wildlife Acoustics recordings on occasion. In such cases, the file name tends to suggest a certain start time, but the real start time was one second later. My inspections tends to reveal that the file was otherwise synchronized - it was just named wrong. Fortunately, the error tends to be an error of exactly one second; by correcting for that one second difference, the file becomes synchronized and can be used. It’s a peculiar error that the Wildlife Acoustics employees told me is related to a poorly calibrated GPS. Since I ran into this error frequently enough, I added functionality to locaR to deal with it - this seemed preferable to identifying and re-naming files, since that could cause issues (for example if I have backups of those same files, the names will not match).

Although adjustment files are not used in any of the vignettes, here is an example taken from one of my own projects:

head(read.csv(system.file('extdata', "Vignette_Adjustments.csv", package = 'locaR')))
#>                               Filename Difference
#> 1 TDLO-001-261_0+1_20200612$090000.wav          1
#> 2 TDLO-001-260_0+1_20200612$090000.wav          1
#> 3 TDLO-001-262_0+1_20200612$090000.wav          1
#> 4 TDLO-001-263_0+1_20200612$090000.wav          1
#> 5 TDLO-001-190_0+1_20200612$090000.wav          1

The first column Filename gives the original file name. The second column Difference indicates the amount that the filename was incorrect, in seconds. Positive numbers indicate the actual start time occurred after the start time indicated by the file name, and negative numbers indicate the actual start time occurred before that indicated by the file name. In the example data, all file names were exactly 1 second off - all of these files actually started at 9:00:01, but their file names indicate they started at 9:00:00. When an Adjustments file is provided to the setupSurvey() function and other subsequent functions, locaR will automatically add the appropriate amount of white noise to the beginning of the recording (in this case 1 second) to bring the files into alignment prior to localization/visualization.

Again, it’s a peculiar problem, and hopefully not one you ever need to deal with!

The detections file.

The detections file is where information about each sound of interest is entered.

head(read.csv(system.file('extdata', "Vignette_Detections_20200617_090000.csv", package = 'locaR')))
#>   Station1 Station2 Station3 Station4 Station5 Station6 From  To F_Low F_High
#> 1     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  0.8 1.1  2000   6500
#> 2     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  1.9 2.2  2000   6500
#> 3     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  2.8 3.1  2000   6500
#> 4     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  4.2 4.5  2000   6500
#> 5     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  5.0 5.3  2000   6500
#> 6     Ex-8     Ex-5     Ex-6     Ex-9     Ex-4       NA  6.1 6.4  2000   6500
#>   Species Individual Comments
#> 1    REVI          1         
#> 2    REVI          1         
#> 3    REVI          1         
#> 4    REVI          1         
#> 5    REVI          1         
#> 6    REVI          1

Various pieces of information specific to each particular sound are entered. The first six columns (Station1 to Station6) include the Station (i.e. location) names. These must match names provided in the coordinates file. If a column contains NA (or a blank), it will be ignored. Currently, locaR is intended for using up to 6 microphones for localization; adding more is unlikely to boost accuracy, and comes at a computational cost.

The From and To columns contain the start and end times of the sound of interest, in seconds relative to the beginning of the recording session. F_Low and F_High contain the low and high frequency of the sound of interest.

Those are the only columns that are actually used for localization. The Species, Individual and Comments columns are just for record-keeping sake. My preferred approach is to write down the species, the individual of that species starting at 1, and any comments about that sound (e.g. if it is overlapped, if it might be outside the array, etc.).

If a row has no information in any columns except the Comments column, that row will be ignored. This can be useful for record-keeping when a sound source is outside the array.

The settings file.

The settings file is the file that brings everything together in one place. It contains file paths to point towards the relevant data structures described above, as well as other relevant survey-specific information such as the temperature, assumed speed of sound, etc. An example:

read.csv(system.file('extdata', "Ex_20200617_090000_Settings.csv", package = 'locaR'), stringsAsFactors = F)
#>            Setting                                  Value
#> 1   DetectionsFile Ex_20200617_090000_Run1_Detections.csv
#> 2  CoordinatesFile               Vignette_Coordinates.csv
#> 3   SiteWavsFolder                                       
#> 4  AdjustmentsFile                                       
#> 5     ChannelsFile        Ex_20200617_090000_Channels.csv
#> 6             Date                               20200617
#> 7             Time                                  90000
#> 8            tempC                                     15
#> 9       soundSpeed                                       
#> 10    SurveyLength                                      7
#> 11          Margin                                     10
#> 12            Zmin                                     -1
#> 13            Zmax                                     20
#> 14      Resolution                                      1
#> 15          Buffer                                    0.2

The first column contains the name of the setting, and the second column contains the value for that setting. The second column can be manually edited as desired, and this will affect the subsequent localization results.

The first six rows point towards the other files and folders. Date is an integer with 8 digits in the format YYYYMMDD. Time is a time with either 5 or 6 digits, either HHMMSS or HMMSS. tempC is the temperature in degrees celsius. soundSpeed is the speed of sound in meters per second. If soundSpeed is not defined, tempC will be used to define the speed of sound in air. If soundSpeed is defined, tempC is ignored. SurveyLength is the length of the survey in seconds (recordings can be longer than the desired period of time to be surveyed, e.g. if you want to survey the first minute of a recording session). Margin is the amount of space around the outside of the array to search for sound sources, in meters. Zmin and Zmax are the amount, in meters, to search below the lowest microphone and above the highest microphone, respectively. Resolution is the size of each grid cell, in meters along each side, in the search grid. Buffer is the amount of time, in seconds, to extract around each detection. This could be important when localizing very short sounds, because transmission delays could be longer than the duration of the sound, leading the sound to be missed on some microphones if the sound is not buffered.