Thumbnail image: Marky Mutchler
[Sarah Wagner] All right. Well, welcome to today’s webinar from the Cornell Lab of Ornithology. We will be discussing how to identify birds even when you can’t see them with a special focus today on the new Merlin Bird ID app feature Sound ID.
Before we get started, I want to recognize the lands on which Cornell University is located by reading the Land Acknowledgment Statement. Cornell University is located on the traditional homelands of the Gayogoho:no, the Cayuga Nation. The Gayogoho:no are members of the Haudenosaunee Confederacy, an alliance of six sovereign nations with a historic and contemporary presence on this land.
The Confederacy precedes the establishment of Cornell University, New York state, and the United States of America. We acknowledge the painful history of the Gayogoho:no dispossession and honor the ongoing connection of the Gayogoho:no people, past and present, to these lands and waters.
All right. The Cornell Lab of Ornithology is home to a community of researchers– community of researchers and supporters from around the world who appreciate the integral roles they play in our ecosystems. Our mission is to advance leading-edge research, education, and citizen science that helps to solve pressing conservation issues and challenges.
This work, including today’s webinar, is funded primarily by people like you who choose to become members. If you enjoy today’s webinar, I hope you’ll consider becoming a member too by visiting birds.cornell.edu.
My name is Sarah Wagner. I’m on the visitor center team at the Cornell Lab. And I’ll be facilitating today’s conversation with three experts who helped develop the app. First, we have Drew Weber, the Merlin Project leader. Hi, Drew.
[Drew Weber] Hi, everyone. Excited to be here.
[Sarah Wagner] With us today is Grant Van Horn, a research engineer with Macaulay Library. Hey, Grant.
[Grant Van Horn] Hey, guys.
[Sarah Wagner] And our final panelist is Jessie Barry, the Program Manager for Macaulay Library.
[Jessie Barry] Hey. Greetings from Sapsucker Woods.
[Sarah Wagner] Pretty special. Many of us are still at home. [Laughs] Thank you all so much for taking the time out of your very busy schedules to talk about this new feature. This is very exciting.
Before we dive in, I have a few quick announce– hopefully quick announcements to make about how this is going to go today. Many of you are probably well familiar with Zoom at this point in time, but I’ll just go over a few things.
For our Zoom audience, live captioning is provided. Select the Closed Captioning button at the bottom of the Zoom window and turn them either on or off. I’ll ask our panelists a few questions just to get us started today, but then we will also want to hear questions from our live audience.
For those of you on Zoom, click on the Q&A button located at the bottom of your screen and type your questions into that window. We’ll be answering some questions verbally, and for others, we’ll be typing as quickly as we can our answers in there, which you’ll be able to see in the answered columns. You can always go to that column to see all of the things that have already been answered.
Zoom also has a chat window, and we’ll be using the Zoom chat for technical support and to share information and resources with you there. We have colleagues who are behind the scenes, responding to the Zoom Q&A and chat. So thank you, support crew.
We’re also streaming live to Facebook. If you’re watching on the Cornell Lab of Ornithology or eBird Facebook pages, you can s your questions to the comments there. But please be aware that we’ve seen a lot of spam attempts already today in the Facebook comments, so don’t click on anything that is not from the Lab of O. And with that being said, let’s dive in and get started.
So first, I just want to hear a little bit more about our panelists. Drew, Grant, and Jessie, would you each please introduce yourselves a little bit further? What’s your background and your role at the Lab? And how does your role connect to bird sounds? So we’ll start with Drew, and then we’ll go to Grant and then Jessie.
[Drew Weber] Yeah, so I’m Drew. I’m the coordinator for the Merlin Project. I’ve been birding ever since I was a kid. And honestly, it’s kind of a dream to be working at the Cornell Lab on projects like these. I’ve been working on apps and websites that use eBird data for probably over a decade now.
And for the past five years, I’ve been working on the Merlin team, helping coordinate the development of new features. Some of my favorites are Photo ID and being able to save birds directly to your life list for Merlin. And then also working on the global expansion of Merlin.
So we have packs that cover over 8,000 species in the world. So nearly everywhere you want to travel, you can download a bird pack and learn the birds of that area. I’m just really excited about being able to create apps that help people learn about birds and increase people’s appreciation of birds, and totally excited to be here talking to you about Sound ID today.
[Grant Van Horn] Yeah, OK. So I’m Grant Van Horn. I’m a machine learning researcher here at the Lab. So my background is in computer vision. I got started with birds actually really through Jessie, another panelist with us today, working on Merlin Photo ID. So I worked on that project really closely with Jessie and just got super passionate about kind of all things birds.
Along the way, I also have worked with the iNaturalist team. So if folks out there have used iNaturalist or Seek, I’ve also helped build a lot of the computer vision pieces that power those apps. Yeah, I’m basically just really passionate about taking sort of the latest and greatest from machine learning and applying it to the natural world.
[Sarah Wagner] Cool.
[Jessie Barry] Yeah. Hey, I’m Jessie Barry. And like Drew, I found my passion for birds when I was a kid. By the time I was 15, I was just set on creating a career that involved birds and connecting people to nature. So I was really fortunate to land at the University of Washington for my undergraduate years, studying ecology and evolutionary biology.
And through some field work opportunities, I had the chance to go down to Peru and borrow sound recording equipment from Macaulay Library. So there I was, in the heart of the tropics, trying to learn these birds that I couldn’t identify, hadn’t seen them for the first time, out there with a sound recorder. And that’s when I got hooked into the Lab and all these projects.
So 13 years ago, I joined the team at the Macaulay Library, and have just had an amazing opportunity to work with such a talented team. And really had the chance to start working on Merlin before we even had a name for the project. And we’ve had a lot of fun creating these tools for the birding community that are really working to advance bird conservation.
And this connection between engineers like Grant and other computer scientists and statisticians is really where I think this magic happens where we can collaborate to build amazing tools for conservation. So I’m just thrilled to be here at the Lab, and grateful for the wonderful team that I get to work with.
[Sarah Wagner] Fantastic. Thanks, everyone. I’m going to ask you again, Jessie– [Laughs] you have to speak back to back. But before we get into the details of Bird Sound ID, I want to go way back to some basics. So Jessie, can you start off with a brief summary of a very huge topic? As far as we know, why do birds sing?
[Jessie Barry] Yeah. That is a pretty loaded question. But basically, birds are singing to communicate with each other. And they give those songs to attract mates, to establish their territories. And songs are typically given by males, but we’re learning more recently that a lot of female birds also sing. You can also find really cool duets out there.
And in addition to songs, birds give a whole suite of different calls. And those calls have different contexts. So short vocalizations might just be saying something simply like, hey, I’m over here. Those contact calls. There’s also aggressive calls that might be given if an intruder comes towards a nest.
And there’s also alarm calls, which is a really cool suite of vocalizations that the birds are giving to alert the community to danger. So if there’s a predator nearby, you might hear a special alarm call just for that situation.
And these vocalizations are also tied to seasonal events. So in the US and Canada, we’re going to hear more birdsong during the breeding season, because that’s when birds are just really active, finding the mates, and setting up those territories. But as they get busier feeding the chicks, raising young, that’s going to be a point where the song starts to taper off.
And then you get to a low point in the year, kind of like where we are right now where many birds are molting. So they’re replacing their feathers. All birds have to do that once a year. And when they’re molting, they’re going to be quiet. They’re going to be in a spot where they’re less susceptible to be found by a predator.
So at this time of year, you can still hear some dawn chorus, that really early morning activity when the woods is just lighting up. And it’s so exciting. You can still catch that briefly in the morning at this time of year. And that is a magical moment, but also overwhelming to just figure out what is happening at that point.
[Sarah Wagner] Yes, the overwhelm. We will talk about that a bit as we go along. Drew, kind of another basic question, but key to what we’ll talk about today is, why is it helpful to be able to identify a species by their vocalizations?
[Drew Weber] Yeah, that’s a really great question. When I’m not birding, I’m constantly relying on my ears to let me know what is around, even before I see it. Some birds are really easy to see and some tend to hide out high in a canopy or deep in a marsh. And so being able to identify birds both by sight and by song allows you to have a more complete picture of what birds are actually around you.
It also means that you can focus your attention on finding birds that you don’t recognize. So potentially something that’s new for your trip list or new for your life list even. And so it helps you kind of target things that you haven’t heard before.
And it’s also a really easy and fun way to pay attention to the birds when you might be doing some other activity. So maybe you’re gardening or sitting outside at a wedding or something like that. You could totally be appreciating the birds that are around you, even without binoculars or being able to see the bird.
[Sarah Wagner] Right. It’s a new layer of information for your experience. Wonderful. It’s always helpful to sort of revisit those fundamentals before we dig in a bit more. How does birding by ear enhance your own birding, Jessie? What are some tips and tricks that you use?
[Jessie Barry] Yeah. So you know, I think I got started just learning the really common birds nearby. And what’s so fun is as you start to explore, like, OK, American Robin is in the backyard, red-winged blackbirds. And American Robin has maybe a dozen different vocalizations. And so it can be a really great starting point to just start to learn the birds in your neighborhood really well and get that framework down.
And then if you can figure out different families of birds, whether it’s by sight or sound, that’s really key, because you’re basically trying to build your own internal framework for narrowing down that list of possible species. So you’ve got 800 species in North America and you’re trying to get to that one species that you’re hearing at that moment.
So trying to break it down from, hey, all right. I got it’s a songbird, but is it a wren? Is it a warbler? Is it a sparrow? And that point where you kind of get to, OK. I think it’s a sparrow, but I’m not quite sure which one. That’s where the fun and the hard work and the fun work comes in, right?
So if I’m hearing something, I’m like, all right. I know that’s a sparrow, and perhaps I’ve dialed in to it’s a clear whistle. And I can even hear those words from this White-throated Sparrow: “Oh, sweet, Canada, Canada, Canada.” And so one of those tricks can be finding those mnemonics or those words that work for you or whatever it is to just help you remember the species.
And we know that mnemonics, there’s a very creative community out there. And we’d love to hear, dropping in the chat, what are some of those phrases that you might use to help remember birds nearby? Because that’s one tool. And we’re of course now thrilled that Merlin is available to support you in the field, but it’s also really important to learn that framework and have Merlin there as your coach.
[Sarah Wagner] Yeah. These are some really good ones in the chat. The Ovenbird: “teacher, teacher, teacher.” It can sort of overwhelm you if you don’t know that one and there are quite a few in your wooded area. What other ones do we have? Let’s see. Who knows the one for Warbling Vireos? [Laughs] Of course, there’s the “chickadee, dee, dee, dee.” Warbling Vireo is– who wants to say that other than me? [Laughs]
[Drew Weber] That’s all you, Sarah. Go for it.
[Sarah Wagner] All me? I don’t know if I can remember, Drew. Maybe you have to do it. [Laughs]
[Drew Weber] I’ll try to remember which is the right one.
[Sarah Wagner] It’s kind of a traumatic one [Laughs] because they do eat a lot of insects. So “I’ll see it and I’ll seize it and I’ll squeeze it till it squirts.” But yeah, “Phoebe,” that’s a good one. “Chickadee, dee, dee.” This person says that Blue Jays say “dead ant,” which I’ve never heard before. But whatever works for you is the take-home with mnemonics, I think. All right. Let’s see.
So now we’re going to dive in a little bit to thinking about the wall of sound that you can be overwhelmed with that we talked about a little bit earlier. So many of you might be familiar with that overwhelm that accompanies hearing birds singing at the same time. And it becomes really powerful to be able to untangle all of those different vocalizations. It can just be this cacophony of sound. So to try to parse that apart and untangle that can be really powerful. So Jessie is going to share an example soundscape with us.
[Jessie Barry] Yeah, because there’s something magical that happens when we get to an experience a dawn chorus. And as Sarah was saying, there’s that challenge of figuring out, what are we listening to? And you might wish that you had an expert with you.
And that’s really what we’ve tried to do with Merlin, is bake in all the expertise of sound recorders across the world, experienced birders. And with machine learning, we can now kind of put that into the palm of your hand as if that expert is right nearby. So let’s listen to this chorus together.
Very exciting. So a little teaser of the Sound ID app for those of you who haven’t seen it yet or maybe you haven’t practiced it on a soundscape yet. So now we’re going to ask Drew to do a demo for us on how to go about using the Sound ID feature.
[Drew Weber] Great. Yeah. So maybe that was a fun refresher on what some of those birds sound like. Maybe you recognize them all, or maybe they were all new to you. But definitely the goal with Sound ID is to be able to give you the clues in the field while you’re listening to help identify these birds.
So I’m going to go through a couple of the main screens and just talk about them a little bit. It’s been super helpful to have all of your questions come in. So hopefully, I can address a lot of them right now. But keep the questions coming and we’ll see what else we want to address.
So when you start the Sound ID process, kind of the first thing that you’re seeing when you start it is the spectrogram rolling across the screen, right? So trying to animate this here. Just an idea of what the spectrogram is looking like, this is a visual representation of the birdsong. So if you’re not familiar with a spectrogram, the horizontal axis is time and the vertical axis is pitch, and the darkest areas are kind of the loudest part of the birdsong. And so it’s very– a visual representation of those actual signals that are coming from the bird.
And you can kind of read this like sheet music if you’re a singer or have been in band or anything like that. Higher pitches are higher. And yeah, it’s a great way to really understand what you’re hearing. And so on this screen, you’re basically seeing that live representation of the birds.
And then as the Sound ID starts recognizing or identifying some of these birds, these best matches will start showing up on the screen. And you can see here that Wood Thrush and Rose-breasted Grosbeak are both kind of lit up in yellow. And so what the app is doing is every time that it detects one of these species or it’s suggesting one of these species, it’s lighting up again. And so if you’re recording for 20 or 30 seconds and the Wood Thrush sings three times, you will see that light up each of those times that you have that live help of what is singing right now.
And that’s particularly helpful when you have a bunch of birds singing all at once and you’re trying to pick apart, which bird is the Wood Thrush? Which bird is the Rose-breasted Grosbeak? And how can I tell that apart from the American Robin?
So a couple other pieces of data that show up on the screen, you’ll see that there’s a checkmark, this blue checkmark for birds that you’ve observed or already are on your life list. And so those are species that you’ve added to your life list through Merlin or eBird. Rose-breasted Grosbeak is flagged as uncommon. And this is, again, data coming from eBird. So it’s birds that are infrequently observed in your area.
And then there’s also a rare icon that will show up sometimes. And that’s birds that are even less common than the uncommon birds. And so these are a really good indication that you might want to dive in more and really explore what they sound like and where they occur, and that sort of thing.
So when you stop your recording, you’ll actually be able to start exploring these best matches. You can tap on each bird to jump automatically to the spot in the recording where it was identified. So that’s really useful if you’re like, ah, man. It says there was a Black-throated Blue Warbler. I’m not quite sure. Where in the recording was that? Then you can tap on it and go directly to that spot.
You can also click these arrows on the right side of the screen, and that will show you all the reference audio for those species. And so if you’re thinking through that scenario, if you’re not quite sure whether what Merlin said was right, you can play your song, play the song that you recorded, and play the reference audio to see how well those match and get a better idea for the variation of the bird you’re trying to identify.
Each of the species generally has a couple different recordings, so it’s good to explore a couple versions of the songs. And some species are highly variable in the songs that they sing, and others kind of sing the exact same thing every time. So that’s important to explore as well.
Sometimes you’re interested in just the particular song that you recorded, and so you want to focus your identification on just a small segment of the recording. And you can do this in Merlin by long-pressing on the screen and dragging to select a few seconds. And then Merlin will give you an answer– give you suggestions just for that selection.
So that can be really useful. If there are a lot of birds singing and you’re particularly interested in one specific song that you recorded, you can get answers for just that rather than seeing all the best matches for the entire recording.
So you can do a couple additional things with your recording besides just playing it and exploring the results. You can rename the file. So that’s just particularly helpful for finding it again, or if you export it.
You can change the location. So this is most useful for recording imported to Merlin. When you start a new recording, Merlin tries to figure out where in the world you are so that it can give you the best results. But if you import it, it doesn’t have that information.
And then also, if you’re using Merlin offline, you can later match your location so it gives you the best suggestions. Sound ID does work totally offline. The model– the feature is running completely on your phone without internet connection, so it can provide those best matches.
However, to filter and target them for your location, it does need internet connection. So if you’re using Merlin without internet, you can always go back to– when you go back home or have internet again, you can assign the location, and it’ll customize the results for where you were birding. You can also change the date. And from that menu, you can also delete the recording.
And so maybe you made a really cool recording or it’s a bird you couldn’t recognize, or you just want to share it. There’s a couple different things you can do. You can basically, in both iOS and Android, there’s a Share icon. It’s different in Android, but it’s in the same location. You tap on that, and it will give you a variety of different options. So you can send a text message or export it to your File Manager app on your phone. Or if you have any of these cloud sharing apps, you can also use it there.
And so it’s super easy to kind of get the file off of your phone. And possibly, if you want to get it on your computer to edit it and then upload it to your eBird checklist, that’s also an easy possibility as well. All right. I think that was all the main pieces.
[Sarah Wagner] Drew, we’re getting a lot of questions on Facebook and Zoom about how to even get to this step. So–
[Drew Weber] OK.
[Sarah Wagner] If we could just like– maybe like an intro to where to find it within the app?
[Drew Weber] Sure.
[Sarah Wagner] Might need to do–
[Drew Weber] So I don’t have a screenshot for that. But when you first download Merlin– I should have started from the very beginning. Apologies for that. So when you first download Merlin or if you get the update, you’ll basically have four options on your home screen. The first is Start Bird ID. And that’s where you can go through a series of questions to figure out what bird you’re seeing.
The second one is photo ID. And it’ll say Get Photo ID, which you’ll download. You tap that to start the process of getting the model that will identify birds from a photo. And then the third option is Get Sound ID. And so you’ll tap that, and it’ll walk you through downloading the machine learning model that does all the magic behind the scenes.
And so once that’s downloaded, you touch Sound ID, and then there’ll be a button that says Record or a Record button. And so any time you want to identify a bird, you can just go into Merlin, tap Sound ID, and start that recording. And you’ll jump straight to a screen that looks like this. So you have that live spectrogram.
[Sarah Wagner] Great. Beautiful. That’s super helpful. Hopefully that cleared up a lot of that confusion.
[Drew Weber] [Laughs]
[Sarah Wagner] So Drew, you mentioned the magic machine learning behind the scenes here. So now we’re going to ask Grant to explain the machine learning piece as well as he can to us of the Sound ID feature.
[Grant Van Horn] I know. Yeah. I’ll do my best. It’s not too difficult. Share this. All right. So yeah, how does this thing work? We’ll use this Chipping Sparrow as an example to walk through the Sound ID pipeline. So imagine this guy is vocalizing. This blue squiggle is the waveform representation of his song.
This waveform is what your phone captures when you’re recording using the microphone. So the microphone is responsible for converting sound waves into a digital signal. In this example, the blue squiggle is three seconds of audio and is converted into a digital signal that has 66,150 numbers. And they look like this. They’re just kind of a bunch of seemingly random numbers that happen to be between negative 1 and 1.
So the job of the Merlin Sound ID machine learning model is to somehow take these numbers and identify them as a vocalization from a specific species. In this case, hopefully the model will correctly say Chipping Sparrow and not one of these other species.
So we can dive a little deeper into the process of Sound ID. Starting from the audio waveform, we’re going to do a trick to make the problem easier. And this trick is to convert these 66,000 numbers into a picture. So this is a spectrogram representation of the audio waveform, and it’s essentially a picture of a sound. And this is what Drew was showing on the Merlin interface. So it’s useful for both machine and for humans.
And so why does this make the problem easier? Well, once we have a picture, we can use techniques similar to those that power the Photo ID model in Merlin, which can help you recognize over 8,000 bird species and photographs. And those techniques have been steadily refined over the last five or so years.
So our Sound ID model is based on computer vision techniques that analyze the spectrogram and then make a prediction from that. In this case, predicting chipping sparrow.
Now, Sound ID brings its own challenges that are different than Photo ID, with one of them being the challenge of predicting the presence of multiple species at once. So imagine all three of these guys are vocalizing. We capture a waveform which gets converted to a spectrogram, which is then analyzed by our Sound ID model. And now it needs to predict the presence of multiple species as opposed to just one.
So up to this point, I basically described how Sound ID works on your phone, but that’s irrelevant if we can’t teach the model what birds sound like in the first place. And that teaching process relies on a high-quality, diverse training and evaluation data set.
For the initial launch of Sound ID, we focused on birds from the United States and Canada and constructed a data set of over 50,000 audio recordings uploaded by over 5,000 recordists to the Macaulay Library. For those listening that have uploaded audio recordings to Macaulay Library either through your eBird checklist or otherwise, I just want to say thank you. These recordings are crucial for the success of Merlin, and the project wouldn’t have been possible without them.
For every one of those recordings in our data set, a human expert annotates the precise location of each species vocalizing. This is what we’re seeing in this screenshot with all these boxes. And then it’s this data that actually becomes the teaching material for the Sound ID model, which ultimately gets deployed to your phone. And that in a nutshell is how Sound ID works.
[Sarah Wagner] Very cool. Thank you so much, Grant. Great way of explaining it. We did have a few people asking for some publications on this material so they can dig deeper, so we’ve told them to go to your web page. [Laughs]
OK. So let’s talk about a bit more of the applied side. So how do we use Sound ID as another tool in our birding toolkit? So Jessie, do you want to answer that one?
[Jessie Barry] Sure. So you know, I mean, we really think of Merlin as a guide. It’s not the final answer, but it’s a way to get you closer to the list of possible species. So when you’re out in the field, it’s really important to have Merlin there with you, getting those hints.
But if you hear something you don’t know and you’re curious, it’s also great to just go track that bird down and see if you can get a look for yourself, because those birds that I remember most clearly, I really spent time trying to go find them and discover, what are they singing? Why are they there? And that’s what really helps things sink in through time.
And it is pretty fun when you get to that spot in Merlin where all those results come up, and you’re like, whoa. Blackburnian Warbler. I have never seen this before. This is incredible. And you’ve got a chance to peer up into the tops of probably some conifers when it’s singing and see that Blackburnian for the first time. It’s awesome, right?
And that’s another point where you can tap this is my bird and save that species that you just saw to your eBird life list. So we’ve got a really cool connection here between Merlin that’s helping people identify birds.
And eBirds is gathering information from birders across the world and collecting that data. And we’re able to use that data that eBirders contribute and help everyone identify birds through Merlin. And as Grant mentioned, this is a really important system where the sound recordists who are uploading the material to eBird are making this possible so that we can build these tools.
[Sarah Wagner] Absolutely. Yeah, it’s always good to see a bird singing, to commit it to memory. But spectrograms can work in a similar way, that hearing it and seeing it at the same time. So great. Great comment.
[Jessie Barry] Spectrograms are really huge, right? That visual representation can solidify it in your brain, and it’s also, as Grant said, that’s how the computers are figuring it out too. So they’re really becoming our best tool to learn songs.
[Sarah Wagner] Right. A question I’m seeing a bit of is, Drew, is Sound ID sufficient for reporting a bird in eBird? If you don’t see it and you don’t know the song yourself, should you report it if it’s rare or infrequent?
[Drew Weber] That’s a good question. So for folks who aren’t already aware of what eBird is, eBird is a project run by Cornell Lab of Ornithology. It’s one of the largest biodiversity-related science projects with over a billion bird observations.
Bird watchers all over the world submit their sightings of complete checklists. So all the birds they see when they’re out on their birding excursion, along with their photos and audio recordings. And like we’ve mentioned, these sightings power Merlin’s ability to give you likely birds for your area as well as build these tools of Sound ID, Photo ID. So all that’s coming through that system.
So we really like to think of Merlin as your birding buddy. It’s they’re offering you clues, helping push you in the right direction. But it doesn’t do all of the thinking for you, right? It’s definitely there to help guide you to the right decision or the right identification. And the idea is that we have all the tools there to help verify what you heard or what you saw.
And really, with Sound ID, as you’re learning, it’s best to kind of follow up and confirm things visually. If you’ve never heard that song before or it’s maybe something that doesn’t quite match like you would expect, following up and actually seeing this thing like Jessie said, definitely the greatest way that I learned, the easiest way for me to learn when I was starting with my Sound ID journey.
It’s really important to be cautious. But as you get more comfortable identifying birds by sound, your repertoire of what birds you can identify and your kind of knowledge of the variation that each bird has will increase. Over time, you’ll get more confident, being able to take a bird that you just heard and add it to your list. But generally, it’s good to be very cautious as you’re starting.
[Sarah Wagner] Yeah. Speaking of caution and sort of the variability in birdsong and regional differences, Grant, you’ve been working really closely with this topic. What are some challenges for the Sound ID model?
[Grant Van Horn] Yeah. Let’s see. So yeah, so I can run through a few of these challenges that have come up. OK. So one challenge is just birds that sound very similar, right? With Red-eyed Vireo and Philadelphia Vireo being a really tough pair.
So even though we convert audio to spectrograms, that doesn’t necessarily make the task easy for similar-sounding species. I’ll just show some example spectrograms for these two birds. And you guys can try to find a visual pattern that really helps to distinguish them. I find these spectrograms very similar, and we’re still refining the Sound ID model to handle this situation appropriately, either by preferably making the correct suggestion or communicating back to the user that Merlin is not sure between the two.
Sort of another interesting challenge is mimics. So here, we see a spectrogram with a Blue Jay vocalization on the left and a Broad-winged Hawk on the right. In this case, the Blue Jay was imitating a hawk with its vocalization. And this phenomenon is actually captured in this plot, which attempts to visualize Merlin’s skill at distinguishing Blue Jay vocalizations from Broad-winged vocalizations.
So each dot in this plot represents a spectrogram with blue dots representing Blue Jays and red dots representing Broad-wings. We do see two distinct clusters here, meaning that Merlin has successfully learned to distinguish these species. But we also see some Broad-winged spectrograms incorrectly grouped with the Blue Jays. And these occur either because there’s actually both species vocalizing in that clip or because of mimic situations like you see at the top.
And we can actually generate a similar figure for all of the United States and Canada birds that Merlin currently knows about. So again, each dot in this plot is a spectrogram. And the cluster that you’re seeing represents Merlin’s ability to group different bird species together.
So the Golden-winged Warbler cluster is over here, shown in red. The Whimbrel cluster is down here. The Northern Bobwhite cluster is up here. And then here we have Northern Mockingbird. And we can see kind of another challenge for the Merlin Sound ID project, which is the task of learning diverse repertoires for species that have a lot to say, so to speak. So we can see the Northern Mockingbird examples mostly spread out in this plot as opposed to being grouped in a tight cluster.
We see a similar phenomenon with the Brown Thrasher with no noticeable tight cluster, as well as for Carolina Wren. So these plots don’t necessarily tell the whole story behind the performance of a single species, but they do provide insights into how the system is behaving as a whole.
If we’re really digging into the performance of Merlin, our team relies heavily on precision recall curves. So a recall, shown here on the x-axis, captures the frequency at which we are reporting species suggestions to the user with a value of 1 meaning that we’re reporting suggestions for every bird sound that we hear.
Precision shown on the y-axis captures the accuracy of our suggestions with a value of 1 meaning that we’re reporting completely accurate suggestions. So the best place on a precision recall plot is this upper right-hand corner, which means that we’re providing accurate suggestions for all bird sounds.
So my goal as a researcher is to push this curve to the upper right-hand corner, knowing that there’s a potential trade-off in how responsive and engaging the app is to different bird sounds and how accurate those suggestions are. So this challenge of improving Merlin’s performance is tackled through additional R&D, collecting more recordings, and annotating those recordings to grow our data set and increase its coverage and diversity.
[Sarah Wagner] Fantastic. Thank you so much. I would say one really important component of helping this to run smoothly is getting good recordings, right? So maybe Jessie and Grant can describe some recording best practices. What are some of the ways to get out there and get the absolute best recording that you can?
[Jessie Barry] Yeah. So when you’re out in the field, it can be great to pull out your smartphone, or if you’re even really engaged and you end up buying a microphone and a separate audio recorder, that’s a fantastic setup. But the general techniques apply to whatever kind of equipment you’re using.
And you want to be generally quiet, perhaps not near people who are too chatty. Also avoiding shuffling your feet. It can be surprising just how noisy we are, just being out in the field. So getting into that quiet mode, trying to find a spot where there’s not a lot of external noise. If you can avoid being right next to a highway or something like that, that’s going to really help your recording.
And then finally, getting close to the bird. You want to approach the bird to that point where you’re not going to disturb it and it’s not going to take off and you miss your recording anyway. So being sensitive to how close is wise. And the closer you are, you will get a stronger signal, and that’s going to boost the quality of your recording. And those are some basic tips. Yep.
[Sarah Wagner] Yeah. Great. Grant, do you want to show us another trick?
[Grant Van Horn] Yeah. So another really cool thing you folks can do is grab an external mic. So an external mic can be plugged into the app, and the app should just work– plugged into your phone, and the app should work just fine. And this is actually one of the best ways to improve the performance of Sound ID.
So here’s an example set up where we’ve plugged in a small Rode mic to an iPhone. And then so here’s a photo of my dad using it in our backyard. So an external mic does a few things to improve Sound ID.
So first, it’s likely to be a better mic than the built-in microphones on your phone, which are optimized for different use cases than just recording bird sounds. And so this could allow you to capture much higher quality recordings just right out of the gate. And so these higher quality recordings are easier for Merlin to classify correctly.
Secondly, an external mic like this Rode one is directional, allowing you to point the mic in the direction that the bird is singing, and potentially reduce the amount of distracting background noise. Again, making it easier for Merlin to help you identify the bird. So if you want a really simple way to just sort of increase the performance of Sound ID, it’s grab one of these mics and plug it in and go outside and explore.
[Sarah Wagner] Very cool. Thank you, Grant. OK. We have some time left, so let’s start taking some of all of these questions we’re getting from our live audience.
So here’s one I see coming up a bit. It looks like people often get the message, Merlin could not download likely birds. Results might not be accurate. So when that comes up on their screen, they’re asking, should they just ignore it? Is there a remedy? What should they do?
[Drew Weber] Yeah. So basically, that means that either Merlin wasn’t able to get your location in the time that it was expecting to or you’re offline and Merlin needs that connection to the eBird database to determine the likely species in your area. So the app can function completely offline as far as suggesting best matches. So it’ll provide its guess without that knowledge of where you are actually in the world.
And so it’s important to look at those best matches with that in mind, right? So there may be some best matches that just don’t occur in your area. So you can dive in and look at the map, just to make sure, and the description to better understand whether this bird can occur in your area.
But really, what you want to do is when you get back to your internet, you can use the Edit Location option from the playback screen and set your location. And then Merlin will download that list of likely birds, and then you have a much more customized list of best matches that occur in your area.
[Sarah Wagner] Right. That makes sense. It looks like maybe Drew answered this in the chat, but I think it’s a really, really fun one to think about. What regional accents– what about regional accents? So how do you guys account for– how do you account for that in your sampling, in your design?
[Drew Weber] Grant, do you want to take that one?
[Grant Van Horn] Yeah. Yeah, I can try to tackle this one. So yeah. So this kind of falls under the umbrella of generalization in machine learning, which is one of the hardest topics in the field, and that’s just getting a model to work. We maybe we train it in Ithaca, New York, and we want it to work well elsewhere in the country. And that’s just a really tough problem.
One of our best remedies at it right now is actually tackling it from a data perspective. And this is why those recordings from our community are so important is we actually just try– when we sample audio recordings for a particular species, we try to sample broadly and cover their full range so that we get that variation in the data itself. And it gives Merlin the best chance to kind of just learn those representations out of the gate.
And then all this is a continual cycle. So we train Merlin. We evaluate it. We see where it’s not doing well, and we try to remedy that either through additional R&D on the model side or on data augmentation by getting more recordings, getting more orientations.
[Sarah Wagner] All right. Thank you for that. I think I have– let’s see. Anybody see any other really good ones?
[Drew Weber] Yeah, I saw some questions about whether the recordings that people make become part of the training data for future versions of Merlin. And right now, there’s no kind of direct connection to send these files back to Cornell, so everything’s happening on your phone and staying on your phone. Yeah, there’s no pipeline of data being piped back to the Cornell Lab.
If you’re interested in contributing your recordings, you can upload them through ebird.org to your list, either the list you saved with Merlin or the one you saved with eBird if you use eBird. And then those become available for future versions of the model.
[Sarah Wagner] Fantastic. So there is a way. [Laughs] Here’s another one that maybe is a good fit for Grant. With roughly 50,000 recordings total and 800 species in North America, that means roughly 50 recordings per species. How few recordings are required to obtain an accurate model for a specific species?
[Grant Van Horn] That’s a good question. It’s really variable. Yeah. So some of these birds that just kind of have small repertoire, they’re really consistent with their vocalizations, you can get away with– yeah, quite a low data sample.
And then the other end of the spectrum is stuff like the thrasher, mockingbirds. These things that are really diverse with their repertoires and they’re also trying to mimic other species. Yeah, 50 is probably not even enough. We want to just keep growing that data set over time and just keep showing Merlin these examples.
But yeah, a lot of it is– so one important distinction with the Sound ID project is we are both building a training data set to teach Merlin as well as sort of another testing data set that we use to quiz Merlin to see, how well is it doing? And so part of our development cycle is really monitoring the performance of birds and making sure that that test set has a good coverage of the repertoire. And Merlin needs to be performing well enough on that test set before we’re comfortable kind of releasing it to the community and letting folks see results for it. So yeah, it’s an iterative process.
[Sarah Wagner] Great. Thank you. It sounds like this one has come up quite a bit too. How many birds can Merlin accurately identify in a three-second span? During spring migration, the overlap of songs seems to be super difficult.
[Drew Weber] That is a super challenging problem when there’s a dawn chorus with a ton of birds singing. I’m not sure if we’ve done a specific test to see how many birds Merlin can identify at once, but we did set it up so that it can identify multiple species kind of singing at the same time. And so something we can constantly improve as we have more data. But Grant, I don’t know if you have any idea for kind of the max overlap that we might be able to pull out good signal from.
[Grant Van Horn] Yeah. There’s no enforced limitation on the number of birds it could predict in any given three seconds. Yeah, it’s more based off of its confident. So if it’s confident on 10 species in a three-second interval, we’ll certainly show those results. But yeah, as soon as you start getting a lot of birds overlapping, it just becomes really difficult with a single kind of microphone setup.
Humans are a little different because we have two ears, so we can start doing sort of directional and source separation-type stuff. Merlin’s not doing any of that currently, and so it’s a little bit handicapped in kind of its representation of the world. But yeah, there’s the stuff that we’ll continue to work on and iterate on and see what we can do.
[Sarah Wagner] I don’t know that this is an answer– I mean, it depends on the kind of device you’re using to make your recording, right? But when using Sound ID and making a recording, how should you orient your phone? [Laughs]
[Drew Weber] Yeah, so when you’re just using your own phone, those microphones are not very directional. And so probably pointing the microphone that’s at the bottom of the screen will help a little bit, but it’s not something that we’ve seen a strong correlation with. It seems like however you hold your phone, it doesn’t make much of a difference. However, if you do have a mic like a shotgun mic like Grant was showing, then it really helps to point it directly at the bird you’re interested in.
[Grant Van Horn] Yeah. And then just to throw on another thing there, one thing we have noticed is if you have a case on your phone, you might experiment with taking your phone out of the case and just seeing if that improves the types of recordings that you’re capturing and the responsiveness of Merlin to the environment you’re in.
[Sarah Wagner] If you’re in a safe place [Laughs] to take it–
[Grant Van Horn] Exactly. Don’t drop your phone after you take it out your case.
[Sarah Wagner] Yeah. That would be me.. Let’s see. This person says Merlin Bird ID identified a sound of a Green Heron in my yard, yet I live about a mile from the closest lake. Is that possible?
[Jessie Barry] Yeah, definitely. So Green Herons are pretty cool in that they nest in trees which can sometimes be a little distance inland from the nearest pond. So you will see them moving around, commuting back to deliver fish to those young. And they can be surprisingly far away from water.
[Sarah Wagner] Yeah. I was at a birthday party far away from the lake or water last weekend, and I got very distracted by a Green Heron flying over.
[Jessie Barry] Yeah. And you’ve also got to watch out for Merlin from time to time, because sometimes it is off base. So beware of the occasional rogue things. Like if a Black Scoter turns up and you’re like, I’ve never seen a Black Scoter here ever in my life, it’s probably Merlin’s mistake. If it’s a Trumpeter Swan, you’re like, they just don’t live here.
Particularly some of those waterfowl, they have weird vocalizations. There’s not a lot of data yet. If the phone happens to be offline, that’s where we kind of run into challenges where Merlin can lead us astray. So be ready for that.
[Sarah Wagner] Yeah. You’ve got to be diligent in your data collection still. Let’s see. Oh, this is an interesting one. Can you upload a sound file from Merlin to ID, or do you have to be–
[Drew Weber] Yeah, on your phone if you used your voice recording app to record a birdsong, you can actually share it from that app directly to Merlin. So if there’s a Share option in that app, you’ll see Merlin as an option to open it up. And you can also directly from Merlin use the Import feature to pull things in from your file system. So you can import songs that you’ve recorded elsewhere to verify what it is.
You can also just play it on your computer while you record with Merlin. It’s not as good, but generally it’s still going to provide you the answer you’re looking for.
[Sarah Wagner] Great. I’m seeing another question that is helpful, and I think there’s a good answer. How can we let you know if we are sure that Merlin has made a mistake?
[Drew Weber] Yeah. So right now, the easiest way to kind of let us know that the best– none of the best matches match the bird you’re looking for is at the bottom of the list of options, there is a Report No Matches button. And so that sends us some data so that we can kind of dig into that in the future.
Further down the road, we hope to more tightly integrate how you can upload recordings directly from Merlin to the Macaulay Library. And so when that’s available, that’ll be another way for you to submit some of these recordings that didn’t get the match you were looking for.
If you’re confident on what the bird is, it would be fantastic for you to upload it to eBird checklist, because that is potentially data that we do not currently have in Macaulay Library. It’s either a unique song or a variation and can help us expand how well Merlin works.
[Sarah Wagner] Absolutely.
[Grant Van Horn] I’ll second that one. If you can find good examples that we’re making mistakes on, yeah, I would love to have that data.
[Sarah Wagner] Yeah. Give us those data. OK. We’re almost out of time, but I see two really good ones I want to ask. Has Merlin helped any direct research or protection of birds and their habitats? So that’s a more general Merlin question, I guess.
[Jessie Barry] Yeah. I mean, absolutely. And I think we’re just at the starting point of what might be possible in tools like this that can really help monitor species in a landscape. And the Merlin algorithms certainly apply to some of those challenges.
And I think we’re also still hearing these stories from the community where Merlin is helpful. And I think part of the magic of this whole system is that it’s a community-driven project. The data that is powering Merlin is contributed by eBirders. Those sound recordings that are coming in and archived in the Macaulay Library are making that possible.
And the organization, the Lab is committed to having Merlin and eBird be free resources for the community. So we’re not putting the barriers in place to use the data for research and conservation. We want the data to be accessible and supporting those projects.
And this whole notion that the data is available, the software is there is possible because of the supporters from the Cornell Lab of Ornithology. And we’re just so grateful to our members who are really making this all possible, because those conservation impact stories are there, and they’re coming. And thank you all for being part of that.
[Sarah Wagner] Fantastic. OK. For our final question, can you tell us what’s next for Merlin Sound ID? A lot of people are asking about other regions and what sort of expansion we can expect.
[Drew Weber] Yeah. So I touched on this a little bit. We plan to allow you to basically record a bird and then add it to your observation so you can start collecting the birds you’ve recorded.
We’re also planning to expand to new parts of the world. Merlin as a whole supports over 8,000 species. Photo ID supports all of those species. And we hope to get eventually there for audio as well. Audio’s a lot harder, but we’re working on expanding with looking at Europe and South America and various parts of the world coming online in the next year.
Also, a big push to further expand what we support in North America. So right now for US and Canada, we have 458 species, but there’s still some additional species we want to put some focus on, make sure they’re working really well before we add them to the model. And so we’re actively working on that right now.
[Sarah Wagner] Very, very exciting. Thank you to our live audience for all of these fantastic questions. This is one of the largest groups we’ve ever had. So there’s a lot of really good enthusiasm out there, and that’s really hopeful and fun, and we’re really grateful to all of you. Drew and Jessie and Grant, thank you so much for taking the time out of your busy schedules to join us today and to share all of this fantastic information about this new Sound ID feature.
So just a few announcements for sort of last-minute stuff that I’ll do today and tomorrow for our participants. I’ll be emailing our Zoom attendees tomorrow with the recorded webinar and all of the resources that we’ve talked about. So any time someone asks a question and we give out a resource, we’ll include that in this email.
This webinar is part of a series. We’ve been spotlighting programs and research from around the Cornell Lab to help you get a better idea of what we do at the Lab, our research issues that birds face around the world.
If you enjoyed today’s program, please consider becoming a Cornell Lab member. And that’s it for today. Thank you so much to Grant and Jessie and Drew, and for our fantastic audience.
[Grant Van Horn] Yeah. No, thank you, Sarah. This was great. It was really fun.
[Sarah Wagner] Thanks so much.
[Jessie Barry] Thanks for joining us. Take care.
[Sarah Wagner] Take care.End of transcript
Have you ever been mystified when hearing a bird you can’t see? Our Merlin Bird ID app now features amazing Sound ID—join our experts to discover how to use this powerful new tool. During this free webinar, the Merlin team will share how citizen science and machine learning combined to create Sound ID. They’ll also provide practical advice for how to bird by ear. Come join the conversation and learn how Merlin can help you better recognize birds by sound.