Hello again everyone! It’s Yifan here with the songbird project. Like my other colleagues I also attended the 4th of July parade in Ann Arbor, which was very fun. I made a very rugged cardinal helmet which looks like a rooster hat, but I guess rooster also counts as a kind of bird, so that turned out just fine.
Anyways, since the last blog post, I have shifted my work emphasis to user interface. After some discussions with my supervisors, we’ve made the decision to change the scheme a little. Instead of using machine learning to detect onsets in a recording, we are going to make an interface that allows the users to select an appropriate volume threshold to do the pre-processing. Then, we will use our machine learning classifier to further classify these interesting clips in details.
Why thresholding based on volume, one might ask? Well, volume is the most straightforward property of sound for us. During tech trek, a kid asked me a very interesting question: when you are detecting birds in a long recording, how do you know the train sound you ruled out as noise isn’t a bird that just sounds like train? Although this one should be quite obvious, we should still give the users the freedom to keep what they want in the raw data. Hence, I’ve developed a simple mechanism that allows every user to decide what they want and what they don’t want before classifying.
This figure is a quick visual representation of a 15 minute field recording after being processed by the mechanism I was talking about. As you can see, in the first plot there is a red line. That is the threshold for user to define. Anything louder than this line would be marked as “activity”; anything quieter than it would be marked as “inactivity.” The second plot shows the activity by time. However, an activity, like a bird call, might have long silence period in between each call. In order not to count those as multiple activities, we have a parameter called “inactivity window,” which is basically the silent time you need in between two activities to be counted as separate activities.
In the above figure, the inactivity window is set to 0.5 second, which is very small. That is why you can see so many separate spikes in the activity plot. Below is the the plot of the same data, but with a inactivity window of 5 seconds.
Because the inactivity window is larger now, smaller activities are now merged into longer continuous activities. This can also be customized by users. After this preprocessing procedure, we will chop up the long recording based on activities, and run smaller clips through the pre-trained classifier.
Unfortunately my laptop completely gave up on me a couple days ago, and I had to send it to repair. I would love to show more data and graphs in this blog post, but I’m afraid I have to postpone that to my last post. Anyways, I wish the best for my laptop (as well as the data in it), and see you next time!
Hey everybody, it’s your favourite Neurorobot project once again, back with more exciting updates! I went to my first knitting lesson this week at a lovely local cafe called Literati, and attended the Ann Arbor Fourth of July parade dressed as a giant eyeball with keyboards on my arms (I meant to dress as “computer vision” but I think it ended up looking more like a strange halloween costume).
Oh wait… Did you want updates on the Neurorobot itself? Unfortunately it’s been more snags and surprises than it has been significant progress; one of major hurdles we’re still yet to overcome is in the video transmission itself. (I did however put huge googly eyes on it)
The video from the Neurorobot has to first be captured and transmitted by the bot itself, then sent flying through the air as radio waves, received by my computer, assembled back together into video, loaded into program memory, processed, and only then can I finally give the bot a command to do something. All parts of this process incur delays, some small, some big, but the end result so far is about 0.85 seconds.
(A demo of how I measure delay, the difference between the stopwatch in the bot recording and the one running live on my computer)
Unfortunately, human perception is a finicky subject; typically in designing websites and applications it has been found that anything up to 100ms of delay is considered “instantaneous,” meaning the user won’t send you angry emails about how slow a button is to click. 0.85 seconds however means that even if you show the robot a cup or a shoe and tell it to follow it, the object may very well leave its view before it’s had a chance to react to it. This means the user has a hard time telling the correlation between showing the object and the bot moving towards it, leading them to question whether it’s actually doing anything at all.
Unfortunately the protocol the wifi module on our robot uses to communicate video with the laptop isn’t that easy to figure out, but we’ve made sizable progress. We’ve gotten the transmission delay down to 0.28 seconds, but the resulting code to do this is 3 different applications all “duct-taped” together, so there’s still a little bit of room for improvement.
I hope to have much bigger updates for my next blogpost, but for now here’s a video demo of my newest mug tracking software.
Hey guys, it’s Yifan again. There has been a lot of progress since my first blog post. As promised last time, I was able to finish a functional prototype with all the legacies left for me. I put the device in the woods to get recording for the first time. The results, to my pleasant surprise, are very impressive. The device recorded 12 hours of data and wrote it to the SD card.
(Device in the field)
(Device field location)
The quality of the microphone is quite decent, and although it cannot record birds from far away, it can record the bird on the same tree very clearly. Here are some of my hand-selected bird song clips from the 12 hour recording. Enjoy the sound of nature!
If you did listen to the recordings, hopefully you can tell that one of the recordings is not a bird song (it’s a train whistle). Differentiating between bird calls and noises is a easy task for use human, but it can be a challenge to a computer. Then why do we need to use programs to help us identify these sounds? Take a look at this recording image.
The entire recording is 12 hours, if I choose to hand select out all the bird songs, it can take hours. If we want our device to continuously record for a week, we humans simply do not have the patience to go through hundreds of hours of data. Computers, on the other hand, love analyzing long recordings, and they can be trained to do it very well.
Luckily, pyAudioAnalysis, the library our team uses for classifying different bird species, also has the functionality to segment a long recording based on a mathematical model called Hidden Markov Model (HMM). In the near future, I will hopefully be able to use this method to segment all the recordings from our device.
The process of utilizing HMM model including these following steps. First, you need to generate annotation files for known recordings as training data. The annotation looks like this:
The model is trained by these given class names. After the model is trained, we can then use it to classify an unfamiliar recording. Based on similarities, the model generates marks for segmentation.
In the most recent test run, I trained the model using one single file, and tested the model’s accuracy on a very simple recording. Here is a graph representing the segmentation results.
Looks like it’s working, right? Well not really. Although this particular segmentation is pretty accurate, others are not very satisfactory. However, I did only use one training input, which can definitely be improved. Another possible improvement I can think of is to train the model with only two classes: bird and not bird. Since we’ll still have to use the classifier to differentiate different kinds of birds, the segmentation model only needs to be able to tell the difference between birds and non-birds. We will see how that goes. Wish me luck!