Hello friends, this is Yifan again. As the end of the summer draws near, my summer research is also coming to a conclusion. The work I did over the summer was very different from what I expected. Since this is a wrap up post for an ongoing project, let us first go through what exactly I did this summer.
The above is the product flow graph for our MDP project. All the blue blocks and paths are what I worked on this summer. In previous posts I wrote about progress and accomplishments on everything except the bird detection algorithm.
In my second blog post, I talked about using a single HMM (hidden Markov model) to differentiate between a bird and a non-bird. One problem was that HMM classification takes a long time. Running HMM classification on a 30-minute long recording takes about 2 minute. Considering the fact that we need to analyze data much longer than that, we need to pre-process the recording, and ditch the less interesting parts. This way, we are only putting the interesting parts of the recording into the HMM classifier.
This figure is the runtime profile of running HMM on a full 30-minute long recording. The classification took about 2 minutes. After splitting out the interesting parts of the recording, we are only running classification on these short clips, hence reduces the runtime by a very large factor (see figure below).
One thing you might have noticed in these two graphs is that the runtime for wav_parse is also extremely long. Since there is almost no way to get around parsing the wav file itself, the time consumed here will always be a bottleneck for our algorithm. Instead of a better parsing function, I did the mature thing by blaming it all on python’s inherent performance issues. Jokes aside, I think eventually someone will need to deal with this problem, but I think optimization can wait for now.
This figure is the raw classification output using a model trained by 5 samples of a matching bird call. If the model thinks a window in the recording matches the model, it marks that window as 0, otherwise 1. Basically this mess tells us that in these 23 clips, only clip 9 and 10 does not contain the bird used to train the model.
One might ask, why don’t you have a plot or graph for this result? Don’t yell at me yet, I have my reasons… I literally have more than a hundred clips from one 30-minute recording. It’s easier for me to quickly go through the result if they are clustered together in a text file.
Although me and my mentor Stanislav had decided on trying out HMM to do the bird detection. The results aren’t very optimistic. There is the possibility that HMM is not a very good choice for this purpose after all, which means I might need to do more research to find a better solution for bird detection. Luckily, since songbird is an ongoing project, I will get my full team back again in September. Over this summer, I believed I have made some valuable contributions to this project, and hopefully that can help us achieve our initial goals and plans for this product.
This summer has been a wonderful time to me. I would like to thank all my mentors and fellows for their help along the way, it really meant a lot to me. Looking into the future, I definitely believe this project has more potential than just classifying birds, but for now I am ready to enjoy the rest of the summer in order to work harder when I come back to Ann Arbor in fall.
Though it feels like just last week I landed here in Michigan, it seems that it’s almost the time for me to go back home. Though my work with the project here is wrapping up, there’s still so much to be done, but I’m confident that I’m leaving behind a good framework.
These past few weeks have been so eventful! I was home sick for a week with the flu, which sucked, but thankfully I could just take my work home with me. The Ann Arbor Art Fair (pictured above) was incredible; half of downtown was covered in tents full of beautiful creations from all around the country. Completely unrelated, half my paycheck now seems to be missing.
If you were following my last blog post, you may remember that one of the biggest hurdles I had to overcome was video delay between the bot and my laptop; big delay meant bad reaction speed meant sad bot.
Through changing libraries, multithreading, and extensive testing, we’ve now gotten the delay to just under 0.4 seconds, less than half of what it was with the old versions of the code!! This may not seem too exciting, but it means the bot now react to things much smoother and calmer than before, as demonstrated in this video of it hunting down a ferocious bag of chips:
Another new accomplishment is neurons! Spiking neurons to be specific, nothing like your regular generic brand neural net neurons. These ones are much prettier (using the Izhikevich model) and react to stimuli very organically, making graphs that resemble what you’d see on your lab oscilloscope:
More importantly, other than just looking good, these neuron models also behave really well. As an example here’s just two neurons (listen for the different clicking tones) connected to the motors of the Neurorobot, one activated and one inhibited by the ultrasonic distance sensor:
With just these two neurons, and fewer than twenty lines of code, the Neurorobot can already display cool exploration behaviours, avoiding getting stuck on walls and objects the best it can. Neurons are powerful, that’s why creatures like the roundworm can survive with just over 300 of them: it doesn’t take a lot to do a lot.
Here’s another example, in which position neurons that are more sensitive to left and right areas of the image space are given the task of finding something big and orange:
Notice how when the cup is to the left of the bot’s camera, the blue neuron spikes; whereas when it drifts to the right, the green and red neurons start spiking.
There’s still much optimization that can be done to make the Neurorobot think and react faster, eventually the entire camera code is going to be rewritten from scratch, as well as better visualisation of what’s going on under the hood. Lights, speakers, microphone, and a good user interface are all coming soon to a Backyard Brains near you!
Christopher Harris’s Neurorobot prototypes already have a drag-and-drop interface for putting together neurons:
The real goal of this project isn’t to have student write a thousand lines of code for the robot to do something interesting, but for them to learn and explore the wonderful behaviour of neurons by just by playing around with them; changing parameters, connecting their synapses to different components and seeing if they can get the bot to wiggle when it sees a shoe or beep and back away from anything that looks like a vase. And as is in keeping with Backyard Brains’ tenets, seeing what they will discover.
Hello again everyone! It’s Yifan here with the songbird project. Like my other colleagues I also attended the 4th of July parade in Ann Arbor, which was very fun. I made a very rugged cardinal helmet which looks like a rooster hat, but I guess rooster also counts as a kind of bird, so that turned out just fine.
Anyways, since the last blog post, I have shifted my work emphasis to user interface. After some discussions with my supervisors, we’ve made the decision to change the scheme a little. Instead of using machine learning to detect onsets in a recording, we are going to make an interface that allows the users to select an appropriate volume threshold to do the pre-processing. Then, we will use our machine learning classifier to further classify these interesting clips in details.
Why thresholding based on volume, one might ask? Well, volume is the most straightforward property of sound for us. During tech trek, a kid asked me a very interesting question: when you are detecting birds in a long recording, how do you know the train sound you ruled out as noise isn’t a bird that just sounds like train? Although this one should be quite obvious, we should still give the users the freedom to keep what they want in the raw data. Hence, I’ve developed a simple mechanism that allows every user to decide what they want and what they don’t want before classifying.
This figure is a quick visual representation of a 15 minute field recording after being processed by the mechanism I was talking about. As you can see, in the first plot there is a red line. That is the threshold for user to define. Anything louder than this line would be marked as “activity”; anything quieter than it would be marked as “inactivity.” The second plot shows the activity by time. However, an activity, like a bird call, might have long silence period in between each call. In order not to count those as multiple activities, we have a parameter called “inactivity window,” which is basically the silent time you need in between two activities to be counted as separate activities.
In the above figure, the inactivity window is set to 0.5 second, which is very small. That is why you can see so many separate spikes in the activity plot. Below is the the plot of the same data, but with a inactivity window of 5 seconds.
Because the inactivity window is larger now, smaller activities are now merged into longer continuous activities. This can also be customized by users. After this preprocessing procedure, we will chop up the long recording based on activities, and run smaller clips through the pre-trained classifier.
Unfortunately my laptop completely gave up on me a couple days ago, and I had to send it to repair. I would love to show more data and graphs in this blog post, but I’m afraid I have to postpone that to my last post. Anyways, I wish the best for my laptop (as well as the data in it), and see you next time!