Insects are crucial members of many ecosystems, and so the recent reports of insect population declines are a vital global topic. Yet there is still much uncertainty about insect biodiversity and population changes, due to a poverty of monitoring information from a very limited number of species and geographic regions. Automatic monitoring has a key role to play, including automatic recognition from sounds (and from images). Naturalis is collaborating on this with Dutch and international partners.
Automatic recognition of insect sounds could help us understand changing biodiversity trends around the world, but insect sounds are challenging to recognize even for deep learning. Can you find a way to perform high-quality classification of insect sounds, to support biodiversity monitoring? We have created a dataset with the sounds of more than 400 insect species (Orthoptera & Cicadidae) and trained a standard classifier (based on work by a team at Capgemini). But there are many ways you could improve on the machine learning, since AI for insect sounds is still a very open topic with many unexplored avenues.
Links to more background information: - https://arxiv.org/abs/2503.15074v1 - https://arxiv.org/abs/2112.06725v1 - https://zenodo.org/records/8252141 - https://www.capgemini.com/in-en/news/inside-stories/catching-the-ai-bug/ - https://github.com/danstowell/insect_classifier_GDSC23_insecteffnet/tree/useIS459
Minimum-viable: - A classifier that can take a number of audio files and generate a set of classifications for it. Each classification refers to a file with a start & end time and has a scientific name of an insect and a probability. - We require you to thoroughly document the building and training of your classifier, enabling future projects to use your work. The classifier would preferably be described using the model cards format. - We strongly prefer your classifier to be made available to the world as open model/source, using for example the Apache 2.0 or MIT license.
Fully completed: - We of course hope that you will be able to build a classifier that generates better results than the current model... :-) - For the integration with the ARISE digital species identification system, the results from your classifier should use the JSON format we have designed for this purpose. - For quality assurance, it would be good to create automated tests. For security, it would be good to check your code and any dependencies for security vulnerabilities. - At Naturalis, we strive to minimize the environmental impact of our research, so making your classifier efficient in GPU/CPU and memory use is something we want to strongly encourage.