Throughout March, the participants of the "Bremen Big Data Challenge 2023" (BBDC) combed through a huge data set from the Alfred Wegener Institute in order to make precise predictions about the state of the sea. For the first time, scientific staff also took part in a separate track alongside the students. The organizational team of the Cognitive Systems Lab is planning even more exciting innovations for the future.
The availability of data, and big data in particular, has never been as large and far-reaching as it is today. "Big data" refers to data sets that are difficult to analyze using conventional data processing methods due to their structure, size or complexity.
The aim of every BBDC is to solve such a tricky task from the field of big data. For this purpose, a data set is made available that contains information collected in the past. The challenge is to analyze the primary data from the real world and use the knowledge from the past to predict future information as accurately as possible. The result with the most accurate prediction wins.
When analyzing data as part of the competition, the BBDC participants rely primarily on machine learning methods in addition to inferential statistical methods. The combination of both methods makes it possible to uncover hidden knowledge in the data and use the initially unstructured wealth of information to answer questions and make predictions.
Task: Predicting the temperature and salinity of the sea
While data sets from industry were analyzed in the first years of the BBDC, in recent years the data has come directly from the University of Bremen, e.g. from projects of the scientific focus Minds, Media and Machines or from the special research area EASE. With the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, the BBDC focused on data from a non-university, non-profit organization for the first time this year.
The Alfred Wegener Institute had provided a data set that included sea samples from the vicinity of Helgoland over a period of 54 years. This dataset contained data from marine samples from 1962 to 2009, with the exception of 2004, and included nine variables, including, for example, the temperature and salinity of the sea. The task of the BBDC participants was to predict the missing values for the year 2004 as well as for the years 2011-12 (in the Student Track) and 2011-2015 (in the Professional Track).
As in previous years, the participants developed creative solutions to problems that arose from the given data. For example, the winning team included data from the German Weather Service in their analyses, which played a decisive role in the team's victory in both the Student and Professional Track categories.
The Professional Track was introduced this year to give academic staff at the university and employees of the sponsoring institutions the opportunity to take part in the BBDC. In addition to the long-standing sponsoring partners Neuland - Büro für Informatik and Sparkasse Bremen, Just Add AI (JAAI) was among the sponsors for the second time this year. It is not uncommon for the winners to find employment with the sponsors or the University of Bremen following the BBDC.
The BBDC is also becoming increasingly popular in other respects: Among the 160 registrations this year, southern Germany was represented for the first time with a team from Darmstadt University of Applied Sciences. And the University of Paderborn, which has close research links with the University of Bremen, was also involved in the BBDC for the first time this year - both through students on the participant side and through Prof. Dr. Axel Ngonga, head of the Data Science (Dice) working group at the University of Paderborn, who gave an exciting keynote speech on the topic of "Learning with multiple representations" following the award ceremony.
Trend towards sustainability and "Explainable AI"
When selecting the data set to be analyzed as part of the BBDC, the working group ensures that no models are yet available to solve the central question. At the same time, however, the task must be solvable within the given time frame. When selecting the data sets, the BBDC organizers, Marvin Borsdorf and Yale Hartmann, take care to use real data that is as tangible as possible. In this way, the participants can develop a feeling for the uncertainties, complications and necessary trade-offs involved in dealing with data in the real world.
In addition, a sustainability criterion that rewards energy-efficient code should be incorporated into the evaluation of solutions in the future, as a lot of energy can be saved through clever data modeling. In the future, the introduction of an interpretability criterion would also be desirable in line with the objectives of the "Explainable AI" approach, although it is currently still difficult to operationalize this.
Introduction at schools planned
A concrete goal for the coming year is to introduce the BBDC at schools in Bremen. With its focus on creative and active learning through tasks and teamwork, the format is ideal for high school students and offers them many opportunities to learn about big data and machine learning and only go into depth if they are interested and need to.
In many ways, the BBDC is growing with the developments and requirements of big data. We are delighted to be able to share our fascination for data and its analysis with an ever-growing audience at the BBDC and are looking forward to the challenges of the future.
---------------------------------------------
We would like to thank the Alfred Wegener Institute for providing its research data for this year's BBDC. Our sincere thanks also go to the BBDC sponsors: Just Add AI, Neuland - Büro für Informatik and Sparkasse Bremen.