Throughout March, participants in the “Bremen Big Data Challenge 2023” (BBDC) crunched through a large dataset from the Alfred Wegener Institute to make accurate predictions about the state of the ocean. For the first time, scientific staff also competed in a separate track. Looking ahead, the Cognitive Systems Lab’s organizational team is planning more exciting innovations for the BBDC.
The availability of data, and Big Data in particular, has never been as large and far-reaching as it is today. “Big Data” refers to data sets that are difficult to analyze using traditional data processing methods due to their structure, size, or complexity.
The goal in each BBDC is to solve such a tricky Big Data task. To do this, a dataset is provided that contains information collected in the past. The challenge is to analyze the primary data from the real world and use the knowledge from the past to predict future information as accurately as possible. The result with the most accurate prediction wins.
To analyse the data in the competition, the BBDC participants rely not only on inferential statistical methods but also on machine learning methods. The combination of both methods makes it possible to uncover hidden knowledge in the data and to use the initially unstructured abundance of information to answer questions and make predictions.
The task: predicting the temperature and salinity of the ocean
While in the first years of the BBDC data sets from industry were analyzed, in recent years the data came directly from the University of Bremen, e.g. from projects of the research network Minds, Media and Machines or from the Collaborative Research Center EASE. With the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, the BBDC shifted the focus of this year’s competition for the first time to data from a non-university non-profit organization.
The Alfred Wegener Institute provided a dataset containing ocean samples from the vicinity of the North Sea island of Helgoland over a period of 54 years. This dataset contained data from ocean samples from 1962 to 2009, with the exception of 2004, and included nine variables, such as ocean temperature and salinity. The BBDC participants’ task was to predict the missing values for the year 2004 and for the years 2011-12 (in the Student Track) and 2011-2015 (in the Professional Track), respectively.
The Professional Track was newly introduced this year to also give employees of the university and employees of sponsoring institutions the opportunity to participate in the BBDC. In addition to the long-standing sponsoring partners Neuland – Büro für Informatik and Sparkasse Bremen, Just Add AI (JAAI) was among the sponsors for the second time this year. Quite often, the winners find employment with the sponsors or the University of Bremen after the BBDC.
In other respects, too, the BBDC is drawing increasingly wider circles: Among the 160 registrations, southern Germany was represented for the first time this year with a team from Darmstadt University of Applied Sciences. Another first was the University of Paderborn, which has close research ties with the University of Bremen. The University of Paderborn was represented both through students on the participant side and through Prof. Dr. Axel Ngonga, head of the Data Science (Dice) working group at the University of Paderborn, who gave an exciting keynote address on the topic of “Learning with multiple representations” following the awards ceremony.
Development towards sustainability and “Explainable AI”
When selecting the data set to be analyzed in the BBDC, the CSL group takes care to ensure that no models are yet available to solve the central question. At the same time, however, the task must be solvable within the given time frame. The organizers of the BBDC, Marvin Borsdorf and Yale Hartmann, make sure to use real data that is as tangible as possible when selecting the data sets. In this way, participants can develop a sense of the uncertainties, complications, and necessary trade-offs involved in dealing with data in the real world.
In addition, a sustainability criterion should be included in the evaluation of solutions in the future that rewards energy-efficient code, as much energy can be saved through clever data modeling. In perspective, in line with the goals of the “Explainable AI” approach, the introduction of an interpretability criterion would also be desirable, although its operationalization is currently still difficult.
Moreover, a sustainability criterion to reward energy-efficient code is planned to get included in the evaluation of solutions, since a lot of energy can be saved through clever data modeling. In perspective, in line with the goals of the “Explainable AI” approach, the introduction of an interpretability criterion would also be desirable, although its operationalization is currently still difficult. In the long term, it would also be desirable to introduce an interpretability criterion in line with the principles of the “Explainable AI” approach, although it is still difficult to operationalize this at present.
Introduction at schools planned
A concrete goal for the coming year is to introduce the BBDC to schools in Bremen. With its focus on creative and active learning through tasks and teamwork, the format is ideally suited for high school students, giving them plenty of opportunities to learn about big data and machine learning, without having to go into depth unless they are interested.
We see the BBDC in many ways growing with the evolution and demands of big data. We are excited to share our fascination with data and its analysis with an ever-growing audience as part of the BBDC, and we look forward to the future challenges.
We would like to thank the Alfred Wegener Institute for making their research data available for this year’s BBDC. Our thanks also go to the BBDC sponsors: Just Add AI, Neuland – Büro für Informatik, and Sparkasse Bremen.