Ocean Hypercapnia Data Challenge
Our goal: We are seeking collaborators to help us accelerate our understanding of how and when levels of ocean carbon dioxide reach dangerous levels for fish, mammals and all marine ecosystems. This phenomenon, known as ‘Ocean hypercapnia’, has potentially large implications for future fisheries, coral reef ecosystems and the hundreds of millions of people who are sustained by the ocean. But with limited global ocean data-sets available, we are opening up our work and looking for passionate scientists with innovative predictive approaches to beat our approach and accelerate this science.
The Challenge: Our data-based approach and discovery has just been published inNature and we are challenging anyone to beat it by downloading the global ocean data-set, employing their own numerical approach and sharing their final predictions openly. Whether you’re a student, data scientist, researcher or organization, there are 2 different prizes and rewards on offer.
The Peer Choice Award: Openness accelerates innovation and reproducibility so we are awarding a $500 ‘Peer Choice Award’. If you have a numerical approach or idea on how we can better predict ocean carbon dioxide levels, then simply write a short synopsis about it or upload a video (you don’t need to submit predictions for this award). The winner of the ‘Peer Choice Award’ will be determined by an open vote of peers through the Thinkable members.
The Challenge Award: The overall winner will go to the individual or team who beats our approach by the largest margin (judged by Residual Standard Error – RSE). The winners of this award will receive $3000 funding and have the option to become a co-author on a follow up paper in Nature’s Scientific Data journal.
We are seeking to work with talented scientists in any field that can come up with a better numerical approach to predict two oceanic state variables (Dissolved Inorganic Carbon (DIC) and Alkalinity (ALK)) that control carbon dioxide levels in the ocean.
Rewards & Prizes
We are excited to collaborate with the eventual winners to accelerate our collective knowledge about the potential threat of ocean hypercapnia. There are two different awards and prizes for those to participate.
1. The Peer Choice Award ($500)
Openness accelerates innovation and reproducibility. If you have a numerical approach or idea on how we can better predict ocean carbon dioxide levels using the existing global data-set, then simply write a short synopsis or upload a short video on your approach. The winner of the ‘Peer Award’ will be determined by an open vote of Thinkable members.
2. The Challenge Award ($3000 & co-authorship)
This will be awarded to the individual or team who beats our predictions for DIC and ALK by the largest margin (quantified by RSE). The winners of this award will have the option if they choose to become a co-author on the follow-up publication to our Nature paper and any future papers that use their winning approach.
1. The Peer Choice Award
Those wishing to submit an idea or technique for the ‘Peer Choice Award’ don’t have to submit predictions but must submit their proposed approach to be considered by an open vote. The winner of the ‘Peer Award’ will be determined by the individual or team that proposes a numerical approach that gains the most votes by our researcher members.
2. The Challenge Award
We have tested our SOMLO approach using the 3 disclosed test datasets below, with a final RSE value of 11.4 umol/kg for DIC and 7.9 umol/kg for TA. The winning individual or team will be awarded who achieves the greatest improvement to our RSE values for both DIC and ALK. The minimum improvement to be awarded the overall challenge winner is 2 umol/kg for DIC and 1 umol/kg for ALK. Final entries will be evaluated on the combined RSE of data predictions using test_dataset_1,2,3 for both DIC and ALK. The overall winner will then need to share their scripts privately with us for final verification and awarding.
1. We do not require individuals or teams to have any formal qualifications or oceanography background to participate in this challenge. It is open to anyone.
2. Entrants are allowed to use any combination of predictor variables listed in the training data-sets (excluding DIC and/or ALK). They can also bring in other variable’s not listed if they choose (e.g. satellite derived chlorophyll, n-vector, etc)
3. Please provide a paragraph summary of your approach with your final submission.
Global Ocean Data-sets
We have split up the worlds surface ocean database of DIC & ALK (~30,000 measurements) into three independent training datasets that contain coinciding predictor variables like latitude, temperature, salinity, nutrients, oxygen etc.
Each dataset includes the following variables:
Latitude (deg North)
Longitude (deg East)
data_number (for assessing which set of variables each entry used)
MLD (Mixed Layer Depth in metres)
Temperature (degrees Cel)
DIC_input (Dissolved Inorganic Carbon in umol/kg)
TA_input (Alkalinity in umol/kg)
There are three different sets of independent data-sets to predict DIC and ALK from your approach using the equivalent training data-sets above. Each represents ~10% of the global data-sets and are not included in the training data.
How to submit?
After you have predicted DIC and ALK for each of the three different testing data sets above using your numerical approach, combine your final predictions into one csv file for DIC and ALK separately and include them in a dropbox, google drive etc link within your submission.
Here are our final numbers as an example submission using a dropbox link:
What was our approach to predict DIC and ALK?
Our data-analysis was performed using R and combined the use of a neural network clustering algorithm and a principle-component regression. For DIC predictions, the optimal parameter set was temperature, salinity, phosphate & oxygen, while for ALK, our parameter set was salinity, oxygen, phosphate and silicate. Click here to watch a brief summary of our approach and to download the open-access paper that details our approach.
How does the leaderboard work?
Thinkable allows any scientist or team to host a competition with judging and/or member voting to award an unlimited set of prizes. The Thinkable leaderboard is only active during the voting/judging period where votes are automatically tallied either through the membership or through the organisers invited judges.
What is DIC?
DIC stands for Dissolved Inorganic Carbon concentration of seawater and can be thought of as the total carbon dioxide concentration of the ocean. Three forms of inorganic carbon make up DIC (see figure below) in the ocean including dissolved carbon dioxide (the molecule that exchanges with atmospheric CO2 and we worry about), carbonate ion (required for calcium carbonate production: CO3) and bicarbonate ion HCO3 (which makes up 90% of the DIC pool).
What is Alk?
Alk refers to the alkalinity concentration of seawater and can be thought of as the ability for a parcel of water to buffer an acid. It’s defined by adding up all of the ionic properties of seawater (eg carbonate, bicarbonate, boron, hydrogen, siliceous acid etc). Alkalinity remains constant when CO2 enters the ocean from the atmosphere since it’s a charge balance, however Alkalinity is a critical state variable along with DIC that defines the level of carbon dioxide in the ocean.
How do you calculate CO2 from DIC and ALK?
Carbon dioxide concentration can be calculated by knowledge of seawater temperature, salinity, DIC and ALK. Scott Denning’s research team at Colorado State University has developed an online calculator to play with (CLICK HERE). Simply change the DIC and ALK concentrations to see how pCO2concentrations, Revelle factors and pH change too.
Where has the data come from?
Over the 1990s until today a number of government funded measurement programs have given scientists an unprecedented snapshot into ocean physics, chemistry and biology. Thousands of oceanographers have invested decades of effort into sampling and measuring a range of ocean properties from coordinated multi-national programs. Below is a graphic illustrating the global data used for this challenge. Please see more at the Carbon Dioxide Information Analysis Center (CDIAC) and the Global Ocean Data Analysis Project.
We would like to thank all those who contributed to the collection, measurement and open disclosure of this valuable and important data-set. In particular, Bob Key from Princeton University and Alex Kozyr from the Oak Ridge National Laboratory for their leadership and dedication over decades in performing quality assurance and management of these bottle carbon data-sets.
Who to contact with questions?
Tristan Sasse at [email protected] or click ‘contact organisers’
Legals & Quorum
For the data challenge, there is no quorum required – however for the peer choice award, a minimum of 5 ideas will be needed before the competition and reward is activated.