NE Big data Innovation Hub: Forecasting Salinity in Rivers during Storm Events



Every winter, vast amounts of road salts are brought onto the streets across the Northeast. While important for our safety, ice-melt and rainfall washes road salt into the river systems where it is damaging our ecosystems. Often, the negative impacts of high salt concentrations only become visible after extended periods of time. For example, when hurricane Sandy flooded New York City with salt water, the disastrous effects on trees became visible three years later.

So far the transport of salt through our waterways is only poorly understood by biogeochemists and hydrologists. This project takes a data science approach in forecasting the salt concentration in rivers across New Hampshire. The purpose is to analyze what-if scenarios regarding salinity at particular river sites, in order to estimate the impact of changing weather patterns, such as rain-on-snow, drought, or intense rainfall, and different road treatment events. For example, if a dry period in winter is followed by multiple severe storms, would we observe a sudden spike of salinity? What if frequent rain-after-snow events wash smaller amounts continuously in the environment; would the salt aggregate near roads or wash down the river? How are sudden changes in expected weather events affecting the river systems’ ability to buffer salinity? Answering these questions allows us to quantify the resilience of different riverine ecosystems with respect to salt.

We will analyze riverine data collected across multiple river sites in 15-minute intervals over the course of five years. This project’s objective is to develop models that predict the expected salinity development over time, for a given series of temperature and flow rate. We can query the model to predict how salinity is expected to change with weather patterns, by varying the input flow rates and temperatures. Here we are particularly interested in storm events and other abnormal weather patterns.

In earlier work, we analyzed both traditional and deep learning models for time series for the task of salinity forecasting. Prediction performance is quantified by root mean squared error (RMSE) to the actual salinity. The vast majority of models can accurately predict near futures, but for long horizons deep neural GRU models perform best (3.8 RMSE). However, the accuracy of salinity prediction drops drastically during storm events: with 14.5 RMSE, the best model for storm events is a neural CNN, which amounts to a five-fold error-rate increase. Since we are particularly interested in modelling storm events accurately, this performance loss is not acceptable for our purposes.

While we focus on salt concentration in rivers, our algorithms are designed to generalize to other solutes of scientific interest, such as dissolved organic carbon (DOC) or nitrogen and phosphorus. While here we focus on aqueous sensors that are submerged in the river, our algorithms transfer to sensor data collected in soil and air.

Project Goals

Early work assumed a constant power-law relationship between flowrate and salinity, ignoring temporal processes. This leads to inaccurate models, especially for smaller rivers and streams. Recently, temporal patterns are studied, such as clockwise and counterclockwise relationships depending on the solute and properties of the measurement site. However these approaches still rely on simplifying assumptions that are not capable of modelling the complex processes observed by biogeochemists.

In our early work, we studied traditional time series and deep learning methods in their ability to accurately model salinity over time. Even models that accurately predict over long time ranges exhibit drastic performance losses during storm events (best case 3.8 RSME during normal conditions versus 14.5 RSME during storm events). While the relative rarity of storm events in the dataset is a problem, the main issue is that storm events trigger complex biogeochemical processes. These lead to rapid fluctuations in the solute concentration, which are difficult to accurately model with general-purpose time series methods.

In this project, we explore a novel approach that simultaneously (1) discerns storm events from normal weather events, (2) models the intra-storm solute depletion process, and (3) incorporates meta information such as weather and topology into one overarching model.

We develop a multi-component system which is subsequently integrated into one deep neural network: A storm event segmentation component identifies storm periods based on information such as temporal flow rate, temperature, seasonality and regional weather data. Given segmented storm events, we develop a normalizing projection function that detects characteristic points during the storm event, such as onset, first peak, last peak, and wane. The projection function is optimized to best capture salinity at any point during the storm. A time-series component (an extension to the GRUs model) is designed to model typical temporal patterns of salinity fluctuations such as depletion of the solute source. The final overarching model incorporates information across storm periods, projected storm normalization, and time series components to forecast salinity over time. Our emphasis is on models that do not require a long history for successful predictions, but instead can forecast distant futures with fictional storm event scenarios which allows us to study “what-if” scenarios. Finally, we extend the model to incorporate data across multiple measurement sites along the river system, including geospatial and topological information.

While end-to-end training of deep neural networks is generally believed to be capable of modeling such complex processes, there are many questions on how to effectively incorporate the domain knowledge of environmental scientists. While five years of cross-site 15-minute interval data provides sufficient training data, the data is only available for a limited number of sites. Hence our models need to be engineered to robustly learn from limited training data, with a possible fall-back on a pipeline of multiple components.

While we develop these techniques for the concrete problem of salinity prediction during storm events, many environmental scientists require similar data science methods.

Broader Impacts

One long-term goal is to foster an interdisciplinary network of academics, data scientists, citizen scientists, and local groups that are interested in the data-driven analysis of how our actions affect our ecological environment and vice versa.

Additionally, we aim to build a community of practice around the concrete topic of road salt on river systems. Many citizens across rural and suburban New England care about protecting their environment. While road salt is necessary to ensure our safety, many citizens are concerned about the effects of road salt on the health of streams in their neighborhood. However, only few citizens have the means to measure and forecast the effect of road salt on their local streams. By developing salinity prediction models that are able to generalize to other locations along the river, we provide publicly accessible data on the topic and hope to inform the public discourse on the topic with our data science methods.

One of my spare time projects is to build the low-cost water sensor “Riffle” that allows citizens to measure the salinity of streams on their property. While this citizen-science project is orthogonal to this seed-fund project, the hope is that such citizen-operated sensor networks offer complementary crowd-sourced data for the prediction of salinity in local rivers and streams. We want to empower citizens to measure and track the water quality in their community, and provide them with our data analysis tools. We want to encourage citizens to use data science to their benefit.

The University of New Hampshire caters to many first-in-the-family students, to whom a strong connection to the environment is an important motivating factor. In this project, students work on abstract data science that has tangible, concrete benefits for their rural community. Especially regarding students with historically high drop-out rates, we believe that such abstract yet directly useful projects help to retain students in the computer science program.


This material is based upon work supported by a seed grant from the North East Big Data Innovation Hub.


Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsor.

Point of contact: Laura Dietz

This page was last updated January 2021.