Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu

Guojin Sun, Weitang Zhu, Xiaoyan Qian, Chunlei Wei, Pengfei Xie, Yao Shi, Xiaoyong Cao, Yi He

Research output: Contribution to journalArticlepeer-review

Abstract

Cyanobacteria harmful blooms (Cyano-HABs) have become a globally critical environmental issue, threatening freshwater ecosystems by degrading water quality and posing risks to human and aquatic life. Chlorophyll-a (Chl-a), a key biomarker of bloom intensity, offers crucial insights into algal bloom dynamics. However, predicting Chl-a concentrations remains challenging due to the complex interactions between various environmental factors. This study utilizes machine learning (ML) models to predict Chl-a concentrations, focusing on Lake Taihu in China, a large eutrophic lake that serves as an example of numerous freshwater lakes suffering from Cyano-HABs. The research leverages nine critical water quality parameters—water temperature, pH, dissolved oxygen, turbidity, electrical conductivity permanganate index, ammonia nitrogen, total phosphorus, and total nitrogen—to develop an ensemble ML model using XGBoost, known for its ability to handle nonlinear relationships and integrate multiple variables. The XGBoost model achieved superior predictive accuracy with an R2 value of 0.78 and RMSE of 8.97 mg/m3 on the test set, outperforming traditional models like linear regression, decision trees, multi-layer perceptrons, support vector regression, and random forests. Feature importance analysis identified electrical conductivity, turbidity, and water temperature as the most significant predictors of Chl-a levels. This study further enhances model interpretability through Pearson correlation analysis, which quantifies the relationships between Chl-a concentrations and other water quality factors. Additionally, we employed principal component analysis (PCA), mutual information, Spearman rank correlation coefficients, and SHAP models to analyze feature importance and model interpretability in ML. The model’s robustness was tested across multiple monitoring sites in Lake Taihu, demonstrating its potential for broader application in other eutrophic lakes facing similar environmental challenges. By providing a reliable tool for forecasting Chl-a concentrations, this research contributes to the development of early warning systems that can help mitigate the impacts of Cyano-HABs, aiding in more effective water resource management.

Original languageEnglish
Article number1219
JournalWater (Switzerland)
Volume17
Issue number8
DOIs
StatePublished - Apr 2025
Externally publishedYes

Keywords

  • chlorophyll-a
  • Lake Taihu
  • machine learning
  • XGBoost

Fingerprint

Dive into the research topics of 'Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu'. Together they form a unique fingerprint.

Cite this