Benchmarking data-driven rainfall-runoff modeling across 54 catchments in the Yellow River Basin: Overfitting, calibration length, dry frequency

Jin Jin; Yanning Zhang; Zhen Hao; Runliang Xia; Wushuang Yang; Hanlin Yin; Xiuwei Zhang

doi:10.1016/j.ejrh.2022.101119

Benchmarking data-driven rainfall-runoff modeling across 54 catchments in the Yellow River Basin: Overfitting, calibration length, dry frequency

Jin Jin, Yanning Zhang, Zhen Hao, Runliang Xia, Wushuang Yang, Hanlin Yin, Xiuwei Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

20 Scopus citations

Abstract

Study region: Yellow River Basin, China. Study focus: The rainfall-runoff modeling performance of Data-Driven Models (DDM) in the Yellow River Basin (YRB) at a large scale is unclear such that the DDM research in the YRB lacks essential reference which can be critical for model development research. Understanding the advantages and disadvantages of DDMs by comparing them to Process Based Models (PBM) helps model selection in practice, especially when benchmarking is performed at large scales. We benchmarked three DDMs, namely SVM, LSTM and CNN-LSTM, and a PBM, namely the Xinanjiang (XAJ) model, across 54 basins of the YRB. Factors affecting DDM performance are identified and the sensitivity of PBM to these factors is also discussed. New hydrological insights for the region: DDM performs the best in the upper reaches, the worst in the middle reaches. PBM demonstrates a wider applicability when calibration data is limited, whereas DDM generally outperforms PBM for areas where data limitation is not a problem. The most important catchment attribute affecting PBM and DDM is a high frequency of dry days (<1 mm d⁻¹). However, DDM is more vulnerable to this factor. In addition, DDM performance depends heavily on the introduced lagged streamflow when data is insufficient. We conclude that the rainfall-runoff modeling relationship in catchments with high drought frequencies is more complex, resulting in DDM requiring more data, but PBM is less affected by these factors, indicating that PBM has better applicability in the case of limited data.

Original language	English
Article number	101119
Journal	Journal of Hydrology: Regional Studies
Volume	42
DOIs	https://doi.org/10.1016/j.ejrh.2022.101119
State	Published - Aug 2022

Keywords

Benchmarking
CNN-LSTM
LSTM
Rainfall-runoff modeling
XAJ
Yellow River Basin

Access to Document

10.1016/j.ejrh.2022.101119

Cite this

@article{7fa099bb9a074d318a68e4f00e4e2df4,

title = "Benchmarking data-driven rainfall-runoff modeling across 54 catchments in the Yellow River Basin: Overfitting, calibration length, dry frequency",

abstract = "Study region: Yellow River Basin, China. Study focus: The rainfall-runoff modeling performance of Data-Driven Models (DDM) in the Yellow River Basin (YRB) at a large scale is unclear such that the DDM research in the YRB lacks essential reference which can be critical for model development research. Understanding the advantages and disadvantages of DDMs by comparing them to Process Based Models (PBM) helps model selection in practice, especially when benchmarking is performed at large scales. We benchmarked three DDMs, namely SVM, LSTM and CNN-LSTM, and a PBM, namely the Xinanjiang (XAJ) model, across 54 basins of the YRB. Factors affecting DDM performance are identified and the sensitivity of PBM to these factors is also discussed. New hydrological insights for the region: DDM performs the best in the upper reaches, the worst in the middle reaches. PBM demonstrates a wider applicability when calibration data is limited, whereas DDM generally outperforms PBM for areas where data limitation is not a problem. The most important catchment attribute affecting PBM and DDM is a high frequency of dry days (<1 mm d−1). However, DDM is more vulnerable to this factor. In addition, DDM performance depends heavily on the introduced lagged streamflow when data is insufficient. We conclude that the rainfall-runoff modeling relationship in catchments with high drought frequencies is more complex, resulting in DDM requiring more data, but PBM is less affected by these factors, indicating that PBM has better applicability in the case of limited data.",

keywords = "Benchmarking, CNN-LSTM, LSTM, Rainfall-runoff modeling, XAJ, Yellow River Basin",

author = "Jin Jin and Yanning Zhang and Zhen Hao and Runliang Xia and Wushuang Yang and Hanlin Yin and Xiuwei Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 The Authors",

year = "2022",

month = aug,

doi = "10.1016/j.ejrh.2022.101119",

language = "英语",

volume = "42",

journal = "Journal of Hydrology: Regional Studies",

issn = "2214-5818",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Benchmarking data-driven rainfall-runoff modeling across 54 catchments in the Yellow River Basin

T2 - Overfitting, calibration length, dry frequency

AU - Jin, Jin

AU - Zhang, Yanning

AU - Hao, Zhen

AU - Xia, Runliang

AU - Yang, Wushuang

AU - Yin, Hanlin

AU - Zhang, Xiuwei

PY - 2022/8

Y1 - 2022/8

N2 - Study region: Yellow River Basin, China. Study focus: The rainfall-runoff modeling performance of Data-Driven Models (DDM) in the Yellow River Basin (YRB) at a large scale is unclear such that the DDM research in the YRB lacks essential reference which can be critical for model development research. Understanding the advantages and disadvantages of DDMs by comparing them to Process Based Models (PBM) helps model selection in practice, especially when benchmarking is performed at large scales. We benchmarked three DDMs, namely SVM, LSTM and CNN-LSTM, and a PBM, namely the Xinanjiang (XAJ) model, across 54 basins of the YRB. Factors affecting DDM performance are identified and the sensitivity of PBM to these factors is also discussed. New hydrological insights for the region: DDM performs the best in the upper reaches, the worst in the middle reaches. PBM demonstrates a wider applicability when calibration data is limited, whereas DDM generally outperforms PBM for areas where data limitation is not a problem. The most important catchment attribute affecting PBM and DDM is a high frequency of dry days (<1 mm d−1). However, DDM is more vulnerable to this factor. In addition, DDM performance depends heavily on the introduced lagged streamflow when data is insufficient. We conclude that the rainfall-runoff modeling relationship in catchments with high drought frequencies is more complex, resulting in DDM requiring more data, but PBM is less affected by these factors, indicating that PBM has better applicability in the case of limited data.

AB - Study region: Yellow River Basin, China. Study focus: The rainfall-runoff modeling performance of Data-Driven Models (DDM) in the Yellow River Basin (YRB) at a large scale is unclear such that the DDM research in the YRB lacks essential reference which can be critical for model development research. Understanding the advantages and disadvantages of DDMs by comparing them to Process Based Models (PBM) helps model selection in practice, especially when benchmarking is performed at large scales. We benchmarked three DDMs, namely SVM, LSTM and CNN-LSTM, and a PBM, namely the Xinanjiang (XAJ) model, across 54 basins of the YRB. Factors affecting DDM performance are identified and the sensitivity of PBM to these factors is also discussed. New hydrological insights for the region: DDM performs the best in the upper reaches, the worst in the middle reaches. PBM demonstrates a wider applicability when calibration data is limited, whereas DDM generally outperforms PBM for areas where data limitation is not a problem. The most important catchment attribute affecting PBM and DDM is a high frequency of dry days (<1 mm d−1). However, DDM is more vulnerable to this factor. In addition, DDM performance depends heavily on the introduced lagged streamflow when data is insufficient. We conclude that the rainfall-runoff modeling relationship in catchments with high drought frequencies is more complex, resulting in DDM requiring more data, but PBM is less affected by these factors, indicating that PBM has better applicability in the case of limited data.

KW - Benchmarking

KW - CNN-LSTM

KW - LSTM

KW - Rainfall-runoff modeling

KW - XAJ

KW - Yellow River Basin

UR - http://www.scopus.com/inward/record.url?scp=85132854144&partnerID=8YFLogxK

U2 - 10.1016/j.ejrh.2022.101119

DO - 10.1016/j.ejrh.2022.101119

M3 - 文章

AN - SCOPUS:85132854144

SN - 2214-5818

VL - 42

JO - Journal of Hydrology: Regional Studies

JF - Journal of Hydrology: Regional Studies

M1 - 101119

ER -

Benchmarking data-driven rainfall-runoff modeling across 54 catchments in the Yellow River Basin: Overfitting, calibration length, dry frequency

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this