Use of Random Forest classifier for rainfall and rain rate retrieval
Understanding the hydrologic and energy cycles to enhance our meteorological and hydrological monitoring capabilities, e.g. to predict (flash) floods, manage water resources and make an agricultural decision at a farm-scale level require high spatial resolution rainfall information, including its distribution and quantity. However, rainfall’s intricate characteristics, such as high spatiotemporal variability, hinder accurate spatial rainfall retrieval from prevailing techniques.
Traditional systems such as rain gauges provide point estimates that are accurate but with low spatial representativeness but satellite-based retrievals with extensive coverage may be suitable for filling the spatial rainfall information gap. Notably, retrieval from geostationary satellites, e.g. the Meteosat Second Generation (MSG), has received significant attention in the past decade due to its high measurement resolution and broad spectral range capable of observing sudden and intense rainfall with thunderstorms from convective systems. Most MSG-based retrieval techniques use the data to infer optical and microphysical cloud-top properties, e.g. cloud top optical thickness or relate the satellites spectral characteristics to cloud-top properties and rainfall. Common to all these techniques is a parametric approach that uses parametric tests and conceptual models to relate the cloud-top property to rainfall. Albeit straightforward in application, requiring few input variables and can directly map the conceptual knowledge of the rain generation process onto the retrieval using the satellite data as proxies, the non-linear and complex relation between cloud top property and rainfall process (within the cloud column) may be beyond the skill of these parametric techniques.
The Random Forest (RF) is an ensemble classification and regression machine learning algorithm that assumes that a whole set of trees can make more accurate predictions than a single tree or network. It relies on data-driven analysis to explore the relationship between variables and have strong capabilities in dealing with non-linear and complex relations between variables, making it suitable for rainfall estimation to overcome the apparent limitations of the parametric techniques.
Therefore the objective of this study is to evaluate the usefulness of the RF algorithm trained with in-situ (gauge) rainfall intensities for estimating high spatial resolution rainfall from cloud top properties of the MSG satellite and compare the results obtained with a number of other satellite derived rainfall estimates.