Integrating climate and satellite remote sensing data for predicting county-level wheat yield in China using machine learning methods

Early and reliable crop yield prediction on a large scale is imperative for making in-season crop management decisions as well as for ensuring global food security. The integrated use of climate and remote sensing data for predicting yield at regional and national scales has been previously investigated in various parts of the world. However, such attempts for national scale yield prediction, particularly in different planting zones in China have been rarely reported. For this purpose, this study explored the potential of nine climate variables, three remote sensing-derived metrics, and three machine learning methods (random forest, support vector machine, and least absolute shrinkage and selection operator) for predicting wheat yield based on data acquired during 2002–2010from 1582 counties across China’s three wheat planting zones. Our results illustrated large spatial divergences for yield prediction.

The best performance (R2 = 0.79 and R2 = 0.66) was achieved for the northern winter wheat and northern spring wheat planting zones, respectively. Water-related climatic variables outperformed temperature-related variables, with the best individual predictive performance (R2 = 0.67). Solar-induced chlorophyll fluorescence had better performance (R2 = 0.60) for predicting the crop yield than NDVI and EVI. Climate data across the whole growing season has provided additional information for yield prediction as compared to remote sensing data. The additional contribution for yield prediction in winter wheat planting zones benefiting from climate data decreased from sowing to maturity, which was the opposite in remote sensing data.

Typically, the support vector machine outperformed other models and the prediction in winter wheat planting zones performed better than the spring wheat planting zone. Our study demonstrates the effectiveness of integrating climate and remote sensing data for accurate county-level yield prediction in China. These kinds of simple and scalable machine learning methods could be targeted for further work by agricultural researchers and advisors.