Improving Crop Yield Prediction With Genotype-by-Environment Interaction Model Summary
The Problem
There is not yet a consensus in the scientific community on the best way to include genomic data and information on how a plant would interact with environmental conditions in a machine learning model for crop yield prediction. Genomic breeding, a process of screening thousands of candidates for field trials based on DNA alone, can save the time and resources needed to develop a new plant variety, such as growing better in drought conditions. Combing environmental and genomics data, also known as “enviromics,” is becoming more common as more environmental data from testing centers becomes available.
The Work
Igor Fernandes, a statistics and analytics master’s student at the University of Arkansas, worked with his adviser, Sam Fernandes, an Assistant Professor of Agricultural Statistics and Quantitative Genetics with the Arkansas Agricultural Experiment Station, and Caio Vieira, an Assistant Professor of Soybean Breeding for the experiment station, to develop a new model that included environmental data combined with genomics data with a machine learning algorithm to predict crop yield.
The experiment used the same data on corn plots from the Genomes to Fields Initiative that Igor Fernandes used in an international competition, but the researchers adjusted inputs as genetic, environmental, or a combination of both in “additive” and “multiplicative” manners.
The Results
When including environmental and genetic data in a more straightforward “additive” manner, the prediction accuracy was better than in the more complicated “multiplicative” manner. The simpler model took less time for the computer to process, and the mean prediction accuracy improved by 7 percent over the established model. The experiment was validated in three scenarios typically encountered in plant breeding.
Their study was published in the Theoretical and Applied Genetics journal under the title “Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials.”
The Value
Collectively, the researchers say the results are promising, especially with the increasing availability of environmental data and the interest in combining environmental features and genetic data for prediction purposes. Their immediate goal is to apply the new model to increase the capability of screening genotypes for field trials, which speeds up the process of developing new, higher-performing crop varieties to feed a growing global population.
Read the Research
Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials Theoretical and Applied Genetics
Volume 137, Issue 8 (2024)
https://doi.org/10.1007/s00122-024-04687-w
Supported in part by
Resources provided by the Arkansas High-Performance Computing Center, which is funded through multiple National Science Foundation grants and the Arkansas Economic Development Commission. Kaio O. G. Dias was supported by the Minas Gerais State Agency for Research and Development.
About the Researcher
Samuel B. Fernandes
Assistant Professor of Agricultural Statistics and Quantitative Genetics
Ph.D., Genetics and Plant Breeding, Universidade Federal de Lavras, Brazil
M.S., Genetics and Plant Breeding, Universidade Federal de Lavras, Brazil
B.S., Agronomy, Universidade de Brasília
Igor Fernandes
Graduate Student, Statistics and Analytics
M.S., Statistics and Analytics, University of Arkansas
B.S., Statistics, Universidade Federal de Goiás
Other Collaborators
Co-authors of the research include Caio Vieira, Assistant Professor of soybean breeding, Division of Agriculture, and Kaio O.G. Dias, Assistant Professor in the Department of General Biology at the Federal University of Viçosa in Brazil.