Author: Paulino Pérez-Rodríguez

Replication Data for: Multi-trait Bayesian decision for parental selection

Jose Crossa Fernando Henrique Toledo Paulino Pérez-Rodríguez (2020)

The files included in this study contains the data used with three promising multivariate loss functions: Kullback-Leibler (KL); the Energy Score; and the Multivariate Asymmetric Loss (MALF); to select the best performing parents for the next breeding cycle in two extensive real wheat data sets.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Genomic Prediction with Genotype by Environment Interaction Analysis for Kernel Zinc Concentration in Tropical Maize Germplasm

Edna Mageto Jose Crossa Paulino Pérez-Rodríguez Thanda Dhliwayo natalia palacios rojas XUECAI ZHANG (2020)

The Zinc association mapping (ZAM) panel is a set of 923 elite inbred lines from the International Maize and Wheat Improvement Center (CIMMYT) biofortification breeding program. The panel represented wide genetic diversity for kernel Zn and is comprised of several lines with tolerance/resistance to an array of abiotic and biotic stresses commonly affecting maize production in the tropics, improved nitrogen use efficiency, and grain nutritional quality. The ZAM panel_923_LINES_GENO and Zinc association mapping (ZAM) panel_phenotypic data are two files with GBS and phenotypic data for zinc (Zn) from this population. From the ZAM panel, four inbred lines (two with high-Zn and two with low-Zn) were selected and used to form the bi-parental populations, namely DH population1 and DH population2. Genotypic and phenotypic data corresponding to these populations are DH populations1&2_255_LINES_GENO and DH population1_phenotypic data and DH population2_phenotypic data

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Approximate kernels for large data sets In genome-based prediction

Osval Antonio Montesinos-Lopez Johannes Martini Paulino Pérez-Rodríguez Jose Crossa (2020)

The rapid development of molecular markers and sequencing technologies has made it possible to use genomic selection (GS) and genomic prediction (GP) in animal and plant breeding. However, computational difficulties arise when the number of observations is large. This five datasets provided here were used to support a comparative analysis of two genomic-enabled prediction models: the full genomic method single environment (FGSE) and the approximate kernel method for a single environment model (APSE). The data were also used to compare the full genomic method with genotype × environment model (FGGE) to the approximate kernel method with genotype × environment interaction (APGE). The results of the analyses are described in the related publication.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Supplemental data for hybrid wheat prediction using genomic, pedigree and environmental covariables interaction models

BHOJA BASNET Jose Crossa Paulino Pérez-Rodríguez Ravi Singh Fatima Camarillo-Castillo (2018)

Genomic prediction of hybrids unobserved in field evaluations is crucial. In this study, we used genomic G×E models for hybrid prediction, where similarity between lines was assessed by pedigree and molecular markers, and similarity between environments was accounted for by environmental covariables.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Joint use of genome, pedigree and their interaction with environment for predicting the performance of wheat lines in new environments

Osval Antonio Montesinos-Lopez Philomin Juliana Ravi Singh Jesse Poland Paulino Pérez-Rodríguez Jose Crossa DIEGO JARQUIN (2019)

In this study, we evaluated genome-based prediction using 35,403 wheat lines from the Global Wheat Breeding Program of the International Maize and Wheat Improvement Center (CIMMYT). We implemented eight statistical models that included genome-wide molecular marker and pedigree information in two different validation schemes. All models included main effects, and others also considered interactions between the different types of covariates via Hadamard products of similarity structures. The pedigree models always gave better results predicting new lines in observed environments than the genome-based models when only main effects were fitted. However, for all traits, the highest predictive abilities were obtained when interactions between pedigree, markers and environments were included. When new lines were predicted in unobserved environments in almost all trait/year combinations, the marker main-effects model was the best. These results provide strong evidence that the different sources of genetic information (molecular markers and pedigree) are not equally useful at different stages of the breeding pipelines, and can be employed differentially to improve the design of future breeding programs.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Deep kernel and deep learning for genomic-based prediction

Jose Crossa Paulino Pérez-Rodríguez Juan Burgueño Ravi Singh Philomin Juliana Osval Antonio Montesinos-Lopez Jaime Cuevas (2019)

Deep learning (DL) is a promising method in the context of genomic prediction for selecting individuals early in time without measuring their phenotypes. iI this paper we compare the performance in terms of genome-based prediction of the DL method, deep kernel (arc-cosine kernel, AK) method, Gaussian kernel (GK) method and the conventional kernel method (Genomic Best Linear Unbiased Predictor, GBLUP, GB). We used two real wheat data sets for the benchmarking of these methods. We found that the GK and deep kernel AK methods outperformed the DL and the conventional GB methods, although the gain in terms of prediction performance of AK and GK was not very large but they have the advantage that no tuning parameters are required. Furthermore, although AK and GK had similar genomic-based performance, deep kernel AK is easier to implement than the GK. For this reason, our results suggest that AK is an alternative to DL models with the advantage that no tuning process is required.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Deep kernel of genomic and near infrared predictions in multi-environment breeding trials

Jaime Cuevas Osval Antonio Montesinos-Lopez Philomin Juliana Paulino Pérez-Rodríguez Juan Burgueño Carlos Guzman Jose Crossa (2019)

In genomic prediction deep learning artificial neural network are part of machine learning methods that incorporate parametric, non-parametric and semi-parametric statistical models. Kernel methods are seeing more flexible, and easier to interpret than neural networks. Kernel methods used in genomic predictions comprise the linear genomic best linear unbiased predictor (GBLUP) kernel (GB) and the Gaussian kernel (GK). These kernels have being used with two statistical models, single environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has being used as phenotype method for prediction of unobserved line performance in plant breeding trials. In this study, we used a non-linear Arc-cosine kernel (AK) that emulates deep learning artificial neural network. We compared AK prediction accuracy with GB and GK kernel methods in four genomic data sets one of them including also pedigree (ABLUP) and NIR (NBLUP) information. Results show that for all four data sets AK and GK kernels gave higher prediction accuracy than the linear GB kernel for single environment as well as GE multi-environment models. In addition, AK gave similar or slightly higher prediction accuracy than the GK kernel.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA