Secure Inference on Homomorphically Encrypted Genotype Data with Encrypted Linear Models

Authors

  • Meng Zou* School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
  • Guangyang Zhang School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
  • Fan Zhang Tencent, Beijing 100193, China
  • Guoping Liu * School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China

Keywords:

homomorphic encryption, genotype to phenotype, CKKS, iDASH- 2022

Abstract

Background: Accurate models are crucial to estimate the phenotypes from high throughput genomic data. While the genetic and phenotypic data are sensitive, secure models are essential to protect the private information. Therefore, construct an accurate and secure model is significant in secure inference of phenotypes.

Methods: We propose a secure inference protocol on homomorphically encrypted genotype data with encrypted linear models. Firstly, scale the genotype data by feature importance with Xgboost or Adaboost then train linear models to predict the phenotypes in plaintext. Secondly, encrypt the model parameters and test data with CKKS scheme for secure inference. Thirdly, predict the phenotypes under CKKS homomorphically encryption computation. Finally, decrypt the encrypted predictions by client to compute the 1-NRMSE/AUC for model evaluation.

Results: 5 phenotypes of 3000 samples with 20390 variants are used to validate the performance of the secure inference protocol. The protocol achieves 0.9548, 0.9639, 0.9673 (1-NRMSE) for 3 continuous phenotypes and 0.9943, 0.99290 (AUC) for 2 category phenotypes in test data. Moreover, the protocol shows robust in 100 times of random sampling. Furthermore, the protocol achieves 0.9725 (the average accuracy) in an encrypted test set with 198 samples, and it only takes 4.32s for the overall inference. These help the protocol rank top one in the iDASH-2022 track2 challenge.

Conclusion: We propose an accurate and secure protocol to predict the phenotype from genotype and it takes seconds to obtain hundreds of predictions for all phenotypes.

Downloads

Published

2024-08-31

Issue

Section

Original Articles

How to Cite

Meng Zou*, Guangyang Zhang, Fan Zhang, and Guoping Liu * , trans. 2024. “Secure Inference on Homomorphically Encrypted Genotype Data With Encrypted Linear Models”. Human Biology 94 (4): 774-80. https://www.humbiol.org/Home/article/view/146.

Most read articles by the same author(s)

1 2 3 4 5 > >>