Calculate Polygenic Risk Score on UKB
Polygenic Risk Score, UKB
- use
dx
command line to log in the account in UKB RAP
dx login
- run the following shell command run swiss-army-knife of UKB RAP (extract genotyping data with some SNPs)
# pull genotyping data from UKB using UKB RAP
sh 01-pull-snps-imp37.sh
- description of some files
- codes in
01-pull-snps-imp37.sh
# shell command lines imp_file_dir="/Bulk/Imputation/UKB imputation from genotype" data_field="ukb22828" rsidlist="multiDis_SNP.txt" genetic_data_dir="/Genetic_data/" for i in $(seq 1 1 22); do run_snps="bgenix -g ${data_field}_c${i}_b0_v3.bgen -incl-rsids ${rsidlist} > chr_${i}.bgen" dx run swiss-army-knife -iin="${imp_file_dir}/${data_field}_c${i}_b0_v3.bgen" \ -iin="${imp_file_dir}/${data_field}_c${i}_b0_v3.sample" \ -iin="${imp_file_dir}/${data_field}_c${i}_b0_v3.bgen.bgi" \ -iin="${genetic_data_dir}/${rsidlist}" \ -icmd="${run_snps}" --tag="SelectSNPs" --instance-type "mem2_ssd2_v2_x16" \ --destination="${project}:${genetic_data_dir}" --brief --yes done
- part text in
multiDis_SNP.txt
rs1060743 rs1532278 rs3865444 rs6656401 rs7232
- codes in
- the following analysis are run in local machine
- combine multiple bgen files
# combine multiple bgen files cat-bgen -g ukb22828_chr_1.bgen -g ukb22828_chr_2.bgen -g ukb22828_chr_3.bgen \ -g ukb22828_chr_5.bgen -g ukb22828_chr_6.bgen -g ukb22828_chr_7.bgen \ -g ukb22828_chr_4.bgen -g ukb22828_chr_9.bgen -g ukb22828_chr_10.bgen -g ukb22828_chr_11.bgen \ -g ukb22828_chr_8.bgen -g ukb22828_chr_13.bgen -g ukb22828_chr_14.bgen -g ukb22828_chr_15.bgen \ -g ukb22828_chr_12.bgen -g ukb22828_chr_17.bgen -g ukb22828_chr_18.bgen -g ukb22828_chr_19.bgen \ -g ukb22828_chr_16.bgen -g ukb22828_chr_21.bgen -g ukb22828_chr_22.bgen -og initial_chr.bgen -clobber -g ukb22828_chr_20.bgen
- write index file (.bgen.bgi)
# write index file .bgen.bgi bgenix -g initial_chr.bgen -index -clobber
- extract SNPs for single disease (extract SNPs of multiple diseases using the above codes)
# extract SNPs disease_name="J10_INFLUPNEU" bgenix -g initial_chr.bgen -incl-rsids snps_${disease_name}.txt > ${disease_name}.bgen
- write index file for disease_name .bgen.bgi
# write index file for disease_name .bgen.bgi bgenix -g ${disease_name}.bgen -index -clobber
- convert to plinkformat
# convert to plinkformat plink2 --bgen ${disease_name}.bgen ref-first --sample ukb22828_c1_b0_v3.sample --freq --maf 0.01 --make-pgen --sort-vars --out ${disease_name}
- calculate prs score
# calculate prs score {-iin pfile, score_file} plink2 --pfile ${disease_name} --score ${disease_name}_scorefile.txt no-mean-imputation list-variants cols=maybefid,nallele,denom,dosagesum,scoreavgs,scoresums --out ${disease_name}_prsout.txt
${disease_name}_scorefile.txt
(summary statistics from the disease)
rsid effect_allele Beta rs146800061 G 0.362028 rs13088725 T 0.0946831 rs11726368 T 0.133672 rs117066711 C 0.388579 rs9507345 G 0.17304 rs2248248 G 0.14556 rs140194168 C 0.367733 rs28655344 C 0.14604 rs28627461 A 0.0763166 rs34772569 C 0.243625 rs2837113 G 0.0779496
- part of results
eid IID ALLELE_CT DENOM NAMED_ALLELE_DOSAGE_SUM SCORE1_AVG SCORE1_SUM 3274335 3274335 54 54 29.886 0.0418474 2.25976 4217650 4217650 54 54 28.941 0.0435369 2.35099 3332919 3332919 54 54 32.314 0.0430303 2.32363 2602284 2602284 54 54 31.773 0.0430288 2.32356 5418540 5418540 54 54 32.965 0.0450697 2.43377 2806429 2806429 54 54 31.004 0.0415628 2.24439 5922336 5922336 54 54 28.98 0.041512 2.24165 4299370 4299370 54 54 32.745 0.0441643 2.38487 5381487 5381487 54 54 31.012 0.0411954 2.22455 1811259 1811259 54 54 31.969 0.042009 2.26848 4164188 4164188 54 54 35.588 0.0456951 2.46753
- Example Polygenic risk score files
- This example file is for type 2 Diabetes as used in PMID: 31232720
- This example files follows the format of PMID: 35251129
- A seconf example file fro pancreatic cancer based on the file from the PGS Catalog. PGS Catalog Pancreatic cancer PRS