HATK
(HLA Analysis Tool-Kit) is a collection of tools and modules to perform HLA fine-mapping
analysis, which is to identify which HLA allele or amino acid position of the HLA gene is driving the disease. HLA fine-mapping analysis is an indispensable analysis in studies of autoimmune diseases.
In GWAS
(Genome-wide Association Test) and its fine-mapping analysis, researchers can obtain candidate causal variants of the target disease. However, the association test performed on the variants in the HLA(Human Leukocyte Antigen) region, chromosome 6p21, usually shows unreliable results because this region has an outlandish polymorphism. Consequently, Performing conventional association test based on SNP array panel may generate inaccurate signals in the HLA region.
On the other hand, the IPD-IMGT/HLA
, which is a specialist database, provides the official and most detailed information of the HLA region. Being updated 4 times a year, they keep and manage whole HLA allele information and name those alleles based on the nomenclature defined by the 'WHO Nomenclature Committee For Factors of the HLA System
’. Furthermore, they provide each HLA allele's (1) amino acid and (2) DNA sequence information. To use these data, Exact HLA allele information of patients is required and researchers may have to employ expensive HLA typing technologies. However, thanks to the recent development of HLA imputation and inference technologies, researchers now can obtain hundreds to thousands of patients’ HLA allele information and detour the cost issue of using HLA typing service.
Ultimately, HATK aims to perform an association test targeted to the HLA region. Based on patients’ HLA type information and its corresponding Amino acid and DNA sequence information distributed by the IMGT-HLA database, HATK builds a marker panel including not only the typical intergenic genomic variants(i.e. SNPs) markers but also variants of HLA region. Also, HATK provides the additional association test method so that researchers can analyze the signals arising in the amino acid sequence position.
First, Prepare OS X(Mac) or Linux operating system. HATK currently doesn't support Windows. It was checked that HATK can work in the next specific operating systems.
-
Linux :
- Ubuntu 19.04(Disco Dingo)
- Ubuntu 18.04.3 LTS(Bionic Beaver)
- CentOS_7
- Linux Mint 19.2 Cinnamon(Tina)
-
OS X :
- Catalina(with Bash NOT Zsh)
- Mojave
In case of using Catalina OS X, Make sure your default shell is 'Bash($)' not 'Zsh(%)'. To change the default shell to Bash, Please reference this blog(https://www.howtogeek.com/444596/how-to-change-the-default-shell-to-bash-in-macos-catalina/).
Then, Download this project in somewhere directory of your OS X or Linux system. It will be assumed that 'git' command is already installed in your system.
$ git clone https://github.com/WansonChoi/HATK.git
$ cd HATK
We strongly recommend using the latest version of 'Anaconda(or Miniconda)' to set up HATK.
-
install Anaconda or Miniconda.
- Anaconda : (https://www.anaconda.com/)
- Miniconda : (https://docs.conda.io/en/latest/miniconda.html)
-
Create a new independent Python virtual environment with the given YML file.
By using 'HATK_LINUX.yml' or 'HATK_OSX.yml' file in the project folder depending on your operating system, Create a new Python virtual environment.
$ conda env create -f HATK_OSX.yml ## OS X(Mac) $ conda env create -f HATK_LINUX.yml ## Linux
The above command will generate a new Python virtual environment named 'HATK', which contains dependent Python packages, R and R libraries, independent to your original Python system. For more detailed explanation about Anaconda's managing Python virtual environment, Please check this reference(https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually).
If the new virtual environment has been succuessfully installed, then activate it.
$ conda activate HATK
HATK will be implemented in this virtual environment.
(Tip) Type 'conda acitvate base' on your command line if you want to go back to your original Python system setting. (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#deactivating-an-environment)
(Tip) Type 'conda env remove -n HATK' in your command line if you want to remove this newly created virtual environment for HATK forever in your Anaconda. (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#removing-an-environment)
$ python3 HATK.py \
--variants example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18 \
--hped example/wtccc_filtered_58C_RA.hatk.300+300.hped \
--2field \
--pheno example/wtccc_filtered_58C_RA.hatk.300+300.phe \
--pheno-name RA \
--out MyHLAStudy/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18 \
--imgt 3320 \
--hg 18 \
--imgt-dir example/IMGTHLA3320 \
--multiprocess 2
This command will implement (1) IMGT2Seq, (2) NomenCleaner, (3) bMarkerGenerator, (4) HLA_Analyzer(Association Test - logistic regression), (5) Manhattan Plot and (6) Heatmap Plot, which are the minimal components for HLA fine-mapping analysis.
Each module of HATK can be implemented repectively. The README files of each of those modules are prepared in 'docs/' folder. Those files include more detailed explanation and respective usage examples.
Check which Human Genome version, e.g. hg18, hg19 or hg38, is being used in your study. HATK dosen't take responsibility for the case where different Human Genome versions are used. For example, SNP array data with 'hg19' and passing '18' to '-hg' argument.
HATK: HLA analysis toolkit - Wanson Choi, Yang Luo, Soumya Raychaudhuri, Buhm Han (https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa684/5879278)
The HATK Software Code is freely available for non-commercial academic research use. If you would like to obtain a license to the Code for commercial use, please contact Wanson Choi (WC) at [email protected] and Buhm Han (BH) at [email protected]. WE (WC and BH) MAKE NO REPRESENTATIONS OR WARRANTIES WHATSOEVER, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THE CODE PROVIDED HERE UNDER. IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO CODE ARE EXPRESSLY DISCLAIMED. THE CODE IS FURNISHED "AS IS" AND "WITH ALL FAULTS" AND DOWNLOADING OR USING THE CODE IS UNDERTAKEN AT YOUR OWN RISK. TO THE FULLEST EXTENT ALLOWED BY APPLICABLE LAW, IN NO EVENT SHALL WE BE LIABLE, WHETHER IN CONTRACT, TORT, WARRANTY, OR UNDER ANY STATUTE OR ON ANY OTHER BASIS FOR SPECIAL, INCIDENTAL, INDIRECT, PUNITIVE, MULTIPLE OR CONSEQUENTIAL DAMAGES SUSTAINED BY YOU OR ANY OTHER PERSON OR ENTITY ON ACCOUNT OF USE OR POSSESSION OF THE CODE, WHETHER OR NOT FORESEEABLE AND WHETHER OR NOT WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, INCLUDING WITHOUT LIMITATION DAMAGES ARISING FROM OR RELATED TO LOSS OF USE, LOSS OF DATA, DOWNTIME, OR FOR LOSS OF REVENUE, PROFITS, GOODWILL, BUSINESS OR OTHER FINANCIAL LOSS.