HaploGrep2 is a free human mitochondrial DNA (mtDNA) haplogroup classification tool and performes advanced Quality Control on the uploaded mtDNA data.
We currently support the upload of VCF files, HSD files and FASTA files (beta). VCF files can be compressed with gzip (.vcf.gz) or uncompressed (.vcf), .hsd or .txt files are interpretet as HaploGreps tab-delimited file format and fasta can be provided as .fasta, .fa or .zip.
- 1. Data sensitivity
- 2. Prepare your data:
- 2.1. Sanger based
- 2.2 NGS
- 2.3 MicroArray
- 3. Haplogroup classification and User Interaction
- 4. Contact
1. Data sensitivity
2.1 Prepare your Sanger based dataIf your data is derived from a Sanger bases sequencing study, you have either one of the following formats: fasta, Sequencher or SeqScape format. Data in Sequencher or SeqScape format can be imported in eCOMPAGT as text files and exported as HSD files directly from there. For Fasta files we currently recommend to use mtDNAprofiler , which can directly export HSD files from the imported fasta files. To do so, go to http://mtprofiler.yonsei.ac.kr:8080/ in the mtDNA nomenclature tab upload the files (50 allowed at once). After this is done and processed click on the d. mtSNP profile Profile in a new window - check all samples and download as Tab-delimited format.
If you're data is in a different format you could also adapt to the file format directly supported by HaploGrep, which is a tab-delimited file with the following columns:
|SampleID||Range||Haplogroup (blank or ? or precalculated)||Polymorphisms (preferentially seperated by tab, but also space allowed)|
|HG00096||1-16569||?||263G 309.1C 315.1C 750G 1438G 4769G 8592A 8860G 10394T 15326G 15340G 16519C|
|HG00099||1-16569||H1ae||263G 315.1C 750G 1438G 3010A 4769G 6620C 8860G 15326G 15553A 16519C|
|HG00353||1-16569||93G 263G 315.1C 523d 524d 750G 1438G 3010A 3796G 4769G 8860G 15326G 16189C 16356C 16362C 16519C|
The polymorphisms provided are required to be annotated accordingly:
|Transition||Position and base optionally||263 or 263G|
|Transversion||Position and base required||12633A or 15452A|
|Heteroplasmy||see IUPAC CODE||152Y or 263R|
|Deletions||Range or position and d or del||8281-8289d or 523d 524d or 523del 524del|
|Insertion||Position and .1 and base/s or increment per base||573.1C, 573.XC, 524.1AC, 524.1A, 524.2C|
2.2 Prepare your NGS dataIf you have raw data from an NGS study you can directly use mtDNA-server which accepts both, FASTQ and BAM files - it also performs a haplogroup classification based on HaploGrep. Input files in VCF/VCF.GZ or the HSD in the first version of HaploGrep ( both aligned to rCRS) are accepted. We currently allow the import of FASTA/ZIP files too, this is however in Beta state and the results have to be handled carefully. If your data is from an Whole Exome or Whole Genome Sequencing project, you can extract the mitochondrial reads to limit the amount of uploaded data. You can use the samtools to do so:
samtools view -b NA18539.mapped.ILLUMINA.bwa.CHB.low_coverage.20101123.bam MT: -o NA18539MT.bamNevertheless, mtDNA-Server will also extract the mtDNA region automatically from the uploaded data.
2.3 Prepare your MicroArray dataIf the data is in VCF file, everything is fine. Remember that HaploGrep works with the rCRS. Otherwise there are a variety of tools to convert data to VCF: For data in PLINK file formats, use either of those tools to convert to VCF:
3. Haplogroup classification
After data upload, the haplogroup classification is started automatically with the new version. Currently the upload is limited to 3,000 samples, as part of a fair use policy. The classification is based on the Kulczynksi distance, presented in HaploGrep paper. HaploGrep's user interface after data upload looks as follows, where red circles indicate new features in HaploGrep 2:
- 1. File upload: VCF, VCF.GZ, TXT or HSD and FASTA/ZIP are accepted
- 2. Load 120 testsamples, described in the previous HaploGrep paper
- 3. Data Export, provides several new data formats:
- The HaploGrep result file, as HSD.
- The extended HaploGrep result file, with remaining and not found polymorphisms, as well as amino acid change annotation for remaining polymorphisms as text files.
- Graphical Phylogenetic Tree, opened in Browser as pdf, or as download (svg, pdf or png).
- VCF file, with preview in Browser
- Multiple Alignment Format opened in Browser
- Fasta files
- Fluxus Network.exe files
- 4. Apply best hits refers to two situations: the data was opened with predefined haplogroups or the user changed the Phylotree version resulting in different results. The new haplogroup is listed in brackets and has to be confirmed with this button.
- 5. Check for Recombination: this feature assumes the data is generated by using PCR products, and the user has the information what fragments where used. Not applicable for whole genome sequencing or exome sequencing and MicroArray data. Note: Please allow popups in your Browser to view the results.
- 6. Check for Phantom mutations: this new feature provides a list of variants being very rare according previously published studies (see Soares et al.), requiring at least two samples. This check can also highlight issues with mapping tools, and problems with the correct nomenclature.
- 7. Check for Haplogroup discordance: this report lists samples based on reclassification with 4 different metrics. If a sample is present in this list, a manual inspection of the sample is considered! Either expected polymorphism are missing or ambiguous haplogroups indicate a need for refining the phylogeny.
- 8. Change version of Phylotree: use this drop-down list, to change to older Phylotrees. This can be of interest for replicating and validate old experiments.
- 9. Result Table, with different state, depending on the resulting quality. The new columns Warnings and Errors are introduced and the table can be sorted accordingly.
- 10. Change Haplogroup of Sample: HaploGrep 2 reports the 50 best haplogroups per sample. The user can change the haplogroup by selecting a different haplogroup and click this button.
- 11. Figure Legend: provides informations in the color-code used - see the figure:
- 12. Polymorphisms expected in the selected haplogroup, indicated by yes (green) and no (red)
- 13. Remaining polymorphisms, not defined in the haplogroup by Phylotree. Hotspots are ignored (green), as well as unknown mutations never observed in Phylotree. Remaining polymorphisms are annotated and amino acid changes are reported.
- 14. New Tabs are represented, where Lineage is the graphical representation of the classification result and the new Error and Warnings Tab reports all possible issues with the selected sample.
- 15. Alternatively this Panel represents Errors and Warnings: