Since HaploGrep2 currently accepts the upload of a single file only, the upload of single VCF files can be cumbersome. As a workaround you can merge single VCF files into one large VCF file. In order to do so there are several ways:
a) GATKs –
b) VCFlib –
c) VCFtools –
Example usage (Samtools tabix required)
Hope this helps!
We just updated HaploGrep2, with the following minor points:
- the Export of the Extended Report (Export / Haplogroup Extendet (txt)) got extended with the “Found_Polys” – see Describtion below:
- The Report for the potential Phantom Mutations got corrected, so that positions with bases according the rCRS reference are not listed anymore.
- Report of the possible Recombinations based now on Hamming Distance instead of Kulczynksi-Distance.
- 6 haplogoups were labeled wrong with H2 instead of H:
|previously until 2.0.3
|correct in 2.1.0
The new column Found_Polys lists the polymorphisms used for the haplogroup classification:
||Not_ Found Polys
||Remai ning _Polys
|| Input_ Sample
||16024 -16569; 1-576;
||[L0f, L0f2, L0f2a]
||73G 146C 152C 182T! 185A 189G 195C! 207A 247A …
||73G 199C 315.1C 16325C 16354T
||73G 146C 152C 185A 189G 199C 207A 247A 263G …
The columns in detail:
SampleID – the identifier of the sample
Range – Sequenced / Genotyped positions on the mitochondrial genome
Haplogroup – resulting Haplogroup
Cluster – if first hit is ambiguous, the result of the cluster is listed in this column
Overall_Rank – the haplogrouping score (from 0.5 to 1) where 0.5 is indicates no SNPs found, and 1 is a perfect match. Now always with “.” as decimal separator
Not_found_Polys – false negatives – mutations expected in this haplogroup but not found
Found_Polys – true positives – mutations found for the resulting haplogroup. Backmutations are considered as well, indicated by ! (see 182T! or 195C! in Sample Africa01)
Remaining_Polys – Variants not being used for this haplogrouping classification – indicates: a) possibly new haplogroup, or b) possible sample admixture, or phantom mutation (false positives). Listed here are hotspot mutations as well as local private mutations (found in at least one different haplogroup) or global private mutation (unknown in the current phylogeny), as well as heteroplasmic mutations or reference identical positions (the latter is often the case for MicroArray based data).
AAC_Remaining – the remaining Variants in the previous column are checked – and marked as such if involved in an Amino Acid Change.
Input_Sample – the profile used for the classification
Happy to announce the publication of the HaploGrep2 paper in this years Web Server Issue 2016 in Nucleic Acids Research. For data generated with massive parallel sequencing devices in form of fastq or bam files, the mtDNA-Server paper also published in this years issue, gives some details there.
With some delay, we finally updated the stand-alone Version of HaploGrep to the latest version 2. This includes the latest Phylotree 17 (Forensic Science International: Genetics Supplement Series, from December 2015), finescaling the human phylogeny even further. This version has no file-upload limit as currently applied on the web service (file size of 5MB and max. 3,000 samples – you can use compressed files in zip format tough). We also provide a command-line version of haplogrep2, which makes it straight-forward to integrate it in your workflows or pipelines directly. Or you can use the Rest-API for doing so.
Here’s the direct link to the Download Page – enjoy – and don’t hesitate to contact us in case of questions, suggestions, or any kind of problems.
And here’s the evolution of HaploGrep’s sessions per month from Google analytics, with the release of the Phylotree Versions:
Here’s the overview of the new export options, HaploGrep 2 offers, by clicking on the small arrow next to Export:
Missing a specific export format? Feel free to contact us!
After exactly 2 years, Phylotree, the “database” behind HaploGrep got updated by Mannis van Oven. Here’s the accompanying publication on Phylotree 17 . The mtDNA tree has now 5,437 haplogroups, which is a growth of over 13% to the previous version. Find out how Phylotree 17 differs for your dataset, by using the updated HaploGrep 2 Version, with the latest mtDNA tree build 17.
The 1000 Genome Consortium recently released the Phase 3 mitochondrial DNA data of over 2,500 samples for download, besides their papers in Nature, stating:
Mitochondrial chromosome variants are now available for the Phase 3 individuals from our FTP site
Since in the current Version HaploGrep 2 can deal with the VCF file (unzipped), all 2500 samples can be analyzed instantly
Over the last years, HaploGrep became the de facto standard for automatic haplogroup classification (~ 18.000 users, cited over 140 times, about 120 local installations) and is also used in several commercial systems and research pipelines. There was quite some work done underneath the surface of HaploGrep, especially to improve our haplogroup classification performance and to keep up with the latest requirements. After almost a year in beta (see entry from Sept 2014), we think it’s finally time to replace the initial version of HaploGrep with the new and improved version Haplogrep2. We hope you like the new version and would appreciate any kind of feedback!
These are the major improvements:
- Improved classification algorithm resulting in a speed up of 20x!
- HaploGrep includes now a rule-based engine. The two new columns “warnings” (W) and “errors” (E) are showing abnormalities in the input file detected with the new engine. We very much appreciate the input and suggestions from Hans Bandelt and Antonio Salas!
- New Import Formats (VCF + FASTA) supported
- Updated to the latest security standards on server side. So we are finally back on Firefox!
- Apply different ranking algorithms (e.g. Jaccard, Hamming Distance) besides our default ranking algorithm, the Kulczynski distance. These new ranking algorithms will be introduced one by one and are therefore currently disabled.
- Provide HaploGrep also as a command line version (included in mtDNA-Server)
- Direct support of VCF files through the Htsjdk library.
New export formats supported:
Points we (currently) removed from the beta:
- Removed direct support of heteroplasmic sites (Y,R)
- How to use the REST-API.
- Fasta Import is available (open a *.fasta file!) but still in beta.
Here’s the updated version: HaploGrep2
HaploGrep 2.0 Beta allows the export of a multiple alignment fasta format. Working with the new version, the generation of phylogenetic trees becomes therefore straight forward. Beside its own Phylogenetic tree directly based on Phylotree (see previous blog entry), we present here the basic steps to generate phylogenetic trees based on multiple alignment fasta files by using MrBayes, Neighbor Joining or Maximum Likelihood. For this purpose we recommend Ugene which is a very powerful toolset not only for Next-Gen sequencing projects. The following steps show how simple this process can be: (more…)
Within the new HaploGrep 2.0 Beta version some new export formats are now supported. One of the most powerful is the export of a phylogenetic tree, representing the current profiles loaded into HaploGrep. This feature generates almost publication-ready phylogenetic trees. We used them with almost no modification (some color-highlighting in Inkscape) – please see .
Click here to read more –> (more…)