Combine multiple VCF files

Since HaploGrep2 currently accepts the upload of a single file only, the upload of single VCF files can be cumbersome. As a workaround you can merge single VCF files into one large VCF file. In order to do so there are several ways:

a) GATKs – CombineVariants
b) VCFlib – vcfcombine
c) VCFtools – vcf-merge
Example usage (Samtools tabix required)

Hope this helps!

New HaploGrep Exports in Detail

Here’s the overview of the new export options, HaploGrep 2 offers, by clicking on the small arrow next to Export:

haplogrep_export

Missing a specific export format? Feel free to contact us!

1000G Phase 3 mtDNA data available

nature_1000G_phase3

The 1000 Genome Consortium recently released the Phase 3 mitochondrial DNA data of over 2,500 samples for download, besides their papers in Nature, stating:

Mitochondrial chromosome variants are now available for the Phase 3 individuals from our FTP site

Since in the current Version HaploGrep 2 can deal with the VCF file (unzipped), all 2500 samples can be analyzed instantly

HaploGrep 2.0 is ready!

Over the last years, HaploGrep became the de facto standard for automatic haplogroup classification (~ 18.000 users, cited over 140 times, about 120 local installations) and is also used in several commercial systems and research pipelines. There was quite some work done underneath the surface of HaploGrep, especially to improve our haplogroup classification performance and to keep up with the latest requirements. After almost a year in beta (see entry from Sept 2014), we think it’s finally time to replace the initial version of HaploGrep with the new and improved version Haplogrep2. We hope you like the new version and would appreciate any kind of feedback!

These are the major improvements:

  • Improved classification algorithm resulting in a speed up of 20x!
  • HaploGrep includes now a rule-based engine. The two new columns “warnings” (W) and “errors” (E) are showing abnormalities in the input file detected with the new engine. We very much appreciate the input and suggestions from Hans Bandelt and Antonio Salas!
  • New Import Formats (VCF + FASTA) supported
  • Updated to the latest security standards on server side. So we are finally back on Firefox!
  • Apply different ranking algorithms (e.g. Jaccard, Hamming Distance) besides our default ranking algorithm, the Kulczynski distance. These new ranking algorithms will be introduced one by one and are therefore currently disabled.
  • Provide HaploGrep also as a command line version (included in mtDNA-Server)
  • Direct support of VCF files through the Htsjdk library.

New export formats supported:

Points we (currently) removed from the beta:

  • Removed direct support of heteroplasmic sites¬†(Y,R)
  • How to use the REST-API.
  • Fasta Import is available (open a *.fasta file!) but still in beta.

Here’s the updated version: HaploGrep2

Importing VCF file to HaploGrep 2.0

vcf_panes

With the establishment of NGS-Devices and the resulting data flood, new file formats such as FASTQ, SAM, BAM or VCF became de-facto standards in the bioinformatics data world.
Especially the VCF-file containing the variants became of special interest in the user requests lately. There are some python scripts available, that convert a VCF file to a HaploGrep hsd file, as well as a publication with a tool that dedicates itself to this topic. To simplify your life, we decided to implement the VCF file import directly into HaploGrep2. You can give it a try with the 1000 Genome mtDNA VCF file from Phase 1:

1000G Phase 1 VCF File