12 articles HaploGrep

HaploGrep 2 Stand-alone Version

With some delay, we finally updated the stand-alone Version of HaploGrep to the latest version 2. This includes the latest Phylotree 17 (Forensic Science International: Genetics Supplement Series, from December 2015), finescaling the human phylogeny even further. This version has no file-upload limit as  currently applied on the web service (file size of 5MB and max. 3,000 samples – you can use compressed files in zip format tough). We also provide a command-line version of haplogrep2, which makes it straight-forward to integrate it in your workflows or pipelines directly. Or you can use the Rest-API for doing so.

Here’s the direct link to the Download Page – enjoy – and don’t hesitate to contact us in case of questions, suggestions, or any kind of problems.

And here’s the evolution of HaploGrep’s sessions per month from Google analytics, with the release of the Phylotree Versions:

haplogrep_phylotree

 

New HaploGrep Exports in Detail

Here’s the overview of the new export options, HaploGrep 2 offers, by clicking on the small arrow next to Export:

haplogrep_export

Missing a specific export format? Feel free to contact us!

Phylotree 17 available

phylotree17

After exactly 2 years, Phylotree, the “database” behind HaploGrep got updated by Mannis van Oven. Here’s the accompanying publication on Phylotree 17 . The mtDNA tree has now 5,437 haplogroups, which is a growth of over 13% to the previous version. Find out how Phylotree 17 differs for your dataset, by using the updated HaploGrep 2 Version, with the latest mtDNA tree build 17.

 

 

 

HaploGrep’s RestAPI now available!

Hi all,
due to many requests we are happy to announce that we provide Haplogrep’s REST API to the public! This will allow everyone to determine haplogroups in a very convenient way. In the following snippets we show how a simple call works (a) from the UNIX command line using curl and (b) from Java. If you have an example running for other languages, please let us know and we will add it here.

Happy haplogrouping & happy new year!
Sebastian

1) Unix Command Line: This call uploads the file myfile.vcf and returns a JSON String including the tags “id” and “haplogroup”:

curl -i -X POST -H "Content-Type: multipart/form-data" -F "importfile=@myfile.vcf" https://haplogrep.uibk.ac.at/haplogrep-ws

2) Here you can see the exactly same call with Java:

import org.restlet.resource.*;
import org.restlet.ext.html.*;
import org.restlet.data.MediaType;
import org.restlet.data.Status;
import org.restlet.representation.*;
import org.json.*;

import java.io.File;
import java.io.IOException;

public class SampleClient {
public static void main(String[] args) throws IOException {
// change location here
File file = new File("/home/seb/samplefile.hsd");
//POST file
ClientResource cr = new ClientResource("https://haplogrep.uibk.ac.at/haplogrep-ws");
final FormDataSet fds = new FormDataSet();
fds.setMultipart(true);
final FormData fileRep = new FormData("importfile", new FileRepresentation(file, MediaType.APPLICATION_ALL));
fds.getEntries().add(fileRep);
cr.post(fds);

//Response
JSONArray jsonArray = new JSONArray(cr.getResponse().getEntityAsText());
for (int i = 0; i < jsonArray.length(); i++) {
JSONObject object = jsonArray.getJSONObject(i);
String id = (String) object.get("id");
String hg = (String) object.get("haplogroup");
Status status = cr.getResponse().getStatus();
}
}
}

1000G Phase 3 mtDNA data available

nature_1000G_phase3

The 1000 Genome Consortium recently released the Phase 3 mitochondrial DNA data of over 2,500 samples for download, besides their papers in Nature, stating:

Mitochondrial chromosome variants are now available for the Phase 3 individuals from our FTP site

Since in the current Version HaploGrep 2 can deal with the VCF file (unzipped), all 2500 samples can be analyzed instantly

HaploGrep 2.0 is ready!

Over the last years, HaploGrep became the de facto standard for automatic haplogroup classification (~ 18.000 users, cited over 140 times, about 120 local installations) and is also used in several commercial systems and research pipelines. There was quite some work done underneath the surface of HaploGrep, especially to improve our haplogroup classification performance and to keep up with the latest requirements. After almost a year in beta (see entry from Sept 2014), we think it’s finally time to replace the initial version of HaploGrep with the new and improved version Haplogrep2. We hope you like the new version and would appreciate any kind of feedback!

These are the major improvements:

  • Improved classification algorithm resulting in a speed up of 20x!
  • HaploGrep includes now a rule-based engine. The two new columns “warnings” (W) and “errors” (E) are showing abnormalities in the input file detected with the new engine. We very much appreciate the input and suggestions from Hans Bandelt and Antonio Salas!
  • New Import Formats (VCF + FASTA) supported
  • Updated to the latest security standards on server side. So we are finally back on Firefox!
  • Apply different ranking algorithms (e.g. Jaccard, Hamming Distance) besides our default ranking algorithm, the Kulczynski distance. These new ranking algorithms will be introduced one by one and are therefore currently disabled.
  • Provide HaploGrep also as a command line version (included in mtDNA-Server)
  • Direct support of VCF files through the Htsjdk library.

New export formats supported:

Points we (currently) removed from the beta:

  • Removed direct support of heteroplasmic sites (Y,R)
  • How to use the REST-API.
  • Fasta Import is available (open a *.fasta file!) but still in beta.

Here’s the updated version: HaploGrep2

HaploGrep Export Formats for Phylogenetic trees

tree_view1

HaploGrep 2.0 Beta allows the export of a multiple alignment fasta format. Working with the new version, the generation of phylogenetic trees becomes therefore straight forward. Beside its own Phylogenetic tree directly based on Phylotree (see previous blog entry), we present here the basic steps to generate phylogenetic trees based on multiple alignment fasta files by using MrBayes, Neighbor Joining or Maximum Likelihood. For this purpose we recommend Ugene which is a very powerful toolset not only for Next-Gen sequencing projects. The following steps show how simple this process can be: (more…)

NGS, HaploGrep and Hadoop MapReduce

Many of our HaploGrep users are interested in analysing data within automated pipelines. To support users with a scalable and standardized service, we are happy to present you our newest project, called mtDNA-Server.
mtDNA-Server provides a free service for the complete workflow of NGS mtDNA projects. It includes the alignment of FASTQ SE/PE data based on BWA MEM (using JBWA) , sorting of data, creation of BAM files, heteroplasmy detection, contamination identification, haplogroup assignment using of course HaploGrep and graphical report creation based on R.
All steps are parallelized using Hadoop MapReduce. Therefore, we are able to analyse 800 1000G Phase 1 samples (27 GB) on our 30 core test cluster within 30 minutes. To simplify the execution of MapReduce jobs and provide users an intelligent workflow system, we use our MapReduce framework Cloudgene. Cloudgene controls the complete workflow and provides an intelligent queuing system in the background. mtDNA-Server will be available as a download in near future.

For now, we are still in beta and would very much appreciate any kind of feedback!!
Here’s the link:
mtDNA-Server

New Feature: Tree-based visualisation of haplogroups

phylogenizer_burma

Within the new HaploGrep 2.0 Beta version some new export formats are now supported. One of the most powerful is the export of a phylogenetic tree, representing the current profiles loaded into HaploGrep. This feature generates almost publication-ready  phylogenetic trees. We used them with almost no modification (some color-highlighting in Inkscape) – please see [1].

Click here to read more –> (more…)

HaploGrep 2.0 – Beta

We are happy to announce HaploGrep 2.0 – Beta Version. We want YOU to give it a try, and help us to improve the software. We also want to say thanks to all the fruitful discussions, especially with our collaborators Antonio Salas and Hans Bandelt.

All new features will be introduced one by one in the next couple of weeks.

What you will notice in this first release:

  • GUI didn’t change a lot :-)
  • The “Find HaploGroups” Button is not needed anymore
  • A new tab (“Errors and Warnings”) shows problematic sequences based on a novel developed internal rule-system
  • Annotation of Amino Acid Changes for remaining mtSNPs
  • Import of VCF files (introduced below)
  • Fasta export
  • New server architecture based on REST

Feature-Highlights that will be released in the next weeks:

  • Support of heteroplasmic sites (Y,R)
  • New distance metrics for haplogroup classification
  • Phylogenetic representation based on the rCRS tree for all samples
  • Fasta Import
  • Additional Quality Checks
  • How to use the Rest API

We are looking forward to your feedback and your replies,

Hansi, Lukas, Sebastian