rCRS vs. RSRS vs. HG19 (Yoruba)

It’s not always straight forward when working with the human mitochondrial DNA, even if it comes to the question: “what reference sequence do you use?” There should be just one reference and the questions therefore obsolete – what however is not the case.

Since the first sequencing of the human mitochondrial genome by Anderson et al. in 1981 the length was defined to be 16,569 base pairs naming it Cambridge Reference Sequence. Even if some years later errors in this first sequencing were corrected by Andrews et al. in 1999, (Genbank NC_012920.1) the new revised Cambridge Reference Sequence (rCRS) was kept the same length – although a deletion on 3107 was found, it was kept by introducing an N. So the 3107N is basically a deletion, kept so that the positions on CRS and rCRS are still comparable.
From many aspects, the choice of an european Haplogroup (H2a2a1) being the reference sequence is not the best one. Therefore Behar et al. proclaimed a new reference sequence in 2012 – a hypotetical one – the so called Reconstructed Sapiens Reference Sequence preserving the historical genome annotation numbering, but not starting with an leaf-sequence in the phylogenetic tree as is the case with the rCRS, but with a “mitochondrial Eve” as root. The two base insertion on 523-524 are represented as NN instead of AC, therefore the RSRS has 3 N positions (523N, 524N, 3107N). Mannis, the father of Phylotree made this table showing the differences between rCRS and RSRS.

We didn’t however change to the RSRS, since we agree with Bandelt et al., that an additional reference sequence causes confusion. But there’s yet another mtDNA reference sequence around, you should be aware of – present in GRCh37/ UCSC Hg19 or the older GRCh36/UCSC Hg18

When working with microarray or NGS data, often a different reference sequence for the mitochondrial genome can be found. Until HG19, a Yoruba Sequence (African haplogroup L3e2b1a1) was used with length 16,571. Also on UCSC genome Browser you should be aware:

from: http://genome-euro.ucsc.edu/cgi-bin/hgGateway?db=hg19

Note on chrM
Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as “chrM” in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release.

In Genbank you find this information:

yoruba

So if you see this reference sequence in your NGS project with length 16,571, you should either remap your files (if you need this information don’t hesitate to contact us) to the rCRS or you correct your called variants (like VCF file for chrM) as follows:

< 315 equals for both references sequences
>= 315 and < 3107: decrease 2 positions at build37/HG19 #due to insertions at 315 position
>= 3107 and <16193: decrease 1 position at build37/HG19 #due to deletion at 3107 rCRS
>= 16193 decrease 2 positions at build37/HG19 

If you have the following positions you need to convert them:

HG18/HG19  rCRS rule
263 263 < 315 leave position
752 750 >= 315 and < 3107: decrease 2
8861 8860 >= 3107 and <16193: decrease 1
16521 16519 >= 16193 decrease 2

73, 150, 195, 263, 309.1C, 315.1C, 408A, 750, 1438, 2352, 2483, 2706, 3107del, 4769, 5580, 7028, 8701, 8860, 9377, 9540, 10398, 10819, 10873, 11017, 11719, 11722, 12705, 12850, 14212, 14580, 14766, 14905, 15301, 15326, 15932, 16172, 16183C, 16189, 16193.1C, 16223, 16320, 16519Differences Yoruba Sequence (L3e2b1a1) to rCRS (H2a2a1):

Differences RSRS (mt-MRCA) to rCRS

73, 146, 152, 195, 247, 263, 523N, 524N, 750, 769, 825A, 1018, 1438, 2706, 2758, 2885, 3594, 4104, 4312, 4769, 7028, 7146, 7256, 7521, 8468, 8655, 8701, 8860, 9540, 10398, 10664, 10688, 10810, 10873, 10915, 11719, 11914, 12705, 13105, 13276, 13506, 13650, 14766, 15326, 16129, 16187, 16189, 16223, 16230, 16278, 16311, 16519

HaploGrep works with rCRS only – in the new version we will accept RSRS and Yoruba based profiles – however converting them to the rCRS.

 

 

hansi

 

6 thoughts on “rCRS vs. RSRS vs. HG19 (Yoruba)

  1. Okay, I am confused, and I am a descendant of an African ancestor mt-DNA Hap. L3e2b. Should African-Americans use the Yoruba sequence or just compare differences with both Yoruba and CRS H2a2a1?

    My differences:
    16172C, 16183C, 16189C, 16223T (L3), 16320T (mid-Halocene expansion), and 16519C.

    1. Hi Gaby!

      Everything fine here with your result, the data is according to the rCRS. If it were according the Yoruba reference you wouldn’t have any differences, since the Yoruba reference is a L3e2b1a1 haplogroup – and has the same mutations in the HVI part of the DLoop as you have.

      As long as all use the same reference sequence to compare their own sequences, it doesn’t matter which one it is. From an evolutionary aspect, the rCRS might not be the best choice for a reference, but as long as all sequences (no matter if african american, asian, european…) are compared to it, everything is fine. The problem starts when different reference sequences are used.

      best,
      Hansi

  2. “HaploGrep works with rCRS only – in the new version we will accept RSRS and Yoruba based profiles – however converting them to the rCRS.”

    So how exactly could one best proceed when having RSRS based haplotype data, if manually converting the haplotypes is not feasible?

    Regards,

  3. I am still confused. I am doing mtDNA sequencing analysis for African samples. I thought I should use Yoruba sequence as reference, but I should manually change the position? Or I should use rCRS as reference genome?

  4. I am looking for someone, a professional who knows what they are doing, to analyze mtDNA results from a test my mom took several years ago.

    Do you know anyone who would be interested in doing this? And what would be the cost?

Leave a Reply

Your email address will not be published. Required fields are marked *