Haplogroups by country

Haplogroups by country DEFAULT

List of Y-chromosome haplogroups in populations of the world

  1. ^Van Oven M, Van Geystelen A, Kayser M, Decorte R, Larmuseau HD (2014). "Seeing the wood for the trees: a minimal reference phylogeny for the human Y chromosome". Human Mutation. 35 (2): 187–91. doi:10.1002/humu.22468. PMID 24166809.
  2. ^International Society of Genetic Genealogy (ISOGG; 2015), Y-DNA Haplogroup Tree 2015. (Access date: 1 February 2015.)
  3. ^Haplogroup A0-T is also known as A-L1085 (and previously as A0'1'2'3'4).
  4. ^Haplogroup A1 is also known as A1'2'3'4.
  5. ^Haplogroup LT (L298/P326) is also known as Haplogroup K1.
  6. ^Between 2002 and 2008, Haplogroup T-M184 was known as "Haplogroup K2". That name has since been re-assigned to K-M526, the sibling of Haplogroup LT.
  7. ^Haplogroup K2a (M2308) and its primary subclade K-M2313 were separated from Haplogroup NO (F549) in 2016. (This followed the publication of: Poznik GD, Xue Y, Mendez FL, et al. (2016). "Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences". Nature Genetics. 48 (6): 593–9. doi:10.1038/ng.3559. PMC 4884158. PMID 27111036. In the past, other haplogroups, including NO (M214) and K2e had also been identified with the name "K2a".
  8. ^ Haplogroup K2b (M1221/P331/PF5911) is also known as Haplogroup MPS.
  9. ^ Haplogroup K2e (K-M147) was previously known as "Haplogroup X" and "K2a" (but is a sibling subclade of the present K2a).
  10. ^K-M2313*, which as yet has no phylogenetic name, has been documented in two living individuals, who have ethnic ties to India and South East Asia. In addition, K-Y28299, which appears to be a primary branch of K-M2313, has been found in three living individuals from India. See: Poznik op. cit.; YFull YTree v5.08, 2017, "K-M2335", and; PhyloTree, 2017, "Details of the Y-SNP markers included in the minimal Y tree" (Access date of these pages: 9 December 2017)
  11. ^ Haplogroup K2b1 (P397/P399) is also known as Haplogroup MS, but has a broader and more complex internal structure.
  12. ^ Haplogroup P (P295) is also klnown as K2b2.
  13. ^ Haplogroup S, as of 2017, is also known as K2b1a. (Previously the name Haplogroup S was assigned to K2b1a4.)
  14. ^ Haplogroup M, as of 2017, is also known as K2b1b. (Previously the name Haplogroup M was assigned to K2b1d.)
Sours: https://en.wikipedia.org/wiki/List_of_Y-chromosome_haplogroups_in_populations_of_the_world

Y-DNA haplogroups by ethnic group

Unbalanced scales.svg

The neutrality of this article is questioned because it may show systemic bias. In particular, there may be a strong bias in favor of European haplogroups. Please see the discussion on the talk page. Please do not remove this message until the issue is resolved.(April 2016)

The various ethnolinguistic groups found in the Caucasus, Central Asia, Europe, the Middle East, North Africa and/or South Asia demonstrate differing rates of particular Y-DNAhaplogroups.

In the table below, the first two columns identify ethnolinguistic groups. Subsequent columns represent the sample size (n) of the study or studies cited, and the percentage of each haplogroup found in that particular sample.

(Data from studies conducted before 2004 may be inaccurate or a broad estimate, due to obsolete haplogroup naming systems – e.g. the former Haplogroup 2 included members of the relatively unrelated haplogroups known later as Haplogroup G and macrohaplogroup IJ [which comprises haplogroups I and J].)

Population Language family[1]n[2]R1b[3]nR1anInE1b1bnE1b1anJnGnNnTnLnH
AlbaniansIE (Albanian) 10623.6

AlbaniansIE (Albanian) 5123.6[6][7]519.8
Kosovo Albanians (Pristina) IE (Albanian) 11421.10
Albanians (Tirana) IE (Albanian) 3018.3[8]308.3[8]3011.7[8]3028.3[8]300.0[8]3020.0
AlbaniansIE (Albanian) 5518.2
Albanians (North Macedonia) IE (Albanian) 6418.8
Albanians + Albanians (North Macedonia) IE (Albanian) 55+
AbaziniansNorthwest Caucasian 140[10]1414[10]140[10]140[10]140[10]147[10][11]1429[10]
AbkhazNorthwest Caucasian 128[10]1233.0[10]1233.3[10]120[10]120[10]1225[10][11]120[10]
AltaiansTurkic, Siberia 1429.2[12]
Altaians (Northern) Turkic, Siberian 506.0[13]5038.0500.0500.0502.0500.05010.0
Altaians (Southern) Turkic, Siberia 961.0[13]9653.1962.1961.0964.29611.5
AmbalakararDravidian (Southern) 290.0[14]2913.8[14]290.0[14]290.0[14]290.0[14]296.9[14]293.4[14]290.0[14]290.0[14]2920.7[14]
AmharaAfro-Asiatic (Semitic) 480.0[15]480.0[15]480.0[15]4845.8[15]4833.3[15]480.0[15]480.0[15]484.2[15]480.0[15]
AndalusiansIE (Romance) 2965.5[6]290.0[6]1033.9[4]769.2[5]931.1[5]290.0[6]290.0[6]296.9[6]293.4[6]
AndisNortheast Caucasian 496.1[16]492.0[16]4926.5[16]492.0[16]4955.1[16]496.1[16]490.0[16]492.0[16]490.0[16]
Arabs (Algeria) Afro-Asiatic (Semitic) 3513.0[17]350.0[17]3250[5]3535[17]
Arabs (Algeria - Oran) Afro-Asiatic (Semitic) 10210.8[18]1021[18]10250.9[18]10212.8[18]10227.4[18]
Arabs (Bedouin) Afro-Asiatic (Semitic) 320.0[19]329.4[19]326.3[19]3218.7[19]3265.6[19]320.0[19]
Arabs EgyptAfro-Asiatic (Semitic) 1474.1[20]1472.7[20]1470.7[20]14736.7[20]1472.8[20]14732.0[20]1478.8[20]1470.0[20]1478.2[20]
Arabs (Israel) Afro-Asiatic (Semitic) 1438.4[19]1431.4[19]1436.3[19]14320.3[19]14355.2[19]1430.0[19]
Arabs (Morocco) Afro-Asiatic (Semitic) 443.8[7]440.0[7]440.0[7]4985.5[5]492.4[5]
Arabs (Oman) Afro-Asiatic (Semitic) 1211.7[20]1219.1[20]1210.0[20]12115.7[20]1217.4[20]12147.9[20]1211.7[20]1218.3[20]1210.8[20]
Arabs (Qatar) Afro-Asiatic (Semitic) 721.4[21]726.9[21]720.0[21]725.6[21]722.8[21]7266.7[21]722.8[21]720.0[21]720.0[21]722.8[21]
Arabs (Saudi Arabia) Afro-Asiatic (Semitic) 1571.9[22]1575.1[22]1570.0[22]1577.6[22]1577.6[22]15740.0[22]1573.2[22]1570.0[22]1575.1[22]1571.9[22]
Arabs (UAE) Afro-Asiatic (Semitic) 1644.3[21]1647.3[21]16411.6[21]1645.5[21]16445.1[21]1644.3[21]1640.0[21]1644.9[21]1643.0[21]
Arabs (Yemen) Afro-Asiatic (Semitic) 620.0[21]620.0[21]620.0[21]6212.9[21]623.2[21]6282.3[21]621.6[21]620.0[21]620.0[21]620.0[21]
Arabs (Syria) Afro-Asiatic (Semitic) 2015.0[6]2010.0[6]205.0[6]2010.0[6]2030.0[6]200.0[6]200.0[6]200.0[6]200.0[6]
Arabs (Lebanon) Afro-Asiatic (Semitic) 316.4[6]319.7[6]313.2[6]3119.8[6]3151.2[6]313.2[6]310.0[6]310.0[6]313.2[6]
Arabs (Sudan) Afro-Asiatic (Semitic) 10215.7[23]1023.9[23]10216.7[23]10247.1[23]
Arabs (Tunisia) Afro-Asiatic (Semitic) 1486.8[17]1480.0[17]1480.0[17]14854.5[17]1481.4[17]14830.6[17]1480.0[17]1480.7[17]1480.0[17]
Arabs (Libya) Afro-Asiatic (Semitic) 633[24]631.5[24]631.5[24]6352.0[24]630.0[24]6324.0[24]638.0[24]635.0[24]631.5[24]
ArmeniansIE (Armenian) 8924.7[25]895.6[25]1005.0[10]893.4[25]8929.2[25]10011.0[10]893.4[25]
ArmeniansIE (Armenian) 73432.4[26]7345.3[26]7345.4[26]7341.6[26]
Aromanians (Dukasi, Albania) IE (Romance) 392.56
Aromanians (Andon Poci, Albania) IE (Romance) 1936.84
Aromanians (Kruševo, North Macedonia) IE (Romance) 4327.91
Aromanians (Štip, North Macedonia) IE (Romance) 6523.08
Aromanians (Romania) IE (Romance) 4223.81
Aromanians (Balkan) IE (Romance) 39+
Ashkenazi JewsAfro-Asiatic (Semitic) 7912.7[19]7922.8%
Ashkenazi JewsAfro-Asiatic (Semitic) 4424.1[27]44219.7%
AustriansIE (Germanic, West) 2193221914
AvarsNortheast Caucasian 422.4[16]422.4[16]420.0[16]427.1[16]4271.4%
AzerbaijanisTurkic, Oghuz 7211.1[28]726.9[28]974.1[29]
Azerbaijanis (West Azerbaijan Province) Turkic, Oghuz 6317.5[30]6319[30]6327.2[30]
BagvalalNortheast Caucasian 2867.9[16]283.6[16]287.1[16]280.0[16]2821.4[16]280.0[16]280.0[16]280.0[16]280.0[16]
BalkarianTurkic, Kipchak 392.6[5]1625.0[5]
BalkariansTurkic, Kipchak 3813.2[9]3813.2[9]382.6[9]382.6[9]380.0[9]3823.7[9]3828.9[9]380.0[9]380.0[9]385.3[9]
BalochIE (Iranian, NW) 258.0[14]2528.0[14]250.0[14]258.0[14]250.0[14]2516.0[14]250.0[14]250.0[14]250.0[14]2524.0[14]
Bashkirs (Perm) Turkic, Kipchak 4386.05
Basque (France, Spain) Basque (Basque) 6788.06
German BavariansIE (Germanic, West) 8050.0
BelgiansIE (Germanic/Romance) 9263.04
BelarusiansIE (Slavic, East) 410.0[25]4139.0[25]14719.0[4]4110.0[25]412.4[25]
BelarusiansIE (Slavic, East) 684.4[32]6845.6[32]6825.0[32]684.4[32]681.5[32]688.8[32]
BelarusiansIE (Slavic, East) 3064.2[33]30651.0[33]3064.6[33]3063.3[33]3069.5[33]
BearnaisIE (Romance) 267.7[4]433.7[5]263.8[5]
BejaAfro-Asiatic (Cushitic) 424.76
Berbers (Marrakesh) Afro-Asiatic (Berber) 29 92.9[29]
Berbers (Moyen Atlas) Afro-Asiatic (Berber) 6987.1[29]
Berbers (Mozabite) Afro-Asiatic (Berber) 2080.0[29]
Berbers (Morocco) Afro-Asiatic (Berber) 6488.2[5]640[5]1030[5]
Berbers (north Morocco) Afro-Asiatic (Berber) 434379.5[34]
Berbers (north-central Morocco) Afro-Asiatic (Berber) 6388.8[5]630[5]
Berbers (central Morocco) Afro-Asiatic (Berber) 18718789.8[34]
Berbers (southern Morocco) Afro-Asiatic (Berber) 4089[5]400[5]
Berbers (southern Morocco) Afro-Asiatic (Berber) 656598.5[34]
Berbers (northern Tunisia) Afro-Asiatic (Berber) 32100[35]
Berbers (southern Tunisia) Afro-Asiatic (Berber) 27100[35]
Borgu (Sudan) Nilo-Saharan (Maban) 2611.5[23]260.0[23]260.0[23]2653.8[23]260.0[23]260.0[23]260.0[23]260.0[23]260.0[23]260.0[23]
Bosnians (Zenica) IE (Slavic, South) 691.45
Brahmins (Konkanastha)IE (Indo-Aryan) 250.0[14]2548.0[14]250.0[14]250.0[14]250.0[14]2516.0[14]250.0[14]250.0[14]250.0[14]254.0[14]
BrahuiDravidian (Northern) 11039.1[36]1100.0[36]1102.7[36]11028.2[36]1100.9[36]1107.3[36]
BrahuiDravidian (Northern) 250.0[14]2524.0[14]250.0[14]250.0[14]250.0[14]2528.0[14]2516.0[14]250.0[14]250.0[14]258.0[14]
BritishIE (Germanic, West) 3268.8
BulgariansIE (Slavic, South) 11.1[38]6.4[38]30.2[38]20.6[38]17.5[38]0.8[38]
BulgariansIE (Slavic, South) 11.0[39]17.3[39]27.5[39]20.5[39]18.1[39]0.8[39]0.8[39]
BulgariansIE (Slavic, South) 14[40]16[40]34[40]21[40]9[40]2[40]1[40]2[40]
BurushoBurushaski (isolate) 971.0[41]9727.8[41]970.0[41]970.0[41]978.2[41]971.0[41]970.0[41]970.0[41]9716.5[41]
CatalansIE (Romance) 2479.2
Cantabrians (Pasiegos) IE (Romance) 5642.9[29]
Chamalins Northeast Caucasian 270.0[16]277.4[16]270.0[16]270.0[16]2770.4[16]2718.5[16]270.0[16]270.0[16]273.7[16]
ChechensNortheast Caucasian 3301.8[43]3303.9[43]3300.0[43]3300.0[43]33077.6[43]3305.5[43]3300.0[43]3307.0[43]
ChuvashesTurkic, Oghur 793.8[44]7931.6[44]7911.3[44]790[44]790[44]7924.2[44]790[44]7927.8[44]790[44]790[44]
Copts (Sudan) Afro-Asiatic (Ancient Egyptian) 3315.2[23]3321.2[23]3345.5[23]
Croats (mainland) IE (Slavic, South) 10815.74
Croat (mainland) IE (Slavic, South) 18938.1
CypriotsIE (Greek) 459.0[25]452.0[25]4527.0[25]
CzechsIE (Slavic, West) 25734.2[46]25718.3[46]2575.8[46]2574.7[46]2575.1[46]2571.6[46]
Czechs and SlovaksIE (Slavic, West) 4535.6[6]4526.7[6]19813.6[4]452.2[6]
DanesIE (Germanic, North) 1241.7[37]1216.7[37]19438.7[4]352.9[29]
DarginsNortheast Caucasian 682.9[16]680.0[16]6894.1[16]682.9[16]680.0[16]680.0[16]680.0[16]
DarginsNortheast Caucasian 264.0[10]260.0[10]2658.0[10]264.0[10]264.0[10]264.0[10]260.0[10]260.0[10]260.0[10]
DolgansTurkic, Siberia 671.5[44]6716.4[44]671.5[44]6734.3[44]
Druze (Arabs) Afro-Asiatic (Semitic) 2814.3[29]
DutchIE (Germanic, West) 2770.4[6]273.7[6]3026.7[4]848.0[25]340[5]
EgyptiansAfro-Asiatic (Semitic) 925.4[47]920.0[47]921.1[47]9243.5[47]923.3[47]9222.8[47]922.2[47]920.0[47]927.6[47]920.0[47]
Egyptians (North) Afro-Asiatic (Semitic) 439.3[48]432.3[48]430.0[48]4353.5[48]4418.2[17]437.0[48]432.3[48]430.0[48]
Egyptians (South) Afro-Asiatic (Semitic) 2913.8[48]290.0[48]293.4[48]2931.0[48]2924.1[17]2917.2[48]2910.3[48]290.0[48]
English (Central) IE (Germanic, West) 21561.9[49]2153.3[49]
EstoniansUralic (Finnic) 2079.0[25]11837.3[50]21018.6[4]2073.0[25]2071.0[25]20740.6[25]
FinnsUralic (Finnic) 572.0[25]5710.5[25]572.0[25]5763.2[25]
FinnsUralic (Finnic) 380.0[44]387.9[44]3828.9[44]3863.2[44]
FrenchIE (Romance) 2352.2[6]230[6]2317.4[4]408.0[25]
FrisiansIE (Germanic, West) 9456.0[51]947.0[51]9429.0[51]942.0[51]946.0[51]
Frieslanders (Netherlands) IE (Germanic, West) 9455.3[49]947.4[49]9434.0[49]942.1[49]941.4[49]
FurNilo-Saharan (Fur) 320.0[23]320.0[23]320.0[23]3259.4[23]320.0[23]326.3[23]320.0[23]320.0[23]320.0[23]320.0[23]
Gagauz (Kongaz) Turkic, Oghuz 4810.4[52]4812.5[52]4831.3[52]4816.7[52]488.3[52]4810.4[52]484.2[52]486.3[52]
Gagauz (Etulia) Turkic, Oghuz 4114.6[52]4126.8[52]4124.4[52]419.8[52]417.3[52]4117.1[52]410.0[52]410.0[52]
Germans (West) IE (German) 4847.9[6][37]488.1[6][37]1637.5[4]166.2[6]
Germans (East) IE (German)
Germans (Berlin) IE (German) 10323.3[53]10322.3[53]10332[53]1039.7[53]1031.3[53]
GeorgiansKartvelian 6314.3[6]637.9[6]630.0[6]642.0[25]6336.5[6]6330.1[6]631.6[6]631.6[6]
GeorgiansKartvelian 669.1[9]6610.6[9]661.5[9]663.0[9]660.0[9]6636.4[9]6631.8[9]660.0[9]661.5[9]661.5[9]
GreeksIE (Greek) 9219.6[9]9216.3[9]929.8[9]9221.8[9]9222.9[9]923.3[9]924.3[9]921.1[9]
GreeksIE (Greek) 7711.7
Greeks IE (Greek) 11822.8[6][37]1188.3[6][37]26113.8[4]8423.8[5]926.5[5]
Greeks IE (Greek) 17113.5
Greeks (Crete) IE (Greek) 19317.0
Greeks (Crete) IE (Greek) 171+
Greeks (Peloponnese) IE (Greek) 3647.44
Greeks (Thrace) IE (Greek) 4112.2[8]4122.0[8]4119.5[8]4119.5[8]410.0[8]4119.5[8]414.9[8]
Greeks (North) IE (Greek) 9614.6=[48]
Greeks (South) IE (Greek) 4619.6[48]
Greeks (North) + (South) IE (Greek) 96+
Hausa (Sudan) Afro-Asiatic (Chadic) 3240.6[23]320.0[23]320.0[23]323.1[23]3212.5[23]320.0[23]320.0[23]320.0[23]320.0[23]320.0[23]
Herzegovinians (Mostar, Široki Brijeg) IE (Slavic, South) 1413.55
HungariansUralic (Ugric) 4518[6]11330[44]16217[4]538[5]499[5]1033[55]1031.0
IcelandersIE (Germanic, North) 18141.44
IngushNortheast Caucasian 1430.0[43]1433.5[43]1430.7[43]1430.0[43]14391.6[43]1431.4[43]1432.8[43]
Iranians (North Iran) IE (Iranian, West) 3315.2[56]336.1[56]330.0[56]330.0[56]330.0[56]3333.3[56]3315.2[56]336.1[56]330.0[56]333.0[56]
Iranians (South Iran) IE (Iranian, West) 1176.0[56]11716.2[56]1170.0[56]1175.1[56]1171.7[56]11735.0[56]11712.8[56]1170.9[56]1173.4[56]1176.0[56]
IrishIE (Celtic) 22281.53
ItaliansIE (Romance) 5062.0
Italians (Calabria) IE (Romance) 3732.4[6]1485.4[4]8016.3[7]571.8[5]
Italians (Apulia) IE (Romance) 782.6[4]8613.9[5]8631.4[5]
Italians (North-central) IE (Romance) 5062.0[6]3900.5[4]21210.4[7]5226.9[5]
Italians (South) IE (Romance) 6820.0[48]683.0[48]686.0[48]6826.0[48]6815.0[48]683.0[48]680.0[48]
Italians (Sicily) IE (Romance) 518.85527.3[5]4223.8[5]
Italians (East Sicily) IE (Romance) 8720.0[48]872.3[48]875.0[48]8729.0[48]875.0[48]875.0[48]870.0[48]
Italians (West Sicily) IE (Romance) 12527.0[48]1252.4[48]12511.0[48]12519.0[48]12513.0[48]1253.0[48]1250.0[48]
IyengarDravidian (Southern) 300.0[14]3030.0[14]300.0[14]300.0[14]300.0[14]3020.0[14]3013.3[14]300.0[14]300.0[14]3016.7[14]
IyerDravidian (Southern) 290.0[14]2927.6[14]290.0[14]290.0[14]290.0[14]2917.2[14]2910.3[14]290.0
Sours: https://en.wikipedia.org/wiki/Y-DNA_haplogroups_by_ethnic_group
  1. Universal bucket seats pair
  2. Epiphone junior model guitar
  3. Collections etc amazon
  4. Sell imac 2011

Genetic differences between Chibcha and Non-Chibcha speaking tribes based on mitochondrial DNA (mtDNA) haplogroups from 21 Amerindian tribes from Colombia

Human and Medical Genetics • Genet. Mol. Biol. 36 (2) • 2013 • https://doi.org/10.1590/S1415-47572013005000011copy

We analyzed the frequency of four mitochondrial DNA haplogroups in 424 individuals from 21 Colombian Amerindian tribes. Our results showed a high degree of mtDNA diversity and genetic heterogeneity. Frequencies of mtDNA haplogroups A and C were high in the majority of populations studied. The distribution of these four mtDNA haplogroups from Amerindian populations was different in the northern region of the country compared to those in the south. Haplogroup A was more frequently found among Amerindian tribes in northern Colombia, while haplogroup D was more frequent among tribes in the south. Haplogroups A, C and D have clinal tendencies in Colombia and South America in general. Populations belonging to the Chibcha linguistic family of Colombia and other countries nearby showed a strong genetic differentiation from the other populations tested, thus corroborating previous findings. Genetically, the Ingano, Paez and Guambiano populations are more closely related to other groups of south eastern Colombia, as also inferred from other genetic markers and from archeological data. Strong evidence for a correspondence between geographical and linguistic classification was found, and this is consistent with evidence that gene flow and the exchange of customs and knowledge and language elements between groups is facilitated by close proximity.

mitochondrial DNA; Amerindian; Colombia; Chibcha; genetic relationships

Genetic differences between Chibcha and Non-Chibcha speaking tribes based on mitochondrial DNA (mtDNA) haplogroups from 21 Amerindian tribes from Colombia

Solangy Usme-RomeroI; Milena AlonsoI; Helena Hernandez-CuervoI; Emilio J. YunisII; Juan J. YunisI,II,III

IGrupo de Identificación Humana e Inmunogenética, Facultad de Medicina, Universidad Nacional de Colombia, Bogotá, D.C., Colombia

IIInstituto de Genética, Servicios Médicos Yunis Turbay y Cia., Bogotá, D.C., Colombia

IIIDepartamento de Patología, Facultad de Medicina e Instituto de Genética, Universidad Nacional de Colombia, Bogotá, D.C., Colombia

Send correspondence to


We analyzed the frequency of four mitochondrial DNA haplogroups in 424 individuals from 21 Colombian Amerindian tribes. Our results showed a high degree of mtDNA diversity and genetic heterogeneity. Frequencies of mtDNA haplogroups A and C were high in the majority of populations studied. The distribution of these four mtDNA haplogroups from Amerindian populations was different in the northern region of the country compared to those in the south. Haplogroup A was more frequently found among Amerindian tribes in northern Colombia, while haplogroup D was more frequent among tribes in the south. Haplogroups A, C and D have clinal tendencies in Colombia and South America in general. Populations belonging to the Chibcha linguistic family of Colombia and other countries nearby showed a strong genetic differentiation from the other populations tested, thus corroborating previous findings. Genetically, the Ingano, Paez and Guambiano populations are more closely related to other groups of south eastern Colombia, as also inferred from other genetic markers and from archeological data. Strong evidence for a correspondence between geographical and linguistic classification was found, and this is consistent with evidence that gene flow and the exchange of customs and knowledge and language elements between groups is facilitated by close proximity.

Keywords: mitochondrial DNA, Amerindian, Colombia, Chibcha, genetic relationships.


Studies about genetic variation among human populations are of great value for understanding genetic structure, migration routes and possible genetic relationships among different continental populations, and mitochondrial DNA (mtDNA) analysis has frequently put to such use in American populations (Schurr et al., 1990; Torroni et al., 1992, 1993a,b, 1994; Horai et al., 1993; Bailliet et al., 1994; Merriwether et al., 1994; Santos et al., 1994a,b; Bianchi et al., 1995; Lorenz and Smith, 1996; Merriwether and Ferrell, 1996; Bonatto and Salzano, 1997; Bisso-Machado et al., 2012). Despite its maternal inheritance (Giles et al., 1980), the mitochondrial genome is extremely useful for determining genetic histories because of its rapid rate of mutation (Brown et al., 1979) and lack of recombination and repair mechanisms. Most mtDNA polymorphisms are single nucleotide substitutions, but insertions and deletions have also been described (Brown et al., 1980; Cann and Wilson 1983; Cann et al., 1984; Wallace et al., 1985; Horai et al., 1993; Torroni et al., 1992, 1993a,b, 1994; Howell and Smejkal, 2000;). By revealing specific geographic locations for mitochondrial haplogroups, such studies helped to clarify migration patterns of human populations throughout history and over all continents (Fernandez-Dominguez, 2005).

Previous studies based on mtDNA analysis in Native American populations revealed the presence of four distinct haplogroups called A, B, C and D. Haplogroup A is characterized by the gain of a HaeIII restriction site at position 663, haplogroup B by the 9 bp COII/tARNlys intergenic deletion, and haplogroup C by the loss of a HincII site at 13259 bp. Haplogroup D is characterized by the loss of an AluI restriction site at position 5176 and a gain of a HincII site at 13259 bp (Torroni et al., 1992, 1993a,b. A fifth haplogroup, X, has been predominantly characterized in some primarily North American populations (Eshleman et al., 2003), but is absent in South America (Dornelles et al., 2005).

Colombia has great cultural and genetic diversity. Its indigenous population is distributed in 89 different ethnic groups which are estimated to represent 1.83% of the total population (Arango and Sánchez, 2006). Based on the theory that peopling of the Americas occurred by migration from northeast Asia across the Bering Strait and subsequent migration through Central America to South America (Turner, 1984; Greenberg et al., 1986; Dillehay and Meltzer, 1991), the nowadays Colombian territory at the northern tip of South America became an obligatory passage for people migrating to the southern cone.

In Colombia, several mtDNA studies of indigenous communities have been carried out (Mesa et al., 2000; Keyeux et al., 2002; Rodas et al., 2002; Torres et al., 2006; Melton et al., 2007; Rondon et al., 2007). In this study, we analyzed 424 individuals from 21 Amerindian populations to determine genetic structure and relationships among them based on geographical and historical information, as well as linguistic and genetic relationships with other tribes of the Americas.

Subjects and Methods


We analyzed 424 blood samples from individuals unrelated by maternal lineage from 21 Amerindian tribes of Colombia (Table 1). Blood samples were collected between 1989 and 1992 after proper informed consent had been obtained. Informed consent included approval of each tribal Chief or Governor. The linguistic affiliation of each tribe is shown in Table 1. No Ge-Pano Carib speaking tribes were included in this study (Table 1).

DNA extraction and mtDNA haplogroup analysis

DNA was extracted using the salting out method (Gustincich et al., 1991) with the DNA Wizard Genomic DNA Extraction Kit (Promega Corporation, Madison WI), following manufacturer's recommendations.

Four regions of the human mtDNA representing mtDNA haplogroups A, B, C and D were PCR amplified with the use of primers that were described elsewhere (Parra et al., 1998).

Each amplification reaction consisted of 2.5 µL of DNA, 1.25 µL of each set of primers (10 nmol/µL), 2.0 µL of dNTPs (10 mM), and 0.125 µL of DNA Taq polymerase (Promega Corporation, Madison WI). The reaction mixture also contained 1.5 µL of MgCl2 (25 mM) for haplogroup A, and 2.0 µL of MgCl2 (25 mM) for the other haplogroups, respectively, in a final volume of 25 µL.

Amplification conditions consisted of a first denaturing cycle at 94 ºC for 5 min; followed by 34 cycles of denaturing at 94 ºC for 30 s, annealing at 50 ºC for 30 s (Haplogroups B and D) or at 55 ºC for 30 s (haplogroups A and C), extension at 72 ºC for 30 s, and a final extension step at 72 ºC for 5 min. The amplification products were evaluated by electrophoresis in a 2% agarose Nusieve/Seakem gel that was stained with ethidium bromide and photographed under UV light. 15 µL aliquots of the amplified products for groups A, C and D were digested with restriction enzymes for3hat37ºC, while haplogroup B was only analyzed by electrophoresis. The digestion products were separated by electrophoresis in a 3% Nusieve/Seakem gel and processed as described above.

The haplogroup frequency of each population was estimated by direct counting (Table 1, Figure 1). Genetic diversity was estimated as (n/(n-1))(1-Σpi2), where n is the sample size and pi the haplogroup frequency estimate for haplogroup i (Nei, 1978). Genetic distance estimates were based on mtDNA haplogroup frequencies calculated from FST pairs with the aid of Arlequin software (Excoffier et al., 2005). Frequency data for mtDNA haplogroups belonging to Amerindian populations of South and Central America used in the analysis were obtained from the literature, see Table S1 (Ginther et al., 1993; Horai et al., 1993; Torroni et al., 1993a, 1994; Bailliet et al., 1994; Santos et al., 1994a; Bianchi et al., 1995; Kolman et al., 1995; Merriwether et al., 1995, 1997; Easton et al., 1996; Lalueza-Fox, 1996; Ward et al., 1996; Bonatto and Salzano, 1997; Kolman and Bermingham, 1997; Lalueza et al., 1997; Dipierri et al., 1998; Rickards et al., 1999; Mesa et al., 2000; Moraga et al., 2000; Bert et al., 2001; Demarchi et al., 2001; Lobatoda-Silva et al., 2001; Rothhammer et al., 2001; Keyeux et al., 2002; Williams et al., 2002; Briceño et al., 2003; Fuselli et al., 2003; Garcia-Bour et al., 2004; Lewis et al., 2004; Sandoval et al., 2004; Dornelles et al., 2005; Cabana et al., 2006; Torres et al., 2006; Marrero et al., 2007; Melton et al., 2007; Barreto et al., 2008). The results are presented in Figures 2 and 3.

In addition we calculated the degree of genetic differentiation among subpopulations (GST) based on the genetic diversity of the total population. An AMOVA analysis using Arlequin (Excoffier et al., 2005) was carried out using linguistic classification or geographical location as testing parameters. In the first analysis, we evaluated the linguistic classification of each tribe, and whether differences could be attributed to belonging or not to the Chibcha speaking family. In the second analysis, we tested groups by geographic location (Tribes located in the north; tribes located in the east-Orinoquian/Amazonian region, and tribes located in the Pacific region-west). We also conducted a comparison to determine if the Andes mountain range was a factor in genetic differentiation (Table 2).

Finally, we compared the genetic (FST values), geographical (distance in km using the AMIGLOBE program) (Collard, 2006) and linguistic distance based on Ruhlen's classification (Ruhlen, 1987) matrices to calculate a possible relationship between these three variables. This was done with the aid of Arlequin, V3.1 software (Excoffier et al., 2005) by using the Mantel test with 100,000 permutations (Figure 3).


Mitochondrial DNA haplogroup frequencies from 424 individuals belonging to 21 Amerindian tribes of Colombia are shown in Table 1 and Figure 1. Haplogroup A was found most frequently; its average frequency was 31% (131/424 individuals), followed by haplogroup C with 30.4% (129/424), haplogroup B with 22.4% (95/424) and haplogroup D with 13.4% (57/424). The 12 out of 424 individuals who did not show any of the four mtDNA founder haplogroups (2.8%) were listed as haplogroup E. At least two of four mitochondrial haplogroups were present in the 21 populations studied. The frequency distribution of these haplogroups ranged from 2.1% to 95.2%.

Genetic diversity index values are shown in Table 1. The least genetic diversity was found among the Chimila tribe (h = 0.0952) while the greatest one was found among the Piapoco (h = 0.8929). The average diversity index for all populations studied was h = 0.7447 (n = 424).

Figure 2 shows the mtDNA haplogroup frequency distribution based on four geographical location groups: Caribbean (northern region), Amazonian (southern region), Pacific (western region) and Orinoquian (eastern region). We also included data from other studies (Torroni et al., 1994; Kolman and Bermingham, 1997; Merriwether et al., 1997; Mesa et al., 2000; Keyeux et al., 2002; Briceño et al., 2003; Torres et al., 2006; Melton et al., 2007; Barreto et al., 2008) in this analysis. There was a marked clinal pattern for mtDNA haplogroup distribution among Amerindian tribes of Colombia. Haplogroup A frequency was higher in the northern region of Colombia (50% frequency) decreasing to 20% in the southern region of the country while haplogroup C frequency was lower in the north and highest in the south. The pattern for haplogroup D was similar, being almost absent in the northern part of Colombia, and showing the highest value in the southern part of the country (25%). Haplogroup B was more frequent in the west, declining towards the east and south.

We constructed a UPGMA tree based on FST genetic distances which includes other Amerindian populations from Central and South America (Figure 3). One cluster included the Kogui, Arhuaco and Chimila tribes of Colombia and the Teribe, Guaymi and Guataso Chibcha-speaking tribes of Central America, which are all characterized by high frequencies of haplogroup A. An exception was found for the Arsario tribe, where none of the individuals tested in this Chibcha speaking tribe carried haplogroup A. The remaining Colombian tribes clustered together with other Amerindian tribes of South America that do not belong to the Chibcha linguistic family. The Guambianos, Paez and Ingano tribes were grouped within this cluster, reflecting their relationships to these non-Chibcha Amerindian populations.

We performed a non-metric multidimensional scaling analysis based on the mtDNA haplogroups identified (Figure 4,). Herein we included the results of other Amerindian populations (Table S1) as well as populations of African descent of Colombia (Nuqui, Guangui and Providencia) described by Rodas et al. (2002), and African populations as an outgroup (Chen et al., 2000) (Figure 4). Most of the Amerindian tribes are clustered together due to the heterogeneous presence of the four mtDNA haplogroups among them. However, the Chibcha speaking tribes have a tendency to cluster much closer together due to the high frequency of haplogroup A and low frequencies for haplogroups C and D. The African descent populations from Colombia are located intermediately between the Amerindian populations and the African population used as outgroup. This is due to the admixture process that resulted in the presence of some of the four mtDNA haplogroups among the Colombian African-descent populations.

The AMOVA analysis based on linguistic affiliation was used to test for differences based on belonging or not to the Chibcha linguistic family (Table 2). The Guambiano and Paez tribes were not included since their languages have not been classified yet. The results showed that 69% variations were due to variations within populations and 21% was due to whether or not a tribe belonged to the Chibcha linguistic family (p < 0.001). Another AMOVA analysis based on the geographical location of Colombian Amerindian tribes detected no significant differences when the tribes were grouped according to the side of the Andes mountain range they were located. Significant differences were found among tribes residing in the northern part of Colombia (most of the Chibcha speaking tribes analyzed here), compared to the Pacific region and the Orinoquian/Amazonian region (p = 0.013), but not so for the Andes as a separating barrier (p = 0.150) (Table 2).

Finally, the Mantel test was used to evaluate the possible relationship between genetic, linguistic and geographical distance. There was a strong correlation between linguistic and geographic distances, and a less strong correlation between genetic and geographic distances. There was no correlation between genetic and linguistic distance (Table 3).


This study provides additional information on mtDNA haplogroup distribution in several Colombian Amerindian populations to previous studies (Keyeux et al., 2002). Haplogroup A, with an average frequency of 31% (131/424 individuals) was found most frequently. It was followed by haplogroup C with 30.4% (129/424), haplogroup B with 22.4% (95/424) and haplogroup D with 13.4% (57/424).

Previous studies of Colombian Amerindian populations have shown high frequencies of haplogroups A and C and lower frequencies for haplogroup D (Keyeux et al., 2002; Torres et al., 2006; Melton et al., 2007; Rondon et al., 2007). Our results are in agreement with those reports. However, the 13.4% average haplogroup D frequency we found was higher than that previously published for Colombian Amerindian populations of 6.6% by Keyeux et al. (2002) and 9.95% by Torres et al. (2006). These differences could be attributed to the fact that these three studies chose different populations to study, or may even be due to differences within groups of the same population. For example, Keyeux et al. (2002) found no haplogroup D in the Paez tribe, whereas we found this haplogroup in 33% of the Paez individuals. Similar situations occurred in the cases of the other haplogroups. For instance, in our study, the Arsario tribe did not carry haplogroup A (but only 8 individuals were tested), while 68% of the individuals of the Arsario tribe tested by Melton et al. (2007) were reported to carry haplogroup A. These results indicate an even greater genetic heterogeneity within the same populations than has been described before.

Only 12 out of 424 individuals showed none of the four founder mtDNA haplogroups (2.8%). These individuals may either have unrecognized founder lineages (Bailliet et al., 1994), recent racial admixture (Torroni et al., 1993a) or reversal of a mutation. The second possibility could be the case for the Wayuu, Arsario and Paez tribes in which admixture has been documented by blood groups and HLA class II genes (Yunis et al., 1994, 2001). The third possibility, which is termed haplogroup C revertant, is common in populations found in the Colombian Orinoquian and Amazonian basin (Torres et al., 2006). This may be the case for the Piapoco tribe of our study that showed a 25% frequency of non A-D haplogroups. A high frequency (59%) for the revertant C haplogroup had previously been found by Torres et al. (2006) for this tribe. The same scenario is possible for the Piartapuyo (12.5%), Tuyuca (16%) and Guanana (10%) Amerindian tribes that live geographically close together in the Northern Amazonian region of Colombia. They present low genetic admixture based on Y STR haplotypes (Campo, D and, JJY, unpublished data) and HLA Class II genes (unpublished data).

The Amerindian tribes considered in this study showed a high degree of genetic heterogeneity (Table 2) and diversity (similar to or greater than populations found throughout South America) as has been described before (Santos et al., 1994a; Batista et al., 1995; Kolman et al., 1995; Ward et al., 1996; Bonatto and Salzano, 1997; Mesa et al., 2000; Keyeux et al., 2002).

Genetic diversity values were higher among the Tucano-Equatorial speaking tribes (0.60 to 0.80) while the Chibchan-speaking groups showed lower values (0.09 to 0.50). These results are consistent with those reported for Chibcha speaking tribes from Central and South America, including Colombia (Torroni et al., 1994; Kolman et al., 1995; Keyeux et al., 2002). The higher diversity values found in Amazonian populations may be a result of gene flow between these populations, as has been shown for other genetic markers such as the Y-chromosome (Mesa et al., 2000). Alternately, it could be the result of fission, fragmentation and founder effects (Cavalli-Sforza et al., 1992). The population that showed the lowest genetic diversity value (and the highest frequency for haplogroup A) was the Chimila (h = 0.0952). The low diversity found in this population has been reported by others (Keyeux et al., 2002) looking at different genetic markers and is probably due to inbreeding (unpublished data).

The high genetic diversity found in our study and others indicates that it is unlikely that bottleneck events took place during the early Amerindian settlement of South America. However, it is evident that Amerindian populations located in northern Colombia that belong to the Chibcha linguistic family differ from non-Chibcha speaking tribes, as has been described before with nuclear genetic markers (Yunis et al., 1994, 2001). Previous studies have shown that Amerindian populations of northern Colombia are close to Central American tribes and North American Amerindian populations (Stone and Stoneking, 1993; Lorenz and Smith, 1996; O'Rourke et al., 2000; Keyeux et al., 2002; Melton et al., 2007). Our results provide further support indicating that Chibcha speaking tribes in Central and South America genetically differentiated from non-Chibcha speaking tribes prior to entering South America.

There were marked clinal patterns for mtDNA haplogroup distribution among Amerindian tribes of Colombia. When populations were grouped according to their geographical location (northern-Caribbean; southern-Amazonian, western-Pacific and eastern-Orinoquian) (see Figure 2), haplogroup A frequency was high in the northern part of Colombia (50% frequency) but decreased to 20% in the southern part of the country. Haplogroup C frequency was lower in the north and had its highest value in the south. Similarly, haplogroup D was almost absent in the north, but had the highest value in the southern part of the country (25%). Haplogroup B was more frequent in the west and had decreasing frequencies towards the east and south. These clinal patterns are similar to those described earlier (Torroni et al., 1994; Lalueza-Fox, 1996; Lorenz and Smith, 1996; Lalueza et al., 1997; Keyeux et al., 2002; Bisso-Machado et al., 2012).

A UPGMA tree constructed from data for the Amerindian tribes analyzed in this study plus data from several Amerindian populations from Central and South America described elsewhere showed a cluster of Chibcha speaking tribes (Chimila, Arhuaco, Kogui, Teribe, Guaymi tribes), which are genetically distant from other Amerindian tribes analyzed. The second cluster includes the remaining tribes including the Guambiano, and Paez tribes. The results for these two tribes, which currently have unclassified languages, provide further support of a genetic relationship to Tucano-Equatorial or Andean linguistic families rather than to the Chibcha linguistic family where they had been classified before. Similar results have also been obtained with HLA genes (Yunis et al., 2001). Some authors have postulated that the Paez originated from the Amazonian region and migrated northeast to their present location before the Spanish conquest (Arboleda, 1993). Recently, archeological findings in the west Amazonian region of Colombia have provided further support for an Amazonian ancestral origin of the Guambiano and Paez tribes.

The AMOVA analysis showed a significant association (p < 0.001) due to variation based on whether or not a tribe belonged to the Chibcha linguistic family (21%) (Table 2). On the other hand, our results do not support the hypothesis that the Andes mountain range served as a differentiation factor for the Amerindian tribes studied.

The strong genetic differentiation between the Chibcha and non-Chibcha speaking tribes is likely due to the high frequency of haplogroup A among these populations. Similar results were obtained in the past using the major histocompatibility complex and other genetic markers (Yunis et al., 1994, 2001).

The correlation analysis between the geographical, linguistic and genetic data (Table 3) showed the highest correlation value for the linguistic-geographical pair followed by the genetic-geographic comparison. These results are explained by the fact that many populations that belong to the same linguistic family are also geographically close, so it is difficult to infer whether there is a linguistic-genetic relationship based solely on mtDNA haplogroups. The Amerindian tribes that are closely related are also geographically close, which facilitates gene flow and exchange of customs, knowledge and languages. Both geographic and linguistic factors are associated with genetic differentiation in the Amerindian populations analyzed in Colombia. As has been found for other Amerindian tribes, these three parameters have evolved together in a historical and strongly correlated fashion.


We would like to thank all the Colombian Amerindian communities that kindly contributed by providing samples for this study. This research was financed in part by grants from Colciencias to EJY and by the Universidad Nacional de Colombia to JJY.

Internet Resources

Supplementary Material

The following online material is available for this article:

Table S1 -mtDNA haplogroup frequencies of Colombian and South America Amerindian tribes.

This material is available as part of the online article from http://www.scielo.br/gmb.

Received: August 11, 2012; Accepted: December 3, 2012.

Associate Editor: Francisco Mauro Salzano

License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • Arango R and Sánchez E (2006) Los Pueblos Indígenas de Colombia en el Umbral del Nuevo Milenio. Tercer Mundo, Bogotá, 426 pp.

  • Arboleda JE (1993) Inganos, Paeces y Coconucos: Notas para la Etnohistoría. Incora, Popayán, 218 pp.

  • Bailliet G, Rothhammer F, Carnese F, Bravi C and Bianchi N (1994) Founder mitochondrial haplotypes in American populations. Am J Hum Genet 54:27-33.

  • Barreto G, Osorio JC, Peña AV, Garcés HA and Rondón F (2008) Diversidad genética en poblaciones humanas de dos regiones colombianas. Colombia Médica 39:52-60.

  • Batista O, Kolman CJ and Bermingham E (1995) Mitochondrial DNA diversity in the Kuna Amerinds of Panama. Hum Mol Genet 4:921-929.

  • Bert F, Corella A, Gene M, Perez-Perez A and Turbon D (2001) Major mitochondrial DNA haplotype heterogeneity in highland and lowland Amerindian populations from Bolivia. Hum Biol 73:1-16.

  • Bianchi N, Baillet G and Bravi C (1995) Peopling of the Americas as inferred through the analysis of mtDNA. Braz J Genet 18:661-668.

  • Bisso-Machado R, Bortolini MC and Salzano FM (2012) Uniparental genetic markers in South Amerindians. Genet Mol Biol 35:365-387.

  • Bonatto SL and Salzano FM (1997) Diversity and age of the four major mtDNA haplogroups, and their implications for the peopling of the New World. Am J Hum Genet 61:1413-1423.

  • Briceño I, Gómez A, Lozano PAU, Mitchell RJ and Papiha S (2003) Mitochondrial variation in Colombia: Study of matrilineal lineages among amerindian tribes. XIX International Congress of Genetics Proceedings, Melbourne.

  • Brown WM, George Jr M and Wilson AC (1979) Rapid evolution of animal mitochondrial DNA. Proc Natl Acad Sci USA 76:1967-1971.

  • Brown W, George M and Wilson A (1980) Polymorphism in mitocondrial DNA of humans as revealed by restriction endonuclease analysis. Proc Natl Acad Sci USA 77:3605-3609.

  • Cabana GS, Merriwether DA, Hunley K and Demarchi DA (2006) Is the genetic structure of Gran Chaco populations unique? Interregional perspectives on Native South American mitochondrial DNA variation. Am J Phys Anthropol 131:108-119.

  • Cann RL and Wilson AC (1983) Length mutations in human mitochondrial DNA. Genetics 104:699-711.

  • Cann RL, Brown WM and Wilson AC (1984) Polymorphic sites and the mechanism of evolution in human mitochondrial DNA. Genetics 106:479-499.

  • Cavalli-Sforza LL, Minch E and Mountain JL (1992) Coevolution of genes and languages revisited. Proc Natl Acad Sci USA 89:5620-5624.

  • Chen YS, Olckers A, Schurr TG, Kogelnik AM, Huoponen K and Wallace DC (2000) mtDNA variation in the South African Kung and Khwe-and their genetic relationships to other African populations. Am J Hum Genet 66:1362-1383.

  • Demarchi DA, Panzetta-Dutari GM, Motran CC, Lopez de Basualdo MA and Marcellino AJ (2001) Mitochondrial DNA haplogroups in Amerindian populations from the Gran Chaco. Am J Phys Anthropol 115:199-203.

  • Dillehay TD and Meltzer DJ (1991) The First Americans. CRC Press, Boca Raton, 310 pp.

  • Dipierri JE, Alfaro E, Martinez-Marignac VL, Bailliet G, Bravi CM, Cejas S and Bianchi NO (1998) Paternal directional mating in two Amerindian subpopulations located at different altitudes in northwestern Argentina. Hum Biol 70:1001-1010.

  • Dornelles CL, Bonatto SL, De Freitas LB and Salzano FM (2005) Is haplogroup X present in extant South American Indians? Am J Phys Anthropol 127:439-448.

  • Easton RD, Merriwether DA, Crews DE and Ferrell RE (1996) mtDNA variation in the Yanomami: Evidence for additional New World founding lineages. Am J Hum Genet 59:213-225.

  • Eshleman JA, Malhi RS and Smith DG (2003) Mitochondrial DNA Studies of Native Americans: Conceptions and misconceptions of the population prehistory of the Americas. Evol Anthropol 12:7-18.

  • Excoffier L, Laval G and Schneider S (2005) Arlequin ver. 3.1: An integrated software package for population genetics data analysis. Evol Bioinform Online 1:47-50.

  • Fernandez-Dominguez E (2005). Polimorfismos de DNA Mitocondrial en Poblaciones Antiguas de la Cuenca Mediterránea. Universidad de Barcelona, Barcelona, 670 pp.

  • Fuselli S, Tarazona-Santos E, Dupanloup I, Soto A, Luiselli D and Pettener D (2003) Mitochondrial DNA diversity in South America and the genetic history of Andean highlanders. Mol Biol Evol 20:1682-1691.

  • Garcia-Bour J, Perez-Perez A, Alvarez S, Fernandez E, Lopez-Parra AM, Arroyo-Pardo E and Turbon D (2004) Early population differentiation in extinct aborigines from Tierra del Fuego-Patagonia: Ancient mtDNA sequences and Y-chromosome STR characterization. Am J Phys Anthropol 123:361-370.

  • Giles RE, Blanc H, Cann HM and Wallace DC (1980) Maternal inheritance of human mitochondrial DNA. Proc Natl Acad Sci USA 77:6715-6719.

  • Ginther C, Corach D, Penacino GA, Rey JA, Carnese FR, Hutz MH, Anderson A, Just J, Salzano FM and King MC (1993) Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: Mitochondrial DNA sequence variation and allele frequencies of several nuclear genes. EXS 67:211-219.

  • Greenberg J, Turner CG and Zegura SL (1986) The settlement of the Americas: A comparison of the linguistic, dental and genetic evidence. Curr Anthropol 4:477-497.

  • Gustincich S, Manfioletti G, Del Sal G, Schneider C and Carninci P (1991) A fast method for high-quality genomic DNA extraction from whole human blood. Biotechniques 11:298-300, 302.

  • Horai S, Kondo R, Nakagawa-Hattori Y, Hayashi S, Sonoda S and Tajima K (1993) Peopling of the Americas, founded by four major lineages of mitochondrial DNA. Mol Biol Evol 10:23-47.

  • Howell N and Smejkal CB (2000) Persistent heteroplasmy of a mutation in the human mtDNA control region: Hypermutation as an apparent consequence of simple-repeat expansion/contraction. Am J Hum Genet 66:1589-1598.

  • Keyeux G, Rodas C, Gelvez N and Carter D (2002) Possible migration routes into South America deduced from mitochondrial DNA studies in Colombian Amerindian populations. Am J Hum Genet 74:211-233.

  • Kolman CJ, Bermingham E, Cooke R, Ward RH, Arias TD and Guionneau-Sinclair F (1995) Reduced mtDNA diversity in the Ngobe Amerinds of Panama. Genetics 140:275-283.

  • Kolman CJ and Bermingham E (1997) Mitochondrial and nuclear DNA diversity in the Choco and Chibcha Amerinds of Panama. Genetics 147:1289-1302.

  • Lalueza-Fox C (1996) Mitochondrial DNA haplogroups in four tribes from Tierra del Fuego-Patagonia: Inferences about the peopling of the Americas. Hum Biol 68:855-871.

  • Lalueza C, Perez-Perez A, Prats E, Cornudella L and Turbon D (1997) Lack of founding Amerindian mitochondrial DNA lineages in extinct aborigines from Tierra del Fuego-Patagonia. Hum Mol Genet 6:41-46.

  • Lewis CM ,Tito RY, Lizarraga B and Stone AC (2004) Land, language, and loci: mtDNA in Native Americans and the genetic history of Peru. Am J Phys Anthropol 127:351-360.

  • Lobato-da-Silva DF, Ribeiro-dos-Santos AKC and Santos SEB (2001) Diversidade genética de populações humanas na Amazônia. In: Guimarães Vieira IC, Cardoso da Silva JM, Oren DC and D'Ineao MA (eds) Diversidade Humana e Cultural na Amazônia. Museu Paraense Emilio Goeldi, Belém, pp 167-193.

  • Lorenz JG and Smith DG (1996) Distribution of four founding mtDNA haplogroups among native North Americans. Am J Phys Anthropol 101:307-323.

  • Marrero AR, Silva-Junior WA, Bravi CM, Hutz MH, Petzl-Erler ML, Ruiz-Linares A, Salzano FM and Bortolini MC (2007) Demographic and evolutionary trajectories of the Guarani and Kaingang natives of Brazil. Am J Phys Anthropol 132:301-310.

  • Melton PE, Briceño I, Gómez A, Devor EJ, Bernal JE and Crawford MH (2007) Biological relationship between Central and South American Chibchan speaking populations: Evidence from mtDNA. Am J Phys Anthropol 133:753-770.

  • Merriwether DA and Ferrell RE (1996) The four founding lineages hypothesis for the New World. A critical reevaluation. Mol Phylogen Evol 5:241-246.

  • Merriwether DA, Rothhammer F and Ferrell RE (1994) Genetic variation in the New World: Ancient teeth, bone and tissue as sources of DNA. Experientia 50:592-601.

  • Merriwether DA, Rothhammer F and Ferrell RE (1995) Distribution of the four founding lineage haplotypes in Native Americans suggests a single wave of migration for the New World. Am J Phys Anthropol 98:411-430.

  • Merriwether DA, Reed DM and Ferrell RE (1997) Ancient and contemporary mitochondrial DNA variation in the Maya. In: Whittington SL and Reed DM (eds) Bones of the Maya: Studies of Ancient Skeletons. Smithsonian Institution Press, Washington, DC, pp 208-217.

  • Mesa NR, Mondragon MC, Soto ID, Parra MV, Duque C, Ortiz-Barrientos D, Garcia LF, Velez ID, Bravo ML, Munera JG, et al. (2000) Autosomal, mtDNA, and Y-chromosome diversity in Amerinds: Pre-and post-Columbian patterns of gene flow in South America. Am J Hum Genet 67:1277-1286.

  • Moraga ML, Rocco P, Miquel JF, Nervi F, Llop E, Chakraborty R, Rothhammer F and Carvallo P (2000) Mitochondrial DNA polymorphisms in Chilean aboriginal populations: Implications for the peopling of the southern cone of the continent. Am J Phys Anthropol 113:19-29.

  • Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583-590.

  • O'Rourke DH, Hayes MG and Carlyle SW (2000) Spatial and temporal stability of mtDNA haplogroup frequencies in native North America. Hum Biol 72:15-34.

  • Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, Forrester T, Allison DB, Deka R, Ferrell RE, et al. (1998) Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 63:1839-1851.

  • Rickards O, Martinez-Labarga C, Lum JK, De Stefano GF and Cann RL (1999) mtDNA history of the Cayapa Amerinds of Ecuador: Detection of additional founding lineages for the Native American populations. Am J Hum Genet 65:519-530.

  • Rodas C, Gelvez N and Keyeux G (2002) Mitochondrial DNA Studies show asymmetrical Amerindian admixture in Afro-Colombian and Mestizo populations. Hum Biol 75:13-30.

  • Rondon F, Braga Y, Cardenas H and Barreto G (2007) Análisis de la diversidad y el grado de estructura genética presente en poblaciones humanas colombianas a partir del uso de marcadores RFLPs de mtDNA. Rev Asoc Colomb Cienc Biol 19:94-103.

  • Rothhammer F, Llop E, Carvallo P and Moraga M (2001) Origin and evolutionary relationships of native Andean populations. High Alt Med Biol 2:227-233.

  • Ruhlen M (1987) A Guide to the World's Languages. Stanford University Press, Stanford, 469 pp.

  • Sandoval J, Fujita R, Delgado B, Rivas L, Bonilla B and Nugent D (2004) Variants of mtDNA among islanders of the lake Titicaca: Highest frequency of haplotype B1 and evidence of founder effect. Rev Peru Biol 11:161-168.

  • Santos M, Ward RH and Barrantes R (1994a) mtDNA variation in the Chibcha Amerindian Huetar from Costa Rica. Hum Biol 66:963-977.

  • Santos M, Ward RH and Barrantes R (1994b) D-Loop mtDNA deletion as a unique marker of Chibchan Amerindians. Am J Hum Genet 55:413-414.

  • Schurr TG, Ballinger SW, Gan YY, Hodge JA, Merriwether DA, Lawrence DN, Knowler WC, Weiss KM and Wallace DC (1990) Amerindian mitochondrial DNAs have rare Asian mutations at high frequencies, suggesting they derived from four primary maternal lineages. Am J Hum Genet 46:613-623.

  • Stone A and Stoneking M (1993) Ancient DNA from a pre-Columbian Amerindian population. Am J Phys Anthropol 92:463-471.

  • Torres MM, Bravi CM, Bortolini MC, Duque C, Callegari-Jacques S, Ortiz D, Bedoya G, Groot de Restrepo H and Ruiz-Linares A (2006) A revertant of the major founder Native American haplogroup C common in populations from northern South America. Am J Hum Biol 18:59-65.

  • Torroni A, Schurr TG, Yang CC, Szathmary EJ, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, et al. (1992) Native American mitochondrial DNA analysis indicates that the Amerind and the Nadene populations were founded by two independent migrations. Genetics 1:153-162.

  • Torroni A, Schurr TG, Cabell MF, Brown MD, Neel JV, Larsen M, Smith DG, Vullo CM and Wallace DC (1993a) Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet 53:563-590.

  • Torroni A, Sukernik RI, Schurr TG, Starikorskaya YB, Cabell MF, Crawford MH, Comuzzie AG and Wallace DC (1993b) mtDNA variation of aboriginal Siberians reveals distinct genetic affinities with Native Americans. Am J Hum Genet 53:591-608.

  • Torroni A, Chen YS, Semino O, Santachiara-Beneceretti AS, Scott CR, Lott MT, Winter M and Wallace DC (1994) mtDNA and Y-chromosome polymorphisms in four Native American populations from southern Mexico. Am J Hum Genet 54:303-318.

  • Turner C (1984) Advances in the dental search for Native American origins. Acta Anthopogenetica 8:23-78.

  • Wallace DC, Garrison K and Knowler WC (1985) Dramatic founder effects in Amerindian mitochondrial DNAs. Am J Phys Anthropol 68:149-155.

  • Ward RH, Salzano FM, Bonatto SL, Hutz MH, Coimbra CE and Santos RV (1996) Mitochondrial DNA polymorphism in three Brazilian Indian tribes. Am J Hum Biol 8:317-323.

  • Williams SR, Chagnon NA and Spielman RS (2002) Nuclear and mitochondrial genetic variation in the Yanomamo: A test case for ancient DNA studies of prehistoric populations. Am J Phys Anthropol 117:246-259.

  • Yunis JJ, Ossa H, Salazar M, Delgado MB, Deulofeut R, de la Hoz A, Bing DH, Ramos O and Yunis EJ (1994) Major histocompatibility complex class II alleles and haplotypes and blood groups of four Amerindian tribes of northern Colombia. Hum Immunol 41:248-258.

  • Yunis JJ, Yunis EJ and Yunis E (2001) Genetic relationship of the Guambino, Paez, and Ingano Amerindians of southwest Colombia using major histocompatibility complex class II haplotypes and blood groups. Hum Immunol 62:970-978.

  • Collard O (2006) AMIGLOBE. p Amiglobe is a world atlas and database with information about every country in the world, http://www.downloadatoz.com/home-education_directory/amiglobe-2006/

  • Publication in this collection
    05 Mar 2013
  • Date of issue
  • Received
    11 Aug 2012
  • Accepted
    03 Dec 2012

Sociedade Brasileira de Genética Rua Cap. Adelmio Norberto da Silva, 736, 14025-670 Ribeirão Preto SP Brazil, Tel.: (55 16) 3911-4130 / Fax.: (55 16) 3621-3552 - Ribeirão Preto - SP - Brazil
E-mail: [email protected]

Acompanhe os números deste periódico no seu leitor de RSS

SciELO - Scientific Electronic Library Online
Rua Dr. Diogo de Faria, 1087 – 9º andar – Vila Clementino 04037-003 São Paulo/SP - Brasil
E-mail: [email protected]

Sours: https://www.scielo.br/j/gmb/a/BR7pSWMdwPByjBZr6m5Wpmf/?lang=en
European paternal Y-DNA haplogroups distribution by country

The genetic history of Europe since the Upper Paleolithic is inseparable from that of broader Western Eurasia. By about fifty thousand years ago, a basal West Eurasian lineage had appeared out of the undifferentiated “non-African” ancestors of seventy thousand years.

European early modern human lineages between 40 and 26 thousand years BP still part of a large Western Eurasian “meta-population,” linked to Central and Western Asian populations. The division into genetically separate sub-populations in Western Eurasia is a consequence of enhanced selection pressure and founder effects during the Last Glacial Maximum.

By the end of the Last Glacial Maximum (21 thousand years), called West European Hunter-Gatherer lineage, appears from the Solutrean refugium during the European Mesolithic (12.7 thousand years). All tested Mesolithic, West European Hunter-gather, Y-chromosomes from Luxembourg and Motala (Sweden), related to haplogroup I. Haplogroup I is the primary candidate for Europe’s indigenous Y-haplogroup, which is today the most prevalent Y-haplogroup in most of Scandinavia.

These Western hunter-gatherer societies are considerably displaced in the Neolithic Revolution by the arrival of Early European Farmers lineages derived from Mesolithic populations of West Asia (Anatolia and the Caucasus).

Hunter-gatherers and farmers

In the European Bronze Age, there were again extensive population replacements in parts of Europe by the invasion of Ancient North Eurasian ancestors from the Pontic–Caspian steppes. These Bronze Age population replacements are connected with the Beaker culture archaeologically and with the Indo-European expansion linguistically.

As a consequence of the population shifts throughout the Mesolithic to Bronze Age, contemporary European populations are distinguished by differences in West European Hunter-Gatherers, Early European Farmers, and Ancient North Eurasian ancestry.

According to Sciencemag, blending rates ranged geographically. In the late Neolithic, West European Hunter-Gatherers in farmers in Hungary was at about 10 percent, in Germany almost 25 percent, and in Iberia as high as 50 percent. The contribution of Early European Farmer’s ancestry is more notable in Mediterranean Europe and decreases towards northeastern Europe, where Ancient North Eurasian ancestry is stronger. The Sardinians are defined by almost clear origin from Early European Farmers.

Distribution of European Y-chromosome DNA (Y-DNA) haplogroups by country and region in percentage

Haplogroup is a group of similar haplotypes that share a common ancestor with a single-nucleotide polymorphism mutation. More specifically, a haplogroup is a combination of alleles at different chromosome regions that are closely linked and that tend to be inherited together.

As a haplogroup consists of similar haplotypes, it is usually possible to predict a haplogroup from haplotypes (a haplotype is a group of genes in an organism that are inherited together from a single parent).

Haplogroups pertain to a single line of descent, usually dating back thousands of years. As such, membership of a haplogroup, by any individual, relies on a relatively small proportion of the genetic material possessed by that individual.

Distribution of European Y-chromosome DNA (Y-DNA) haplogroups by country in percentage

A simplified map will look like this.

Map with predominant Haplogroups
Map with predominant Haplogroups

The map below shows countries by percentage of similarity to Y-DNA average of Europeans.

Countries by percentage of similarity to Y-DNA average of Europeans

Residents of Austria have the highest percentage of similarity to the Y-DNA average of Europeans (89.4%).

What will the political map of Europe look like if you use genetics for state division?

What if borders were drawn by DNA instead of ethnicity?

Dominant Y-DNA haplogroups in Europe and Middle East
Sours: https://vividmaps.com/dominant-y-dna-haplogroups-in-europe/

Country haplogroups by

Worldwide human mitochondrial haplogroup distribution from urban sewage


Community level genetic information can be essential to direct health measures and study demographic tendencies but is subject to considerable ethical and legal challenges. These concerns become less pronounced when analyzing urban sewage samples, which are ab ovo anonymous by their pooled nature. We were able to detect traces of the human mitochondrial DNA (mtDNA) in urban sewage samples and to estimate the distribution of human mtDNA haplogroups. An expectation maximization approach was used to determine mtDNA haplogroup mixture proportions for samples collected at each different geographic location. Our results show reasonable agreement with both previous studies of ancient evolution or migration and current US census data; and are also readily reproducible and highly robust. Our approach presents a promising alternative for sample collection in studies focusing on the ethnic and genetic composition of populations or diseases associated with different mtDNA haplogroups and genotypes.


Due to the advances made in DNA sequencing in the last two decades, the general idea of obtaining the genetic code of every single person has become the hypothetical answer to many health-, demography-, forensics- and even history-related questions. The dubious legal, economical and ethical repercussions of this vision however render this approach presently unattainable. Before we reach individual level genome sequencing, an easier target may be community level pooled sequencing.

Health-related efforts would largely benefit from available genetic distribution data for local communities. Many breakthroughs have already been made in the discovery and study of diseases with the use of personalized sequencing in the hopes of enabling earlier and more accurate diagnosis, individualized intervention, guiding prevention strategies and monitoring the effects of treatments1. Effects of pandemics can also differ based on genetic background as certain gene variants may provide enhanced susceptibility or resistance to viral diseases2,3,4. These genetic determinants may be shared by larger phylogenetically related subpopulations and the advantages of population based screening of risk factors in symptomless individuals are immense and have been demonstrated by multiple studies5,6. Data collection however is problematic, as in order to realize prevention programs specifically tailored to smaller communities, the distribution of genetic variations in the local populations has to be first established. In principle, this would require the collection of genetic information from as many individuals as possible, which naturally raises many ethical and legal concerns, as well as the practical challenges of sample collection and analysis. With the increasing amount of health related information available, it is getting progressively more difficult to ensure confidentiality, especially because in many cases third-party access to the data is insufficiently controlled7. This matter is made worse by the fact, that personal genomic data are highly sensitive, as they contain information not only about the person taking the genetic test, but also about a broader group of people who are genetically related to the individual8. People can be discouraged from getting tested for certain diseases by fear of possible genetic discrimination by employers or insurance agencies based on the results9.

A closely related subject, the collection of data on ethnicity can be essential to study demographic tendencies, employment practices and opportunities, income distributions, educational levels, migration patterns and trends, family composition and structure, social support networks, health conditions of a population and optimal treatment and preventive measures10,11,12. Data on ethnicity are collected using a wide range of different methodologies and often rely on self-reporting, which makes standardization difficult10,13. The collection of data on ethnicity is also sensitive and the decision to collect and disseminate information on ethnic or national groups of a population has to be based on a number of considerations and national circumstances10,13.

Additionally, from a forensics viewpoint, the availability of a comprehensive database of the genetic distribution of populations worldwide would be highly beneficial. It has been shown14 that individual contributors can be detected in highly complex mixtures of human DNA collected from common surfaces even when only an extremely small portion of the mixture belongs to the person of interest. The statistical method used for such analysis is highly dependent on the availability of the allele frequencies of an appropriate reference population with similar ancestral components to the investigated mixture. Furthermore, whenever genetic identification is limited to a restricted part of the human genome due to DNA degradation in the accessible evidence, the local genetic distribution of the population could be used as an informative prior to fine-tune probabilistic models to determine the probability of a DNA match.

As an alternative to obtaining informed consent from many individuals, performing genome sequencing one-by-one and then pooling the data, samples collected from wastewater plants can be used as a pooled sample that may contain the same relevant information. From a surveillance point of view, urban sewage is attractive because it combines material from a large and mostly healthy population, which would otherwise not be feasible to monitor. In addition, analysis of ab ovo pooled samples does not require informed consent, thus limiting ethical concerns15, including those related to studying human DNA-sequences16. The nature of sample collection itself eliminates the need for further anonymization of the data, as it provides an inextricably anonymized mixture of genomic information about many individuals simultaneously.

In the COMPARE Global Sewage Surveillance Project, we initiated a global collection of urban sewage in 2016 with the purpose of determining the occurrence of antimicrobial resistance genes and infectious disease agents among the healthy human population using metagenomic sequencing17. Metagenomic sequencing of urban sewage allows not only the identification of disease causing agents, like bacteria and viruses, but also a lot of additional information present in the samples, which were not part of the original scope of the study. In the initial analyses, we observed that on average 0.2% of all reads could be assigned to humans17. This relatively small amount of human DNA is insufficient for the detailed profiling of genotype distributions across the populations but limiting the investigations to the mitochondrion can lead to meaningful results. It has been previously shown, that mitochondrial DNA is suitable to distinguish between fecal contamination of human, bovine, porcine and ovine in contaminated surface water samples18. Human mitochondrial DNA (mtDNA) is a short (16,569 base pairs (bp)) circular DNA present in multiple copies in a single human cell, which makes it easier to detect even in samples with low concentration human DNA content. Human mitochondrion is inherited only from the mother (though recent results indicate that this is not always the case19), and it has also been demonstrated that the inherence is clonal, thus mtDNA is transmitted from mother to offspring without germline recombination20. These make variants of the mtDNA eligible to track evolutionary patterns. The leaves of the human mitochondrial phylogenetic tree are the mitochondrial haplotypes, which can be assigned to mitochondrial haplogroups (the major branching points of the tree) based on their similarities.

Human mitochondrial DNA haplogroups and their distributions have been extensively investigated across different nations and geographical regions21,22,23,24,25,26,27,28,29, predominantly to uncover population origins and genetic structure. The accumulated information about the mtDNA haplogroup composition of ancient and current communities transformed genetic ancestry testing from an abstruse academic quest to a popular and common practice among the public. However, companies providing direct-to-consumer (DTC) ancestry tests have been criticised30 for supplying misleading information to their customers that can deeply affect their personal identities. One of the disadvantages cited is the limited amount of reference samples available in databases upon which inference of geographic ancestry is based. This aspect could be greatly improved by the global and up-to-date monitoring of the mitochondrial DNA composition of populations at different geographic locations.

In this pilot study, we aimed to identify traces of human mitochondrial DNA in the global sewage dataset and determine the local mtDNA haplogroup composition of the sewage catchment area.


Coverage of the human mtDNA in the samples

Urban sewage was collected globally from 79 sample locations and sequenced using Illumina HiSeq obtaining an average of 120 million reads per sample (range: 8 to 398 million). For details see Hendriksen et al.17. The average coverage of the human mtDNA varied greatly among samples (Supplementary Figs 1 and 2b), and altogether 44 samples reached the limit of having a mean coverage of 10 or higher. These samples were further analysed.

To ensure that the satisfactory average coverage along the length of the human mtDNA does not arise from false alignments of homologous non-human DNA segments to short regions resulting in on average high, but very uneven coverage, the pooled coverage was plotted for all the 44 analysed samples on Fig. 1a. The coverage of the mtDNA is fairly even in the investigated samples, the fluctuations do not exceed the known variation of next generation sequencing data31. This is also true when exploring coverage variations in single samples only, albeit with a much lower mean value (Supplementary Fig. 1.). This is however, not the case for other organisms, where besides a few local peaks in coverage, the rest of the mtDNA remains uncovered. This is demonstrated with an example of the rat mtDNA on Fig. 1b, where the peaks in coverage appear almost solely in those regions that are homologous to the human mtDNA. We have also plotted the number of reads aligned uniquely to the human and the rat mtDNA and the number of reads aligned to both on Supplementary Fig. 2a for each of the 44 samples. The exclusively human reads outnumber the reads unique to the rat mtDNA by a factor of 40 on average in samples with a mean coverage of 10 or more. Thus, we can conclude that the identified reads are indeed human and not the results of misalignment. No reads could be aligned to the human mtDNA in our three negative control samples (extraction kit controls, data not shown), thus human DNA contamination during the analysis process can be excluded.

Coverage of the human and rat mtDNA in investigated samples. (a) Pooled coverage along the length of the human mtDNA for samples with an average coverage of 10 or higher. The approximately uniform distribution indicates that the chances of misaligned non-mtDNA reads are minimal. (b) Pooled coverage for the rat mitochondrion for the 44 analyzed samples (blue line). Light red vertical lines indicate genomic regions that are homologous to the human reference mtDNA. The measure of this quantity was defined on a binary scale for each genomic position as follows: if the given genomic position could be included in a section of the rat mtDNA with a sliding window method which section could also be (exactly) found in the human reference mtDNA, the position was deemed to be a homologous one. (The windowsize was chosen to be 19, as this is the default value used by the alignment algorithm as the minimal seed length.).

Full size image

Unsupervised clustering with principal component analysis and t-distributed stochastic neighbor embedding

To confirm that the amount of human mtDNA found in our samples is sufficient for meaningful scientific conclusions to be drawn, we first tested whether the samples could be separated by unsupervised clustering algorithms according to their origin. Previous efforts32 have shown that principal component analysis (PCA) on the human mitochondrial genome can efficiently distinguish individuals based on their mitochondrial haplogroups.

We performed PCA on the 44 samples with an average coverage higher than 10 as a general exploration of sample features (see Methods for details). As human mitochondrial DNA haplogroups are distributed across geographical areas non-uniformly25, it is reasonable to expect that traces of human mitochondrion found in sewage samples would also differ between samples collected from various continents. To test this theory, we projected data from each sample to the subspace spanned by the first two PCA directions (percentage of variance explained 99.87% and 0.03% respectively) (Fig. 2). Samples originating from different continents were distinguished by different colors. As apparent on the figure, sewage samples from both Africa and Asia are remarkably well-separated from the rest of the samples. Samples from Europe and America tend to somewhat mix together, in line with our intuition and previous literary evidence32,33 of these continents having highly diverse populations due to migration. Outliers can however, also be observed on the figure for both Europe and North America. This is most likely due to the fact, that PCA uses a single consensus sequence to characterize each sample even though sewage samples contain mtDNA sequences from the mixture of a large population. Using only the most frequent base in each genomic position (a common practice for consensus sequence generation) artificially creates mixed mtDNA sequences fused together from different mtDNA haplotypes. An additional difference from the method described by Biffi et al.32 is that while they only used 64 tagging SNPs, in our analysis, the whole mitochondrial genome was incorporated to PCA.

Principal component analysis of the 44 samples with average coverage higher than 10. Samples originating from different continents are marked with dots of different colors. PCA was carried out on the whole mitochondrial genome, using the most dominant base in each genomic position with one hot encoding. (The analyzed matrix for the 44 samples had a shape of (44, 4·16,569).).

Full size image

As a basic confirmation of the validity of these results, we also performed t-distributed Stochastic Neighbor Embedding (t-SNE)34 on the samples, which technique uses a different algorithmic concept for dimensionality reduction. We were able to reproduce the above described clusters with this method as well (results not shown).

Phylogenetic analysis

A slightly different approach that is fairly commonly used in human mtDNA analysis pipelines35,36 is the construction of phylogenetic trees, which aims at uncovering evolutionary relationships among samples. Given that the human mitochondrial phylogenetic tree has been extensively studied29,35,36, and human mtDNA haplogroups are defined as its major branch points, the inhomogeneous distribution of haplogroups among different geographical regions suggests that samples collected from the same areas should form clades on their own phylogenetic tree.

To explore the evolutionary relationships between samples, we constructed a phylogenetic tree of the 44 samples with an average coverage of 10 or higher based on the consensus sequences of these samples (Supplementary Fig. 3.). This method suffers from the same limitation as PCA due to using a single consensus sequence to represent a sample, rendering the results somewhat unreliable. Different tree constructing methods resulted in slightly different trees; nevertheless, the main conclusions remained the same. Samples originating from Africa formed a fairly distinct clade, while samples from Europe and America were slightly blended together but the robustness of the trees (see Methods for details) was very low, as expected from the nature of consensus sequence generation.

Human mtDNA haplogroup composition of samples

The above analyses proved that the human mtDNA content in our samples was sufficient to recover the expected tendencies, but the results were largely biased by the pooled nature of sample collection. To overcome the problem of using a single consensus sequence to represent a sample, we analyzed the samples separately and decomposed the aligned reads to differently weighted contributions of different mtDNA haplotypes with an expectation maximization approach37 (see Methods for details). The results of the analysis were plotted on a map as pie charts (Fig. 3). Color codes were selected to match those in Fig. 2 of Rishishwar et al.25 to allow an easy visual comparison. Our results show great agreement with the published results of Rishishwar et al.25, indicating that the mtDNA haplogroup composition of a given area can be accurately determined from trace human mtDNA detected in sewage samples.

Mitochondrial DNA haplogroup composition of the 44 sewage samples with an average coverage of 10 or higher. (a) Mitochondrial DNA haplogroup composition of samples plotted at the site of the wastewater collection. Circle colors and colors of the pie charts correspond to specific haplogroups, while colors of the underscores indicate the four broad biogeographic ancestry categories. (b) Mitochondrial DNA haplogroup composition of samples using only the four broad biogeographic ancestry categories.

Full size image

Comparison with available data

For a more direct comparison with published data of human mtDNA haplogroup compositions of different cities, we plotted mtDNA haplogroup pie charts of sewage samples along with available results of various studies on Supplementary Figs 4–7. (The full list of data sources used can be found at the end of Supplementary Information.) In Supplementary Fig. 7, the mtDNA haplogroup distributions of cities from the United States of America (US) were limited to ratios of mtDNA haplogroups belonging to the four broad biogeographic ancestry categories indicated by the underscores on Fig. 3. Results acquired using US census data can be directly compared to the inner pie charts of sewage samples.

In general, the mtDNA haplogroup composition of the urban sewage samples shows surprisingly great agreement with results previously obtained by careful sampling of specific populations in other studies. As anticipated, some differences do occur, given the extremely dissimilar natures of sample collection. Studies focusing on the evolutionary history of a particular population tend to single out very specific groups of individuals, while collecting wastewater in a given location results in a mixed sampling of all kinds of human mitochondrial DNA. As with any statistical result, the number of samples used by other studies and the relatively low coverage of the human mtDNA of the sewage samples can also contribute to the observed differences. Another factor of uncertainty is the decreasing precision of the software37 used for mtDNA haplogroup decomposition with the number of mitochondrial haplotypes to be recovered in the mixture. For three haplotypes, even the trace contributor (present in 5%) is correctly detected in most cases and only sometimes is it mistakenly identified as a closely related haplotype. Given that our analysis focused on mitochondrial haplogroups instead of specific haplotypes, the errors in the expectation maximization process are somewhat compensated. However, the possibility of increased ambiguity in complex mixtures should not be overlooked. It should also be noted that using only the broad continental ancestry groups for US cities is admittedly a compromise necessitated by the lack of mtDNA haplogroup distribution data specific to these cities. Although the classification is widely used in both scientific literature38 and commercial ancestry testing39, many studies have shown in recent years40,41,42,43,44 that continental-ancestry proportions often vary greatly among individuals sharing the same mtDNA haplogroup. Nevertheless, the basic trends of the mtDNA haplogroup distributions are consistently recovered even from the trace amounts of human mtDNA found in our samples.


To obtain a general idea about the accuracy and reproducibility of our analysis results, four different samplings of the same city but different wastewater treatment plants (El Paso) were plotted alongside each other in Supplementary Fig. 7., and two different sites (Kitwe and Lusaka) from Zambia in Supplementary Fig. 6. All these samples were treated as non-related during the whole analysis pipeline. Simply comparing the pie charts by visual inspection, the results are strikingly similar for all samples collected at the same site or geographically near to each other. Given that mtDNA haplogroups H and V are remarkably close to each other on the phylogenetic tree of human mitochondrial haplogroups29, the pie charts of El Paso become all the more alike.

This suggests that reconstructing the local human mtDNA haplogroup frequencies from sewage samples using the proposed pipeline not only produces results that are in line with previously published data, but that are also highly robust and reproducible.


Many previous studies on human populations have focused on determining the original native populations or ancient evolution and migration, thus, ignoring as much as possible the subpopulations of minorities, temporary foreign workers, immigrants or tourists who may in some places outnumber the residents. Data describing the actual population composition are however, important to study demographic tendencies, health and related socioeconomic trends45 and would be a valuable asset for various institutions and organizations, allowing greater efficiency in the provision of services, support and in improving preventive interventions11. On the other hand, the introduction of ethnic monitoring is a politically sensitive issue that usually evokes resistance and many feel, that collecting data might itself be discriminatory. Handling such sensitive data by governments and other organizations brings up further questions from fear of racial discrimination of ethnic groups to data security. Collection of detailed information on the genetics of the population may provide additional benefits for public health policy makers but such attempts may face an even stronger resistance.

In this study we provide evidence that by short read sequencing of urban sewage the local composition of human populations in respect to their mtDNA haplogroups in a sewage catchment area can be robustly determined and is in reasonable agreement with the available data. Although the main focus of our analysis was the identification of mtDNA haplogroups, the same method with minor modification in the sampling process might be feasible to recover genotype distributions as well. This presents a great possibility for future studies of ethnic and genetic composition of populations, given that these types of analyses are non-invasive, require no informed consent, do not suffer from the limitations of self-reporting and by their nature, provide a well-mixed sampling of the local population.

Many studies indicate that different mitochondrial DNA haplogroups are variously associated with medical conditions and genetic diseases. These include coronary artery disease, diabetic retinopathy, early-onset Alzheimer’s disease, frontotemporal lobar degeneration, AIDS progression, breast, prostate and renal cancer and many more46,47,48,49,50. Many of these established associations are already applied in clinical practice as either biomarkers or aids for patient stratification51. These findings suggest that a comprehensive investigation of the mtDNA haplogroup composition of populations at different geographical locations could serve as a helpful guide for disease control by allowing for region-specific prevention strategies and increasing awareness of medical conditions more likely to occur in the local society. Our results demonstrate that the sequencing of urban sewage followed by a subsequent analysis using our proposed pipeline could not only make such a project feasible, but also produce reliable and accurate results.

Population level collection of complete human genome sequences can be even richer source of genetics related health monitoring. As the DNA purification method used for our samples was optimized for the isolation of bacterial DNA52, our results suggest that by further optimization to target human DNA, an increased sequencing depth would provide an even more detailed view of population genetics. Many genetic diseases are associated with the presence of specific single nucleotide variations or insertions/deletions that have been shown to occur non-uniformly across different populations53,54,55. Thus, by analysing the frequencies of these mutations from sewage samples, purposeful steps could be taken to ensure locally effective screening and prevention. Mapping out the genetic landscape of different populations with a city-scale, or even larger resolution would also be beneficial for resolving ancestry-related questions and aiding forensics efforts. Similarly, population genetic studies are commonly based on genetic data from a large number of individuals without the explicit need for personal identification, thus retrieving whole genome sequences from urban sewage would be ideal for this purpose.

Emerging technologies like blockchains56 promise complete anonymity even for genomic data, but it may take decades to build trust for such systems for the general population. Thus, in contrast to individual level whole genome analysis, sequencing population level DNA mixtures from sewage may provide a viable path. Even though it has been previously shown14 that the presence or absence of a single individual can be established even at a trace level from a pooled mixture of various DNAs, the complete genome of the person of interest has to be on hand prior to the analysis. Thus pooled community sequencing does not contribute an additional risk of possible violations of privacy for individuals whose genome is otherwise unknown. The fact that the sequencing of sewage samples requires no active participation from the community makes the technique even more appealing.

Our results also highlight the future possibility of monitoring demographic effects (such as global migration or the segregation of local communities) in the population in-time, as wastewater collection can be accomplished without the need for lengthy preparations and high cost investments and thus can be repeated as required.

It should be emphasized that the data analysed for this study was collected for the purpose of studying the global distribution and abundance of antimicrobial resistance genes and not human genetics. Thus, potential future studies on human populations based on sewage should take into consideration the specific features of the sewage catchment area; among others their exact geographic location in the individual cities and countries, to make sure that a representative sampling of the local population is achieved by wastewater collection. Our results do however, show the potential of analysing urban sewage not only for antimicrobial resistance and infectious disease agents, but also human populations in one and the same analysis.


Sample acquisition

Urban sewage was collected globally from 79 sample locations, covering seven geographical regions from 74 cities in 60 countries17. DNA was extracted from the sewage pellets according to an optimized protocol using the QIAamp Fast DNA Stool Mini Kit including twice the input material and initial bead beating57. DNA from all samples was mechanically sheared to a targeted fragment size of 300 bp using ultrasonication (Covaris E220evolution). Library preparation was performed with the NEXTflex PCR-free Library Preparation Kit (Bioo Scientific). The Bioo NEXTflex-96 adapter set (Bioo Scientific) was used, and in batches of roughly 60 samples, the libraries were multiplexed and sequenced on the HiSeq. 3000 platform (Illumina), using 2 × 150-bp paired-end sequencing per flow cell with a mean of 120 million reads (range: 8 to 398 million) per sample.

Identification of human reads

Short reads were aligned to the reference genome with the BWA-MEM58 algorithm. The human mitochondrial revised Cambridge Reference Sequence59 (NCBI ID: NC_012920.1) was used as a reference genome and the default settings were used to conduct the alignment. To lower the risk of misinterpreting the results, and to verify that the reads mapped to the human mtDNA were indeed derived from humans, we also performed an alignment to several vertebrate species (Bos taurus, Sus scrofa, Danio rerio, Canis lupus familiaris, Gallus gallus, Ovis aries; data not shown) including the Norway rat (Rattus norvegicus) mitochondrion sequence (NCBI ID: NC_001665.2) (see Fig. 2b).

Alignment results were validated for the presence of PCR duplicates with the samtools software tool60, but none were detected, thus eliminating the need for duplicate removal. Sewage samples collected at different times but originating from the same treatment plant were pooled together during analysis to obtain higher coverage. In cases where samples from the same general geographic location, but from different treatment plants reached the necessary level of coverage, this allowed the comparison of samples collected from the same city (El Paso) or near to each other (Kitwe and Lusaka).

Only 44 samples were considered for further analysis that had an average coverage of at least 10 in the human mitochondrion.

Principal component analysis

After short read alignment, a principal component analysis (PCA) was carried out based on the most dominant base (supported by the majority of the aligned reads) found in each genomic position of the mtDNA using the samtools mpileup command60. The data for the analyzed 44 samples was condensed into a matrix of shape (44, 4·16,569) using one hot encoding. This was achieved by assigning a value of 1 to the most dominant base in each genomic position for each sample, and a value of 0 to all the other bases. PCA was performed using scikit-learn61 python package.

t-distributed stochastic neighbor embedding

Using the above described one hot encoded matrix, a t-distributed Stochastic Neighbor Embedding (t-SNE)34 pipeline was also run on the data with the scikit-learn61 python module. As suggested by the manual, initial dimension reduction was achieved by selecting the top 50 most dominant PCA components of the originally 4·16,569-dimension space and t-SNE was performed in a subsequent step.

Phylogenetic tree construction

To obtain a general idea about how the samples might relate to each other, we performed a simple phylogenetical analysis. As a first step, consensus sequences were generated for each sample with the help of bcftools and vcfutils. These are more refined than the above described method of simply choosing the most common base at each genomic position in the sense, that if multiple bases were found at a given position, it is also possible to assign the somewhat ambiguous “pyrimidine” or “purine” values to these sites. These sequences were then multiple aligned with the ClustalW algorithm implemented in Biopython62. Phylogenetic trees were constructed with two different methods (neighbor joining63 and maximum parsimony64) using the Phylo module of Biopython. The robustness of the trees was accessed as the proportion of the 1000 phylogenetic trees created with bootstrapping that agreed with the topology of the original tree for each clade separately.

Reconstruction of contributions from different mtDNA haplogroups

Given that human mtDNA sequences present in sewage samples are likely to be diverse mixtures of general populations living in a specific area, it can be of great interest to decompose the available reads to differently weighted contributions of various mtDNA haplogroups. To achieve this, we used a computational tool called mixemt37. The algorithm uses the database provided by PhyloTree.org (Phylotree Build 17) that describes the defining mutations of over 5000 mtDNA haplotypes. After aligning short reads of the investigated samples to the reference sequence of the human mitochondrion, the pipeline assigns a value to each read and mtDNA haplotype pair that describes how consistent the variants in the given read are with the given mtDNA haplotype, while accounting for sequencing errors. It also sets the initial haplotype proportions in the sample randomly by drawing from a Dirichlet distribution. Then it employs an expectation maximization approach, which first calculates the conditional probabilities of observing the variants in the read in the mtDNA haplotype, given the current mixture proportions. The new mixture composition is determined by finding the values that maximize the conditional probabilities. These steps are iterated until convergence. Once convergence is reached, the mtDNA haplotypes present in the sample are reevaluated by employing additional filtering steps and the iteration is then repeated on the contributing mtDNA haplotypes only, resulting in the final mtDNA haplotype composition of the sample. We used default settings with the -V option, as suggested for low coverage samples. Identified mtDNA haplotypes were grouped into mtDNA haplogroups for easier comparison with available data.

Comparison with available mtDNA haplogroup composition data

Relevant information about the mtDNA haplogroup distribution of the investigated cities was gathered from literature. We aimed to compare our data with city-, or region-specific results of different studies whenever possible, however, in the absence of such particular information, the country level distributions were used for reference.

Given the lack of specific mtDNA haplogroup distribution data for US cities, we collected census data available about the ethnic composition of the cities and used the results of Just et al.65 to convert these ratios to a distribution of mtDNA haplogroups belonging to four broad biogeographic ancestry categories. These categories are indicated by different colors of the underscores for each haplogroup on Fig. 3. Since Just et al.65 contained no specific information about the mtDNA haplogroup composition of the Asian population in the US, the crude assumption was made that individuals with self-reported Asian ethnicity belong strictly to the Asian ancestry group. Even though this obvious simplification might skew the results, given the relatively low ratio of Asian population in US cities, this effect is presumed to be negligible. The mtDNA haplogroup composition obtained from sewage data was also transformed to ratios of the four main ancestry categories for more direct comparison in the case of US cities (Supplementary Fig. 7.).

Materials & correspondence

Correspondence and material requests should be addressed to István Csabai.

Data Availability

Sequencing data analysed by this study were collected and prepared by Hendriksen et al.17 and can be found on the European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) under study accession number ERP109094.


  1. 1.

    Esplin, E. D., Oei, L. & Snyder, M. P. Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease. Pharmacogenomics15, 1771–1790 (2014).

    CASArticle Google Scholar

  2. 2.

    Falfán-Valencia, R. et al. An Increased Frequency in HLA Class I Alleles and Haplotypes Suggests Genetic Susceptibility to Influenza A (H1N1) 2009 Pandemic: A Case-Control Study. J. Immunol. Res.2018, 1–12 (2018).

    Article Google Scholar

  3. 3.

    Kenney, A. D. et al. Human Genetic Determinants of Viral Diseases. Annu. Rev. Genet.51, 241–263 (2017).

    CASArticle Google Scholar

  4. 4.

    Bustamante, M. et al. A genome-wide association meta-analysis of diarrhoeal disease in young children identifies FUT2 locus and provides plausible biological pathways. Hum. Mol. Genet.25, 4127–4142 (2016).

    CASArticle Google Scholar

  5. 5.

    Gabai-Kapara, E. et al. Population-based screening for breast and ovarian cancer risk due to BRCA1 and BRCA2. Proc. Natl. Acad. Sci. USA111, 14205–10 (2014).

    CASArticleADS Google Scholar

  6. 6.

    Perkins, B. A. et al. Precision medicine screening using whole-genome sequencing and advanced imaging to identify disease risk in adults. Proc. Natl. Acad. Sci.115, 3686–3691 (2018).

    CASArticle Google Scholar

  7. 7.

    Niemiec, E. & Howard, H. C. Ethical issues in consumer genome sequencing: Use of consumers’ samples and data. Appl. Transl. genomics8, 23–30 (2016).

    Article Google Scholar

  8. 8.

    Alzu’bi, A., Zhou, L. & Watzlaf, V. Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals. Perspect. Heal. Inf. Manag.11, 1c (2014).

    Google Scholar

  9. 9.

    Brothers, K. B. & Rothstein, M. A. Ethical, legal and social implications of incorporating personalized medicine into healthcare. Per. Med.12, 43–51 (2015).

    CASArticle Google Scholar

  10. 10.

    Principles and Recommendations for Population and Housing Censuses. Department of Economic and Social Affairs, Statistics Division (2017).

  11. 11.

    Gill, P. S. & Johnson, M. Ethnic monitoring and equity. Bmj310, 890 (1995).

    CASArticle Google Scholar

  12. 12.

    Liao, Y. et al. Surveillance of health status in minority communities - Racial and Ethnic Approaches to Community Health Across the U.S. (REACH U.S.) Risk Factor Survey, United States, 2009. MMWR Surveill Summ60, 1–44 (2011).

    PubMed Google Scholar

  13. 13.

    Farkas, L. Analysis and comparative review of equality data collection practices in the European Union Data: Data collection in the field of ethnicity. https://doi.org/10.2838/447194 (2017).

  14. 14.

    Homer, N. et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet.4, e1000167 (2008).

    Article Google Scholar

  15. 15.

    Research Ethics Committees of the Capital Region of Denmark, 29th January, H-14013582 (www.regionh.dk) (2015)

  16. 16.

    Research Ethics Committees of the Capital Region of Denmark, 18th November 2016, 16037921 (www.regionh.dk).

  17. 17.

    Hendriksen, R. S. et al. Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage. Nat. Commun.10, 1124 (2019).

    ArticleADS Google Scholar

  18. 18.

    Martellini, A., Payment, P. & Villemur, R. Use of eukaryotic mitochondrial DNA to differentiate human, bovine, porcine and ovine sources in fecally contaminated surface water. Water Res.39, 541–548 (2005).

    CASArticle Google Scholar

  19. 19.

    Luo, S. et al. Biparental Inheritance of Mitochondrial DNA in Humans. Proc. Natl. Acad. Sci. USA115, 13039–13044 (2018).

    CASArticle Google Scholar

  20. 20.

    Hagström, E., Freyer, C., Battersby, B. J., Stewart, J. B. & Larsson, N.-G. No recombination of mtDNA after heteroplasmy for 50 generations in the mouse maternal germline. Nucleic Acids Res.42, 1111–6 (2014).

    Article Google Scholar

  21. 21.

    Torroni, A. et al. Classification of European mtDNAs From an Analysis of Three European Populations. Genetics144, 1835–1850 (1996).

    CASPubMedPubMed Central Google Scholar

  22. 22.

    Comas, D. et al. Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur. J. Hum. Genet.12, 495–504 (2004).

    CASArticle Google Scholar

  23. 23.

    Chen, Y.-S. et al. Analysis of mtDNA Variation in African Populations Reveals the Most Ancient of All Human Continent-Specific Haplogroups. Am. J. Hum. Genet57, 133–149 (1995).

    CASPubMedPubMed Central Google Scholar

  24. 24.

    Cann, R. L., Stoneking, M. & Wilson, A. C. Mitochondrial DNA and human evolution. Nature325, 31–36 (1987).

    CASArticleADS Google Scholar

  25. 25.

    Rishishwar, L. & Jordan, I. K. Implications of human evolution and admixture for mitochondrial replacement therapy. BMC Genomics18, 140 (2017).

    Article Google Scholar

  26. 26.

    Underhill, P. A. & Kivisild, T. Use of Y Chromosome and Mitochondrial DNA Population Structure in Tracing Human Migrations. Annu. Rev. Genet.41, 539–564 (2007).

    CASArticle Google Scholar

  27. 27.

    Cavalli-Sforza, L. L. & Feldman, M. W. The application of molecular genetic approaches to the study of human evolution. Nat. Genet.33, 266–275 (2003).

    CASArticle Google Scholar

  28. 28.

    Torroni, A. et al. Asian affinities and continental radiation of the four founding Native American mtDNAs. Am. J. Hum. Genet.53, 563–90 (1993).

    CASPubMedPubMed Central Google Scholar

  29. 29.

    van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat.30, 386–394 (2009).

    Article Google Scholar

  30. 30.

    Deborah, A. et al. The Science and Business of Genetic Ancestry Testing. Science (80-.).318, 399–400 (2007).

    Article Google Scholar

  31. 31.

    Ekblom, R., Smeds, L. & Ellegren, H. Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria. BMC Genomics15, 467 (2014).

    Article Google Scholar

  32. 32.

    Biffi, A. et al. Principal-Component Analysis for Assessment of Population Stratification in Mitochondrial Medical Genetics. Am. J. Hum. Genet.86, 904–917 (2010).

    CASArticle Google Scholar

  33. 33.

    Simoni, L., Calafell, F., Pettener, D., Bertranpetit, J. & Barbujani, G. Geographic patterns of mtDNA diversity in Europe. Am. J. Hum. Genet.66, 262–78 (2000).

    CASArticle Google Scholar

  34. 34.

    Van Der Maaten, L. & Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning Research9 (2008).

  35. 35.

    Ingman, M., Kaessmann, H., Pääbo, S. & Gyllensten, U. Mitochondrial genome variation and the origin of modem humans. Nature408, 708–713 (2000).

    CASArticleADS Google Scholar

  36. 36.

    Maca-Meyer, N., González, A. M., Larruga, J. M., Flores, C. & Cabrera, V. M. Major genomic mitochondrial lineages delineate early human expansions. BMC Genet.2, 13 (2001).

    CASArticle Google Scholar

  37. 37.

    Vohr, S. H. et al. A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures. Forensic Sci. Int. Genet.30, 93–105 (2017).

    CASArticle Google Scholar

  38. 38.

    Bamshad, M., Wooding, S., Salisbury, B. A. & Stephens, J. C. Deconstructing the relationship between genetics and race. Nat. Rev. Genet.5, 598–609 (2004).

    CASArticle Google Scholar

  39. 39.

    Royal, C. D. et al. Inferring Genetic Ancestry: Opportunities, Challenges, and Implications. Am. J. Hum. Genet.86, 661 (2010).

    CASArticle Google Scholar

  40. 40.

    Emery, L. S., Magnaye, K. M., Bigham, A. W., Akey, J. M. & Bamshad, M. J. Estimates of Continental Ancestry Vary Widely among Individuals with the Same mtDNA Haplogroup. Am. J. Hum. Genet.96, 183–193 (2015).

    CASArticle Google Scholar

  41. 41.

    Watkins, W. et al. Genetic analysis of ancestry, admixture and selection in Bolivian and Totonac populations of the New World. BMC Genet.13, 39 (2012).

    CASArticle Google Scholar

  42. 42.

    Cardena, M. M. S. G. et al. Assessment of the Relationship between Self-Declared Ethnicity, Mitochondrial Haplogroups and Genomic Ancestry in Brazilian Individuals. PLoS One8, e62005 (2013).

    CASArticleADS Google Scholar

  43. 43.

    Poetsch, M. et al. Determination of population origin: A comparison of autosomal SNPs, Y-chromosomal and mtDNA haplogroups using a Malagasy population as example. Eur. J. Hum. Genet.21, 1423–1428 (2013).

    CASArticle Google Scholar

  44. 44.

    Salas, A. et al. The mtDNA ancestry of admixed Colombian populations. Am. J. Hum. Biol.20, 584–591 (2008).

    CASArticle Google Scholar

  45. 45.

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA108, 11983–8 (2011).

    CASArticleADS Google Scholar

  46. 46.

    Kofler, B. et al. Mitochondrial DNA haplogroup T is associated with coronary artery disease and diabetic retinopathy: a case control study. BMC Med. Genet.10, 35 (2009).

    Article Google Scholar

  47. 47.

    Krüger, J., Hinttala, R., Majamaa, K., Remes, A. M. & Mitochondrial, D. N. A. haplogroups in early-onset Alzheimer’s disease and frontotemporal lobar degeneration. Mol. Neurodegener.5, 8 (2010).

    Article Google Scholar

  48. 48.

    Hendrickson, S. L. et al. Mitochondrial DNA haplogroups influence AIDS progression. AIDS22, 2429–39 (2008).

    CASArticle Google Scholar

  49. 49.

    Darvishi, K. et al. G10398A polymorphism imparts maternal Haplogroup N a risk for breast and esophageal cancer. Cancer Lett.249, 249–255 (2007).

    CASArticle Google Scholar

  50. 50.

    Booker, L. M. et al. North American White Mitochondrial Haplogroups in Prostate and Renal Cancer. J. Urol.175, 468–473 (2006).

    CASArticle Google Scholar

  51. 51.

    Urzúa-Traslaviña, C. G. et al. Relationship of Mitochondrial DNA Haplogroups with Complex Diseases. J. Genet. Genome Res.1, 1–5 (2014).

    Article Google Scholar

  52. 52.

    Knudsen, J. D., Hägglöf, C., Weber, N. & Carlquist, M. Increased availability of NADH in metabolically engineered baker’s yeast improves transaminase-oxidoreductase coupled asymmetric whole-cell bioconversion Microbial Cell Factories. Microb. Cell Fact. 15 (2016).

  53. 53.

    van Beek, E. J. A. H. et al. Rates of TP53 Mutation are Significantly Elevated in African American Patients with Gastric Cancer. Ann. Surg. Oncol.25, 2027–2033 (2018).

    Article Google Scholar

  54. 54.

    Bollig-Fischer, A. et al. Racial Diversity of Actionable Mutations in Non–Small Cell Lung Cancer. J. Thorac. Oncol.10, 250–255 (2015).

    CASArticle Google Scholar

  55. 55.

    Kurian, A. W. BRCA1 and BRCA2 mutations across race and ethnicity: distribution and clinical implications. Curr. Opin. Obstet. Gynecol.22, 72–78 (2010).

    Article Google Scholar

  56. 56.

    Ozercan, H. I., Ileri, A. M., Ayday, E. & Alkan, C. Realizing the potential of blockchain technologies in genomics. Genome Res.28, 1255–1263 (2018).

    CASArticle Google Scholar

  57. 57.

    Knudsen, B. E. et al. Impact of Sample Type and DNA Isolation Procedure on Genomic Inference of Microbiome Composition. mSystems1 (2016).

  58. 58.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. 25, 1754–176010 (2009).

    CAS Google Scholar

  59. 59.

    Andrews, R. M. et al. Reanalysis and revision of the cambridge reference sequence for human mitochondrial DNA [5]. Nat. Genet.23, 147 (1999).

    CASArticle Google Scholar

  60. 60.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinforma. Appl. NOTE25, 2078–2079 (2009).

    Article Google Scholar

  61. 61.

    Pedregosa FABIANPEDREGOSA, F. et al. Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT et al. Matthieu Perrot. Journal of Machine Learning Research12 (2011).

  62. 62.

    Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinforma. Appl. NOTE25, 1422–1423 (2009).

    CASArticle Google Scholar

  63. 63.

    Saitou, N. & Nei, M. The Neighbor-Joining Method - A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol.4, 406–425 (1987).

    CASPubMed Google Scholar

  64. 64.

    Langley, C. H. & Fitch, W. M. An examination of the constancy of the rate of molecular evolution. J. Mol. Evol.3, 161–177 (1974).

    CASArticleADS Google Scholar

  65. 65.

    Just, R. S. et al. Full mtGenome reference data: Development and characterization of 588 forensic-quality haplotypes representing three U.S. populations. Forensic Sci. Int. Genet.14, 141–155 (2014).

    Article Google Scholar

Download references


The authors are grateful to the collaborators of the Global Sewage Surveillance project consortium for providing samples and to G. Vattay and S. Laki for help with data processing. This study has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 643476 (COMPARE), the World Health Organization and The Novo Nordisk Foundation (NNF16OC0021856: Global Surveillance of Antimicrobial Resistance).

Author information


  1. Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1A, Budapest, 1117, Hungary

    Orsolya Anna Pipek, Anna Medgyes-Horváth, László Dobos, József Stéger, Dávid Visontai & István Csabai

  2. Department of Information Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1C, Budapest, 1117, Hungary

    János Szalai-Gindl

  3. Department of Computational Sciences, Wigner Research Centre for Physics of the HAS, Konkoly-Thege Miklós út 29–33., Budapest, 1121, Hungary

    László Dobos, József Stéger, János Szalai-Gindl, Dávid Visontai & István Csabai

  4. National Food Institute, Technical University of Denmark, Kgs., Lyngby, Denmark

    Rolf S. Kaas, Rene S. Hendriksen & Frank M. Aarestrup

  5. Viroscience department, Erasmus Medical Center, Rotterdam, The Netherlands

    Marion Koopmans


O.A.P. and A.M-H. contributed to the analysis and interpretation of the data and to the writing of the manuscript. L.D., J.S., J.S-G., D.V. and R.S.K. contributed to the acquisition of the data. M.K., R.S.H. and F.M.A. contributed to the acquisition of the data and the revision of the manuscript. I. C. contributed to the conception and coordination of the study and the writing of the manuscript. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to István Csabai.

Ethics declarations

Competing Interests

Sours: https://www.nature.com/articles/s41598-019-48093-5
Haplogroup R1b (Y-DNA)

Her nipples, hardened to an unthinkable state, were asking for the field, and Vika simply tore off her blouse and moved her bra, releasing them to freedom. The nipples immediately jumped out and literally stuck into the body of Richie, who moved back and forth along them, adding pleasure. Rich was bigger than Vicki and her head was pressed against the bed by Richie's chest, and from his quick movements his hair.

Fell into her nose and mouth.

Now discussing:

The town where Marie lived was no different from other French towns with a population of between 50,000 and 100,000. After taking a shower and having breakfast, Marie began to dress. She was a slender, beautiful woman and loved to look a little provocative in her outfits.

161 162 163 164 165