Kudos to you for investing the time to better understand scientific methods and Kurd demography. I highly encourage others from South & West Asia to do the same. The more the merrier!
Learning to properly use acceptable DNA analysis tools is a steep learning curve as I learned around 10 years ago when I got involved in this and inspite of investing 1000s of hours using and developing new bioinformatics methods I’m still learning.
A quick note about qpAdm which I have discussed a few years ago on my website It’s a great tool but needs to be properly understood.
1- The Null hypothesis in qpAdm is that the mixture model outputted is true. P-value of 0.05 is usually used as a significance threshold. P-value signifies probability there’s Null or no difference from the Null hypothesis of the mixture model being true.
Thus the higher the p-value the more evidence that the mixture model outputted is true. That’s why we favor models with higher p-values. P<0.05 is used to reject the model
2- Years ago we found that the genotyping pipeline type and QC affects p-value. For example mixing Simons samples with 1000G and HGDP and 23andMe as well as joint genotyping such as often used in GATK. We have also found that the flipping of minor and major alleles in Plink also causes issues when Plink processed samples are merged with VCF data obtained directly from genotyping pipelines. Also SNP IDs shouldn’t be taken for granted as we have had to correct 1000s in some datasets.
That’s why we have invested alot of money to acquire powerful computers to do our own fasta file genotyping and review SNP IDs in all samples. We always also address ascertainment bias which can be a killer in DNA analysis.
In science the more evidences you have in support of your argument the better it will be supported. That’s why we go the extra mile in our studies such as
“ Post Iron Age Introduction of Y-DNA R1a R-Z94 and East Asian Ancestry into Kurdistan, North Iran, and Turkey with the Parthians and Scythians” at www.EurasianDNA.com and point out:
1- R1a-Z94 became part of demography of Kurds and other ethnic groups in western Iran more recently than 2500 years ago based on the lack thereof in the hundreds of samples published in the “Southern Arc” study.
2- We corroborate Parthians, Scythians and Turkics being the vector for R1a-Z94 introduction by showing that R1a-Z94 rich populations in West Asia are shifted to the exclusion of Armenians and SW Iranians on the Siberian, E.Asian and C Asian Iranian axis using 20 or so pright references in qpWave and qpAdm. We use both contemporary and higher quality ancients to show this.
@Nezih
Your model using Sintashta can be interpreted as “Sintashta like non R1a” especially in light of the lower p-values and lack of R1a in Hasanlu’s time. It’s clear that pure Sintashtans didn’t exist anywhere especially near W Asia when R1a was introduced to Kurdistan about 2000 years ago. That vector must have been Parthian, Scythian and Turkic
We also were able to reject TKM-IA as BMAC+Sintashta in our study on our website a couple of years ago. Andronovo fared better which makes sense based on their larger distribution and proximity to BMAC.
You’ll find when using better quality samples and more SNPs and more pright references that Kurds model better as Iran-Chl/IA + Scythian/Sarmatian/Parthian than Iran-Chl/IA + TKM-IA
The model for Hasanlu_LBA with Sintashta is wrong. It does not hold up to scrutiny, and neither does it explain the R1b in the Hasanlu samples. I'm afraid you are using it to push agendas rather than arrive at the truth.
I have tried to replicate your model here, but it fails.
I wouldn't consider a model with a p-value of 0.01 and 23 right pops as a failed one, sorry. I have no agenda to push as I have no horse in this Proto-Aryan homeland race; I just see Sintashta as the most reasonable option for now (although I tend to accept Southern Caucasus as the Early PIE homeland) and have shared the model that makes the most sense to me. I didn't share all possible models and I didn't claim that this is the only possible model. But it is a quite good model with passing p-values and all those who are familiar with qpAdm can decide for themselves whether a model with 20 or more diverse and relevant right pops with passing p-values is valid or not. I think you're the one who's pushing too hard here. Are you really suggesting that this is not one of the valid models? If you are, then I see no point in discussing with you, no hard feelings.
Right. Because 10 R1b samples at Hasanlu and 0 R1a along with 0 sintashta mtdna markers makes it clear to you that the steppe source must be Sintashta rather than the obvious Yamnaya/armenia connection which is also borne out by archaeology.
Yes, I am really suggesting that Sintashta source is not a valid model as it does not pass. Your own model here cannot be replicated. The stakes are high, and the truth counts for something, if not everything.
Well, I think it should have been clear from my article that I don't deny the Caucasus route for the steppe ancestry (Catacomb/Poltavka part). I just don't think that it's the only source of it. I think my thoughts about R1a are also clear. So neither the presence of R1b nor the lack of R1a provides a valid argument against my position.
Because of the reason I have stated in my previous reply, unfortunately I see no point in discussing with you. Jaya Śrīman Nārāyaṇa.
Earlier you said "But it is a quite good model with passing p-values and all those who are familiar with qpAdm can decide for themselves whether a model with 20 or more diverse and relevant right pops with passing p-values is valid or not. "
I have appended these two results on my blog article as well. I just want more rigour. qpAdm is a statistical test and is supposed to provide us with objective results, not subjective 'feels'.
This is my last reply to you on this issue and it is actually not for you but for those who will read this conversation between us. Further comments on this subject will be deleted. I have nothing against you as a person but unfortunately have no time to deal with irrational obsessions.
1. More robust doesn't necessarily mean closer to truth in qpAdm. A model that is in accordance with other data we have can be considered better than a more robust one that isn't.
2. 0.05 is an arbitrary threshold, 0.01 is generally good enough and Sintashta which easily gives higher than 0.05 (in fact 0.2 with even 25 right pops as in my model) is definitely one of the potential sources of the steppe ancestry in the Hasanlu_LBA_A. You certainly did not replicate my model as it is and by playing with right pops you most probably can get "non-passing" p-values (such as 0.0497 as in one of your proofs on how my model doesn't pass) for the models you propose too.
3. Unlike you I am not representing my qpAdm results as THE TRUTH but only as the most logical explanation according to my view in the light of non-genetic data we have. I MIGHT BE WRONG AT THE END. BUT THE MODEL IS VALID STATISTICALLY AND REPRESENTS A QUITE SERIOUS POSSIBILITY. THE MOST INTELLECTUALLY PLAUSIBLE POSSIBILITY TO ME.
1) Have you tried modeling Hasanlu and DinkhaTepe_B as a two-way mix of Seh Gabi C & Hajji Firuz BA? It makes more sense than using Hajji Firuz N/EC as it's an early source for Seh Gabi C.
2) DinkhaTepe_A can be modeled as Ebla + Seh Gabi C. Don't you think it fits better considering it's even more western than Sirnak BA despite their similarity?
1) I think you mean Hasanlu_MBA? I have not actually, it is a reasonable suggestion. As for DinkhaTepe_B I think they both are too early. In this article I have tried to model targets with more proximal sources; this is why Hasanlu_LBA_A and DinkhaTepe_A seemed better.
2) Again, I have tried to use more proximal sources; hence I would prefer Hasanlu_MBA over SehGabi_C in the case of DinkhaTepe_A. I think this is reflecting a more realistic scenario, although not necessarily so.
The reason is likely the differential relationship within deeper ancestral sources. What we call "steppe ancestry" is actually a mixture of some deeper ancestral populations and if a model can supply some of it via another source then the steppe percentage might decrease. For the Kushan_North example I think the key is Tarim_EMBA1 and also some West Eurasian admixture in Khövsgöl_BA which have been used to model EasternIranic_IA.
ah, ok. Which of the models for the samples are more "reliable" or "accurate" then? Example talking about the kushan samples? Is 45% steppe more likely, or 25-27%???
So YAZ_II is most accurate model, then comes kushans and then parthians???
Do kurds score around 14-17% steppe on these models and qpadm in general? The models sort of seem like that, if the parthian-kushan samples scores like 20-27% steppe
Not that Parthians after Kushans. Kushan_West and Parthian/Khwarezmian samples are quite similar genetically. So they both are plausible.
If the Yaz_II models for them are reflecting a more realistic scenario then Kushan_West samples should have around 30% steppe (EBA+MLBA) ancestry. Don't forget that DinkhaTepe_BIA_B also has some. So it will always depend on the model, but somewhere around 20% steppe ancestry for Kurds seems more realistic to me. You might want to see the table in my previous post.
with the model involving kushan north, the kurd scores 15% steppe though(with 60% aramaean). That one is more unreliable than the other models of kurds, right?
All the models with safavids and kurds scoring below 20% steppe(15-16%) are wrong/less reliable, right?
Kudos to you for investing the time to better understand scientific methods and Kurd demography. I highly encourage others from South & West Asia to do the same. The more the merrier!
Learning to properly use acceptable DNA analysis tools is a steep learning curve as I learned around 10 years ago when I got involved in this and inspite of investing 1000s of hours using and developing new bioinformatics methods I’m still learning.
A quick note about qpAdm which I have discussed a few years ago on my website It’s a great tool but needs to be properly understood.
1- The Null hypothesis in qpAdm is that the mixture model outputted is true. P-value of 0.05 is usually used as a significance threshold. P-value signifies probability there’s Null or no difference from the Null hypothesis of the mixture model being true.
Thus the higher the p-value the more evidence that the mixture model outputted is true. That’s why we favor models with higher p-values. P<0.05 is used to reject the model
2- Years ago we found that the genotyping pipeline type and QC affects p-value. For example mixing Simons samples with 1000G and HGDP and 23andMe as well as joint genotyping such as often used in GATK. We have also found that the flipping of minor and major alleles in Plink also causes issues when Plink processed samples are merged with VCF data obtained directly from genotyping pipelines. Also SNP IDs shouldn’t be taken for granted as we have had to correct 1000s in some datasets.
That’s why we have invested alot of money to acquire powerful computers to do our own fasta file genotyping and review SNP IDs in all samples. We always also address ascertainment bias which can be a killer in DNA analysis.
In science the more evidences you have in support of your argument the better it will be supported. That’s why we go the extra mile in our studies such as
“ Post Iron Age Introduction of Y-DNA R1a R-Z94 and East Asian Ancestry into Kurdistan, North Iran, and Turkey with the Parthians and Scythians” at www.EurasianDNA.com and point out:
1- R1a-Z94 became part of demography of Kurds and other ethnic groups in western Iran more recently than 2500 years ago based on the lack thereof in the hundreds of samples published in the “Southern Arc” study.
2- We corroborate Parthians, Scythians and Turkics being the vector for R1a-Z94 introduction by showing that R1a-Z94 rich populations in West Asia are shifted to the exclusion of Armenians and SW Iranians on the Siberian, E.Asian and C Asian Iranian axis using 20 or so pright references in qpWave and qpAdm. We use both contemporary and higher quality ancients to show this.
@Nezih
Your model using Sintashta can be interpreted as “Sintashta like non R1a” especially in light of the lower p-values and lack of R1a in Hasanlu’s time. It’s clear that pure Sintashtans didn’t exist anywhere especially near W Asia when R1a was introduced to Kurdistan about 2000 years ago. That vector must have been Parthian, Scythian and Turkic
We also were able to reject TKM-IA as BMAC+Sintashta in our study on our website a couple of years ago. Andronovo fared better which makes sense based on their larger distribution and proximity to BMAC.
You’ll find when using better quality samples and more SNPs and more pright references that Kurds model better as Iran-Chl/IA + Scythian/Sarmatian/Parthian than Iran-Chl/IA + TKM-IA
Best, Dilawer
The model for Hasanlu_LBA with Sintashta is wrong. It does not hold up to scrutiny, and neither does it explain the R1b in the Hasanlu samples. I'm afraid you are using it to push agendas rather than arrive at the truth.
I have tried to replicate your model here, but it fails.
https://a-genetics.blogspot.com/2022/11/hasanlu-model-critique.html
I wouldn't consider a model with a p-value of 0.01 and 23 right pops as a failed one, sorry. I have no agenda to push as I have no horse in this Proto-Aryan homeland race; I just see Sintashta as the most reasonable option for now (although I tend to accept Southern Caucasus as the Early PIE homeland) and have shared the model that makes the most sense to me. I didn't share all possible models and I didn't claim that this is the only possible model. But it is a quite good model with passing p-values and all those who are familiar with qpAdm can decide for themselves whether a model with 20 or more diverse and relevant right pops with passing p-values is valid or not. I think you're the one who's pushing too hard here. Are you really suggesting that this is not one of the valid models? If you are, then I see no point in discussing with you, no hard feelings.
Right. Because 10 R1b samples at Hasanlu and 0 R1a along with 0 sintashta mtdna markers makes it clear to you that the steppe source must be Sintashta rather than the obvious Yamnaya/armenia connection which is also borne out by archaeology.
Yes, I am really suggesting that Sintashta source is not a valid model as it does not pass. Your own model here cannot be replicated. The stakes are high, and the truth counts for something, if not everything.
Well, I think it should have been clear from my article that I don't deny the Caucasus route for the steppe ancestry (Catacomb/Poltavka part). I just don't think that it's the only source of it. I think my thoughts about R1a are also clear. So neither the presence of R1b nor the lack of R1a provides a valid argument against my position.
Because of the reason I have stated in my previous reply, unfortunately I see no point in discussing with you. Jaya Śrīman Nārāyaṇa.
Earlier you said "But it is a quite good model with passing p-values and all those who are familiar with qpAdm can decide for themselves whether a model with 20 or more diverse and relevant right pops with passing p-values is valid or not. "
Well, your Sintashta model doesn't even pass with 10 right pops. See this output. https://drive.google.com/file/d/1472lJCecEaW43AC7UIpU3nqP15oOr9DL/view?usp=share_link
Whereas with the same 10 references, the Armenia_MBA + Bmac model passes with p-val 0.36.
https://drive.google.com/file/d/1trADg0CYnjjzGLdSIIltXYU2pCoJT3Ih/view?usp=share_link
I have appended these two results on my blog article as well. I just want more rigour. qpAdm is a statistical test and is supposed to provide us with objective results, not subjective 'feels'.
Jai HanumAn!
This is my last reply to you on this issue and it is actually not for you but for those who will read this conversation between us. Further comments on this subject will be deleted. I have nothing against you as a person but unfortunately have no time to deal with irrational obsessions.
1. More robust doesn't necessarily mean closer to truth in qpAdm. A model that is in accordance with other data we have can be considered better than a more robust one that isn't.
2. 0.05 is an arbitrary threshold, 0.01 is generally good enough and Sintashta which easily gives higher than 0.05 (in fact 0.2 with even 25 right pops as in my model) is definitely one of the potential sources of the steppe ancestry in the Hasanlu_LBA_A. You certainly did not replicate my model as it is and by playing with right pops you most probably can get "non-passing" p-values (such as 0.0497 as in one of your proofs on how my model doesn't pass) for the models you propose too.
3. Unlike you I am not representing my qpAdm results as THE TRUTH but only as the most logical explanation according to my view in the light of non-genetic data we have. I MIGHT BE WRONG AT THE END. BUT THE MODEL IS VALID STATISTICALLY AND REPRESENTS A QUITE SERIOUS POSSIBILITY. THE MOST INTELLECTUALLY PLAUSIBLE POSSIBILITY TO ME.
And that is all.
Nezih,
1) Have you tried modeling Hasanlu and DinkhaTepe_B as a two-way mix of Seh Gabi C & Hajji Firuz BA? It makes more sense than using Hajji Firuz N/EC as it's an early source for Seh Gabi C.
2) DinkhaTepe_A can be modeled as Ebla + Seh Gabi C. Don't you think it fits better considering it's even more western than Sirnak BA despite their similarity?
First of all, thank you for your suggestions.
1) I think you mean Hasanlu_MBA? I have not actually, it is a reasonable suggestion. As for DinkhaTepe_B I think they both are too early. In this article I have tried to model targets with more proximal sources; this is why Hasanlu_LBA_A and DinkhaTepe_A seemed better.
2) Again, I have tried to use more proximal sources; hence I would prefer Hasanlu_MBA over SehGabi_C in the case of DinkhaTepe_A. I think this is reflecting a more realistic scenario, although not necessarily so.
Sorry to ask, but how is there such a huge incosistency in the models? I see kushan_north being capeable of going from 27% steppe up to 45% steppe?
And kurd models can go from 14% to 21% steppe?
The reason is likely the differential relationship within deeper ancestral sources. What we call "steppe ancestry" is actually a mixture of some deeper ancestral populations and if a model can supply some of it via another source then the steppe percentage might decrease. For the Kushan_North example I think the key is Tarim_EMBA1 and also some West Eurasian admixture in Khövsgöl_BA which have been used to model EasternIranic_IA.
Most probably it also has something to do with East Asian admixture in EasternIranic_IA.
ah, ok. Which of the models for the samples are more "reliable" or "accurate" then? Example talking about the kushan samples? Is 45% steppe more likely, or 25-27%???
Do kurds score 14-15% steppe? Or on average 21%?
I think the most realistic scenario is "Yaz_II > Kushan_West/Parthian > Kurd."
So YAZ_II is most accurate model, then comes kushans and then parthians???
Do kurds score around 14-17% steppe on these models and qpadm in general? The models sort of seem like that, if the parthian-kushan samples scores like 20-27% steppe
Not that Parthians after Kushans. Kushan_West and Parthian/Khwarezmian samples are quite similar genetically. So they both are plausible.
If the Yaz_II models for them are reflecting a more realistic scenario then Kushan_West samples should have around 30% steppe (EBA+MLBA) ancestry. Don't forget that DinkhaTepe_BIA_B also has some. So it will always depend on the model, but somewhere around 20% steppe ancestry for Kurds seems more realistic to me. You might want to see the table in my previous post.
with the model involving kushan north, the kurd scores 15% steppe though(with 60% aramaean). That one is more unreliable than the other models of kurds, right?
All the models with safavids and kurds scoring below 20% steppe(15-16%) are wrong/less reliable, right?
There is an sample of 'Adana 23133' mainly representing the Kurds of Turkey in the qpAdm dataset, I wonder how this sample will yield if used.
I wasn't aware of it, thanks for letting me know!