In my previous Blog, I started to phase 5 siblings based on their raw data and the raw DNA data from their mom. I looked at homozygous results. That is, when each sibling had the same allele, it meant that they got one of each of those same alleles from each parent. Also when my Mom had homozygous results, say GG, she had to have given one of those G’s to each of her children in that location.
I am using an Athey paper on Phasing from 2010. I looked at his first 2 principles in my previous Blog. Here is Principle 3:
Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heter
ozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base.
Heterozygous is a fancy term meaning two different alleles. This principle also lends itself to MS Access, but it requires a few more steps. In my case, the known contributor is my Mom. So in the case where my Allele 1 is different from my Allele 2 and I have an allele from mom. My allele from dad will be my other allele. I just have to make a formula out of that. It sounds like a high school math word problem.
First, I copy my homozyous allele from mom table to a new table. This is in case I make a mistake and have to go back to my previous table. I’ll call my new table, ‘tbl5SiblsHeteroMomtoDad’. Again, I’ll use an Update Query, to update the table with the new ‘from Dad’ alleles. There shouldn’t be an allele from Dad in any of these situations, as we have only put those in where the children were homozygous.
I used the Access Expression Builder to get my heterozygous results:
Here is the second part of the criteria:
This part says that where I’m heterozygous, and my allele from mom was allele1, put allele2 in as from Dad. Before I run this, I presently have 485,834 alleles from Dad. When I go to run the Update Query, I get this message:
After I run the Update Query, I now have 533,517 results. This is the same as 485,834 plus 47,683, so I assume that I am on the right track. I next have to run this one more time for myself for the case when my allele from Mom is allele2 and my allele from Dad would then be allele1. Then I will run this eight times for my four siblings.
5 Phased Sibs Update: V1 and V2
I did all my Principle 3 phasing and here is the update:
What is a little surprising is that Jon and Lori who were tested as AncestryDNA V2 had more Mom-phased alleles. I did mention above that they were getting extra phasing on SNPs that they hadn’t tested from their mom, but I didn’t realize how much.
I mentioned in my previous blog that the combined number of SNPs tested between V1 and V2 is 942,269. That number represents the merging of V1 and V2.
Also some of the specifics are a bit off. For example, my numbers include phased results for myself from my dad (16,536) on the X Chromosome. Well, I didn’t get an X from my dad. This means that the JoelfromDad and JonfromDad numbers above are a bit high.
Next up: DNA patterns