In my last Blog, I was still playing catchup in going from my original 3 sibling phasing, to incorporating my brother’s new DNA results.
Missing Principle 2 for Jon
Here is Principle 2 from the Whit Athey Phasing Paper I’ve been using:
principle 2 fix
For this, I will again use the update query.
In this case, I didn’t bother writing ‘Is Null’ under the JonFromMom column. That is because even if there is something in there, I would just as soon overwrite it, as this is such a basic principle. I only missed 481,000 rows.
second part of fix
Now that I have mom’s bases, I will go back and fill in Jon’s dad bases based on his mom bases. Those are also Principle 2 fillns where Jon is heterozygous. I don’t mind doing these updates in Access as they are so easy.
This says in the case where Jon is heterozygous and his mom has allele1, put Jon’s allele2 in as Jon’s allele from Dad. This query says if a Jon’s has allele1 from his Mom, the allele2 has to go to his dad.
So that is an easy way to update over 7,000 rows in a few minutes.
Next, On to Mom Patterns
It’s a good thing that I added these mom bases to Jon, because now it is time to look at mom patterns. From Athey:
In the next step, we use the pattern on the mother’s side to fill in as many more cells as possible. Finally, we can project the information in those newly filled cells back to the father’s side using Principle 3 again.
aaab mom pattern
AAAB Quality Control
My spreadsheet counts the numbers of ID’s between patterns.
619 is close to the cutoff that I had set. I went back to the original spreadsheet and found other AAAB patterns between the Stop and next Start. So I can combine those 2 AAAB Chromosome 15 patterns. I checked another pattern with about 700 ID’s from the Stop to the next AAAB Start. However, there was another pattern between, so those were a valid Stop and Start. There were about 45 AAAB Mom Patterns or about 2 per chromosome which seems like a lot.
ABAA Mom Pattern
The query should be similar to the previous one. If Sharon isn’t the same as her siblings, we will have an ABAA Pattern.
This pattern was easier to figure out. There were about 35 of them.
aaba Mom pattern
This is the one I should have done second if I had wanted to stay in alphabetical order. I checked a few with differences of about 500 between Stop and next Start, but they looked OK. There were a few single allele patterns.
aabb mom pattern
I have 3 criteria for this one:
I had to enter that Sharon’s allele from mom could not be the same as Heidi’s allele from mom or I would get a lot of AAAA Patterns. When I looked for these, there appear to be 19 AABB maternal patterns.
abab mom pattern
Again, this is a bit out of alphabetical order. This query is not unlike the previous one.
When I make Heidi’s mom base different from Sharon’s mom base, that gives me the ABAB pattern:
Here I have Excel on the left where I am entering the results from the Mom Patterns that I found in Access.
The jump in Chromosome 4 from position 6M to 37.9M indicates a change in pattern. That is entered in Excel on the left. The change from the previous pattern is shown as 7544 ID’s. ID’s should be the same as SNPs.
A change in Chromosome is an obvious Stop and Start:
There were about 30 ABAB mom patterns for me and my 3 other siblings. I’ve done:
abbb mom pattern
It looks like this must be the last Mom Pattern. This is the mom pattern where I show my individualism – unlike my siblings who have the same mom base:
Here’s an ABBB example:
In this case on Chromosome 9, there is a jump from position 38M to 71M. However, the SNP (or ID) count between the two is only 190. That means this must be an area where the SNPs are not counted for some reason, so I would think that I could continue the Mom Pattern through that area. However, when I look at my Access table, I see this:
Above ID 370485 is a different pattern of AABB in the last four columns. This would have come out when I merged all my patterns and I would have had to fix it then. However, I might as well get this as good as I can now. As it is, there will be a discrepancy to work out:
The AABB pattern started at 369193 which is before the ABBB Pattern stopped at 370295. This means I need to go back to the Table:
Here is position 370295 where I had the ABBB Pattern ending. However, this is a a very small pattern, going only up to ID# 370290. Before that is the AABB pattern again. Here the AABB Pattern picks up again.
Here is how I corrected my Chromosome 9 Mom AABB Pattern:
However, note that I had to break my 500 ID/SNP rule. That 51 represents the tiny ABBB Pattern between two AABB Mom Patterns.
Here is the start of the AABB pattern at 369193:
First note, that it would actually start at 369192. Before that is a single ABBB pattern. Then above that in the first row is an ABAA pattern. The first row is the end of an ABAA Pattern that I already recorded in my spreadsheet at ID# 369181, so that doesn’t need to change:
At 369190 there is a single pattern of ABBB. This will be noted in my spreadsheet, but not entered as a start/stop position.
Re-Sort the Mom Patterns by Pattern
Now I have 426 lines of Mom Pattern Locations. I need to sort them by pattern and hope there are not many weird issues like I found in Chromosome 9. I will also take out the single patterns. When I do this, I get quite a mess. Here is Chromosome 1:
Here we have quite a few nested patterns.
The first AABB pattern is a single, so I can take that out, but what do I do with the AABB Stop? It looks like that was a single also, so I can take that out.
The AAGG is between a CTTT and an AGG? which would turn out to be an AGGG. What I had previously described was a single pattern going to another single pattern within a valid non-single pattern.
Next, starting at ID# 6608 I have three starts in a row which cannot be good. Looking at the first two patterns of ABAA and AABB, they look like they could be good.
I’ll add a ‘G’ where the cursor is above and call that the end to a very short ABAA Mom Pattern.
Here is the corrected ABAA stop. I highlighted the next ABAA Stop in yellow as that will need work.
Next I’ll look at the ABBB Start at 19885. It looks like I missed the previous AABB Stop at 19884.
Time for Quality Control
Just Like Starting Over
Based on the errors that I’ve found, I will start from scratch in Part 7