DNA Phasing of 4 Siblings When One Parent Is Missing: Part 8

Dad Patterns

In my last Blog, I looked at the Whit Athey 3 Principles and used MS Access to assign bases to the paternal or maternal side for the 4 tested siblings of my family. The next step is to look at Dad Patterns. I have been doing this by querying for a pattern and then scrolling down for start and stop positions. This has been quite tedious. It occurred to me that there may be another way to do this.

MS Access Min Max Functions

Access has a function that finds a minimum or maximum value in a group. In this case the group can be Chromosome.

AAAB Dad Pattern – Access to the rescue

 

aaabminmax

To get the total line I hit the summation [totals] icon in the Show/Hide Group. This adds a Group By to each field to in the Total row. Here I looked for the Minimum and Maximum ID for each chromosome for the AAAB Dad Pattern. That is where Joel’s base from dad was the same as Sharon’s. Sharon’s base from dad was the same as Heidi’s and Heidi’s base from Dad was different from Jon’s. Here is the output for the AAAB Dad Pattern:

aaabminmaxresults

This step has revolutionized my work as it saves me from scrolling through 100’s of thousands of dad base AAAB Patterns.  This takes about 2 minutes vs. the old way which seemed like an hour.

The upside of this method is that it is fast. The downside is that it only finds the minimum and maximum of a pattern within a chromosome. It doesn’t find all the breaks in the patterns within the chromosomes.

Using this method, in a couple of minutes I have 91 Start and Stop locations for all the possible patterns – except for AAAA.

Here are the sorted results for Chromosome 1:

dadpatternchr1

Note that there are some overlaps that will need to be resolved. However, there also clean breaks such as between ABBB and ABAB. ABBB stops at ID# 19797 and ABAB starts at 19837. Also note the last line. AABA has the same Min and Max ID#. This means that this is a single AABA pattern apparently within the AABB pattern.

Looking at the Table

In this step, I’ll look at tbl4SibsNewDadPattern and use the Access Pattern Mins and Maxes to get more accurate Start and Stop points. My spreadsheet above shows that ABAA starts at ID 52. I scroll up from there:

chr1tabledadpattern

At ID# 18 I see ?AG?. I can imagine that being an ABAA pattern, so why not start the ABAA Dad Pattern at ID# 1? Out of 680,000 ID’s, that doesn’t seem too much of a stretch.

Next it seems like the ABAA should stop somewhere before ID# 6605. I’ll hasten the process by a query that looks at the case where Sharon’s base from Dad is not equal to Heidi’s Base from Dad:

abaa-stop

Clearly, there is a break at ID# 5127, so I’ll use that.

chr1dadpatternstartstop

Here, I’ve added a finer Start and Stop for Dad Pattern ABAA. What that means is that in this segment of Chromosome 1, I got my DNA from one of my dad’s grandparents as did Heidi and Jon. Sharon got here DNA from the opposite paternal grandparent.

Here is the Start/Stops filled in:

chr1dadpatternfilledin

I highlighted the 57205 as a reminder that I needed to add an extra ABAA pattern in later. There is a gap between ABAA and ABBB of 1477 ID’s where there is a likely AAAA pattern, which means the 4 siblings got their DNA from the same paternal grandparent.

Finished Start Stop Dad Pattern Spreadsheet

I took out the single patterns and re-sorted by pattern. Then I wrote a formula to get the locations in Access language:

dadpatternstartstop

Next I made a copy of my working table in Access to a new table called tbl4SibsNewDadPatternFillin. I’ll use this to fill in the Dad Patterns.

Filling in the First AAAB Pattern

In this pattern, I will be filling in all the missing ‘A’s of the AAAB pattern. I won’t fill in the B as I won’t know if an ‘A’ or a ‘B’ belong there. Here is my first update query:

aaabupdate

This says if I am missing a base from dad in any of the AAAB Pattern areas that I am in and Sharon has that base, I’ll take the base she has. I can save a little time, by adding on to that query:

joelaaabfromsharonheidi

It is important to put the second ‘Is Not Null’ and ‘Is Null’ on a separate line as that is the ‘or’ line. Otherwise, I would only get the Sharon from Dad and Heidi from Dad bases where they equaled each other.

First I run the query to make sure it shows what I want.

aaabqueryex

It does [although, see below. For one thing I missed the ID criteria in the 2nd line of criteria!]. If I had the criteria all on one line, I wouldn’t have gotten the Heidi from Dad bases where Sharon is missing a base (ID# 63) and visa versa. I will want to check my query later, so I can check it at least two ways. One way is to check at ID# 63 and 99 to see if that base was added. The other way is to see if the Update Query updates 49094 lines as that is the number of lines in the above query.

When I went to run my query, I got this error:

udateaaaberror

Before I give up on this double query, I’ll try one more thing:

heidiorsharonaaabtojoel

Here I say if the conditions I mentioned above apply give either Heidi’s base from Dad or Sharon’s base from dad to me. I note that the update is for 49094 rows, so that seems on the right track. The reason why I don’t mind doing a double query here is that either Heidi’s base from Dad or Sharon’s should always be the same in an AAAB pattern.

I ran this and now I am checking ID# 63:

erroraaab

Unfortunately, Access gave me a -1 instead of Heidi’s C Base from dad. Part of why I wanted to do the one query is so I wouldn’t have to add the 2 queries. However, instead, I’ll just add a line to my base tracker:

basetrackernew

That means that I am back to my simpler query. Sharon should add 3975 bases from Dad to my bases from Dad:

3975row

Heidi was going to add over 2200 of her bases from Dad before Sharon gave me hers. Now it is a lower number:

heidibasestojoel

Now check Line 63:

line63

My base from Dad still isn’t filled in. But that is a good thing. When I checked my double query above, it gave me areas outside the AAAB Pattern area. ID# 63 is actually a different pattern. So that is why the number was so high also. The lesson learned is to keep the queries simple.

Now I’ve updated my Base Tracker for the AAAB Dad Pattern:

aaabbasetracker

Note that the Heidi from Dad Bases didn’t go up in the second round of this query. After she had gotten her extra Dad bases from me in the AAAB region, Sharon didn’t have any extra ones to give to her that I hadn’t already.

nodadbasestoheidifromsharon

AABA Fill-in

This time Heidi will be left out and Joel, Sharon and Jon will get new bases from dad based on others from the AABA areas. This is the same simple query as before, except that the ID#’s are different:

aabafillin

Here is Jon’s first bases from Dad from one of his siblings:

jonfromdad

This brings up an interesting point. There may be cases where Jon has a phased base at a location which his DNA test didn’t cover.

AABB Fill-IN

Here there should be Bases for all siblings. Wherever there is an A and an missing A, add it, and the same for B. Again my first query is the same except for the ID#’s:

aabbfillin

On the AABB bases from Dad, Jon doesn’t have a lot to add to Heidi’s bases, but Heidi has a lot to add to Jon’s:

aabbbasetracker

abaa dad pattern fill-in

Here we start with Joel being updated with Heidi’s bases from Dad because Sharon is the lone B.

abaa

There are more rows updated as the ABAA Dad Patterns had more regions than the other patterns.

In my last update query, I made a mistake:

jonfromjoelmistake

I’m not sure if it makes a difference. I said that in the case where my base from Mom is not null, give Jon my Base from Dad where he doesn’t have any. To check, I run the correct query:

abaaquerymistake

This shows that there are still 2063 bases that didn’t get added to Jon from my bases from Dad. I will add them now. Plus I will add that number to the previous 29113 bases I added to Jon’s bases from Dad from my bases from Dad.

abaatracker

As there were 3 siblings the same in this pattern, I again took 2 rows to add the bases to the table.

ABAB Dad Pattern Fill-in

ababtracker

Jon now has more bases phased than he had tested on his paternal side. He already had more than he had tested on the maternal side.

ABBA and ABBB Dad Pattern Fill-ins

basetrackersummary

As expected, Jon made out best in this Pattern Phasing.

Mom Bases From Dad Bases

This is the part of the project that seems ironic. My dad who wasn’t tested for DNA is now supplying bases to his children that were from their mom. Here I’m looking for where the siblings are heterozygous. In those cases where there is now a Dad base from the patterns and a mom base is missing, we can fill it in.

First, I am making another copy of my table called tble4SibsNewMomPatternFillin.

Here is my first Mom from Dad Update Query:

joeldadfrommom

It says where I am heterozygous and my Dad base is my 2nd one put my first base in as the base I got from Mom, but only if she doesn’t already have a base there. The last part is just an extra precaution so that I don’t overwrite anything.

In the next query, I just reverse the Joelallele1 and 2 to get 12,000 more rows of phased DNA:

momfromdad2

Summary of Mom Bases from Dad Bases

trackermomfromdad

Check the numbers

I have been adding up the rows added. But now I will check my table to see of the Total Bases Phased added up. And the answer is:

countfromtbl

The numbers are pretty close. The above Heidi from Dad is higher than my tracker. I’m guessing the table sums are correct and mine are a little off. The means that Heidi’s paternal phasing should be a little lower.

Part 8 Summary

  • The use of MS Access Min and Max functions to get Dad Pattern starts and stops saved a lot of time
  • It still takes time to verify those starts and stops
  • The Base Tracker makes it easier to track the numbers and the process. It is also interesting to see how the % phased goes up with each round of updates
  • I wasn’t expecting the numbers from my base tracker and actual updated bases to reconcile perfectly, but most of the numbers did. It is possible the discrepancies are from the 2 minor errors I made and tried to correct along the way.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *