Raw DNA Phasing of 4 Siblings Using One Parent’s DNA: Part 5

In my past 4 Blogs on the topic, I have started the phasing of my siblings raw DNA using my mother’s raw DNA. I used Whit Athey’s applicable paper on the subject, MS Access and have checked my results with the work that M MacNeill’s similar analysis of my raw DNA. I started out using 3 siblings in the analysis. Part way through, my brother’s results came in, so now I am looking at 4 siblings.

I parted a bit with the Whit Athey analysis in that where he went to a visual analysis, I decided to look for the change points in the data. I then used those points to perform an Access query to update the various patterns found. When I left off my last Blog, I had just located Starts and Stops for the 7 different paternal patterns for the 4 siblings.

Sibling Patterns in an Excel Stacked Bar Chart

Today I was getting a headache trying to find a way to put the paternal patterns information into Excel in a Bar Chart. Here is the best I could do for the first 2 Chromosomes:

pattern-bar-chart

The spreadsheet data format is above. I chose Stacked Bar Chart. Then I had to transpose the row and columns. The slight glitch is that I had to create an extra duplicate pattern when that occurred to get the results in one bar per chromosome. I used the end point for each pattern. The bar assumes the start is at zero for each chromosome which isn’t totally accurate, but close enough, I suppose for a bar chart. The bar chart is meant to represent all the paternal changes in patterns for me and my 3 siblings.

When I check the change in patterns to the number of crossovers in the work of M MacNeill, it appears that I have missed a pattern change on each of the above crossovers. Hopefully, I will find them as I go through the process and re-check my work. I guess I’m batting 2 out of 3 now.

Finding the Two Missed Paternal Crossovers

It is possible that they aren’t missing at all. Perhaps in all the work I did to represent the information in an Excel Chart above, I misrepresented the work I had done. Here is my spreadsheet for Chromosome 1:

chr1startstop

Here is what M MacNeill has for Joel, Heidi and Sharon’s paternal Starts and Stops on Chromosome 1:

macneillstartstopchr1

It looks like MacNeill has 6 paternal starts and stops. I don’t count the last one as that goes to the end of the Chromosome. Again, I run into the conversion issue between my Build 37 and MacNeill’s Build 36 work. Here is what happen when I put the approximate crossover locations side by side:

macneill-checkchr1

This shows that we both have 6 crossover, which is good. It gets a bit confusing. Note that I had to add a crossover at my position 23289397. That is because there was a gap. That is the gap where the 4 siblings must match the same paternal grandparent. Normally, there shouldn’t be any gap between the Stop from one pattern and the Start of the next one. So it turns out I’m doing better than I thought. That is encouraging to know. For the last pattern, I don’t have an entry, because the crossover is at the same spot as the Stop of the previous pattern.

However, I am comparing my 4 sibling work to MacNeill’s 3 sibling. Also MacNeill had a start of 742429 and mine was higher. That means that there must be a pattern between 742,429 and 1,062,638. I checked, and there aren’t many extra locations there. I suppose I did as well as I could do. I do wonder where Jon’s Chromosome 1 crossovers are, though. Perhaps he has a double crossover with another sibling or one that is very close in location to another sibling.

Gedmatch Check

Here is how my 3 tested siblings and I look compared to each other at Gedmatch in the browser:

chr1-4sibs

The lines don’t match up perfectly, but I have for 3 crossovers for Sharon in a row. I am J, my brother Jon is F and only has one crossover of his own. These are the combined paternal/maternal crossovers. When I map it out using a visual method, it appears that Jon may have no recombination in Chromosome 1.

jonvisphasechr1

If that is true, it would make him a good candidate for finding Frazer or Lentz relatives at Chromosome 1.

assigning Paternal crossovers for Chromosome 1

Assigning crossovers is getting a little ahead of myself, but I would like to see if I am on the right track.

assignxoverchr1

Here the Dad4Pattern represents Joel, Sharon, Heidi and Jon. There appears to be a logic to assigning these crossovers that I have in the XSib column. The first crossover I have going to Sharon. That is between the first Stop and the second Start. Sharon’s B in ABAA goes to an A in the AAAA homozygous region. That means all siblings match the same paternal grandparent in this region. The next crossover goes to me as I’m represented by the A in the ABBB pattern. The 3 other siblings remain matched to each other. Then Heidi gets the next crossover as she goes from matching Sharon and Jon to matching me. The next to the last crossover, I had as Jon. But it has to be Heidi as she went from matching me and Sharon to matching Jon. If Jon was the one that changed to matching Heidi, the pattern would have gone to AAAA. Likewise, the last crossover I had as Heidi, but it has to be Joel. I went from matching Sharon to matching Heidi and Jon.

There are a few cross checks to the method. One is to check to see what MacNeill has done. Another way is to check to see known matches. I noticed above that Jon had matches to known matches on my paternal grandmother on either side of the wrong crossover that I had assigned to him, so that was likely not a good crossover. Another note is that there is a least one other homozygous region. That is between the 191M stop and 192M start above. That means that there should be an AAAA pattern stuck in there, but it is not necessary to know at this stage.

Time to Push the Button: Back to Phasing by MS Access

A lot of the above work was to make sure that I had the right number of crossovers in the right places. I was worried that if I didn’t, that I wouldn’t be applying the right rules to the right areas of the spreadsheets.

First aAAB

Here are my AAAB paternal Patterns with start and stop in Access language:

aaabstartstop

Here are some examples of fixes that are needed within these AAAB areas:

aaabexample

Basically, if there is a blank in the first 3 positions, it should be filled in by the non-blank in that area. But how do I write that into a formula? Here is one way:

update

This says if Joel’s Dad base or Heidi’s Dad base is null, put Sharon’s value in. I ran that and it updated 2165 rows.

Next:

aaabfillin

 

This time only about 1400 rows were updated. The last time we fill in Heidi’s value if Joel or Sharon had a missing Dad base.

407rows

I’ll check my work. I see a flaw in my logic already. I shouldn’t have put the 2 ‘Is not null’s’ on the same line and Access sees that as an and. I wanted an ‘OR’, so they should’ve been on separate lines. Here is the revised query putting Heidi’s Dad base in the empty spot of Sharon and Joel.

aaabrev

 

Note that I had to put the position criteria for the Paternal Patterns in twice also. See, I had missed 4121 rows. I went through this with the 2 other siblings.

AAbA Paternal pattern

aabappatternstartstop

The AABA also has potential for filling in.

aabaexample

In the first line there is ?T??. In and AABA pattern, we know that the first and last position will also need to be T. In the second line, we don’t know what to fill in. In the 3rd line we can put a C in the last column.

My ID locations for AABA look like this:

idsaaba

The queries will be similar to last time except that they will involve Joel, Sharon and Jon and leave out Heidi.

aabaqueryfillin

This was a more popular fill-in. In the query above, if I had a Dad Base and Sharon and Jon didn’t it went to Sharon or Jon. I then did the same thing for Sharon and Jon. Here I check my results.

checkaaba

These are the same ID Lines shown as above before I did the query. This now shows that Joel, Sharon and Jon have the same bases for this AABA Pattern. This is even true when we don’t know the base Heidi has on her paternal side as in the first row.

aabb pattern

aabbpattern

Here is what an AABB Pattern area looks like before I fill in the bases:

aabbpatternarea

The rule is if Joel or Sharon or Heidi or Jon have one base and the other is missing, fill in the missing one with the one that is there. However, as in the second row, where Heidi and Jon are both missing, nothing may be filled in. This will take a little thought. Perhaps I can do this in 2 steps:

aabbfillinquery

This says if I have a base AND Sharon doesn’t, give her my base, then also do the same and fill in Heidi’s base to Jon’s missing base from dad. This query filled in a little less than 20,000 bases with the push of a Run button. Then I’ll do the opposite:

aabbquery2

This time Sharon’s base goes to me and Jon’s goes to Heidi. I’ll check good old ID 45494.

aabbcheck

It looks like I filled in what I wanted to and didn’t fill in what should not have been filled in.

The other combinations will be variations on what has just been done. Either 3 will match each other and one won’t or there will be 2 pairs that match each other within the pair.

abaa Fill-in

This is the first pattern of my siblings’ 1st Chromosome.

abaaquery

Another ho-hum 20,000 rows filled in.

Here Heidi fills in Joel and Jon:

abaa2

The updated rows go down the 3rd time I run this.

abaa3

abab Dad pattern

This will be where Joel and Heidi match paternally and Sharon and Jon match.

ababquery

Jon is probably missing a lot of bases due to being tested on with the Ancestry Version 2.

ababq2

abba

abbaq1

This query says in the ID areas where there is an ABBA pattern put Jon’s dad base into Joel’s missing dad base area and put Heidi’s dad base into Sharon’s missing dad base area in the table called tbl4SibsPPatternFillin.

joelsharonabba

Here, I made a mistake. Note that I had Access overwrite a bracket “]” that didn’t get erased. That means that I will have to run this query again to get my bases back from Jon. Here is what the above Update Query did.

mistake

Fortunately, Jon still has the bases that I gave him. I’ll redo the query to get my bases back.

fixqurey

This query will fix my error. It says if I have an end bracket as a base, fill it up with what Jon has.

fixresults

abbb – the last paternal combo

This time I won’t touch my bases, but make sure that Sharon, Heidi and Jon match.

abbbq1

heidiabbb

jonabbb

So that should have filled in all the paternal patterns.

Finding the AAAA ‘Patterns’

This should be a little trickier. Previously, we had identified one AAAA pattern in Chromosome 1. This can be seen between 19 and 23 below. All the paternal areas are orange.

jonvisphasechr1

There is no other area on this Chromosome that is all orange or all green for all siblings. However, how do I identify all the other quadruple A patterns? It is not as easy as the other patterns because this pattern may occur within other patterns. I could make a chromosome map for each chromosome as above, however, it becomes a chicken and egg problem. It would be nice to know the AAAA areas so I could draw the map.

Here is a spreadsheet where I checked the number of IDs from the Start to the previous Stop.

startminusstop

When the amount was more that 500 IDs, I highlighted that number in yellow. Above between the Stop of ABAA and the Start of ABBB on Chromosome 1, there was an AAAA pattern for 1478 position numbers.

The next yellow area is in Chromosome 2 which is a larger region of AAAA
pattern.

Here is an interesting situation:

chr6and7

This yellow area is above the amount I chose as a minimum of 500 positions. However, as I look at my worksheet, I see that the ABBA pattern extends beyond ID# 285124. So I will do a new query based on the new fill-in table. Here is the new ABBA:

newabba

newabbaresults

This shows that the ABBA pattern goes to the end of Chromosome 6. I can fill in the extra letters by hand and adjust my spreadsheet.

However, what about Chromosome 7?

Chromosome 7 appears to have an AAAA pattern for about 847 ID#’s. This is how MacNeill mapped my Chromosome 7.

chr7macneill

He would have the ABBB Paternal Pattern with me being the A. This is how I had visually mapped Chromosome 7:

chr7joel

These end pieces are difficult where there is a half identical region. I will stick with my as I do notice a small match with my Hartley-related 2nd cousin Pat:

chr7joelpat

This may become more clear once my brother Jon is mapped out. In fact Jon is Fully Identical with me in that region:

jonjoelchr7

Jon also matches cousin Pat in that same spot:

jonjoelchr7pat

Ergo, I must match Pat aka Hartley DNA at the first part of Chromosome 7.

Here is Jon mapped out no Chromosome 7:

chr7vismapjon

Jon (F) and Heidi (H) got a full dose of Hartley DNA at Chromosome 7.

That was a bit of a long exercise, but the intention was to prove to myself that an AAAA pattern of over 500 positions (or my ID#s) is a valid AAAA Pattern.

Filling in the aaaa’s

As I have now convinced myself that this small area was indeed an AAAA area, I can proceed. I made a formula in Excel that takes the other Patterns’ Stops and Starts and puts them into Access language.

aaaagaps

The formula adds an ID# to the beginning and subtracts one from the end so the AAAA patterns have their own range.

 

Inspecting my work

Having found a pattern boundary that was off at the end of Chromosome 6, I will check the other boundaries. According to my spreadsheet, the first AAAA should end at 6604.

aaaaspreadsheetchrq1

The actual Access Data table is different by one:

chr1correction

That mean that the I need to add an ‘A’ to the missing space and change the start of the ABBB Pattern from 6605 to 6604 – a pretty minor change. I made a few more minor changes. However, I’ll hold off on making the AAAA pattern changes for now. That is in case the boundary changes again due to other changes I’ll be making.

Filling In Mom Bases From Dad Bases

This is about how far I got last time when I was trying to phase 3 siblings. My interpretation of this portion of the process is to look at the heterozygous siblings. Where they have a new base on the Dad side, they will know that the other base goes on the Mom side.

Finding heterozygous siblings

First I made a new table to put the new information in. It is just a copy of my last table of the fill-ins based on patterns. Here is a query just to find the alleles for each sibling that are different from each other:

heterozgous4sibs

Here is the Update Query. I better get it right as it is doing a lot of things:

updatemomfromdad4sibs

The first part has the criteria that makes a person heterozygous. I forgot to make sure that the mom base was missing, so I need to add an ‘is null’ phrase:

4sibheteromomfromdadrev

This may not be necessary, but just makes sure I am not overwriting anything that is already there. So when mom’s base is missing add the base that isn’t the dad base. Or more specifically, add allele2. This changes 39,260 rows.

Next to get the opposite effect, I change most of the alleles 1’s to 2’s and the 2’s to 1’s.

otherallelemomfromdad4sib

That changed over 10% of all the results. To check, here is a query from the older un-updated table showing just my results where I’m heterozygous and my allele1 was from Dad:

qryoldtable

Here is the updated table.

updatedtablemomfromdad

The G, C, C, G was added as a base from my mom – along with 10’s of thousands of other bases.

Summary

In overview:

  • Principal 1: I’ve added the homozygous sibling results. This says a double base for a sibling means that they got the same base for each parent.
  • Principal 2: I forgot to add the homozygous mom results to Jon. I’ll do that in the next Blog
  • Principal 3: This is for heterozygous siblings. When one base is known for a parent and the other parent base is missing, the other base is assigned to the other parent
  • Next I looked at the paternal patterns and made note of where they changed
  • For each paternal pattern region I filled in the bases that could be filled in based on that pattern
  • Then based on that new information, I filled in more missing mom bases from the dad bases in areas where the children were homozygous. This is Principal 3 reapplied.

 

Using Triangulation Groups to Map My Wife’s Chromosomes

I would like to update the Chromosome Map I have for my wife. The one I have now looks like this:

marie-cmap-old

This map is based on programming by Kitty Munson Cooper. It doesn’t look too bad. It only has 3 colors: 2 blue colors for her dad’s side and one color for her mom’s side. The red is based on the results from her 1/2 great Aunt. The blue is based on paternal grandmother cousins.

Here is Marie’s family of DNA tested relatives:

marie-relationship

From bottom left to right we have the following that have had their DNA tested:

  • Fred, Fred’s sister
  • Pat, Buddy
  • 1st cousin John
  • 2 Paternal Aunts
  • Dad and Mom
  • Aunt Esther
  • In addition I have results from a Dicks DNA study

The Rule of 1st Cousin, 2nd Cousin Combo

In my previous blog, looking at my mother’s side DNA, I came up with a rule. That rule said:

In a triangulation group between a person’s 1st cousin and a second cousin, the second cousin will be able to identify which grandparent the 1st cousins share.

I would like to apply this rule to my wife Marie as she has 1 first cousin and 2 aunts who have tested their DNA. These 3 are like cousins as the common ancestor of grandparent are the same. Marie also has 2 first cousins once removed tested. These would be similar to 2nd cousins as they both have great grandparents in common.

mariepaternalrelationships

Basically, right now if Marie compares herself to John or her 2 Aunts Lorraine and Virginia, she doesn’t know if the shared DNA is from Estelle LeFevre or Edward Butler. However, a triangulation group (TG) with Fred, Fred’s sister, Pat or Buddy and John, Lorraine or Virginia, will show that DNA to be from Estelle LeFevre. Further, not just the match in common to the TG will be from Estelle, but the entire segment represented by Marie’s match to John or her 2 Aunts will be from Estelle.

That’s My Theory, Let’s Try It Out

I have a boatload of combinations to try this theory out on. First, I’ll go with Fred, Fred’s sister, John, Marie and her 2 aunts. First I go to Marie’s one to many menu at Gedmatch and I choose Marie’s relatives I just mentioned. Then I choose the Matching Segment CSV. This downloads a file of all the matches between these 4 people, making it easy to find TGs. I could have used the Chromosome Browser but that only hints at TGs. However, I will use the Chromosome Browser to focus my search.

Chromosome 14 example

chr20ex

The browser show’s Marie’s matches to:

  1. Aunt Lorraine
  2. Cousin Pat
  3. Cousin Buddy
  4. Aunt Virginia

Here is how I have the Triangulation Group (TG) beween these 5 mapped out:

patbuddytg

This shows a Triangulation Group (TG) between Pat, Buddy, Aunt Lorraine. Aunt Virginia and Marie.

Now a few observations:

  • The chromosome browser view above is from Marie’s point of view
  • Marie’s matches with Pat or Buddy (#2 and #3 on the browser) represent the DNA they share from either Martin LeFevre or Emma Pouliot. It is also likely that one segment is shared from each of Marie’s great grandparents.
  • These segments are represented in the Kitty Munson Cooper Chromosome Map at the top of this Blog.
  • The long segment shared between Marie and her Aunt Lorraine is from one of Marie’s grandparents. Because Pat and Buddy also match Aunt Lorraine, we may say for sure that the segment Aunt Lorraine shares with Marie must have come from Aunt Lorraine’s mother Estelle LeFevre.
  • Marie’s DNA she got from her grandmother Estelle is shown below.

munsonmaprev

The previous map had 2 blue segments on Chromosome 20 representing either of Marie’s paternal grandmother’s parents. We didn’t know which. Now it shows the one large segment taking up all of Chromosome 20 from her known paternal grandmother. The green should say Estelle LeFevre b. 1904 – not Emma Pouliot b. 1894.

chromosome 15

On Chromosome 15 here are the same people, but in the following order: Aunts Lorraine and Virginia, Pat and Buddy.

mariechr15

kittymarie15

Interestingly, this time the program doesn’t overwrite the light blue. This is because the match for the light blue extends further than the match for the green. When I mouse-over the original map, it shows that the light blue match starts at about position 34 while the green match starts at about 35. Because of this, the entire blue match shows until it’s end and then the green match is shown.

This blue, light blue, green progression represents 3 generations of Marie’s ancestors on her paternal grandmother’s side.

Paternal Grandmother Results Using 1st Cousins, Once Removed

Here are the results of comparing Marie’s cousin and two aunts to her two 1st cousins, once removed. Here I correctly have Estelle LeFevre  b.1905 labeled for the green areas.

mariepatbuddychromomap

 

I didn’t bother doing the comparison for Marie’s X Chromosome. The reason is this. The X Chromosome that her dad gave to her, he got from his mom. That means that the green must extend for the whole X Chromosome. For that matter, the light blue would also be Marie’s paternal grandmother’s parents.

How to Identify Emma Pouliot?

That seemed to work well for Estelle, but is it possible to be go back one generation further and identify one of Marie’s great grandparents by DNA? I think so. Let’s take a look. This time, I don’t want to look at Marie’s 1st cousin John or her 2 aunts. The reason for that is that when I compare Marie to those 3 people, the common ancestor would be Marie’s grandparents. I want to compare Marie to her 2 first cousins, once removed to find her great grandparents – or in this case her paternal grandmother’s mother Emma Pouliot b. 1874 in the Province of Quebec.

tgchr1

We are using the same principle as before, but going one rung up the ladder. I will look for a Triangulation Group (TG) between Fred, Fred’s sister, Pat, Buddy and Marie. Once I find that TG, I will take the DNA match between Pat or Buddy and Marie and assign that DNA match to Emma Pouliot.

Chromosome 1

Let’s try this out on Chromosome 1:

pouliotlefevrechr1

  1. Fred’s sister (2C,1R)
  2. Fred (2C,1R)
  3. Pat (1C, 1R)
  4. Buddy (1C, 1R)

It looks like there should be an overlap between #1 and #3, but they have no match there in the middle of the Chromosome. However, on the right side, there is a match between #1 and #3. Using my plan, I’ll assign Emma Pouliot to the green segment. In this case, #1 and #2 representing the parents of Emma Pouliot are larger. It would stand to reason that these would belong to Emma also. However, for consistency, I will just map Emma to the green segment.

When I tried to map this using the Kitty Chromosome Mapper, it didn’t show up as Estelle had already filled up that slot.

Chromosome 2

chr2buddyfred

This time the two 1C’s, 1R are on the top and the smaller segments representing Marie’s two 2C’s, 1R are on the bottom. Is there a TG? I lowered the gedmatch thresholds, which I didn’t do for the first part of this Blog. Here is the match between the 1C, 1R and the 2C, 1R:

gedmatchchr23

They match on Chromosome 2, but a little below the 7 cM threshold. I’m not worried as I’ve read that in a TG a match is likely to be good down to 5 cM. That means that I will map Pat’s #1 green segment to Emma.

Unfortunately Estelle is taking up the space where Emma would be mapped on Kitty’s Mapper. This seems to be a trend.

Chromosome 13

I did the same exercise as above and mapped with no results. This time I took out the other references in the area of Chromosome 13 that were blocking Emma and got this:

emmachr13

Now we see Emma’s DNA in lighter green on Chromosome 13. The downside was that I took out some of Estelle’s DNA to the right of the light green area so Emma’s DNA match with Marie would show. Hey, I created this map; I can do what I want with it.

So that is what I found. My wife can claim hold to a lot of her grandmother’s DNA, but only 3 identified segments of her great grandmother’s DNA based on this procedure. Of course, one may say that every instance of finding the parents of Emma would be the same as Emma. Based on that idea, I’ll try another map.

emmaestelle-map

This map isn’t really any better, it is just meant to show that whether you have the parents or the child, it fills up the same area on the map. Note I have the same problem here where Estelle fills up the older Emma DNA on Chromosomes 1, 2, and 13.

Marie’s Dicks DNA

The idea for this section should be more straightforward. I have been involved with a Newfoundland Dicks DNA project. There are many people who have tested their DNA and found through triangulation to be likely related to the Newfoundland Dicks family. For example, here is a list of the Dicks Triangulation Groups (TGs):

Dicks TG Summary

These include the Dicks TGs except for the most recent few. Joan is near the middle of the chart. She is my wife’s mother. All I have to do is see if Marie is in any of the same TGs that her mother is in. Then I can take the match with the other 2 from the TG and assign that DNA to the appropriate Dicks ancestors.

Here is what was added (in yellow):

mariechromomap

All that was added was a probable Dicks segment on Chromosome 2. There were other Dicks segments but they were “behind” Upshall matches. That means that they are the ancestor of Frederick Upshall. The reason that the Chromosome 2 match stood out was that it was a match with Joan (Marie’s mom) and not with Marie’s great Aunt Esther (represented in red above).

Check Your Work

Fortunately, M MacNeill [prairielad_genealogy@hotmail.com] has looked at my wife’s family’s Chromosome 1. He has looked at the raw DNA which is more under the hood than what I am doing. Here is a small portion of his work. He phased Marie’s father and 2 aunts and then went back and put that information into Marie’s DNA.

macneillchr1marie

The interesting thing about MacNeill’s map is that it includes the DNA for Marie’s 4 paternal great grandparents. The cross-hatched area is where it was not possible to determine the crossover point. At any rate, MacNeill points out some errors in my Chromosome mapping for Marie. He has sections of salmon or pink indicating Richard’s paternal grandparents where I have Marie mapped to Richard’s maternal side.

This is when I go back to my spreadsheet for the details:

mariechr1notg

In the first part of Chromosome 1, it is clear that Marie does not match Pat, Buddy, Fred, or Fred’s sister, so I cannot call that a TG or a Paternal grandmother match for Marie. My original rule said that Marie had to be in a TG for my segment extending plan to work.

Here is where I removed 2 paternal grandmother segments on Chromosome 1:

mariechr1rev

However, on the right of Chromosome 1,  MacNeill has more paternal grandfather DNA mapped where I again have paternal grandmother. In my defense, this was an area where, according to MacNeill, Fred and Fred’s sister appear to match on both the paternal grandmother and grandfather side. I couldn’t have known that as I only had information for the paternal grandmother side.

One other point on Emma pouliot

emmaphoto

Above, I had mapped Emma Pouliot to Marie on Chromosome 1:

emmamappedsegments

Here is a larger view of what MacNeill had for Marie’s family’s Chromosome 1:

richard-chr1

The legend on the top line is difficult to read, but Pouliot is the darker red. More specifically, that would be Emma Pouliot. Marie is on the bottom line. The last vertical white line in Marie’s dark red area represents position 198. As I had mapped Emma from 197 to 207, that would put her in the end of the dark red area of Richard’s Pouliot maternal grandmother, before Marie’s DNA switches to the DNA she got from her dad’s paternal side in the salmon color. So at least my work agrees with MacNeill in this little area.

Summary and Conclusion

  • Most of the additional segments came by phasing the unknown grandparent using the 1st cousins’, once removed shared DNA
  • This method could work well along with the visual chromosome mapping that Kathy Johnston developed.
  • There is a fine distinction with mapping the DNA of one’s known grandparent and mapping the DNA of the parents of that known grandparent. When mapping to the parents, the individual segments could be from either parent. When mapping to the known grandparent, that larger segment could contain compound segments of the parents. It is a subtle distinction, but one that should be maintained in my opinion for future research.
  • Using the Kitty Mapping tool is fun and instructive as to how DNA works. It can be manipulated to show what one would like to be shown. For example, when I wanted to highlight the Emma Pouliot segment, I was able to do that.
  • Even with no paternal and maternal grandfather DNA matches for Marie, I have been able to fill out her Chromosome map quite a bit – mostly on her paternal grandmother side.

More Lentz/Nicholson DNA and the 1st Cousin, 2nd Cousin Combo Rule

A little over a year ago I decided to test my autosomal DNA at 23andme. I had tried the other 2 testing companies and was curious as to what 23andme was like. Perhaps I would have some more matches that I didn’t know about. The most interesting match that I found was my mother’s 1st cousin once removed. Her name is Judy. I was asked  her if she would  upload her results to gedmatch.com for analysis. She tried a few times without success. Recently, she went back and successfully uploaded her results, so now I can write about them.

Lentz/Nicholson Lines

Judy descends from our common Lentz/Nicholson Line. Others that I have been in touch with and have tested for DNA are just from the Nicholson Line. The Nicholson Line is in red. The Lentz line is in yellow. The Lentz/Nicholson Line is in orange. From my early school days, I recall that if you mix yellow and red, you get orange. Judy and Joshua are on the orange line. My mom shows as green, but for the purposes of this Blog, can be considered orange also.

lentznicholschart

The bottom row indicates people that have had their DNA tested. There is also a further out line of Nicholsons that I don’t have included here.

I haven’t identified anyone yet who is only from the Lentz Line.

Here is Judy’s match with my mom at Gedmatch:

judymomgedmatch

comparing cM’s for first cousin once removed

Their total match of 269 cM is actually on the low side for 1C, 1R. Here is a Bettinger study showing the average DNA shared between 1st cousins, once removed as being in the 400 cM range:

bettinger1c1r

Not to be outdone by Blaine Bettinger, I also looked at some of my own family relationships to see how they compare:

joelcmstudy

So with just 8 people, I came to the same conclusion on the average amount of DNA that 1st cousins, once removed shared. Blaine took thousands of people to come to his result. Another side point of interest is that my brother Jon shares over 150 cM more with my dad’s first cousin (583.7 cM) compared to what my sister Sharon shares with my dad’s first cousin (421 cM).

Chromosome Mapping for Mom

Judy’s new DNA results update my mom’s Chromosome map in many of the red areas below:

momchromomapoct16

More About Judy’s DNA

Based on the tree, we can see a few things.

lentznicholschart

  1. If Judy matches Joan or Carol, that means the DNA has to be from the Nicholson side.
  2. If she matches my mom and no red people, then the DNA could be from Lentz or Nicholson.
  3. If Judy matches just Joshua, the DNA could be from Lentz, Nicholson or from the wife of William Lentz.
  4. If she matches my mom plus Joan or Carol, the match would be from the Nicholson side. If Judy matches Joshua plus Joan or Carol, the same should apply. However, this would have to be a triangulation group.
Judy’s Nicholson (or Ellis) DNA

William Nicholson

Here is an example of Judy’s Nicholson DNA. She matches both Carol and Joan who are not descended from the Lentz family.

judynicholson-match

These 3 are also in a triangulation group (TG) which means they match each other on Chromosome 13. Here is what that TG looks like on a family chart:

judytgnicholson

The same segment of DNA from Chromosome 13 has come down to these 3 women. We know that the DNA was from either William Nicholson or Martha Ellis but we don’t know which. So when I said that this was her Nicholson DNA, it could really be either Nicholson or Ellis DNA – but not both.

In addition, like the next example below, Joan and Carol can know something else. They can know that the 51.4 segment that they share on Chromosome 13 is with Carol’s grandmother, Nellie Nicholson and not with Nellie’s husband. Before this match with Judy, they wouldn’t have known this.

a fine distinction on the Nicholson DNA

Here is an example of case #4 above where Judy matches both Carol and my mom, forming another triangulation group on a portion of Chromosome 18:

tg18

tgmomjudycarol

Again Mom, Judy and Carol all Share this specific segment of either William Nicholson or Martha Ellis. There is something else interesting about this chart. Judy and mom share that same DNA from Ann Nicholson. Usually when Mom and Judy match, they wouldn’t know from which of the couple the DNA came from. In this case their Chromosome 18 match came from Annie Nicholson.

That means Judy and my mom could assign that part of their DNA to Annie Nicholson. Also I could modify the Chromosome map for my mom that I did earlier in the blog. I think that I will do that.

chromomap18rev

On Chromosome 18, what I had as red is now in yellow. That means that the information is more specific. Interestingly, the orange on that Chromosome would also be from Annie, but because of who was matched to get to that, we say that it would be from one of Annie’s parents. It gets a little confusing. So at the point where the bar goes from yellow to orange, we are seeing further into the past when we see the orange part.

The practical part is that whenever someone matches my mom’s maternal side on that portion of Chromosome 18, she will know that it is a Nicholson (or Nicholson ancestor) match and not a Lentz match.

What about me?

I wonder if I share any of this Annie Nicholson DNA. Here is how Judy matches my brother Jon and 2 sisters Sharon and Heidi on Chromosome 18:

judychr18

Below is a chromosome map that I updated now that my brother’s  DNA results are in. This indicates the DNA that my 3 siblings and I got from our 4 grandparents. The maternal side is in orange and green and the paternal grandparents are shown in purple and blue. My brother Jon’s yellow match with Judy above is within the orange area of the bottom F bar below. Sharon’s green bar match with Judy above corresponds to the second orange segment below on the S row. Heidi’s blue bar match above corresponds to her second orange (Lentz) segment below on the H row. I match my mother’s father’s Rathfelder side for most of Chromosome 18. That is shown in green in the 4th bar below (J row). So I didn’t inherit any Annie Nicholson DNA here where my 3 siblings did.

chr18maprev

This method maps to our 4 grandparents, so Nicholson is not shown. Annie Nicholson is one of my 8 great grandparents. However, we now know that two of my sisters’ and my brother’s orange bars came from our great grandmother Annie Nicholson by way of her Lentz daughter.

Judy’s Lentz (or Nicholson) DNA

Speaking of Annie Nicholson, here she is with her husband Jacob Lentz:

Jacob Lentz

Below is another triangulation group from Chromosome 1 that Judy is in with my mom and Joshua:

judymomjoshuatg

Here is the family chart again:

tg1chart

This time the DNA may be from either Jacob Lentz or Annie Nicholson – but not both. This same segment of DNA came down 2 generations to my mom, 3 to Judy and 5 generations to Joshua. We might guess that this is Lentz DNA. That is because there are no Nicholson only matches there, but we don’t know for sure.

The Rule of the 1st and 2nd Cousin Combo

In two of the examples above, there was a 1st and 2nd cousin combo – including a triangulation group.

In the first case, Carol and Judy are 1st cousins, once removed. As such, they couldn’t tell which grandparent’s DNA that they shared (Nicholson or Nicholson spouse). Enter my mom as Carol’s 2nd cousin. She is further out relationally and they match on the Nicholson Line at Carol and mom’s great grandparent level. This identifies Carol and Joan’s DNA as coming from the Nicholson side. How is this helpful? Now anytime that Joan and Carol match someone on that same segment, they will know that the match has to be along the Nicholson Line going up through the Nicholson ancestors. This narrows down the possibilities a lot.

The rule: In a triangulation group between a 1st cousin and a second cousin, the second cousin will be able to identify which grandparent the 1st cousins share.

I’m sure that is why it is said that it is important to test second cousins. The reason that I haven’t come upon this situation before is that this combination hasn’t come up on my father’s side. I have results of my father’s first cousin’s DNA and my own 1st cousin’s once removed, but no second cousins to compare.

Summary and Conclusion

  • Cousin Judy has been helpful in filling in my mom’s Chromosome map
  • Judy’s DNA results will also be helpful also as I fill in my siblings’ and my own chromosome maps.
  • Judy’s results have partially phased the DNA. That means, for my mom she can tell at least for one area, not only where she has a maternal match, but also that it is a maternal grandmother match (Nicholson).
  • I had thought that there would be a way to identify some of the Lentz DNA. However, I don’t see a way without finding a Lentz cousin who doesn’t descend from the Nicholson side. This would have to be a second cousin or further out.
  • Once Nicholson DNA is identified, it is more likely that the remaining non-Nicholson DNA could be from the Lentz side. However, that is not sure, it just represents more than a 50% likelihood.

A Hartley Z17911 STR Tree

In my previous blogs on Hartley YDNA, I mentioned that my terminal SNP is Z17911. That is a part of the L513 Branch of the larger L21 Branch of R1b. Here is what the L513 Branch looks like. This Tree represents those who have taken the Big Y Test in the colored area above.

l513chart

My Hartley Z17911 is difficult to see but it is slightly to the left of the middle and to the left of an orange area. The checkerboard pattern shows the part of England that my Hartleys are from. As far as I know I am the only Hartley that has had SNPs tested positive for Z17911, or for L513, for that matter.

STRs and Z17911

However, quite a few Hartleys have tested their YDNA. They have tested STRs. As a result, it is possible to do a comparison to others taking this test. STRs are not SNPs which are a more definitive designation of where you are on the Y Tree. However, they can suggest what SNP you should belong to. I belong to an L513 and the Administrator Mike is actively looking for others that might be in L513. As a result, Mike has put out lists of people that appear to be L513 based on their STR patterns. I have mentioned in past Blogs that some of those people are Hartleys.

Here is a recent list:

suspectedz17911

The first on the list above is me. Then follows three other Hartleys. Administrator Mike has grouped these other 3 Hartleys next to me. Based on their STRs, he has grouped them as Z17911. This is even though these 3 have not tested for Z17911, L513, or probably not even for L21 which is way up on the Y Tree. The row with the orange, green and yellow above the results has what is called STR Rates. These are the rates at which each individual STR mutates. Some are very slow and some mutate relatively quickly. The selected mode above is likely the mode of L513. This will come in handy later on in this Blog in a few ways.

Z17911 and Signature STRs

It turns out that STRs form themselves into groups. That means that for groups of people that are related by YDNA have combinations of STRs that are almost always unique to that group. Here I will make an assumption that the other 3 Hartleys are indeed Z17911, even though they haven’t tested their SNPs.

In the results section to the right of the Hartley names are the values for each STR marker. The colored values are the ones that vary from the L513 Mode. These values, especially the ones that are in the darker colors will result in a signature for these Z17811 Hartleys. The darker colors indicate more of a variance or distance from the mode. Another way to put it, is that the L513 mode is the older value and the Z17811 Hartley numbers are the newer values for the STRs that have mutated away from the L513 mode.

Up or Down?

These Z17811 STRs may mutate up or down. The blue shaded numbers are going down and the reds are going up. Why is this important? It is important as I’d like to build a tree from these 4 Hartleys. I will need to know who is descending from whom. Or at least, which of the 4 branches of Hartleys may be the oldest.

Here is an example:

str-example

These are some of the results of our 4 presumed positive Z17911 Hartleys. It is  difficult to create a mode of these results as the mode is the value which occurs the most. If there are 2 of each value, which value do you use? This happens the #449 Marker results. I am 31 at the top, but there are two 31’s and two 32’s. I have the L513 mode at the top of the image. The value for Marker #449 is 29. That means I have the older 31 value and the other 2 Hartleys have newer 32 values. They are moving away from 29.

Defining Hartley Z17911 STRs

Next, I looked at all the STRs where the 4 Hartley had different results. The other results are interesting but in comparing Hartley to Hartley they don’t matter if they are the same. Well, they might matter if there was a STR that mutated up and back down again, but the chance of that happening should be relatively rare.

hartley-strs

Here I have compacted 67 STR results to 12. This is a good time to point out the STR rates. The rate for 447 is about 0.09. The rate for CDYb is 35. That means that CDYb will change over 350 times as fast as 447. Another point is that Hartley #4 seems to be a special case. He was categorized as a non-L513 person which was thought by the L513 Administrator Mike to be a mistake. I don’t know if that was ever resolved. I do note that some of his STRs are a bit different than the other 3 Hartleys, but not totally different. I also note that he has tested positive for R-L21, so perhaps this has been resolved.

But Wait, There’s More

I had forgotten, there is one more Hartley in the group. He doesn’t have a Hartley last name but believes that he is descended from the Hartley Line. Great news. I will call him Hartley #5.

5-hartleys

Previously, I had missed Marker 481. Also when I copied things, my numbers didn’t get colors, but that’s alright. Now I have 13 markers and 5 Hartleys.

References for Trees

I’m aware of 3 references for creating STR trees.

  • Robert Laurence Baber – He has written quite a few articles on STR trees. I have not read them all yet. I downloaded a 5 part study he wrote but I don’t totally understand his method yet – though I understand some of the principles. He uses an upstream STR mode as I tried to do above.
  • Robb Hand Drawn Tree example – He compares a hand drawn tree to the Fluxus software. Although he likes the hand drawn version better, he learns some from using difficult to use the Fluxus software
  • Gleeson STR Tree – Maurice Gleeson gives a method and example of how to build a STR tree

More on Modes

I seem to be getting hung up on Modes:

more-modes

Here I have the L513 Mode and various modes from downstream SNPs. The 458 mode went quickly from 17 for L513 to 19 for S5668 and then appeared to stay there for quite a while.As a result, I chose 19 for the mode. Had I just looked at the older L513 Mode, I may have come to a different conclusion as to which way this STR was mutating.

Then the very fast CDYb seemed to move up in a regular way through the ages. Of course, in reality, it could have gone up and down over that period of time, but we wouldn’t know it if it did. I picked the lower 39 value for the CDYb STR at the Hartley mode level. To the right, I have the GD or generational distance from the Hartley Mode. This says that these Hartleys should be related at about the same level – around 4 or 5 GDs or STR mutations.

A 5 Hartley Likely Z17911 STR Tree

Here is the tree I came up with. It is along the line of and in the form of the Gleeson STR Tree example mentioned above:

5hartleytree

  • The Hartley common ancestor’s signature STR values are listed at the top. The mutations from that are shown down the branches to the individual Hartleys.
  • I also added some dates assuming that on average, a STR will mutate every 170 years given a test of 67 STRs. The lower horizontal lines above happen at the 2 or 3 STR mutation rate (which is the same as the GD). The top horizontal line happens at a GD of 4 or 5. The Hartley #5 horizontal line is up higher as the 358b mutation is a double one from 16 to 18.
  • In the above scenario, Hartley #5 is by himself. Another scenario would have Hartley #4 and Hartley #5 together as they share a mutation at 389b. Instead, I chose the above tree due to Hartley #1, 4, 3, and 2 each sharing 2 STRs.

This image shows some of my rationale for the tree:

5hartley-groping

I chose the double combo of 25-32 that Hartley #2 and #3 shared. I also chose the double combo of 17-40 (in yellow) that Hartley #1 and #4 shared. Other possible single combos that I didn’t choose to group were the two step 16>18 mutation for Hartley #4 and 5, the 11 mutation for Hartley #1 and 5 and the 16 mutation for Hartley #1 and 3. The principle used is to try to get the tree as simple as possible. This is what Gleeson calls the parsimony principle. My assumption is that my groupings achieve that goal.

How Do the Hartleys Compare to the Z17911 Mode?

In comparing Hartleys to the Z17911 Mode,  I go from the age of surnames to before the age of surnames. There are 4 that have tested positive for Z17911. They are Hartley (me), Goff, Thomas and Merrick. In that group, the level of GD’s and the variance in surnames indicate a pre-surname common ancestor.

So the GD’s will be further back also.

z17911gds

Here I am assuming no back mutations. Under the previous tree I assumed that Hartley #5 had a back mutation at CDYb. Due to the volatility of this marker, it is sometimes ignored in these analyses. Notice that now the range of GDs is from 3 to 8. Again, I group Hartley #1 and #4 together and Hartley #2 and #3 together.

z17911tree

Hartley #4 has the GD of 8. This is due to 2 double mutations. That pushes back his connection to Z17911 to around the year 600. This seems to be pushing back to a possible age of Z17911. Z17911 positive Thomas has submitted his Big Y results to YFull, so I am hoping to get a date from YFull for Z17911. It will be interesting to see what they come up with. The structure of the tree is the same as the previous Hartley Tree. I just adjusted the relative heights of the horizontal arms.

Summary and Conclusion

  • STRs from 5 Hartleys who have tested their YDNA seem to indicate a relatively close relationship – at least in YDNA terms
  • I have had my SNPs tested and the administrator of the R1b-L513 project has grouped the other STR-testing Hartleys in the same Z17911 group as me based on similar STR patterns. That is quite a way down the SNP tree.
  • If any of these Hartley were to test for for the L513 SNP or further down for Z17911, it could confirm what the STRs seem to be saying. Then I wouldn’t be the lone SNP tested Z17911 Hartley
  • SNPs create a solid reliable marker for relationships. It is best to have the SNP relationship established through testing before doing this type of STR analysis. However, even without SNP testing, STR trees can be informative
  • Back mutations and the different mutation rates leading to unpredictable STR mutations are the 2 major variables that make STR testing less accurate than SNP testing
  • The weakness of the SNP testing is that many have not done it. The other issue is SNP testing may only take you up to a certain date. After that date, STR analysis is  more useful
  • STR testing is best used in conjunction with SNP testing
  • Making a STR tree takes some practice and knowledge of STRs and mutations.
  • This YDNA research and resulting connections could shed light on the history of this branch of the Hartley family over the past 400-1400 years or so.

 

Updates to Whitson, Whetstone and Butler YDNA: A Proposed Whitson/Butler Tree

There have been some good news since my last Blog on Whitson and Butler YDNA. I wrote that almost 2 months ago. The biggest news is that there are new people in the group.

whitsonbutlerydnatestees

There is now one new category – R1b>R-M239 Whetstone (in yellow). There are 2 new people there. There is a new person in the I1>M253 Whitson/Whetstone Group (McIntyre). There is a new Whitson under I2>M223 who has taken the 111 STR test which is one of the best available. He shows up under the green section as having an ancestor Jacob Whitson. I believe that he had tested before when Ancestry had YDNA testing, but unfortunately, it is not easy to compare the two tests. His results are of special interest to me as he is in the group with my Butler father in law. There are now 3 Whitsons and 3 Butlers in this I2 Subgroup.

In this Blog, I will be analyzing and drawing trees for the green I2 Whitson/Butler Subgroup as they have the most in the group. With too few people in a group, it is difficult to draw trees.

YDNA – What Does It All Mean?

As many know, YDNA shines a laser bean down the male line to the far past. YDNA can quickly show who is not related. For example, in the chart above, the people in the different colored subgroups cannot be related. The connection between these groups could be in the 1,000’s or 10’s of thousands of years. To find who is related by YDNA is more difficult. The probability of relationships are predicted. This is because distance is measured in STRs and STRs can mutate whenever they want, even though on average that all mutate at a certain rate. Then some STRs may mutate faster than others – or much more slowly.

The TIP Report

FTDNA’s TIP Report is a good tool, because it estimates how closely 2 people may be related in generations based on probabilities. It takes into account the number of STRs tested and rate at which the STRs mutate.

batt and butler TIP

i2whitson-burtler

First, we will look at #1 and #4 on our list. They both tested at 111 STRs. The Report shows the likelihood that those 2 would share a common ancestor in the previous generations:

batt-peter

I usually feel that 90% is pretty likely. Let’s say a generation is 34 years. That would be 408 years ago or 1608 from now or even further back if we start from when someone was alive today and born in the 1950’s. Then it could be as close as 4-8 generations. Hopefully, we would know if the match was 4 generations ago, but the point is that the number of generations to a common ancestor could vary quite a bit.

I did a comparison for everyone in the Green Group above:

tipchart

I found the results quite interesting:

  • Mr Batt appears to be the same distance from each person in this group – irrespective of whether the match is a Butler or Whitson descendant
  • #4 Butler varies the most between 8 and 18 generations
  • #3 Butler was on average related most closely to the group
  • It appears that a sort of tree could be drawn from these results
  • It appears that this group of Whitsons and Butlers have been related to each other for quite a while. The number 12 comes up a lot for generations to a common ancestor. My guess that these two families have been related to each other for between 8 and 12 generations

These are my interpretations from just the TIP Report so far. I am open to other theories.

A tree from tip reports

I have never seen a tree drawn from these TIP Reports, but it would be interesting to try. Here is my first try:

whitbuttreept1

This shows the furthest and closest relationships based on the TIP Report. #4 is 17 generations away from #2 and #4 is 8 generations away from #3. Now I just need to add one more Butler and 2 more Whitsons. But How? Here is a simple solution:

simple-tree

Here this assumes that all the GDs above 8 are pretty much equal and that everyone matches above at the common Whitson/Butler Ancestor. Here is another option:

tip-tree-2

This looks nicer, but I can’t say that it is more accurate given the TIP Reports. Here is a 3rd try:

tiptree3

This doesn’t seem to do the TIP Report justice either. I’ll go on to the more traditional trees made using STRs.

STR Analysis

I’ll now try to create a tree using a method developed by Robert Baber in 2014. Here is an example of one of his trees:

baber-example

In my previous Blog, I looked at signature STRs. Those are the similar STRs that define a group. However, to created a tree, I will be looking at the STRs that are different.

I2 Whitson/Butler STRs

Here is a chart of the defining differences in the I2 Whitson/Butler Group:

i2whitsonbutlerstrs

modes

The first mode above is an I-A427 mode from the FTDNA I-M223 Y Haplogroup Project. So this mode should be a more generic version of the Whitson/Butler Group. The assumption is that the mode for this larger group goes back further in time than the Whitson/Butler Group. The reason that this is important is that it should tell us which way the STRs are moving.

  • In the first column with numbers above, the A427 mode is 29, the W/B Mode is 31 and 6 Butler (Michael) is 32. That means the STRs are mutating up.
  • Look at DYS576. That is a red STR. That means it is a fast mover. A427 mode is 18, W/B mode is 16 and Batt is 15. That means that the trend of STR mutation is going down over time.
  • CDY is a fast mover and difficult to interpret. Some people might ignore the CDY results for this reason.
  • Finally look at the last 2 columns above. The A427 (older) modes are 14 and 12. The Whitson/Butler modes are 16 and 14. That would indicate that the trend in STR values is upward. However at that level of STR testing (111), the 2 Whitsons are at the higher level and the Butler is at the lower STR level. If we were just looking at the 3 Whitson and Butler STR results here in isolation, we would think that the Whitson higher level STRs were older and that Butler is changing away from them. However, by using the broader I-A427 vantage, we can see that it is likely that is Whitson changing away from Butler. This could have implications as we try to determine who came first – the Butlers or the Whitsons in this I2 subgroup.
  • It is possible that if all those in the I2 group had tested for 111 STRs, that the above point would be clearer.

Just based on the last 2 STRs of the 67-111 STR results, I would draw a tree like this:

butlerwhtson111tree

Unfortunately, I am having a lot of trouble understanding the Baber Paper and I am pulling the plug on that method for now. However, there are interesting concepts in it that are helpful.

From Baber to Robb

John Bartlett Robb put out a paper in 2012 called:

Fluxus Network Diagrams vs Hand-Constructed Mutation History Trees

In that paper Robb gives a procedure for drawing trees.

In his paper, Robb uses only the STRs in common, so in our case, that would be the 37 STRs. He also creates a Root Prototype Haplotye (RPH). In our case that RPH would just be the Whitson/Butler Mode. Then he notes deviations from that RPH in lime green:

robbstrs

Here are the Mutation Rates for the applicable STRs extracted from the Robb Paper:

mutation-rates

The faster mutations are on the bottom and slower ones on the top. I added in the people on the right that had the mutations. On 37 markers, everyone had one mutation except for Butler (James) who had 3.

Proposed Whitson/Butler Tree

Here is the tree I came up with based on 37 STRs:

proposed-whitsonbutler-tree

From there, I recall a rule by Baber which says, in my terms, “you should only have 2 lines going into each box”. Here is a tree that meets that rule:

treebaberrule

So reading down from the top, we have the common ancestor which I have as Butler Ancestor 3. That ancestor has a certain signature based on STRs. Then I have my father in law branching off with a 389ii that goes from 31 to 32. I took my father in law as the first mutation as he had the second slowest mutation after #4 Butler. I couldn’t choose #4’s slowest mutation at that point as that mutation apparently happened after the common mutation (of 570 22 to 23) he had with #3 Butler. Branching down from Butler Ancestor 2 is Whitson Ancestor 2. From him I have #2 Whitson (Jacob) branching off as he has a slow moving STR also. Then from Whitson Ancestor 1, I have #5 Whitson (Isaac) and #1 Batt (Wm Whitson).

Also from Butler Ancestor 2 I have the common mutation of STR 570 which went from 22 to 23 in a presumed common ancestor of #3 Butler (Laurence) and #4 Butler (James). After this common mutation, the #4 Butler line had two additional mutations – one on the very slow mutating STR and one on the very fast mutating one.

The technique takes a little logic, a little guesswork and some knowledge of how the STRs mutate. If I had plugged #6 Butler into Butler Ancestor 2 and Whitson Ancestor 2 into Butler Ancestor 3, it wouldn’t have made much difference. I did it the way I did based on the speed of the STR’s mutation rate – all other things being equal. The overall idea is to get from the common ancestor signature STR to the individual members’ STRs.

I think the above tree is a likely scenario considering:

  • I see the Whitson STRs changing off the Butler STRs in my charts above.
  • The Butler STRs are slightly slower changing STRs which could indicate an older line.

Some other points:

  • It is likely that the Whitsons and Butlers are grouped together by surname as I have them.
  • The Butlers all descend from Ireland. If the chart is correct, then the Whitsons in Subgroup I2 could also descend from Ireland. A more complicated speculation would have both lines in England. Then the Butler line could have gone to Ireland and the Whitson Line to the U.S.

Raw Data Phasing Part 4: Going from 3 Siblings to 4

In my last Blog, I mentioned that my brother Jon’s DNA test results came in this week. This happened in the middle of my attempt learn how to phase the raw DNA data for my 2 sisters and myself. I was phasing the data in what I can only assume is a traditional way. I say I assume, as I haven’t seen any other blogs on the process. The difference is that I am using MS Access which I hope will speed up the process. I should be able to get results for 23 chromosomes at a time instead of just one at a time.

The arrival of the new DNA results poses at least two problems:

  • The previous 4 DNA data files were all in AncestryDNA version 1. Jon’s is in AncestryDNA2. While they are all Build 37, they look at somewhat different points on the chromosomes
  • One of the difficult parts of the previous process was identifying and dealing with patterns of phased paternal and maternal bases. Those patterns were AAB, AAB, and ABB. With 4 siblings, there will be more patterns. However, the Whit Athey Paper I have been following does also look at 4 siblings.

AncestestryDNA Version 1 Vs. AncestryDNA Version 2

My understanding is that Ancestry changed the locations on the chromosomes that they were testing to get more into the medical area like 23andme. I don’t know if that is true. Here is a chart comparing the different atDNA tests:

ancestrydna-compared

I was doing well comparing Anc1 with Anc1 as I was looking at over 700,000 base pairs among 4 people. Once I compare Anc2 to Anc1, that is number is cut down quite a bit. That is about a 40% drop. My only other option, other than re-testing Jon, is to compare Jon to my mother’s FTDNA results. However, that will only pick up 2-3,000 SNPs, so I won’t bother.

Back to Square One with 4 Siblings: Homozygous Siblings

I need to find Jon’s equal base pairs and apply one to his ‘from dad’ column and one to his ‘from mom’ column. That is, after I add all Jon’s data to my database and add those columns. First I need to decide where to add Jon’s data. I could add it to the beginning of what I have already done or to the end. I’ll try adding it to the end, because I think that the work I did already is OK. I want to build on that. So rather than adding Jon’s DNA to the first step, I’ll add it to my table called tblMomBaseFromDadBase. This table has over 700,000 lines of bases for 4 people. Jon’s has 668,942 lines. Actually, when I remove “Chromosomes” 24-26, I will only have 666,531 lines.

Querying Jon into my latest table

Here I am adding Jon and the Mom from Dad Table to my query design:

adding-jon

Access thinks the ID that it added was important, but it really isn’t, so I need to take out that equal join. I really want the join to be at the rsid, but I don’t want an equal join. Why not? If I had an equal join, I would end up only with the positions that Jon has. I will lose 40% of the work that I have already done. Instead, I’ll use an unequal join.

unequal-join

I flipped the 2 tables in the query design area, so things are moving left to right. Then I choose a #2 join which is basically, an unequal join left to right.

Actually, I changed my mind. I have a better idea. I will just do the first 2 steps on Jon’s raw DNA and then join the results together. That is a third way that I hadn’t thought of. The point is, that there are many ways to do things in Access. There can be more than one way to get to where you want to be.

Back to Homozygous Siblings

First I copied Jon’s raw data into a table called tblJonHeterozygousSib. This is so I can use an update query to update the data in the new table and still have the original. Hold that idea. The better idea is to use a make table query. The reason that this is better is that it can take out the “chromosomes” I don’t want:

make-table-query-jon

I took out the table I copied and I’ll make a better one with only Chromosomes 1-23. I hit the Run button and create a table with 666,000 lines:

jonhomosib

Then in the above table, I inserted 2 rows: JonFromDad and JonFromMom. Now this table is ready to phase for any homozygous siblings. By the way, it looks like my Chr23 or X is homozygous, but it isn’t. Ancestry adds an extra base. I only really have one for my X Chromosome.

Finally time to query and phase

I go to Query Design in Access and choose the above table. This is a very simple Update Query design:

qrysibhomojon

This says if Jon’s allele1 is the same as his allele 2, put allele 2 as his base from mom and as his base from dad. I hit the run button for the update and get the dire warning that I’m updating a lot of information, I can never change it back. Then I get a message that I’m updating 478,000+ rows. That is good. Those are the number of Jon’s homozygous bases – quite a few. I’d say over two thirds.

I’m not looking for crazy results and didn’t get any.

Homozygous Mom Query

I’ll copy my previous table into one to update. Then I need to add Jon’s base from mom where mom is homozygous. Easy peasy. I think this is all I need.

momhomoupdatequery

Actually, I did think of an issue. I have an equal join. That means I won’t be using the homozygous bases that mom tested for in the old AncestryDNA test that aren’t in the new AncestryDNA test list. My guess is that is interesting information but perhaps not very useful. It also occurs to me that in the spots where Jon doesn’t match up with my siblings, I will still have the 3 letter pattern work that I had done previously.

The query above says if Mom allele 1 = 2, then put that 2 allele in Jon’s from Mom base slot. I hit Run and pasted 277,000 rows of bases.

homomomforjonresults

This query will be a little more difficult to check. I have to create a query linking my mom’s DNA results to this table. I did that and see one problem already.

momrawtojonfrommom

The first problem is that ID 126 didn’t show up. That means that rs3819001 that Jon has is not in my mom’s raw DNA. I don’t want to have data for Jon that looks like it can be updated, but it can’t.

I think I can fix this.

Updated Table Query

A few steps ago, I ran a Table Query to get just Chromosomes 1-23 into Jon’s Table. I need to upgrade that query so that I am only including the locations (rsid’s) that are common to both my mother and Jon. I do this using an equal join on the rsid Field:

updatedtablequeryforjon

This time, my table for Jon only has the rsid’s that my mom has.

newtable-upda

Also my Chromosome formula was off, so I had to fix it. Also note that I have about the number of rows as per my Anc1 vs. Anc2 table earlier in the Blog. I then re-added the Jon from Dad and Mom columns into the new and improved table. Then I reran the update query which told me I was about to update 284,000+ rows.

homozgygousjonupdate

This worked as well as last time, but this time I have the fewer rows I was trying to get.

Re-Run the update query for homozygous mom for jon

I double clicked on my old update query. The message said I was updating 277,000 rows or so. Now I’ll re-check my work. If there is no ID 126, I’ll be happy. Well it is still there, because I forgot to copy the previous homozygous sibling table into the homozygous mom table. After re-re-running the update, I got the desired results:

tableno126

And there you [don’t] have it: no ID 126. Here is my mom’s raw file compared to Jon’s updated table.

momrawtojonfrommom

Jon gets a G from mom at ID 128 even though Jon is AG, because mom is GG. Now I’m talking DNA.

Merge Jon’s New Table with His 3 Siblings’ Tables

This is the point where I put everything together. I will try to use the Make Table Query for this one again. So I’ll put my newest Jon table together with my newest sibling table.

left-to-right-merge

This shows the left to right arrow join. I’ll want the larger file plus everything equal in the smaller file. Come to think of it, this Create Table Query would have fixed the earlier problem I had. I guess I was too careful! The other issue is that the ID in the 1st table won’t be the ID in the second table. I could keep the second ID, but I would have to rename it as Jon ID or Anc2ID.

newidtablemerge

 

Here I rename Jon’s IDs as JonID. I may not need it, but if I do need it I will have it. I guess MS Access wasn’t happy with my idea:

autonumber

OK, I took out the JonID and hit Run. Microsoft tells me about my new 700,000 row table.

Back to the Dad Patterns

Now that all the family is together I want to look at Dad Patterns, because I know that I will be updating those. Here is the first query I tried on my new Table of 4Sibs.

sharon-not-joel

This is looking for filled in Dad bases where Sharon’s base is not the same as Joel’s. That query gives me an ABAA pattern:

abaa-pattern

Also ABBB:

abbb

Here’s ABBA:

abba

It looks like ABAB is a possibility also. That means the following are possible:

  • AAAB
  • AABA
  • AABB
  • ABAA
  • ABAB
  • ABBA
  • ABBB

So if I chose Joel’s Base not equal to Sharon and then Joel’s base equal to Sharon would I have every combination? It looks like I need this combination to cover all possibilities:

  • Joel <> Sharon OR
  • Joel<>Heidi OR
  • Sharion<>Heidi OR
  • Heidi<>Jon OR
  • Jon<>Joel OR
  • Jon<>Sharon OR

Which in Access looks like:

access-pattern-combos

But Wait, I Forgot Principle 3 for Jon

Principle 3 says where Jon is heterozygous and he knows where he got his maternal base, the other base goes into his From Dad column. Looking back at my old queries, I see this is a 2 step query. I’m tempted to try this in one step, but I think  this got me in trouble before, so I’ll go with the simpler query. Simpler queries are usually better in MS Access.

jonhetero

This says where Jon is missing a phased allele from Dad and he has an allele that doesn’t equal the one he got from mom (making Jon heterozygous here) put that allele into Jon’s From Dad spot. I tried the query and only got 37 results. The problem is, I should have said ‘Is Null’ in the JonFromDad Criteria:

jonheteroisnull

This time I get 35,000 updates, so that is right. I then change the allele1’s to allele2’s above and get 33,000 updates to tbl4Sibs. I ran a quick query on the 4Sibs Table to get just Jons heterozygous results:

jonheterocheck

In the first line, Jon had allele1 as T which was different from the allele from Mom of G, so Jon’s T got put into the From Dad spot. At ID 41, Jon’s allele2 of G is from Dad because he had an A from Mom. When parent and child are heterozygous, the From Parent location remains blank.

Now I have Jon with 3 Principals: Homozygous Jon, Homozygous Mom and Heterozygous Jon.

Back to Dad Patterns

I have the old Dad Patterns for 3 siblings. Now I need to See what the 4 sibling Dad Patterns would be and add Jon’s Start and Stop Locations for his new Dad Pattern Areas. I need to combine that with the 3Sibs Table.

wrongpattern-query

My first query was wrong and gave bad results. The reason is that the ID for 4Sibs was from the raw data. The ID for the Dad Pattern Table just numbered the amount of Dad patterns. I needed to join the ID in the first table to the start and stop locations in the second table. I ended up doing 2 queries: one for the start position and one for the stop as I needed both. This query gives the stop position of a pattern.

stop-query

I took both those queries and put them into an Excel Spreadsheet.

excelstartstopfromaccessdad

I added a new column called Dad4Pattern. In the first row, the new pattern was AAA by chance. However, in the second row which is the Stop or End of the first Dad Pattern, it is obvious that the ABA Dad Pattern goes to an ABAA Pattern. I didn’t think that there would be many AAAA Patterns as that means that all siblings match the same Paternal grandparent. This is the only AAA pattern that I had noted so far as I wasn’t looking for them yet. Still, I will need to go back and verify that these Start and Stop AAAA’s were not by chance. Finally, on the last line, it is clear that the Dad Pattern goes from AAB to AABB with Jon added.

Next I chose all the cells where Jon had a base from Dad and performed a Concatenate operation to write the pattern.

concaternate

This gave me the CCCC that I wanted to check. Next, I wrote a formula to put the Dad bases together in a new column and wrote down the Dad Patterns that I had.

newdadpattern

A few notes:

  • Out of the 66 three sibling patterns that I had, I was able to find all but 5 new four sibling Dad Patterns. See the yellow above for two of the missing 4 sibling dad patterns.
  • The missing 4 sibling dad patterns should be easy to find by scrolling through the 4Sib Table
  • I noticed that there were no AAAB patterns. That is because in my previous search, I was not looking for AAA patterns. So now, I don’t have any AAAB patterns. I will have to find these in my new search.
  • AAAB is the situation where I match the same paternal grandparent as my 2 sisters, but Jon matches the other paternal grandparent.
Filling in more dad patterns

To fill in the yellow areas, I made a query in Access based on the 4Sibs Table. This looked at every case where Jon had a base from Dad. Searching around the ID 6604 and after, I found this pattern:

fill-in-patterns

ABBB

Then I checked near the end of the old 3 sibling pattern which is at ID 19806.

break-point

At ID 19827 we see an ABAB Pattern, so I enter that Pattern in my spreadsheet:

newpattern4

For the start of the new ABAB pattern, I used the old ABA location as that was more precise. The next interesting thing happens at Chromosome 2:

chr2

Here I have a problem in my spreadsheet. For some reason, the Start of the last pattern of Chromosome 2 ends at Chromsome 3, which is not right. My previous spreadsheet was better than that. From the ashes I will re-build.

I note that at ID 108798, my 4 Sib Spreadsheet goes to an ABAB Pattern. At the end of Chromosome 2, I see an AAAB Pattern. That was the one I wouldn’t have had from the 3 sibling pattern as I wasn’t checking on AAA’s.

I added new rows for the patterns ABAB and AAAB:

addnewrows

The most important thing here is the ID, the pattern, the Start and Stop. Here is the new change area from ABAB to AAAB:

chr2change

There are a few SNPs between the ABAB Stop and the AAAB Start that are a little unclear.

end-of-2

Finding Jon’s Patterns

Now I’ll check Jon’s Patterns. I’m looking for any changes in patterns as these should be important as crossovers later. I will need to assign the crossovers to each sibling’s Chromosome Map.

Good Old Triple A – B Pattern and all the others

AAAB is where Jon has a different paternal grandparent than his 3 tested siblings and the 3 siblings have the same paternal grandparent.

aaabquery

My query says that Jon has to be different from each sibling. I run that and insert the appropriate Start and Stop point for the AAAB in my spreadsheet.

I do the same for AABA which I can find using a similar query under Heidi’s criteria:

aaba-query

I ended up going to a clean spreadsheet. It was too messy combining the 4 sibling results with the old 3 sibling results.

4sibpatterns

Here I have the ID, the Chromosome, the pattern and the Start and Stop. The yellow marks a one SNP pattern. It appears that there should be 3 types of patterns:

  1. One where one sibling matches none of the others. That is what I have above: AAAB, ABAA, AABA and BAAA
  2. One where 2 pairs of siblings match each other: AABB or ABBA. I’m not sure what else there could be. I looked above and saw one other: ABAB
  3. One where all the siblings match each other: AAAA

That makes 7 or 8 patterns, depending on whether AAAA is considered a pattern.

Two Pairs of siblings match each other patterns

Here is the Access query for AABB

aabb-query

At first I was missing the criteria under SharonFromDad and that gave me AAAA combinations also. The result of the query looks like this:

aabb-results

Here Joel matches Sharon and Heidi matches Jon but on a different base. After I was finished putting in Starts and Stops for each Pattern, I then sorted my spreadsheet by ID. This brings up some issues that need looking at:

quality-control

Where there are 2 Starts or Stops in a row, there is a need to check what is going on. The ones around the yellow positions may not be a problem as I’ll likely be taking those single positions out. However, at the end of Chromosome, there are 2 starts and 2 stops together. I need to go to ID 236707 and see what is before that point. It apears that there is an AAAA pattern before that point and that the ABAB at 224584 is a single point. That fixes half of the problem. Then I go to ID 238976 to see why I have a Stop there for ABAB.

fix5

I had missed the Start for the ABAB right after the stop of the ABBA pattern, so I added it in. The repaired spreadsheet looks like this.

fix5spreadsheet

An application

Now that I have the change between ABBB and ABAB described, let’s look at what it means. Here is a different look at that location:

heidi-break

When the pattern changes from ABBB to ABAB, what has changed is the third B changes to an A. Heidi is in that location. So that says at the above position of Chromosome 5, Heidi has a paternal crossover. I thought it would be good to check my work against the work of M MacNeill. To do that, I used the NCBI Remap website to change my Build 37 results to Build 36:

remap

This would be the start of Heidi’s new segment. Here is what MacNeill had:

macneill-check

I got it right again. That is 2 for 2. Actually, the first time I tried, I was comparing the wrong Chromosomes. Rookie mistake. Here is M MacNeill’s map for Heidi on Chromosome 5:

macneill5

Perhaps it is difficult to see, but the point I am looking at is the little lighter red segment at the far right of Chromosome 5. Perhaps that is why I missed it the first time as it is so small.

Another Aside is that this was a very difficult Chromosome to decipher using visual methods. This was one of my attempts to figure out the crossovers visually for 3 siblings.

visual-chr5

I had missed the last crossover as it is so small and difficult to see. In my defense, I should note that M MacNeill did mention that the end of this Chromosome was difficult to decipher.

Taking Out the X

I’ve realized that I’ve generated some bases for the X I got from Dad. Of course, I didn’t really, so I’m taking out any bases there for me and my brother Jon. I’ll use this update query:

takeoutx

I was worried that I’d mess something up, so I created  a  new table called 4SibsChrX. My query put dashes in the spots where I couldn’t have an X base from Dad:

xtodash

This looks like a good place to end Part 4. It appears that there should be many chances to quality check my work and that the process is progressing. Getting Jon’s new DNA set me back a bit, but the results should be better than what I’d see with 3 siblings.

 

Raw Data Phasing: Part 3

This Blog is Part 3 documenting my learning process of phasing my DNA raw data using:

Part 1 and 2 Recap

  1. I imported 4 sets of raw data into Access from AncestryDNA after taking out the zeros that the Excel software produced for the no-calls.
  2. I used Access Queries to apply 3 Whit Athey Principles. This resulted in many phased bases for me and my 2 sisters.
  3. I put the phased A’s, G’s, C’s and T’s for each siblings into 2 new columns for each sibling
  4. This resulted in 6 new columns. The first 3 of these six were for the paternally based bases. These resulted in a pattern which was either in the form of AAB, ABA, or ABB.
  5. The Athey Paper did not emphasize the AAA pattern or considered it a non-pattern. While specific AAA results within another pattern area are by chance, there are other areas where 3 siblings match the same grandparent where there will be an AAA-only Pattern.
  6. I separated my results into 3 patterns using Access: AAB, ABA, and ABB
  7. For each of those results, I noted where those patterns changed.  I did this by looking at the ID numbers. Breaks in the ID numbers were considered changes.
  8. However, there were some cases where the changes occurred around missing bases. For these, I went back and noted a more precise position of the pattern change based on where the change would be if the missing base were to be filled in.
  9. I Made a preliminary bar graph using the first 3 paternal changes. These crossovers were mapped to myself and 2 sisters.
  10. Using the 3 patterns I developed Access queries to fill in the missing bases in the 3 paternal pattern areas.

So those were the 10 easy steps. Actually step 10 was difficult as there was quite a bit of refining the Access queries and quality checking the results. I needed 2 queries for each of the pattern areas. However, once I had the queries, it was the push of a button to update missing parental-received bases for 3 siblings within over 700,000 lines of DNA.

Back to Athey

This portion of the Athey Paper appears to apply to where I am now:

For some of the unfilled cells on the mother’s side of the table, we can fill in the alternative (other) base from the corresponding location on the father’s side of the table. That is, we know that the sibling with an empty cell got one base from the father, but the alternative base from the mother. Therefore, after the use of the Dad pattern fills in more cells, a newly filled – in cell in the father’s side of the table gives rise to a filled – in cell in the same position on the mother’s side–the alternative base to what was on the father’s side.

Unfortunately, I’m not sure what is meant above. My guess is that this relates to Principle 3:

Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base. This principle will be very useful in the present approach.

So now that missing paternal bases have been determined based on the patterns, it should be possible to fill in missing maternal bases for heterozygous children. First, I’ll do a Query to see if I can locate this situation. I’ll take my most recently updated Dad ABB Pattern Table update and query that. I’ll look at the situation where there are heterozygous results. Then, I’ll look at spots where there are missing bases from Mom.

Fortunately, I was able to come up with a slick looking Query for this situation:

mom-from-dad

Plus the Query design has some nice symmetry. The first criteria row of the query is for my (Joel) DNA. Reading across, it says Joel is heterozygous because my allele 1 does not equal my allele 2. Then it says that I have a base from Dad but not from Mom. This will show areas where the mom bases are missing in this heterozygous child situation.

mom-bases-to-fill-in

The truncated fields above are Joel Allele 1, Joel Allele 2, Sharon allele 1&2, Heidi allele 1&2. The next 3 columns are Joel, Sharon and Heidi from Dad. Then Joel, Sharon and Heidi from Mom (the last 3 columns). This shows that there are almost 12,000 of these Mom bases to fill in. Above the blue line are Heidi’s bases missing from Mom. Heidi is TC (heterozygous) on that line. Her Dad base is T. I love these binary problems. They seem well suited for the computer. That means that a query could not be too difficult to update almost 12,000 records. So Heidi’s Mom base will be C above the blue line. At the blue highlighted area, I am TC and my Dad base is C. My Mom base will be T on the blue line.

Looking for a Good Query to Fill In Mom Bases from Dad Bases

First, I copied my ABB Table to a new Table called tbleMomBaseFromDadBase. I will want to update that table with a new Update Query. I already have the first part of the query. Now I need my thinking cap. Even better than thinking, I can look at what I did before. Here is my old query.

allele1-query-heterozygous

This is difficult to see, but I split the problem into 2 alleles. What this says is when Sharon has a base from her mom and Sharon’s allele 1 is not the same as the base from her Mom, pop that allele 1 into her base from Dad slot.

For our situation we are doing the opposite. So we will switch Mom and Dad. This time we are using our Dad results to get some Mom results. I’ll also add a criteria to make sure the Mom result is Null, so I’m not overwriting anything. It will just be an extra precaution.

Basically, I want to make sure Heidi has a base from Dad and not from Mom. In that case, when her allele1 is not equal to her base from Dad, put that allele 1 in as her base from Mom. Drawing upon my vast experience in this area of about 1 week, I get this:

allele1dad-to-mom

When I preview the results, I get about 6,000 lines which is half of my previous query, so that seems OK. I’ll go ahead and update my new Table. I renamed my Query to qryMomBaseFromDadBaseAllele1 and copied it to do the same thing with Allele2. I’ll change the Allele’s 1’s to Allele’s 2 in the Query design. First I’ll do a Select (non-updating) Query to show what I’ll be updating with the allele’s 2.

allele2momfromdadselectquery

Here I added the ID numbers, so I can make sure my update went well.

Here is my Allele2 Update Query with the 3 siblings included:

allele2momfromdadupdatequery

The results:

momfromdadupdate

In the far right column is the Base Heidi got from Mom. It was updated on lines 2292, 2295 and 2299. In each case Heidi’s Paternal Base was T and the Maternally derived Base from Dad was C.

Here is my corresponding filled in Mom Base:

joelmomfromdad

My Dad’s T’s in 6 columns from the right were used to fill in the missing C’s in 3 columns from the right. Doesn’t it seem a bit ironic? Even though my dad was not tested for DNA, his “results” from this process are used to find the DNA I got from my mom who was tested.

A Premature End to This Blog and a New Beginning

This will be one of my shortest Blogs. I was both awaiting and not awaiting my brother’s DNA test results. Those results came in this week. The reason I was not awaiting was that I knew that I would need to re-start the raw data DNA phasing process once his results came in. With that, I’ll end this Blog and start a new one.

 

 

 

 

A New Tested Frazer Descendant: My Brother

My last Blog on Frazer DNA had to do with a newly tested James Line person – Madeline. My brother is on the Archibald Line of our Roscommon, Ireland Frazer Study Group. I had brought a DNA kit to my Hartley Family Reunion at the beginning of August, thinking to get a sample from one of my dad’s cousins. I didn’t end up doing that. So, later, I asked my brother if he would take the test. He did and the results are in.

Missing Frazer Segments from the Hartley Family

M MacNeill – prairielad_genealogy@hotmail.com has been mapping Chromosomes based on my family’s raw DNA data. That has shown that, on some chromosomes, even with 3 tested siblings, there is some Frazer DNA missing. Here is Chromosome 3, for example:

chr3-macneill-map

The bottom 3 lines are my DNA and my 3 sisters. The lighter red is Frazer and the darker red is Hartley DNA. In the middle part of the Chromosome, my 2 sisters and I only inherited Hartley DNA. That means that there is some Frazer DNA missing. My father’s DNA is on the top line. The cross hatch area shows the Frazer DNA that he is missing there because by chance his 3 children below didn’t inherit any in that area. My brother Jon may not help fill in this particular gap, but he may fill in some of the gaps.

Looking for new Frazer DNA from jon

What I did was look at Jon’s top matches. Then I ran those top matches through the One to Many Utility at gedmatch.com. From there I looked at Jon’s match’s matches to see if Jon came up by himself. That would be the new Frazer DNA. Jon’s top Frazer match is our second cousin, once removed Paul. I didn’t see any obvious new DNA with that comparison. Jon’s 2nd or 3rd top Frazer DNA project person is Michael. When I go to Michael’s match list, Jon comes up as Michael’s top DNA match. That is a good sign. Here is Michael’s Chromosome Browser matches for Chromosome 2:

michael-jon-chr-2

Here Jon is #1.

It is not a large match, but the key here is that it is by itself. That makes it new as my 2 sisters and I don’t match Michael at that spot.

Phasing Brother Jon

Seeing the match above, it reminds me that I need to Phase Jon by Gedmatch. That means that gedmatch takes my brother’s results and splits them into the DNA it thinks Jon got from my mom and the DNA it thinks that he got from my dad based on my mom’s results. Before I do that, however, I uploaded my mom’s AncestryDNA results to Gedmatch.com. Her FTDNA results are already there. This is why I also uploaded her AncestryDNA results. The chart below shows the results you get when you compare one company’s DNA results to another’s or even a different version of one company’s results to another version of that company’s results.

ancestrydna-compared

Jon’s new results are Anc2 results. That means that now Ancestry is testing different areas of the Chromosomes. However, it looks like I didn’t need my mom’s AncestryDNA results after all. Comparing Jon’s Anc2 results with my mom’s FTDNA results still gives me more SNPs (426,923) than comparing Jon’s Anc2 to Anc1 (424,150). Now I’ll have to mark my mom’s Ancestry kit as research only at Gedmatch as it is not good to have 2 results for one person there.

Now Jon has 2 phased kits (maternal and paternal). My little side trip was to check Jon’s Paternal Phased Kit with Michael. Here are the results:

jon-michael-paternal

Next I run Jon’s maternally phased results with Michael:

jon-michael-maternal

They have a borderline maternal match. That means that Michael matches Jon on his maternal side as well as Jon’s paternal side. How can this be? The answer is that they probably have a very distant match or match in the general population. My mom is German,  but about 1/4 English also. Michael lives in England. The key for this project is to disregard this Chromosome 7 Segment match as it is not likely a Frazer match.

Any more New Frazer project DNA for Jon?

Next on Jon’s list of Frazer DNA Projects is Gladys. Here is Gladys’ chromosome 1 showing her matches with Jon and his siblings.

jon-gladys-chr1

Jon on Line #1 doesn’t have a new match here, but his match is longer. Numbers 2 and 3 are my sisters Sharon and Heidi. This is in an important part of Chromosome 1 where there are a lot of Triangulation Groups (TGs). It looks like Jon’s Frazer DNA got a bit less broken up compared to his sister Sharon’s match in the area from about 182-202M on Chromosome 1 above. Here is MacNeill’s Chromosme 1 map of Heidi (#3 above) and Sharon (#2 above):

macneill-chr1

Sharon’s (#2) small match is represented by the right end of the lighter blue bar above. Where the bar changes from red to dark red, Sharon’s DNA changes from Frazer to Hartley. What Gedmatch shows above is that when his DNA is mapped, the lighter red bar will go further to the right than Sharon’s red bar. Heidi’s (#3) small match is represented by the left side of her 2nd lighter red bar.

Jon and the Everyone Comparison

This next image will compare the matches Jon has with everyone in the Frazer Project. I left out those with parents that have tested.

jon-with-everyone

This is like when you order the Everything Pizza. The square in the top left left has the Archibald Line matches. The square in the bottom right has the matches of the James Line of the Frazer DNA Project. People with green matches should know each other already. My brother Jon from the Archibald Line matches Jonathan of the James Line. This seems appropriate as Jon’s middle name is Frazer.

More Detail: GEDmatch Matching Segment CSV

For the same people that I chose for the comparison above, I wanted more segment detail, so I chose an option called Gedmatch Matching Segment. This puts all the matching segments between all the people above into an Excel spreadsheet. While looking at those segment matches, I found a new TG that Jon was in.

New Chromosome 9 TG with Jon

Here is my Frazer DNA spreadsheet:

chr-9-tg

The first line is for 2 close relatives in the James Line, so the match may not be on a Frazer line.

Can you see the TG? It is difficult to see. The TG is between Pat (PB), Gladys and Jon. It is confusing as there is a lot going on there. Here is what Gladys’ Chromosome Browser matches looks like for her Chromosome 9:

gladys-chr-9-matches

Where the lines represent Gladys’ matches with:

  1. Bill
  2. Pat
  3. Jon
  4. Sharon

But remember I said above that the TG was with Gladys Pat and Jon. How did Bill get in there? Note that Jon matches Pat at 8.8 cM. Perhaps Jon and Bill match below thresholds. I lowered the thresholds at Gedmatch to see if Jon and Bill would match, but still no match. Perhaps there is another explanation.

First the TG we do have. And it is a beautiful thing.

gladys-pat-jon-tg-circle-line

Pat, Gladys and Jon had a double shot at being in a Frazer TG as they have Violet as an ancestor and James, believed to be her 1st cousin. We may not know which Frazer the TG is for, but we know that it is a Frazer TG.

Why isn’t bill in this TG?

Yes, why not? Here’s my guess. As you likely know, we carry a set of chromosomes from our Mom and another set from our Dad. Gladys, above, had a set of Frazer Chromosomes and Webber Chromosomes. Perhaps the match Gladys showed with Bill was a Webber match. There is a way to test this theory. Bill also does not match Pat in the area of the TG that we are looking at. In my spreadsheet above, Bill has a few matches with Pat but they are in different regions. Bill may be matching Pat on the Price Line. Note that Pat and Bill share a Price ancestor.

Another Question: Why isn’t my sister Sharon in this tg?

The answer to this question is easy. She should have been in this TG all along. In my spreadsheet I have that Pat and Sharon match at 8.4 cM. I missed the larger match between Sharon and Gladys.

gladys-sharon-match

By the way, of my mapped siblings, Sharon has a lot of Frazer DNA in her Chromosome 9:

sharon-chr9

Sharon’s Unrecombined Frazer DNA

Thanks to the results of M MacNeill’s beautiful mapping work, Sharon’s  lighter red bar on the bottom of the image above is all Frazer DNA. That means her paternal chromosome #9 did not recombine. She has the same Frazer DNA in that Chromosome that her dad got from his Frazer mother. But how did Dad get his DNA from his mom? My guess is that the DNA my dad got from his mom did recombine. That means that grandma passed down a combination of her parents’ DNA. That would be her paternal Frazer DNA and her maternal Clarke DNA.  What I know for sure is the places where Sharon matches other Frazers would be the Frazer segment of my dad’s DNA. That would be at least the 85 to 100M range of Chromosome 9. So my dad’s (maternal) and Sharon’s (paternal) Chromosome 9 could have looked like this:

frazer-clarke-segment

That leads to my modified spreadsheet for Chromosome 9:

mod-chr9-spreadsheet

The gold is intended to stand for the TG leading to Violet and James Frazer. Note that a lot of changes happen around 85M on the spreadsheet. My guess was, and still is, that there was a change from Frazer to McMaster DNA at that spot for Paul and my family. Both Paul (PF) and my family descend from George Frazer and Margaret McMaster. That would explain why other Frazers stop matching Paul and my siblings at that spot.

Hopefully, this image will explain it better. This is what Sharon’s and my dad’s Chromosome 9 could look like as it passed down to my dad. A generation earlier, in my grandmother’s DNA a McMaster probably recombined in there also.

mcmaster-dna

Basically:

  • Sharon should have gotten a chromosome from her dad’s mother and father recombined
  • However, At Chromosome 9, she only got her paternal grandmother’s DNA (Frazer) – so she got one long segment
  • My father’s Chromosome could have looked like the image I had with Clarke and Frazer that he got from his 2 paternal grandparents.
  • My grandmother got her paternal Chromosome 9 from George Frazer and Margaret McMaster. Her Paternal Chromosome likely had a break in it at position 85M where her DNA went from McMaster to Frazer. This carried down to Sharon. Her paternal Chromosome 9 wouldn’t have had Clarke as this was her mother. The Clarke DNA was on my grandmother’s Maternal Chromosome 9.

So in Summary:

  • Jon is in a previously undiscovered TG at Chromosome 9
  • The TG points to Violet and James Frazer
  • Sharon got her entire Chromosome from her dad un-recombined
  • My grandmother passed down her DNA to my dad probably recombined with some of my dad’s grandfather’s Frazer DNA and Grandmother’s Clarke DNA
  • There is a stop in my family’s Frazer matches right at the point where there is a start in a match with my Frazer 2nd cousin once removed (location 85M). That leads me to believe that this is the spot where our match goes from 2nd great grandfather Frazer to our shared 2nd great grandmother McMaster.
  • Sometimes when working on a family DNA project such as this Frazer one, it is possible to find non-Frazer ancestor’s DNA.
  • Chromosome mapping is a big help in visualizing which ancestor likely contributed DNA to which descendant.

Bonus Feature: Archibald and James Line Frazer TG Update

tg-summary-frazer-sep16

I hope that I got this right. At least it should be generally correct.

  • There were a lot of new TGs that I noted in my previous Blog on the James Line side that I updated in lavender.
  • A preliminary observation is that Joanna and her family seem to favor Charlotte, Madeline and Mary more than Judith, Bonnie and Beverly.
  • This shows that all in our Frazer DNA Group except for one is in a TG. That is pretty exceptional.
  • There are 24 in the TGs. Each of these people averages about 4 Frazer TGs
  • I was an underachiever, as I’m only in one TG
  • The number of times one is in a TG is likely subject to many things including:
    • Random DNA inheritance
    • Distance you are to the common ancestor – the closer you are, the more likely you are to have a match
    • endogamy. Some groups have 2 or 3 Frazers in their ancestry. The Price group have 3 Frazers in their ancestry and the most TGs. Each Frazer/Price descendant is in almost 15 TGs each, though 5 of those are likely Price only TGs
    • number of descendants your common ancestor had. This will increase the odds of a TG as there are more descendants to triangulate
  • The 5 likely non-Frazer TGs are in a raspberry color and are likely Price TGs.
  • I have a note that the yellow TG could be for either Violet or James Frazer. I am leaning toward James as Violet descends from Richard Frazer as does Michael and Michael is not in this TG. James is believed to be the son of Philip b. around 1776 who was a brother of Richard b. around 1777.

 

 

 

 

 

A Second Look at Pauline’s Newfoundland DNA

In my last post on Newfoundland DNA, I looked at Pauline and how she matched others in the Dicks DNA Project I have been working on. I found that she was in 3 Triangulation Groups (TGs), but I wasn’t totally convinced which family those TGs represented as there was some ambiguity whether they were Dicks TGs or Joyce TGs for two of the TGs. The other TG she was in was with my wife’s Upshall family which has Dicks ancestry also. However, due to questions in the Upshall ancestry I wasn’t totally sure those were Dicks TGs either. Pauline expressed a desire to find out more also, so I though I’d take a second look at Pauline’s DNA

People Who Match One or Both or Two Kits

This is a utility at Gedmatch that is helpful in DNA analysis. I’ll use this to find out more about Pauline. From my previous Blog, here are Pauline’s top matches with the Dicks DNA Project:

paulines-closest-matches

Her top 2 matches are with Molly and Kenneth of the Joyce Line. Coming in at #3 is Esther who is my wife’s great Aunt.

Pauline’s matches with molly

First I’ll run the Gedmatch Utility for People who Match Both Kits. Those kits being Pauline and Molly. I understand this utility as similar to the Ancestry Circles. Another term I have for it is ‘Where there’s smoke there’s fire”. In other words, these people match both Pauline and Molly, so maybe they have common ancestry. The difference is that Gedmatch can find the fire, so to speak. The fire is the TGs that show that there are common ancestors.

After finding all the people that match both Pauline and Molly, I look at those matches at gedmatch’s chromosome browser. Here is Pauline’s Chromsome 5 which I looked at in the last Blog, but now the net is spread a little wider. The matches are to Pauline.

chr5-tg

Matches #1 and #2 were identified previously as being in a TG. They are Molly and Kenneth of the Joyce Line. #3 is also in the TG, or at least in one with Molly. This is someone called opcarrie at gedmatch who I don’t know. opcarrie is a lead which Pauline may contact to find common ancestry. Perhaps this person will be the tie breaker and indicate whether this DNA is from the Joyce Line or Dicks Line.

To the right of Chromosome 5 is another smaller likely TG. 3 of 4 of those matches have the name of Pike which may be recognizable. This TG is probably not a Dicks TG as it was not found in my previous look at Dicks descendants.

Common matches: Chromosome 15

Pauline has more interesting matches on Chromosome 15.

chr15-pauline

#1 is Molly again. #2 is Richard who I don’t know. I looked him up on Ancestry, so Pauline may find some common ancestry there. He also matches my wife’s Aunt Esther and they have a common ancestral surname of Kirby. This looks like a strong TG for Pauline also.

Pauline and Chromosome 21

I found this Chromosome interesting even though there was not an apparent TG.

pauline-chr21

Here Pauline has large matches with #1 Molly and #2 Kenneth. Both of these are on the Joyce Line. The reason I find this interesting is that it looks like there is a break right around the 23M mark. Assuming that these segments represent Rachel Dicks and her husband James Joyce, it could be that one segment is the James Joyce Segment and the other is the Rachel Dicks Segment.

People Who Match Both Pauline and Kenneth

Next I run the Gematch utility again for Pauline and Kenneth. This resulted in a smaller group of matches than Pauline had with Molly. I didn’t see anything much new here that was not already in the group of people that matched Pauline and Molly.

People Who Match Both Pauline and Esther

Here I would expect different results as Esther is not from the Joyce Line unlike Molly and Kenneth. Actually, when I look at the results, they look similar to the first 2 looks at results. There is one difference at Chromosome 15.

chr15-pauline-esther

Now look at the Chromosome 15 matches Pauline had when looking at her Molly in common matches.

chr15-pauline

#1 above is Molly, #2 is the Richard who was not in the Dicks DNA Project. #5 is Jennifer. I’m also unfamiliar with her. This Jennifer also did not come up when I looked at the people who matched both Pauline and Kenneth.

Summary and Discussion

After taking a second look at Pauline’s DNA there is a little clearer picture of what is going on. I set the net a little wider. But with a wider net comes some more questions.

  • The new TG at Chromosome 16 appears to be a non-Dicks TG. However, Pauline may want to follow-up with some of the names there to make sure. One of the people in that TG shares a Kirby surname with my wife’s great Aunt. However, that may not be the surname of the shared ancestor with Pauline and Molly.
  • There is a new person to follow up with on Pauline’s TG on Chromosome 5
  • Pauline matches Molly and Kenneth on Chromosome 21. Assuming that these 2 matches represent Rachel Dicks and James Joyce, it would appear that the dividing line between these 2 matches represents the dividing line between the DNA that Pauline received from her 3rd great grandparents Rachel Dicks and James Joyce.

Raw Data Phasing Via Access, Athey and MacNeill: Part 2

In my last Blog on raw data phasing, I went through 3 principals that Whit Athey laid out in a paper on phasing raw data when one parent’s DNA results were missing. Using those principals, and the MS Access program, I was able to sort many of my bases and 2 sisters’ bases into ones we received from our mom and ones that we received from our dad. I checked a few of my results with a chromosome map made for me by M Macneill.

Paternal Patterns

I had gotten to the part of the Athey paper where he talks about paternal patterns of bases that the sibling combinations received. I noted a space between the first two paternal patterns that I looked at. Below the pattern goes from an ABA pattern to an ABB pattern.

change-in-dad-pattern-hilite

There was a gap between the ABA and ABB pattern where there was no ‘pattern’ as my 2 sisters and I shared the same base there. When my sisters and I all share the same base, that is an AAA “pattern”. That AAA area corresponded exactly to the area between the 2 yellow lines below in the chromosome map made for me by M MacNeill – prairielad_genealogy@hotmail.com .

macneill-chr1-hilite

In the map above, MacNeill was able to determine that my 2 sisters and I got our DNA from our paternal grandmother in the area between the 2 yellow lines. Further, the first yellow line described Sharon’s first paternal crossover point and the second yellow line described my (Joel’s) first paternal crossover point.

Finding All the Paternal Crossover Points

At this point in the Athey Paper, he recommended looking at the paternal pattern and filling in the missing bases based on the known pattern. I was looking for an easier way to do this, so decided to take a different approach. I decided that I would find all the paternal crossover points first. Then, armed with that information, I would create a formula that would fill in most or all of the missing bases for each pattern.

However, this required a modification of my database to make the work easier. I wanted a number to define the range of patterns, so that I could apply an easy query to add missing bases. I already had this but I hadn’t used it. Back when I imported the 4 sets of raw data into Access, Access assigned an ID to every row of data. That meant that I needed to add that ID into all the queries that I had done previously to make tables and further queries. This took a while, but I believe that it was worth it.

table-with-id

The ID is the first column.

I started going down all my data and noting the change of each pattern. I put the results into an Excel table. Here the Start and Stop numbers are the Access assigned ID numbers. The ID’s corrrespond with the number of DNA locations looked at. In this case there were a bit over a total of 700,000 of these locations for my mom, my 2 sisters, and me.

excel-pattern

Then I noted the patterns are repeating as would be expected. For example, my first pattern was ABA, but 3 patterns later, that same ABA repeated. My thought was to create a query just for ABA patterns. Then when scrolling down looking for changes, the separation between rows should be greater and it would be easier to see where those changes were.

Here is what my Access query looks like. I changed the query name to DadSpecificPattern.

dad-specific-pattern-queryquery

This particular query gives me the ABB pattern. I have the HeidifromDad base equal to the SharonFromDad base. That makes me the A and Sharon and Heidi the BB of the ABB Pattern. If you think about it, that also means in these areas that Heidi and Sharon will have their base from the same paternal grandparent and mine will be from the other paternal grandparent. I’m learning as I go. I’m sure that information will come in handy later.

My plan seemed to be good, but there was one catch. Once I refined my query, most or all of the blanks disappeared. That meant that the start and end points might not be exact. Here is an example of what I mean.

change-in-pattern-rouch

This is from my old Dad Pattern query with the blanks still there. The change from ABB to ABA happens at ID or line 19809. However, the new query takes out the blanks to make it look like the change is at ID Line 19826.

Here is what my DNA results look like so far without a filter (or query). The last 3 columns are the bases from Dad columns. There is a lot going on between lines 19809 and 19826.

pattern-unfiltered

Once I apply a formula to add bases, it will say something like: In the lines that have the ABA pattern where there is a blank at either A spot, replace the blank with the A that is there. If I apply the rule too late, I will be missing an area. Worse, If I were to use the 19826 cutoff, I may be still using the previous rule. That rule would say basically the same thing except, “Where the row is ABB and one of the B’s is missing replace the missing B with the one that is there.” If I apply an ABB rule to an ABA area, I’ll get bad results.

Long story short, I ended up recording a rough start and stop in my Excel Spreadsheet.

revised-spreadsheet-for-pattern

I started naming the segments, but realized that was not necessary. Some of the patterns were only at one point rather than in a long segment. I believe that is an anomaly due to a bad read, mutation or some other problem. Those are the ones in the spreadsheet that had no end point. It took me part of a morning to get all the paternal crossover pattern points for all 23 chromosomes. Fortunately for 3 siblings, the patterns are only ABA, AAB and ABB.

I just went back and checked the error points/aonomalies. I reran the Heterozygous Sibling Query and it fixed at least the first problem and hopefully the others. When I added the ID’s in, I had to redo all the queries quickly, so I suppose that is where the errors came in. That is not a problem as long as the problem can be found a fix can usually also be found. There actually weren’t that many errors. There are still some anomalies that are just anomalies. I have left those in yellow in the spreadsheet image below.

So in my spreadsheet, I have all the rough starts and ends for all the crossovers for my 2 sisters and myself. Here is the top part of the spreadsheet sorted by rough start:

rough-start-sort

Next, all I need are more exact start and end points. Here is the start of what I have:

pos-and-id-and-pattern

I picked this section because it looks pretty complete already. Note that my Start and Stop numbers are pretty close to each other. That means that there are no other AAA segments in-between. I had to do an additional Access query to add in the position numbers for the Start and Stop of each chromosome’s pattern change. This was important if I want to convert the results from Build 37 to Build 36 to compare to MacNeill’s work or to gedmatch.com.

Starting to Find Paternal Crossovers and Assigning to Siblings

Previously I had been calling the start and end of my patterns crossovers. These two terms aren’t totally interchangeable as the start or stop of a pattern may happen at the beginning or end of a Chromosome and therefor not be a crossover at that point. It seems like it should be pretty easy to find the crossovers. Look at the image above. The first and second rows show ABA going to AAA. The order in me and my siblings are JSH or Joel, Sharon and Heidi. The only letter that changes is the B to A. That is the position that Sharon is in, so the paternal crossover has to go to her. From row 2 to row 3 the pattern changes from AAA to ABB at Chromosome 1, position 23,288,828, Build 37. That doesn’t mean that 2 siblings have a crossover there as we are looking at the patterns, not the letters. It is actually the letter that stayed the same that represents the crossover here. AAA to ABB means: all the same (AAA) goes to one different and 2 the same (ABB) – in this case Sharon and Heidi). The one that is different is me and I get the crossover at this location. The next change is from ABB to ABA. This is a little harder to see. I would say that that this crossover goes to Heidi if my reasoning is right. BB was the same before and goes to BA. It must be Heidi that changed because now she matches Joel who didn’t change. I’ll need to figure out how to make better bar graphs in Excel, but here is how the beginning part my father’s Chromosome 1 broke up for 3 of his children. Or another way to look at it the vertical lines are where my father’s maternal and paternal chromosomes combined in each of his 3 children that we are now looking at.

excel-bar-chart-chr1

Where:

  • Series 1 is Sharon. Where the color goes from blue to orange is where Sharon has a change from one paternal grandparent’s DNA to another paternal grandparent’s DNA. The number to the right of Series 1 is the Build 37 Chromosome position number for Sharon’s crossover.
  • Series 2 is Joel’s first crossover (between orange and gray) and
  • Series 3 is Heidi’s first crossover position between gray and yellow [The same explanation under Sharon above applies to Joel and Heidi]

I’ll go back to the M MacNeill Standard. It’s like having an answer sheet to my questions.

macneill-chr1-hilite

According to MacNeill, I have assigned the crossovers to the correct siblings. In the above chart, just look at the red. I haven’t gotten to the maternal part yet, which MacNeill has in blue. The first 3 crossovers are where the red changes from light to dark or dark to light red. The difference in the MacNeill Chart is that his chart is split out one bar for each sibling. The other difference is that MacNeill has build 36 Chromosome position numbers and the numbers I have are from Build 37.

The Process

  1. Phase the siblings into maternal and paternal DNA using the principles that Athey outlines
  2. Find the paternal and maternal crossovers by pattern changes
  3. Assign the crossovers to the correct sibling using the pattern changes
  4. Assign the segments to the correct grandparent. This requires knowledge of cousin matches on the appropriate grandparent side.

That is the big picture which I am understanding as long as I don’t get too lost in the details.

Back to the Details: Fill in More A’s, G’s, C’s and T’s

I have been setting up my data for this, so hopefully, this will be easy. I now have 3 areas to look at:

  • AAB
  • ABA
  • ABB
AAB paternal update

Now I go back to my spreadsheet and sort it by Dad Pattern:

sort-by-pattern

The Start and Stop areas are the ones I want to update. First, I’ll copy my most up to date Table in Access which is tblSibHetorzygous. I’ll rename that tblDadPatternUpdate. Then I want to look for missing data and update the blanks using the AAB pattern.

In Access, I create a query with the new table.

dad-pattern-update-1

I chose the position fields and Paternal Pattern fields. I will change this to an update query which adds an Update To row. The criteria I want is when JoelFromDad = Sharon from Dad (AAB). Actually, I forgot, I was going to use ID criteria. So in the ID field, I need a lot of information. For the first AAB segment, I need everything between ID 45393 and 54155. This is what the criteria looks like:

aab-first-area

When I choose that area, I get over 8,000 lines. However, I only want to update when there is one missing value in the first 2 and the one that isn’t missing is not equal to the third. Here is the result of the above query in my first AAB area:

aab-patterns

I assume that the first blank should be a T. This would be one of the AAA results by chance in an AAB area. I don’t want to fill in the second line as I don’t know if it will be GGG or something else. That is what I meant by saying I don’t want to fill anything in unless there is only one missing value. In the 5th line there is A?G. That would have to be AAG (in an AAB Pattern area). There are some lines that have everything missing that I don’t want to touch.

How to create a query?

First, I want the situation where Joel doesn’t equal Sharon or Joel Doesn’t equal Sharon. That would create an AAB situation:

heid-not-joel-or-sharon

This query results in 1,666 rows of data including rows that are already filled in. Note that I had to write the range of ID’s twice because in order to get an OR situation I needed to put Joel not equat to Heidi and Sharon not equal to Heidi on separate lines. A simpler query is this one:

heidi-not-joel-or-sharon-one-line

The above achieves the same results in one line. Now, for this query, if Joel is blank, replace it with Sharon’s results. If Sharon is blank, replace it with Joel’s results. Here is the query prior to the updating part:

joel-sharon-blanks

This shows that there are 29 blanks for Joel and Sharon meeting this AAB criteria in the first range of AAB’s:

29-records-aab

Next, I apply the same logic to all the AAB segments. In the Expression Builder of Access, I type in this simple formula:

Between 45393 And 54155 Or Between 60990 And 72548 Or Between 207109 And 220679 Or Between 313271 And 317516 OR Between 326845 And 326912 OR Between 389395 And 390311 OR Between 400045 And 405578 OR Between 419982 and 427158 OR Between 433191 And 446672
OR Between 482297 And 492542 OR Between 532520 And 539292 OR Between 571557 And 579594 OR Between 589614 And 589666 OR Between 630037 And 630314 OR Between 630319 And 630378 OR Between 658744 And 659375 OR Between 670533 And 672360 OR Between 673325 And 682544

Simple but long. This has the AAB Starts and Stops for 23 chromosomes. Then I copy it into the next ID criteria line and get this result:

all-missing-aabs

It took a few minutes to type the criteria, but the goal is to update 1,514 lines of missing Paterrnal Pattern data with the push of one button. I still think it is quicker than going line by line and will be more accurate if I got the criteria right.

Next, I change the above Select Query to an Update Query.

paternal-aab-update-query

When my (Joel’s) base from Dad is missing, I update to Sharon’s base. When Sharon’s base from Dad is missing her base is updated with mine. Isn’t sharing great? I didn’t look at the case where Heidi’s base from dad was missing, because if that was missing we wouldn’t be able to see any AAB Pattern.

Let’s UPdate

I push the run button and check the results. Here is my standard dire warning:

standard-dire-warning

Now I will check if it worked. I’ll try ID or Line # 682124:

bad-aab-results

Unfortunately, that was an undesirable result. Before I had A?G. I changed this to ?AG. It appears that my query both replaced my value with Sharon’s, but replaced Sharon’s with my blank. I hadn’t expected that. Next, I’ll check ID# 682182. I had ?AG and replaced it with A?G. So until, I can think of a solution, I’ll need to split the 2 queries.

Fix it! Quick!

First I recopied by Heterozygous Sibling Table back to the Dad Pattern Update 1 Table. This got the table back to the way it was. Here is my simpler query.

dad-aab-simpler-query

Here if my base from Dad is null, replace it with Sharon’s base from Dad. I’ll check ID# 682182 again:

second-mistake

This gets into the category of trial and error. Sharon’s result still got replaced with nothing. See in the previous query I still was telling Access to put update Sharon’s results with mine. I needed to take that out:

fix

There. Now the SharonFromDad Update To is blank. I go through the same procedures and now it looks right.

right-results

We now went from ?AG to AAG in the last 3 columns. These are the bases from Dad columns.

The next step is pretty easy:

sharon-missing-aab

I took out my criteria and put criteria in the SharonFromDad field. When she has a blank, replace it with Joel’s base from Dad. I hit run and it updated over 600 rows. Here is my original check spot at ID# 682124 with better results in the last 3 columns:

better-results

It took a while, but at least I got it right. The moral of the story is to not ask Access to do 2 things at once when those 2 things involve the same 2 people.

The Next Step: ABA

This time I’ll try a different query. I want there to be a B from the ABA in each case, so I’ll make sure that Sharon’s base from Dad is there:

aba-query

Maybe I’ll figure what went wrong last time or come up with a new error. Above, I want the criteria on the first line to be for my blank base: If Sharon’s base from Dad is not equal to Heidi’s Base from Dad Put Heidi’s base from Dad in my blank spot. For Heidi, When Joel’s base from Dad doesn’t equal Sharon’s base from Dad, put Joel’s Base in Heidi’s spot.

I’m so tempted to try this query, but before I do, I’ll copy the previous table of the DadPatternUpdate to a new Dad Pattern Update ABA Table.  This will preserve what I have in the now older DadPatternUpdate Table in case anything goes wrong. Hey, what could go wrong?

query-aba-dad

I pushed the Update Button and updated over 30,000 rows. The results don’t appear to be any better, so I’m back to my 2 step process.

Here is my new slimmed down query:

slimmed-down-query

This new Update Query should update my Line 18 in the new UpdateABA Dad Pattern Table and it does:

lne-18

I now have a full ABA pattern on that line. According to Access over 30,000 Lines were updated, so it wasn’t a total waste of time.

heidi-aba

Run and check Line 149:

check-149

We have ABA in the last 3 columns, so that is good. Line 18 is still OK. I checked it just to make sure.

Query AAB Revised

After seeing how well the ABA Query went, I decided to revise the old AAB Query:

aab-query-rev

This is now looking at over 37,000 rows. This updates my AAB Blanks to tblDadPatternAAB. I don’t know if it is a better query, but at least I’m being consistent.

sharon-missing-aab-rev

This was over 80,000 rows, so I’ll assume that bigger is better.

I copied that resulting Table to tblDadPatternUpdateABA and reran the 2 ABA Update Queries. Here is one of the rerun queries updating the ABA Paternal Table:

rerun-aba

Down to ABB

My Last updated Paternal Table was updating ABA, so I’ll copy that to a new Table called tblDadPatternUpdateABB. I’ll also copy my last query and put in the appropriate Starts and Stops for the paternal ABB patterns. Again,

abb1

This says when Joel’s base from dad is not the same as Heidi, put that Joel from Dad into the space. Probably a more precise query would have said when Sharon from Dad is null and Joel from Dad is not equal to Heidi from Dad. I suppose technically the above query could be writing over a base with the same base in most cases.

I’ll fix that and notice that I had the wrong table in the top, so I’ll change that also.

abb-rev

This only updated 944 rows, so maybe bigger is not better. Here is Part 2:

abb2

This was almost 3,000 rows updated. Now I should check if it worked. I scrolled for an ABB Pattern in an old query and found this:

dad-pattern-abb

Here is my check:

abb-check

I guess I’ve been working too long. Here I have an AAB instead of the ABB I wanted. That is because I had Heidi updated to me (the A) instead of Sharon (the B). Here is the correction:

abb-corrections

I made a fresh Table of ABB. When I opened up the Query, it was saved this way:

corrected-abb

So Access changed my query. Note that there are 2 fields with HeidiFromDad in them. One is for the Update To and the other has Criteria. That is probably a clearer way to do it. Who should argue with Access?

I updated that and I take a cue from Access for Part 2:

access-abb-part-2

In English, the above says, “For this range when JoelFromDad is not blank but Sharon from Dad is, and Joel from Dad has a different value that Heidi from Dad, put that Heidi from Dad value where Sharon had the blank. It sounds a little complicated.

Back to Row 197704 and I’ll look at 197709 while I’m at it:

corrected-abb-pattern

Oh no, it is still wrong! I checked the previous ABA Table and that was the reason for the error. The error is also in the old AAB Table. However, the error was not in the file before that. My guess is that the AAB rule got applied to the wrong range of rows. I don’t see an error there, so I’ll have to rerun all the queries.

That’s OK, because I’m brushing up on the queries and will use the Is Null value so we will only be filling in the missing bases.

rev-aab-query

I had more problems, so I deleted the AAB Table and recopied the previous Table into it. I reran the Revised AAB Query halfway and it looked OK. However, when I ran the second half of the AAB query – filling Sharon’s results, the problem came back at ID# 197704. Very mysterious. The problem was where I thought it was originally. Look at the ID Criteria for the AAB Pattern Query:

the-problem

There is an extra digit in the first between. The range goes from 45393 to 544155. The second number should be 54155. So this query was performed on 450,000 more rows than intended. I updated the AAB query with fewer rows. Again fewer is better. After many requeryings, I got the desired result for ID# 197704:

197704

That should be the end of the first phase of nit picky work on the Paternal Side.

Summary, Conclusion and What’s Next

  • This was a lot of work, but the good news is that this update is for all the Chromosomes at once.
  • The bad news is that I have to do this again for the Maternal Side
  • Next up should be easy. That is just re-applying the Principles that Whit Athey Outlined on the new bases that I added from knowing the patterns. This should update missing maternally received bases from the updated paternally received bases.
  • I haven’t filled in blanks for the AAA patterns yet.
  • I am a little ahead of the game as I looked at how some of the first paternal crossovers will look.
  • Also with some basic phasing, I was able to deduce who those first paternal crossovers belonged to – one each to my two sisters and one for me.
  • If anything can go wrong it will