September 2016 – Hartley DNA & Genealogy

September 29, 2016May 30, 2018

Updates to Whitson, Whetstone and Butler YDNA: A Proposed Whitson/Butler Tree

There have been some good news since my last Blog on Whitson and Butler YDNA. I wrote that almost 2 months ago. The biggest news is that there are new people in the group.

There is now one new category – R1b>R-M239 Whetstone (in yellow). There are 2 new people there. There is a new person in the I1>M253 Whitson/Whetstone Group (McIntyre). There is a new Whitson under I2>M223 who has taken the 111 STR test which is one of the best available. He shows up under the green section as having an ancestor Jacob Whitson. I believe that he had tested before when Ancestry had YDNA testing, but unfortunately, it is not easy to compare the two tests. His results are of special interest to me as he is in the group with my Butler father in law. There are now 3 Whitsons and 3 Butlers in this I2 Subgroup.

In this Blog, I will be analyzing and drawing trees for the green I2 Whitson/Butler Subgroup as they have the most in the group. With too few people in a group, it is difficult to draw trees.

YDNA – What Does It All Mean?

As many know, YDNA shines a laser bean down the male line to the far past. YDNA can quickly show who is not related. For example, in the chart above, the people in the different colored subgroups cannot be related. The connection between these groups could be in the 1,000’s or 10’s of thousands of years. To find who is related by YDNA is more difficult. The probability of relationships are predicted. This is because distance is measured in STRs and STRs can mutate whenever they want, even though on average that all mutate at a certain rate. Then some STRs may mutate faster than others – or much more slowly.

The TIP Report

FTDNA’s TIP Report is a good tool, because it estimates how closely 2 people may be related in generations based on probabilities. It takes into account the number of STRs tested and rate at which the STRs mutate.

batt and butler TIP

First, we will look at #1 and #4 on our list. They both tested at 111 STRs. The Report shows the likelihood that those 2 would share a common ancestor in the previous generations:

I usually feel that 90% is pretty likely. Let’s say a generation is 34 years. That would be 408 years ago or 1608 from now or even further back if we start from when someone was alive today and born in the 1950’s. Then it could be as close as 4-8 generations. Hopefully, we would know if the match was 4 generations ago, but the point is that the number of generations to a common ancestor could vary quite a bit.

I did a comparison for everyone in the Green Group above:

I found the results quite interesting:

Mr Batt appears to be the same distance from each person in this group – irrespective of whether the match is a Butler or Whitson descendant
#4 Butler varies the most between 8 and 18 generations
#3 Butler was on average related most closely to the group
It appears that a sort of tree could be drawn from these results
It appears that this group of Whitsons and Butlers have been related to each other for quite a while. The number 12 comes up a lot for generations to a common ancestor. My guess that these two families have been related to each other for between 8 and 12 generations

These are my interpretations from just the TIP Report so far. I am open to other theories.

A tree from tip reports

I have never seen a tree drawn from these TIP Reports, but it would be interesting to try. Here is my first try:

This shows the furthest and closest relationships based on the TIP Report. #4 is 17 generations away from #2 and #4 is 8 generations away from #3. Now I just need to add one more Butler and 2 more Whitsons. But How? Here is a simple solution:

Here this assumes that all the GDs above 8 are pretty much equal and that everyone matches above at the common Whitson/Butler Ancestor. Here is another option:

This looks nicer, but I can’t say that it is more accurate given the TIP Reports. Here is a 3rd try:

This doesn’t seem to do the TIP Report justice either. I’ll go on to the more traditional trees made using STRs.

STR Analysis

I’ll now try to create a tree using a method developed by Robert Baber in 2014. Here is an example of one of his trees:

In my previous Blog, I looked at signature STRs. Those are the similar STRs that define a group. However, to created a tree, I will be looking at the STRs that are different.

I2 Whitson/Butler STRs

Here is a chart of the defining differences in the I2 Whitson/Butler Group:

modes

The first mode above is an I-A427 mode from the FTDNA I-M223 Y Haplogroup Project. So this mode should be a more generic version of the Whitson/Butler Group. The assumption is that the mode for this larger group goes back further in time than the Whitson/Butler Group. The reason that this is important is that it should tell us which way the STRs are moving.

In the first column with numbers above, the A427 mode is 29, the W/B Mode is 31 and 6 Butler (Michael) is 32. That means the STRs are mutating up.
Look at DYS576. That is a red STR. That means it is a fast mover. A427 mode is 18, W/B mode is 16 and Batt is 15. That means that the trend of STR mutation is going down over time.
CDY is a fast mover and difficult to interpret. Some people might ignore the CDY results for this reason.
Finally look at the last 2 columns above. The A427 (older) modes are 14 and 12. The Whitson/Butler modes are 16 and 14. That would indicate that the trend in STR values is upward. However at that level of STR testing (111), the 2 Whitsons are at the higher level and the Butler is at the lower STR level. If we were just looking at the 3 Whitson and Butler STR results here in isolation, we would think that the Whitson higher level STRs were older and that Butler is changing away from them. However, by using the broader I-A427 vantage, we can see that it is likely that is Whitson changing away from Butler. This could have implications as we try to determine who came first – the Butlers or the Whitsons in this I2 subgroup.
It is possible that if all those in the I2 group had tested for 111 STRs, that the above point would be clearer.

Just based on the last 2 STRs of the 67-111 STR results, I would draw a tree like this:

Unfortunately, I am having a lot of trouble understanding the Baber Paper and I am pulling the plug on that method for now. However, there are interesting concepts in it that are helpful.

From Baber to Robb

John Bartlett Robb put out a paper in 2012 called:

Fluxus Network Diagrams vs Hand-Constructed Mutation History Trees

In that paper Robb gives a procedure for drawing trees.

In his paper, Robb uses only the STRs in common, so in our case, that would be the 37 STRs. He also creates a Root Prototype Haplotye (RPH). In our case that RPH would just be the Whitson/Butler Mode. Then he notes deviations from that RPH in lime green:

Here are the Mutation Rates for the applicable STRs extracted from the Robb Paper:

The faster mutations are on the bottom and slower ones on the top. I added in the people on the right that had the mutations. On 37 markers, everyone had one mutation except for Butler (James) who had 3.

Proposed Whitson/Butler Tree

Here is the tree I came up with based on 37 STRs:

From there, I recall a rule by Baber which says, in my terms, “you should only have 2 lines going into each box”. Here is a tree that meets that rule:

So reading down from the top, we have the common ancestor which I have as Butler Ancestor 3. That ancestor has a certain signature based on STRs. Then I have my father in law branching off with a 389ii that goes from 31 to 32. I took my father in law as the first mutation as he had the second slowest mutation after #4 Butler. I couldn’t choose #4’s slowest mutation at that point as that mutation apparently happened after the common mutation (of 570 22 to 23) he had with #3 Butler. Branching down from Butler Ancestor 2 is Whitson Ancestor 2. From him I have #2 Whitson (Jacob) branching off as he has a slow moving STR also. Then from Whitson Ancestor 1, I have #5 Whitson (Isaac) and #1 Batt (Wm Whitson).

Also from Butler Ancestor 2 I have the common mutation of STR 570 which went from 22 to 23 in a presumed common ancestor of #3 Butler (Laurence) and #4 Butler (James). After this common mutation, the #4 Butler line had two additional mutations – one on the very slow mutating STR and one on the very fast mutating one.

The technique takes a little logic, a little guesswork and some knowledge of how the STRs mutate. If I had plugged #6 Butler into Butler Ancestor 2 and Whitson Ancestor 2 into Butler Ancestor 3, it wouldn’t have made much difference. I did it the way I did based on the speed of the STR’s mutation rate – all other things being equal. The overall idea is to get from the common ancestor signature STR to the individual members’ STRs.

I think the above tree is a likely scenario considering:

I see the Whitson STRs changing off the Butler STRs in my charts above.
The Butler STRs are slightly slower changing STRs which could indicate an older line.

Some other points:

It is likely that the Whitsons and Butlers are grouped together by surname as I have them.
The Butlers all descend from Ireland. If the chart is correct, then the Whitsons in Subgroup I2 could also descend from Ireland. A more complicated speculation would have both lines in England. Then the Butler line could have gone to Ireland and the Whitson Line to the U.S.

September 26, 2016April 13, 2017

Raw Data Phasing Part 4: Going from 3 Siblings to 4

In my last Blog, I mentioned that my brother Jon’s DNA test results came in this week. This happened in the middle of my attempt learn how to phase the raw DNA data for my 2 sisters and myself. I was phasing the data in what I can only assume is a traditional way. I say I assume, as I haven’t seen any other blogs on the process. The difference is that I am using MS Access which I hope will speed up the process. I should be able to get results for 23 chromosomes at a time instead of just one at a time.

The arrival of the new DNA results poses at least two problems:

The previous 4 DNA data files were all in AncestryDNA version 1. Jon’s is in AncestryDNA2. While they are all Build 37, they look at somewhat different points on the chromosomes
One of the difficult parts of the previous process was identifying and dealing with patterns of phased paternal and maternal bases. Those patterns were AAB, AAB, and ABB. With 4 siblings, there will be more patterns. However, the Whit Athey Paper I have been following does also look at 4 siblings.

AncestestryDNA Version 1 Vs. AncestryDNA Version 2

My understanding is that Ancestry changed the locations on the chromosomes that they were testing to get more into the medical area like 23andme. I don’t know if that is true. Here is a chart comparing the different atDNA tests:

I was doing well comparing Anc1 with Anc1 as I was looking at over 700,000 base pairs among 4 people. Once I compare Anc2 to Anc1, that is number is cut down quite a bit. That is about a 40% drop. My only other option, other than re-testing Jon, is to compare Jon to my mother’s FTDNA results. However, that will only pick up 2-3,000 SNPs, so I won’t bother.

Back to Square One with 4 Siblings: Homozygous Siblings

I need to find Jon’s equal base pairs and apply one to his ‘from dad’ column and one to his ‘from mom’ column. That is, after I add all Jon’s data to my database and add those columns. First I need to decide where to add Jon’s data. I could add it to the beginning of what I have already done or to the end. I’ll try adding it to the end, because I think that the work I did already is OK. I want to build on that. So rather than adding Jon’s DNA to the first step, I’ll add it to my table called tblMomBaseFromDadBase. This table has over 700,000 lines of bases for 4 people. Jon’s has 668,942 lines. Actually, when I remove “Chromosomes” 24-26, I will only have 666,531 lines.

Querying Jon into my latest table

Here I am adding Jon and the Mom from Dad Table to my query design:

Access thinks the ID that it added was important, but it really isn’t, so I need to take out that equal join. I really want the join to be at the rsid, but I don’t want an equal join. Why not? If I had an equal join, I would end up only with the positions that Jon has. I will lose 40% of the work that I have already done. Instead, I’ll use an unequal join.

I flipped the 2 tables in the query design area, so things are moving left to right. Then I choose a #2 join which is basically, an unequal join left to right.

Actually, I changed my mind. I have a better idea. I will just do the first 2 steps on Jon’s raw DNA and then join the results together. That is a third way that I hadn’t thought of. The point is, that there are many ways to do things in Access. There can be more than one way to get to where you want to be.

Back to Homozygous Siblings

First I copied Jon’s raw data into a table called tblJonHeterozygousSib. This is so I can use an update query to update the data in the new table and still have the original. Hold that idea. The better idea is to use a make table query. The reason that this is better is that it can take out the “chromosomes” I don’t want:

I took out the table I copied and I’ll make a better one with only Chromosomes 1-23. I hit the Run button and create a table with 666,000 lines:

Then in the above table, I inserted 2 rows: JonFromDad and JonFromMom. Now this table is ready to phase for any homozygous siblings. By the way, it looks like my Chr23 or X is homozygous, but it isn’t. Ancestry adds an extra base. I only really have one for my X Chromosome.

Finally time to query and phase

I go to Query Design in Access and choose the above table. This is a very simple Update Query design:

This says if Jon’s allele1 is the same as his allele 2, put allele 2 as his base from mom and as his base from dad. I hit the run button for the update and get the dire warning that I’m updating a lot of information, I can never change it back. Then I get a message that I’m updating 478,000+ rows. That is good. Those are the number of Jon’s homozygous bases – quite a few. I’d say over two thirds.

I’m not looking for crazy results and didn’t get any.

Homozygous Mom Query

I’ll copy my previous table into one to update. Then I need to add Jon’s base from mom where mom is homozygous. Easy peasy. I think this is all I need.

Actually, I did think of an issue. I have an equal join. That means I won’t be using the homozygous bases that mom tested for in the old AncestryDNA test that aren’t in the new AncestryDNA test list. My guess is that is interesting information but perhaps not very useful. It also occurs to me that in the spots where Jon doesn’t match up with my siblings, I will still have the 3 letter pattern work that I had done previously.

The query above says if Mom allele 1 = 2, then put that 2 allele in Jon’s from Mom base slot. I hit Run and pasted 277,000 rows of bases.

This query will be a little more difficult to check. I have to create a query linking my mom’s DNA results to this table. I did that and see one problem already.

The first problem is that ID 126 didn’t show up. That means that rs3819001 that Jon has is not in my mom’s raw DNA. I don’t want to have data for Jon that looks like it can be updated, but it can’t.

I think I can fix this.

Updated Table Query

A few steps ago, I ran a Table Query to get just Chromosomes 1-23 into Jon’s Table. I need to upgrade that query so that I am only including the locations (rsid’s) that are common to both my mother and Jon. I do this using an equal join on the rsid Field:

This time, my table for Jon only has the rsid’s that my mom has.

Also my Chromosome formula was off, so I had to fix it. Also note that I have about the number of rows as per my Anc1 vs. Anc2 table earlier in the Blog. I then re-added the Jon from Dad and Mom columns into the new and improved table. Then I reran the update query which told me I was about to update 284,000+ rows.

This worked as well as last time, but this time I have the fewer rows I was trying to get.

Re-Run the update query for homozygous mom for jon

I double clicked on my old update query. The message said I was updating 277,000 rows or so. Now I’ll re-check my work. If there is no ID 126, I’ll be happy. Well it is still there, because I forgot to copy the previous homozygous sibling table into the homozygous mom table. After re-re-running the update, I got the desired results:

And there you [don’t] have it: no ID 126. Here is my mom’s raw file compared to Jon’s updated table.

Jon gets a G from mom at ID 128 even though Jon is AG, because mom is GG. Now I’m talking DNA.

Merge Jon’s New Table with His 3 Siblings’ Tables

This is the point where I put everything together. I will try to use the Make Table Query for this one again. So I’ll put my newest Jon table together with my newest sibling table.

This shows the left to right arrow join. I’ll want the larger file plus everything equal in the smaller file. Come to think of it, this Create Table Query would have fixed the earlier problem I had. I guess I was too careful! The other issue is that the ID in the 1st table won’t be the ID in the second table. I could keep the second ID, but I would have to rename it as Jon ID or Anc2ID.

Here I rename Jon’s IDs as JonID. I may not need it, but if I do need it I will have it. I guess MS Access wasn’t happy with my idea:

OK, I took out the JonID and hit Run. Microsoft tells me about my new 700,000 row table.

Back to the Dad Patterns

Now that all the family is together I want to look at Dad Patterns, because I know that I will be updating those. Here is the first query I tried on my new Table of 4Sibs.

This is looking for filled in Dad bases where Sharon’s base is not the same as Joel’s. That query gives me an ABAA pattern:

Also ABBB:

Here’s ABBA:

It looks like ABAB is a possibility also. That means the following are possible:

AAAB
AABA
AABB
ABAA
ABAB
ABBA
ABBB

So if I chose Joel’s Base not equal to Sharon and then Joel’s base equal to Sharon would I have every combination? It looks like I need this combination to cover all possibilities:

Joel <> Sharon OR
Joel<>Heidi OR
Sharion<>Heidi OR
Heidi<>Jon OR
Jon<>Joel OR
Jon<>Sharon OR

Which in Access looks like:

But Wait, I Forgot Principle 3 for Jon

Principle 3 says where Jon is heterozygous and he knows where he got his maternal base, the other base goes into his From Dad column. Looking back at my old queries, I see this is a 2 step query. I’m tempted to try this in one step, but I think this got me in trouble before, so I’ll go with the simpler query. Simpler queries are usually better in MS Access.

This says where Jon is missing a phased allele from Dad and he has an allele that doesn’t equal the one he got from mom (making Jon heterozygous here) put that allele into Jon’s From Dad spot. I tried the query and only got 37 results. The problem is, I should have said ‘Is Null’ in the JonFromDad Criteria:

This time I get 35,000 updates, so that is right. I then change the allele1’s to allele2’s above and get 33,000 updates to tbl4Sibs. I ran a quick query on the 4Sibs Table to get just Jons heterozygous results:

In the first line, Jon had allele1 as T which was different from the allele from Mom of G, so Jon’s T got put into the From Dad spot. At ID 41, Jon’s allele2 of G is from Dad because he had an A from Mom. When parent and child are heterozygous, the From Parent location remains blank.

Now I have Jon with 3 Principals: Homozygous Jon, Homozygous Mom and Heterozygous Jon.

Back to Dad Patterns

I have the old Dad Patterns for 3 siblings. Now I need to See what the 4 sibling Dad Patterns would be and add Jon’s Start and Stop Locations for his new Dad Pattern Areas. I need to combine that with the 3Sibs Table.

My first query was wrong and gave bad results. The reason is that the ID for 4Sibs was from the raw data. The ID for the Dad Pattern Table just numbered the amount of Dad patterns. I needed to join the ID in the first table to the start and stop locations in the second table. I ended up doing 2 queries: one for the start position and one for the stop as I needed both. This query gives the stop position of a pattern.

I took both those queries and put them into an Excel Spreadsheet.

I added a new column called Dad4Pattern. In the first row, the new pattern was AAA by chance. However, in the second row which is the Stop or End of the first Dad Pattern, it is obvious that the ABA Dad Pattern goes to an ABAA Pattern. I didn’t think that there would be many AAAA Patterns as that means that all siblings match the same Paternal grandparent. This is the only AAA pattern that I had noted so far as I wasn’t looking for them yet. Still, I will need to go back and verify that these Start and Stop AAAA’s were not by chance. Finally, on the last line, it is clear that the Dad Pattern goes from AAB to AABB with Jon added.

Next I chose all the cells where Jon had a base from Dad and performed a Concatenate operation to write the pattern.

This gave me the CCCC that I wanted to check. Next, I wrote a formula to put the Dad bases together in a new column and wrote down the Dad Patterns that I had.

A few notes:

Out of the 66 three sibling patterns that I had, I was able to find all but 5 new four sibling Dad Patterns. See the yellow above for two of the missing 4 sibling dad patterns.
The missing 4 sibling dad patterns should be easy to find by scrolling through the 4Sib Table
I noticed that there were no AAAB patterns. That is because in my previous search, I was not looking for AAA patterns. So now, I don’t have any AAAB patterns. I will have to find these in my new search.
AAAB is the situation where I match the same paternal grandparent as my 2 sisters, but Jon matches the other paternal grandparent.

Filling in more dad patterns

To fill in the yellow areas, I made a query in Access based on the 4Sibs Table. This looked at every case where Jon had a base from Dad. Searching around the ID 6604 and after, I found this pattern:

ABBB

Then I checked near the end of the old 3 sibling pattern which is at ID 19806.

At ID 19827 we see an ABAB Pattern, so I enter that Pattern in my spreadsheet:

For the start of the new ABAB pattern, I used the old ABA location as that was more precise. The next interesting thing happens at Chromosome 2:

Here I have a problem in my spreadsheet. For some reason, the Start of the last pattern of Chromosome 2 ends at Chromsome 3, which is not right. My previous spreadsheet was better than that. From the ashes I will re-build.

I note that at ID 108798, my 4 Sib Spreadsheet goes to an ABAB Pattern. At the end of Chromosome 2, I see an AAAB Pattern. That was the one I wouldn’t have had from the 3 sibling pattern as I wasn’t checking on AAA’s.

I added new rows for the patterns ABAB and AAAB:

The most important thing here is the ID, the pattern, the Start and Stop. Here is the new change area from ABAB to AAAB:

There are a few SNPs between the ABAB Stop and the AAAB Start that are a little unclear.

Finding Jon’s Patterns

Now I’ll check Jon’s Patterns. I’m looking for any changes in patterns as these should be important as crossovers later. I will need to assign the crossovers to each sibling’s Chromosome Map.

Good Old Triple A – B Pattern and all the others

AAAB is where Jon has a different paternal grandparent than his 3 tested siblings and the 3 siblings have the same paternal grandparent.

My query says that Jon has to be different from each sibling. I run that and insert the appropriate Start and Stop point for the AAAB in my spreadsheet.

I do the same for AABA which I can find using a similar query under Heidi’s criteria:

I ended up going to a clean spreadsheet. It was too messy combining the 4 sibling results with the old 3 sibling results.

Here I have the ID, the Chromosome, the pattern and the Start and Stop. The yellow marks a one SNP pattern. It appears that there should be 3 types of patterns:

One where one sibling matches none of the others. That is what I have above: AAAB, ABAA, AABA and BAAA
One where 2 pairs of siblings match each other: AABB or ABBA. I’m not sure what else there could be. I looked above and saw one other: ABAB
One where all the siblings match each other: AAAA

That makes 7 or 8 patterns, depending on whether AAAA is considered a pattern.

Two Pairs of siblings match each other patterns

Here is the Access query for AABB

At first I was missing the criteria under SharonFromDad and that gave me AAAA combinations also. The result of the query looks like this:

Here Joel matches Sharon and Heidi matches Jon but on a different base. After I was finished putting in Starts and Stops for each Pattern, I then sorted my spreadsheet by ID. This brings up some issues that need looking at:

Where there are 2 Starts or Stops in a row, there is a need to check what is going on. The ones around the yellow positions may not be a problem as I’ll likely be taking those single positions out. However, at the end of Chromosome, there are 2 starts and 2 stops together. I need to go to ID 236707 and see what is before that point. It apears that there is an AAAA pattern before that point and that the ABAB at 224584 is a single point. That fixes half of the problem. Then I go to ID 238976 to see why I have a Stop there for ABAB.

I had missed the Start for the ABAB right after the stop of the ABBA pattern, so I added it in. The repaired spreadsheet looks like this.

An application

Now that I have the change between ABBB and ABAB described, let’s look at what it means. Here is a different look at that location:

When the pattern changes from ABBB to ABAB, what has changed is the third B changes to an A. Heidi is in that location. So that says at the above position of Chromosome 5, Heidi has a paternal crossover. I thought it would be good to check my work against the work of M MacNeill. To do that, I used the NCBI Remap website to change my Build 37 results to Build 36:

This would be the start of Heidi’s new segment. Here is what MacNeill had:

I got it right again. That is 2 for 2. Actually, the first time I tried, I was comparing the wrong Chromosomes. Rookie mistake. Here is M MacNeill’s map for Heidi on Chromosome 5:

Perhaps it is difficult to see, but the point I am looking at is the little lighter red segment at the far right of Chromosome 5. Perhaps that is why I missed it the first time as it is so small.

Another Aside is that this was a very difficult Chromosome to decipher using visual methods. This was one of my attempts to figure out the crossovers visually for 3 siblings.

I had missed the last crossover as it is so small and difficult to see. In my defense, I should note that M MacNeill did mention that the end of this Chromosome was difficult to decipher.

Taking Out the X

I’ve realized that I’ve generated some bases for the X I got from Dad. Of course, I didn’t really, so I’m taking out any bases there for me and my brother Jon. I’ll use this update query:

I was worried that I’d mess something up, so I created a new table called 4SibsChrX. My query put dashes in the spots where I couldn’t have an X base from Dad:

This looks like a good place to end Part 4. It appears that there should be many chances to quality check my work and that the process is progressing. Getting Jon’s new DNA set me back a bit, but the results should be better than what I’d see with 3 siblings.

September 23, 2016April 13, 2017

Raw Data Phasing: Part 3

This Blog is Part 3 documenting my learning process of phasing my DNA raw data using:

My raw data results, along with my mother’s and 2 sisters’ raw DNA data
MS Access to speed up the process
A Whit Athey Paper on Phasing when the data from one parent is missing
The work of M MacNeill on my raw data results.

Part 1 and 2 Recap

I imported 4 sets of raw data into Access from AncestryDNA after taking out the zeros that the Excel software produced for the no-calls.
I used Access Queries to apply 3 Whit Athey Principles. This resulted in many phased bases for me and my 2 sisters.
I put the phased A’s, G’s, C’s and T’s for each siblings into 2 new columns for each sibling
This resulted in 6 new columns. The first 3 of these six were for the paternally based bases. These resulted in a pattern which was either in the form of AAB, ABA, or ABB.
The Athey Paper did not emphasize the AAA pattern or considered it a non-pattern. While specific AAA results within another pattern area are by chance, there are other areas where 3 siblings match the same grandparent where there will be an AAA-only Pattern.
I separated my results into 3 patterns using Access: AAB, ABA, and ABB
For each of those results, I noted where those patterns changed. I did this by looking at the ID numbers. Breaks in the ID numbers were considered changes.
However, there were some cases where the changes occurred around missing bases. For these, I went back and noted a more precise position of the pattern change based on where the change would be if the missing base were to be filled in.
I Made a preliminary bar graph using the first 3 paternal changes. These crossovers were mapped to myself and 2 sisters.
Using the 3 patterns I developed Access queries to fill in the missing bases in the 3 paternal pattern areas.

So those were the 10 easy steps. Actually step 10 was difficult as there was quite a bit of refining the Access queries and quality checking the results. I needed 2 queries for each of the pattern areas. However, once I had the queries, it was the push of a button to update missing parental-received bases for 3 siblings within over 700,000 lines of DNA.

Back to Athey

This portion of the Athey Paper appears to apply to where I am now:

For some of the unfilled cells on the mother’s side of the table, we can fill in the alternative (other) base from the corresponding location on the father’s side of the table. That is, we know that the sibling with an empty cell got one base from the father, but the alternative base from the mother. Therefore, after the use of the Dad pattern fills in more cells, a newly filled – in cell in the father’s side of the table gives rise to a filled – in cell in the same position on the mother’s side–the alternative base to what was on the father’s side.

Unfortunately, I’m not sure what is meant above. My guess is that this relates to Principle 3:

Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base. This principle will be very useful in the present approach.

So now that missing paternal bases have been determined based on the patterns, it should be possible to fill in missing maternal bases for heterozygous children. First, I’ll do a Query to see if I can locate this situation. I’ll take my most recently updated Dad ABB Pattern Table update and query that. I’ll look at the situation where there are heterozygous results. Then, I’ll look at spots where there are missing bases from Mom.

Fortunately, I was able to come up with a slick looking Query for this situation:

Plus the Query design has some nice symmetry. The first criteria row of the query is for my (Joel) DNA. Reading across, it says Joel is heterozygous because my allele 1 does not equal my allele 2. Then it says that I have a base from Dad but not from Mom. This will show areas where the mom bases are missing in this heterozygous child situation.

The truncated fields above are Joel Allele 1, Joel Allele 2, Sharon allele 1&2, Heidi allele 1&2. The next 3 columns are Joel, Sharon and Heidi from Dad. Then Joel, Sharon and Heidi from Mom (the last 3 columns). This shows that there are almost 12,000 of these Mom bases to fill in. Above the blue line are Heidi’s bases missing from Mom. Heidi is TC (heterozygous) on that line. Her Dad base is T. I love these binary problems. They seem well suited for the computer. That means that a query could not be too difficult to update almost 12,000 records. So Heidi’s Mom base will be C above the blue line. At the blue highlighted area, I am TC and my Dad base is C. My Mom base will be T on the blue line.

Looking for a Good Query to Fill In Mom Bases from Dad Bases

First, I copied my ABB Table to a new Table called tbleMomBaseFromDadBase. I will want to update that table with a new Update Query. I already have the first part of the query. Now I need my thinking cap. Even better than thinking, I can look at what I did before. Here is my old query.

This is difficult to see, but I split the problem into 2 alleles. What this says is when Sharon has a base from her mom and Sharon’s allele 1 is not the same as the base from her Mom, pop that allele 1 into her base from Dad slot.

For our situation we are doing the opposite. So we will switch Mom and Dad. This time we are using our Dad results to get some Mom results. I’ll also add a criteria to make sure the Mom result is Null, so I’m not overwriting anything. It will just be an extra precaution.

Basically, I want to make sure Heidi has a base from Dad and not from Mom. In that case, when her allele1 is not equal to her base from Dad, put that allele 1 in as her base from Mom. Drawing upon my vast experience in this area of about 1 week, I get this:

When I preview the results, I get about 6,000 lines which is half of my previous query, so that seems OK. I’ll go ahead and update my new Table. I renamed my Query to qryMomBaseFromDadBaseAllele1 and copied it to do the same thing with Allele2. I’ll change the Allele’s 1’s to Allele’s 2 in the Query design. First I’ll do a Select (non-updating) Query to show what I’ll be updating with the allele’s 2.

Here I added the ID numbers, so I can make sure my update went well.

Here is my Allele2 Update Query with the 3 siblings included:

The results:

In the far right column is the Base Heidi got from Mom. It was updated on lines 2292, 2295 and 2299. In each case Heidi’s Paternal Base was T and the Maternally derived Base from Dad was C.

Here is my corresponding filled in Mom Base:

My Dad’s T’s in 6 columns from the right were used to fill in the missing C’s in 3 columns from the right. Doesn’t it seem a bit ironic? Even though my dad was not tested for DNA, his “results” from this process are used to find the DNA I got from my mom who was tested.

A Premature End to This Blog and a New Beginning

This will be one of my shortest Blogs. I was both awaiting and not awaiting my brother’s DNA test results. Those results came in this week. The reason I was not awaiting was that I knew that I would need to re-start the raw data DNA phasing process once his results came in. With that, I’ll end this Blog and start a new one.

September 23, 2016April 13, 2017

A New Tested Frazer Descendant: My Brother

My last Blog on Frazer DNA had to do with a newly tested James Line person – Madeline. My brother is on the Archibald Line of our Roscommon, Ireland Frazer Study Group. I had brought a DNA kit to my Hartley Family Reunion at the beginning of August, thinking to get a sample from one of my dad’s cousins. I didn’t end up doing that. So, later, I asked my brother if he would take the test. He did and the results are in.

Missing Frazer Segments from the Hartley Family

M MacNeill – prairielad_genealogy@hotmail.com has been mapping Chromosomes based on my family’s raw DNA data. That has shown that, on some chromosomes, even with 3 tested siblings, there is some Frazer DNA missing. Here is Chromosome 3, for example:

The bottom 3 lines are my DNA and my 3 sisters. The lighter red is Frazer and the darker red is Hartley DNA. In the middle part of the Chromosome, my 2 sisters and I only inherited Hartley DNA. That means that there is some Frazer DNA missing. My father’s DNA is on the top line. The cross hatch area shows the Frazer DNA that he is missing there because by chance his 3 children below didn’t inherit any in that area. My brother Jon may not help fill in this particular gap, but he may fill in some of the gaps.

Looking for new Frazer DNA from jon

What I did was look at Jon’s top matches. Then I ran those top matches through the One to Many Utility at gedmatch.com. From there I looked at Jon’s match’s matches to see if Jon came up by himself. That would be the new Frazer DNA. Jon’s top Frazer match is our second cousin, once removed Paul. I didn’t see any obvious new DNA with that comparison. Jon’s 2nd or 3rd top Frazer DNA project person is Michael. When I go to Michael’s match list, Jon comes up as Michael’s top DNA match. That is a good sign. Here is Michael’s Chromosome Browser matches for Chromosome 2:

Here Jon is #1.

It is not a large match, but the key here is that it is by itself. That makes it new as my 2 sisters and I don’t match Michael at that spot.

Phasing Brother Jon

Seeing the match above, it reminds me that I need to Phase Jon by Gedmatch. That means that gedmatch takes my brother’s results and splits them into the DNA it thinks Jon got from my mom and the DNA it thinks that he got from my dad based on my mom’s results. Before I do that, however, I uploaded my mom’s AncestryDNA results to Gedmatch.com. Her FTDNA results are already there. This is why I also uploaded her AncestryDNA results. The chart below shows the results you get when you compare one company’s DNA results to another’s or even a different version of one company’s results to another version of that company’s results.

Jon’s new results are Anc2 results. That means that now Ancestry is testing different areas of the Chromosomes. However, it looks like I didn’t need my mom’s AncestryDNA results after all. Comparing Jon’s Anc2 results with my mom’s FTDNA results still gives me more SNPs (426,923) than comparing Jon’s Anc2 to Anc1 (424,150). Now I’ll have to mark my mom’s Ancestry kit as research only at Gedmatch as it is not good to have 2 results for one person there.

Now Jon has 2 phased kits (maternal and paternal). My little side trip was to check Jon’s Paternal Phased Kit with Michael. Here are the results:

Next I run Jon’s maternally phased results with Michael:

They have a borderline maternal match. That means that Michael matches Jon on his maternal side as well as Jon’s paternal side. How can this be? The answer is that they probably have a very distant match or match in the general population. My mom is German, but about 1/4 English also. Michael lives in England. The key for this project is to disregard this Chromosome 7 Segment match as it is not likely a Frazer match.

Any more New Frazer project DNA for Jon?

Next on Jon’s list of Frazer DNA Projects is Gladys. Here is Gladys’ chromosome 1 showing her matches with Jon and his siblings.

Jon on Line #1 doesn’t have a new match here, but his match is longer. Numbers 2 and 3 are my sisters Sharon and Heidi. This is in an important part of Chromosome 1 where there are a lot of Triangulation Groups (TGs). It looks like Jon’s Frazer DNA got a bit less broken up compared to his sister Sharon’s match in the area from about 182-202M on Chromosome 1 above. Here is MacNeill’s Chromosme 1 map of Heidi (#3 above) and Sharon (#2 above):

Sharon’s (#2) small match is represented by the right end of the lighter blue bar above. Where the bar changes from red to dark red, Sharon’s DNA changes from Frazer to Hartley. What Gedmatch shows above is that when his DNA is mapped, the lighter red bar will go further to the right than Sharon’s red bar. Heidi’s (#3) small match is represented by the left side of her 2nd lighter red bar.

Jon and the Everyone Comparison

This next image will compare the matches Jon has with everyone in the Frazer Project. I left out those with parents that have tested.

This is like when you order the Everything Pizza. The square in the top left left has the Archibald Line matches. The square in the bottom right has the matches of the James Line of the Frazer DNA Project. People with green matches should know each other already. My brother Jon from the Archibald Line matches Jonathan of the James Line. This seems appropriate as Jon’s middle name is Frazer.

More Detail: GEDmatch Matching Segment CSV

For the same people that I chose for the comparison above, I wanted more segment detail, so I chose an option called Gedmatch Matching Segment. This puts all the matching segments between all the people above into an Excel spreadsheet. While looking at those segment matches, I found a new TG that Jon was in.

New Chromosome 9 TG with Jon

Here is my Frazer DNA spreadsheet:

The first line is for 2 close relatives in the James Line, so the match may not be on a Frazer line.

Can you see the TG? It is difficult to see. The TG is between Pat (PB), Gladys and Jon. It is confusing as there is a lot going on there. Here is what Gladys’ Chromosome Browser matches looks like for her Chromosome 9:

Where the lines represent Gladys’ matches with:

Bill
Pat
Jon
Sharon

But remember I said above that the TG was with Gladys Pat and Jon. How did Bill get in there? Note that Jon matches Pat at 8.8 cM. Perhaps Jon and Bill match below thresholds. I lowered the thresholds at Gedmatch to see if Jon and Bill would match, but still no match. Perhaps there is another explanation.

First the TG we do have. And it is a beautiful thing.

Pat, Gladys and Jon had a double shot at being in a Frazer TG as they have Violet as an ancestor and James, believed to be her 1st cousin. We may not know which Frazer the TG is for, but we know that it is a Frazer TG.

Why isn’t bill in this TG?

Yes, why not? Here’s my guess. As you likely know, we carry a set of chromosomes from our Mom and another set from our Dad. Gladys, above, had a set of Frazer Chromosomes and Webber Chromosomes. Perhaps the match Gladys showed with Bill was a Webber match. There is a way to test this theory. Bill also does not match Pat in the area of the TG that we are looking at. In my spreadsheet above, Bill has a few matches with Pat but they are in different regions. Bill may be matching Pat on the Price Line. Note that Pat and Bill share a Price ancestor.

Another Question: Why isn’t my sister Sharon in this tg?

The answer to this question is easy. She should have been in this TG all along. In my spreadsheet I have that Pat and Sharon match at 8.4 cM. I missed the larger match between Sharon and Gladys.

By the way, of my mapped siblings, Sharon has a lot of Frazer DNA in her Chromosome 9:

Sharon’s Unrecombined Frazer DNA

Thanks to the results of M MacNeill’s beautiful mapping work, Sharon’s lighter red bar on the bottom of the image above is all Frazer DNA. That means her paternal chromosome #9 did not recombine. She has the same Frazer DNA in that Chromosome that her dad got from his Frazer mother. But how did Dad get his DNA from his mom? My guess is that the DNA my dad got from his mom did recombine. That means that grandma passed down a combination of her parents’ DNA. That would be her paternal Frazer DNA and her maternal Clarke DNA. What I know for sure is the places where Sharon matches other Frazers would be the Frazer segment of my dad’s DNA. That would be at least the 85 to 100M range of Chromosome 9. So my dad’s (maternal) and Sharon’s (paternal) Chromosome 9 could have looked like this:

That leads to my modified spreadsheet for Chromosome 9:

The gold is intended to stand for the TG leading to Violet and James Frazer. Note that a lot of changes happen around 85M on the spreadsheet. My guess was, and still is, that there was a change from Frazer to McMaster DNA at that spot for Paul and my family. Both Paul (PF) and my family descend from George Frazer and Margaret McMaster. That would explain why other Frazers stop matching Paul and my siblings at that spot.

Hopefully, this image will explain it better. This is what Sharon’s and my dad’s Chromosome 9 could look like as it passed down to my dad. A generation earlier, in my grandmother’s DNA a McMaster probably recombined in there also.

Basically:

Sharon should have gotten a chromosome from her dad’s mother and father recombined
However, At Chromosome 9, she only got her paternal grandmother’s DNA (Frazer) – so she got one long segment
My father’s Chromosome could have looked like the image I had with Clarke and Frazer that he got from his 2 paternal grandparents.
My grandmother got her paternal Chromosome 9 from George Frazer and Margaret McMaster. Her Paternal Chromosome likely had a break in it at position 85M where her DNA went from McMaster to Frazer. This carried down to Sharon. Her paternal Chromosome 9 wouldn’t have had Clarke as this was her mother. The Clarke DNA was on my grandmother’s Maternal Chromosome 9.

So in Summary:

Jon is in a previously undiscovered TG at Chromosome 9
The TG points to Violet and James Frazer
Sharon got her entire Chromosome from her dad un-recombined
My grandmother passed down her DNA to my dad probably recombined with some of my dad’s grandfather’s Frazer DNA and Grandmother’s Clarke DNA
There is a stop in my family’s Frazer matches right at the point where there is a start in a match with my Frazer 2nd cousin once removed (location 85M). That leads me to believe that this is the spot where our match goes from 2nd great grandfather Frazer to our shared 2nd great grandmother McMaster.
Sometimes when working on a family DNA project such as this Frazer one, it is possible to find non-Frazer ancestor’s DNA.
Chromosome mapping is a big help in visualizing which ancestor likely contributed DNA to which descendant.

Bonus Feature: Archibald and James Line Frazer TG Update

I hope that I got this right. At least it should be generally correct.

There were a lot of new TGs that I noted in my previous Blog on the James Line side that I updated in lavender.
A preliminary observation is that Joanna and her family seem to favor Charlotte, Madeline and Mary more than Judith, Bonnie and Beverly.
This shows that all in our Frazer DNA Group except for one is in a TG. That is pretty exceptional.
There are 24 in the TGs. Each of these people averages about 4 Frazer TGs
I was an underachiever, as I’m only in one TG
The number of times one is in a TG is likely subject to many things including:
- Random DNA inheritance
- Distance you are to the common ancestor – the closer you are, the more likely you are to have a match
- endogamy. Some groups have 2 or 3 Frazers in their ancestry. The Price group have 3 Frazers in their ancestry and the most TGs. Each Frazer/Price descendant is in almost 15 TGs each, though 5 of those are likely Price only TGs
- number of descendants your common ancestor had. This will increase the odds of a TG as there are more descendants to triangulate
The 5 likely non-Frazer TGs are in a raspberry color and are likely Price TGs.
I have a note that the yellow TG could be for either Violet or James Frazer. I am leaning toward James as Violet descends from Richard Frazer as does Michael and Michael is not in this TG. James is believed to be the son of Philip b. around 1776 who was a brother of Richard b. around 1777.

September 21, 2016April 13, 2017

A Second Look at Pauline’s Newfoundland DNA

In my last post on Newfoundland DNA, I looked at Pauline and how she matched others in the Dicks DNA Project I have been working on. I found that she was in 3 Triangulation Groups (TGs), but I wasn’t totally convinced which family those TGs represented as there was some ambiguity whether they were Dicks TGs or Joyce TGs for two of the TGs. The other TG she was in was with my wife’s Upshall family which has Dicks ancestry also. However, due to questions in the Upshall ancestry I wasn’t totally sure those were Dicks TGs either. Pauline expressed a desire to find out more also, so I though I’d take a second look at Pauline’s DNA

People Who Match One or Both or Two Kits

This is a utility at Gedmatch that is helpful in DNA analysis. I’ll use this to find out more about Pauline. From my previous Blog, here are Pauline’s top matches with the Dicks DNA Project:

Her top 2 matches are with Molly and Kenneth of the Joyce Line. Coming in at #3 is Esther who is my wife’s great Aunt.

Pauline’s matches with molly

First I’ll run the Gedmatch Utility for People who Match Both Kits. Those kits being Pauline and Molly. I understand this utility as similar to the Ancestry Circles. Another term I have for it is ‘Where there’s smoke there’s fire”. In other words, these people match both Pauline and Molly, so maybe they have common ancestry. The difference is that Gedmatch can find the fire, so to speak. The fire is the TGs that show that there are common ancestors.

After finding all the people that match both Pauline and Molly, I look at those matches at gedmatch’s chromosome browser. Here is Pauline’s Chromsome 5 which I looked at in the last Blog, but now the net is spread a little wider. The matches are to Pauline.

Matches #1 and #2 were identified previously as being in a TG. They are Molly and Kenneth of the Joyce Line. #3 is also in the TG, or at least in one with Molly. This is someone called opcarrie at gedmatch who I don’t know. opcarrie is a lead which Pauline may contact to find common ancestry. Perhaps this person will be the tie breaker and indicate whether this DNA is from the Joyce Line or Dicks Line.

To the right of Chromosome 5 is another smaller likely TG. 3 of 4 of those matches have the name of Pike which may be recognizable. This TG is probably not a Dicks TG as it was not found in my previous look at Dicks descendants.

Common matches: Chromosome 15

Pauline has more interesting matches on Chromosome 15.

#1 is Molly again. #2 is Richard who I don’t know. I looked him up on Ancestry, so Pauline may find some common ancestry there. He also matches my wife’s Aunt Esther and they have a common ancestral surname of Kirby. This looks like a strong TG for Pauline also.

Pauline and Chromosome 21

I found this Chromosome interesting even though there was not an apparent TG.

Here Pauline has large matches with #1 Molly and #2 Kenneth. Both of these are on the Joyce Line. The reason I find this interesting is that it looks like there is a break right around the 23M mark. Assuming that these segments represent Rachel Dicks and her husband James Joyce, it could be that one segment is the James Joyce Segment and the other is the Rachel Dicks Segment.

People Who Match Both Pauline and Kenneth

Next I run the Gematch utility again for Pauline and Kenneth. This resulted in a smaller group of matches than Pauline had with Molly. I didn’t see anything much new here that was not already in the group of people that matched Pauline and Molly.

People Who Match Both Pauline and Esther

Here I would expect different results as Esther is not from the Joyce Line unlike Molly and Kenneth. Actually, when I look at the results, they look similar to the first 2 looks at results. There is one difference at Chromosome 15.

Now look at the Chromosome 15 matches Pauline had when looking at her Molly in common matches.

#1 above is Molly, #2 is the Richard who was not in the Dicks DNA Project. #5 is Jennifer. I’m also unfamiliar with her. This Jennifer also did not come up when I looked at the people who matched both Pauline and Kenneth.

Summary and Discussion

After taking a second look at Pauline’s DNA there is a little clearer picture of what is going on. I set the net a little wider. But with a wider net comes some more questions.

The new TG at Chromosome 16 appears to be a non-Dicks TG. However, Pauline may want to follow-up with some of the names there to make sure. One of the people in that TG shares a Kirby surname with my wife’s great Aunt. However, that may not be the surname of the shared ancestor with Pauline and Molly.
There is a new person to follow up with on Pauline’s TG on Chromosome 5
Pauline matches Molly and Kenneth on Chromosome 21. Assuming that these 2 matches represent Rachel Dicks and James Joyce, it would appear that the dividing line between these 2 matches represents the dividing line between the DNA that Pauline received from her 3rd great grandparents Rachel Dicks and James Joyce.

September 20, 2016April 13, 2017

Raw Data Phasing Via Access, Athey and MacNeill: Part 2

In my last Blog on raw data phasing, I went through 3 principals that Whit Athey laid out in a paper on phasing raw data when one parent’s DNA results were missing. Using those principals, and the MS Access program, I was able to sort many of my bases and 2 sisters’ bases into ones we received from our mom and ones that we received from our dad. I checked a few of my results with a chromosome map made for me by M Macneill.

Paternal Patterns

I had gotten to the part of the Athey paper where he talks about paternal patterns of bases that the sibling combinations received. I noted a space between the first two paternal patterns that I looked at. Below the pattern goes from an ABA pattern to an ABB pattern.

There was a gap between the ABA and ABB pattern where there was no ‘pattern’ as my 2 sisters and I shared the same base there. When my sisters and I all share the same base, that is an AAA “pattern”. That AAA area corresponded exactly to the area between the 2 yellow lines below in the chromosome map made for me by M MacNeill – prairielad_genealogy@hotmail.com .

In the map above, MacNeill was able to determine that my 2 sisters and I got our DNA from our paternal grandmother in the area between the 2 yellow lines. Further, the first yellow line described Sharon’s first paternal crossover point and the second yellow line described my (Joel’s) first paternal crossover point.

Finding All the Paternal Crossover Points

At this point in the Athey Paper, he recommended looking at the paternal pattern and filling in the missing bases based on the known pattern. I was looking for an easier way to do this, so decided to take a different approach. I decided that I would find all the paternal crossover points first. Then, armed with that information, I would create a formula that would fill in most or all of the missing bases for each pattern.

However, this required a modification of my database to make the work easier. I wanted a number to define the range of patterns, so that I could apply an easy query to add missing bases. I already had this but I hadn’t used it. Back when I imported the 4 sets of raw data into Access, Access assigned an ID to every row of data. That meant that I needed to add that ID into all the queries that I had done previously to make tables and further queries. This took a while, but I believe that it was worth it.

The ID is the first column.

I started going down all my data and noting the change of each pattern. I put the results into an Excel table. Here the Start and Stop numbers are the Access assigned ID numbers. The ID’s corrrespond with the number of DNA locations looked at. In this case there were a bit over a total of 700,000 of these locations for my mom, my 2 sisters, and me.

Then I noted the patterns are repeating as would be expected. For example, my first pattern was ABA, but 3 patterns later, that same ABA repeated. My thought was to create a query just for ABA patterns. Then when scrolling down looking for changes, the separation between rows should be greater and it would be easier to see where those changes were.

Here is what my Access query looks like. I changed the query name to DadSpecificPattern.

query

This particular query gives me the ABB pattern. I have the HeidifromDad base equal to the SharonFromDad base. That makes me the A and Sharon and Heidi the BB of the ABB Pattern. If you think about it, that also means in these areas that Heidi and Sharon will have their base from the same paternal grandparent and mine will be from the other paternal grandparent. I’m learning as I go. I’m sure that information will come in handy later.

My plan seemed to be good, but there was one catch. Once I refined my query, most or all of the blanks disappeared. That meant that the start and end points might not be exact. Here is an example of what I mean.

This is from my old Dad Pattern query with the blanks still there. The change from ABB to ABA happens at ID or line 19809. However, the new query takes out the blanks to make it look like the change is at ID Line 19826.

Here is what my DNA results look like so far without a filter (or query). The last 3 columns are the bases from Dad columns. There is a lot going on between lines 19809 and 19826.

Once I apply a formula to add bases, it will say something like: In the lines that have the ABA pattern where there is a blank at either A spot, replace the blank with the A that is there. If I apply the rule too late, I will be missing an area. Worse, If I were to use the 19826 cutoff, I may be still using the previous rule. That rule would say basically the same thing except, “Where the row is ABB and one of the B’s is missing replace the missing B with the one that is there.” If I apply an ABB rule to an ABA area, I’ll get bad results.

Long story short, I ended up recording a rough start and stop in my Excel Spreadsheet.

I started naming the segments, but realized that was not necessary. Some of the patterns were only at one point rather than in a long segment. I believe that is an anomaly due to a bad read, mutation or some other problem. Those are the ones in the spreadsheet that had no end point. It took me part of a morning to get all the paternal crossover pattern points for all 23 chromosomes. Fortunately for 3 siblings, the patterns are only ABA, AAB and ABB.

I just went back and checked the error points/aonomalies. I reran the Heterozygous Sibling Query and it fixed at least the first problem and hopefully the others. When I added the ID’s in, I had to redo all the queries quickly, so I suppose that is where the errors came in. That is not a problem as long as the problem can be found a fix can usually also be found. There actually weren’t that many errors. There are still some anomalies that are just anomalies. I have left those in yellow in the spreadsheet image below.

So in my spreadsheet, I have all the rough starts and ends for all the crossovers for my 2 sisters and myself. Here is the top part of the spreadsheet sorted by rough start:

Next, all I need are more exact start and end points. Here is the start of what I have:

I picked this section because it looks pretty complete already. Note that my Start and Stop numbers are pretty close to each other. That means that there are no other AAA segments in-between. I had to do an additional Access query to add in the position numbers for the Start and Stop of each chromosome’s pattern change. This was important if I want to convert the results from Build 37 to Build 36 to compare to MacNeill’s work or to gedmatch.com.

Starting to Find Paternal Crossovers and Assigning to Siblings

Previously I had been calling the start and end of my patterns crossovers. These two terms aren’t totally interchangeable as the start or stop of a pattern may happen at the beginning or end of a Chromosome and therefor not be a crossover at that point. It seems like it should be pretty easy to find the crossovers. Look at the image above. The first and second rows show ABA going to AAA. The order in me and my siblings are JSH or Joel, Sharon and Heidi. The only letter that changes is the B to A. That is the position that Sharon is in, so the paternal crossover has to go to her. From row 2 to row 3 the pattern changes from AAA to ABB at Chromosome 1, position 23,288,828, Build 37. That doesn’t mean that 2 siblings have a crossover there as we are looking at the patterns, not the letters. It is actually the letter that stayed the same that represents the crossover here. AAA to ABB means: all the same (AAA) goes to one different and 2 the same (ABB) – in this case Sharon and Heidi). The one that is different is me and I get the crossover at this location. The next change is from ABB to ABA. This is a little harder to see. I would say that that this crossover goes to Heidi if my reasoning is right. BB was the same before and goes to BA. It must be Heidi that changed because now she matches Joel who didn’t change. I’ll need to figure out how to make better bar graphs in Excel, but here is how the beginning part my father’s Chromosome 1 broke up for 3 of his children. Or another way to look at it the vertical lines are where my father’s maternal and paternal chromosomes combined in each of his 3 children that we are now looking at.

Where:

Series 1 is Sharon. Where the color goes from blue to orange is where Sharon has a change from one paternal grandparent’s DNA to another paternal grandparent’s DNA. The number to the right of Series 1 is the Build 37 Chromosome position number for Sharon’s crossover.
Series 2 is Joel’s first crossover (between orange and gray) and
Series 3 is Heidi’s first crossover position between gray and yellow [The same explanation under Sharon above applies to Joel and Heidi]

I’ll go back to the M MacNeill Standard. It’s like having an answer sheet to my questions.

According to MacNeill, I have assigned the crossovers to the correct siblings. In the above chart, just look at the red. I haven’t gotten to the maternal part yet, which MacNeill has in blue. The first 3 crossovers are where the red changes from light to dark or dark to light red. The difference in the MacNeill Chart is that his chart is split out one bar for each sibling. The other difference is that MacNeill has build 36 Chromosome position numbers and the numbers I have are from Build 37.

The Process

Phase the siblings into maternal and paternal DNA using the principles that Athey outlines
Find the paternal and maternal crossovers by pattern changes
Assign the crossovers to the correct sibling using the pattern changes
Assign the segments to the correct grandparent. This requires knowledge of cousin matches on the appropriate grandparent side.

That is the big picture which I am understanding as long as I don’t get too lost in the details.

Back to the Details: Fill in More A’s, G’s, C’s and T’s

I have been setting up my data for this, so hopefully, this will be easy. I now have 3 areas to look at:

AAB paternal update

Now I go back to my spreadsheet and sort it by Dad Pattern:

The Start and Stop areas are the ones I want to update. First, I’ll copy my most up to date Table in Access which is tblSibHetorzygous. I’ll rename that tblDadPatternUpdate. Then I want to look for missing data and update the blanks using the AAB pattern.

In Access, I create a query with the new table.

I chose the position fields and Paternal Pattern fields. I will change this to an update query which adds an Update To row. The criteria I want is when JoelFromDad = Sharon from Dad (AAB). Actually, I forgot, I was going to use ID criteria. So in the ID field, I need a lot of information. For the first AAB segment, I need everything between ID 45393 and 54155. This is what the criteria looks like:

When I choose that area, I get over 8,000 lines. However, I only want to update when there is one missing value in the first 2 and the one that isn’t missing is not equal to the third. Here is the result of the above query in my first AAB area:

I assume that the first blank should be a T. This would be one of the AAA results by chance in an AAB area. I don’t want to fill in the second line as I don’t know if it will be GGG or something else. That is what I meant by saying I don’t want to fill anything in unless there is only one missing value. In the 5th line there is A?G. That would have to be AAG (in an AAB Pattern area). There are some lines that have everything missing that I don’t want to touch.

How to create a query?

First, I want the situation where Joel doesn’t equal Sharon or Joel Doesn’t equal Sharon. That would create an AAB situation:

This query results in 1,666 rows of data including rows that are already filled in. Note that I had to write the range of ID’s twice because in order to get an OR situation I needed to put Joel not equat to Heidi and Sharon not equal to Heidi on separate lines. A simpler query is this one:

The above achieves the same results in one line. Now, for this query, if Joel is blank, replace it with Sharon’s results. If Sharon is blank, replace it with Joel’s results. Here is the query prior to the updating part:

This shows that there are 29 blanks for Joel and Sharon meeting this AAB criteria in the first range of AAB’s:

Next, I apply the same logic to all the AAB segments. In the Expression Builder of Access, I type in this simple formula:

Between 45393 And 54155 Or Between 60990 And 72548 Or Between 207109 And 220679 Or Between 313271 And 317516 OR Between 326845 And 326912 OR Between 389395 And 390311 OR Between 400045 And 405578 OR Between 419982 and 427158 OR Between 433191 And 446672
OR Between 482297 And 492542 OR Between 532520 And 539292 OR Between 571557 And 579594 OR Between 589614 And 589666 OR Between 630037 And 630314 OR Between 630319 And 630378 OR Between 658744 And 659375 OR Between 670533 And 672360 OR Between 673325 And 682544

Simple but long. This has the AAB Starts and Stops for 23 chromosomes. Then I copy it into the next ID criteria line and get this result:

It took a few minutes to type the criteria, but the goal is to update 1,514 lines of missing Paterrnal Pattern data with the push of one button. I still think it is quicker than going line by line and will be more accurate if I got the criteria right.

Next, I change the above Select Query to an Update Query.

When my (Joel’s) base from Dad is missing, I update to Sharon’s base. When Sharon’s base from Dad is missing her base is updated with mine. Isn’t sharing great? I didn’t look at the case where Heidi’s base from dad was missing, because if that was missing we wouldn’t be able to see any AAB Pattern.

Let’s UPdate

I push the run button and check the results. Here is my standard dire warning:

Now I will check if it worked. I’ll try ID or Line # 682124:

Unfortunately, that was an undesirable result. Before I had A?G. I changed this to ?AG. It appears that my query both replaced my value with Sharon’s, but replaced Sharon’s with my blank. I hadn’t expected that. Next, I’ll check ID# 682182. I had ?AG and replaced it with A?G. So until, I can think of a solution, I’ll need to split the 2 queries.

Fix it! Quick!

First I recopied by Heterozygous Sibling Table back to the Dad Pattern Update 1 Table. This got the table back to the way it was. Here is my simpler query.

Here if my base from Dad is null, replace it with Sharon’s base from Dad. I’ll check ID# 682182 again:

This gets into the category of trial and error. Sharon’s result still got replaced with nothing. See in the previous query I still was telling Access to put update Sharon’s results with mine. I needed to take that out:

There. Now the SharonFromDad Update To is blank. I go through the same procedures and now it looks right.

We now went from ?AG to AAG in the last 3 columns. These are the bases from Dad columns.

The next step is pretty easy:

I took out my criteria and put criteria in the SharonFromDad field. When she has a blank, replace it with Joel’s base from Dad. I hit run and it updated over 600 rows. Here is my original check spot at ID# 682124 with better results in the last 3 columns:

It took a while, but at least I got it right. The moral of the story is to not ask Access to do 2 things at once when those 2 things involve the same 2 people.

The Next Step: ABA

This time I’ll try a different query. I want there to be a B from the ABA in each case, so I’ll make sure that Sharon’s base from Dad is there:

Maybe I’ll figure what went wrong last time or come up with a new error. Above, I want the criteria on the first line to be for my blank base: If Sharon’s base from Dad is not equal to Heidi’s Base from Dad Put Heidi’s base from Dad in my blank spot. For Heidi, When Joel’s base from Dad doesn’t equal Sharon’s base from Dad, put Joel’s Base in Heidi’s spot.

I’m so tempted to try this query, but before I do, I’ll copy the previous table of the DadPatternUpdate to a new Dad Pattern Update ABA Table. This will preserve what I have in the now older DadPatternUpdate Table in case anything goes wrong. Hey, what could go wrong?

I pushed the Update Button and updated over 30,000 rows. The results don’t appear to be any better, so I’m back to my 2 step process.

Here is my new slimmed down query:

This new Update Query should update my Line 18 in the new UpdateABA Dad Pattern Table and it does:

I now have a full ABA pattern on that line. According to Access over 30,000 Lines were updated, so it wasn’t a total waste of time.

Run and check Line 149:

We have ABA in the last 3 columns, so that is good. Line 18 is still OK. I checked it just to make sure.

Query AAB Revised

After seeing how well the ABA Query went, I decided to revise the old AAB Query:

This is now looking at over 37,000 rows. This updates my AAB Blanks to tblDadPatternAAB. I don’t know if it is a better query, but at least I’m being consistent.

This was over 80,000 rows, so I’ll assume that bigger is better.

I copied that resulting Table to tblDadPatternUpdateABA and reran the 2 ABA Update Queries. Here is one of the rerun queries updating the ABA Paternal Table:

Down to ABB

My Last updated Paternal Table was updating ABA, so I’ll copy that to a new Table called tblDadPatternUpdateABB. I’ll also copy my last query and put in the appropriate Starts and Stops for the paternal ABB patterns. Again,

This says when Joel’s base from dad is not the same as Heidi, put that Joel from Dad into the space. Probably a more precise query would have said when Sharon from Dad is null and Joel from Dad is not equal to Heidi from Dad. I suppose technically the above query could be writing over a base with the same base in most cases.

I’ll fix that and notice that I had the wrong table in the top, so I’ll change that also.

This only updated 944 rows, so maybe bigger is not better. Here is Part 2:

This was almost 3,000 rows updated. Now I should check if it worked. I scrolled for an ABB Pattern in an old query and found this:

Here is my check:

I guess I’ve been working too long. Here I have an AAB instead of the ABB I wanted. That is because I had Heidi updated to me (the A) instead of Sharon (the B). Here is the correction:

I made a fresh Table of ABB. When I opened up the Query, it was saved this way:

So Access changed my query. Note that there are 2 fields with HeidiFromDad in them. One is for the Update To and the other has Criteria. That is probably a clearer way to do it. Who should argue with Access?

I updated that and I take a cue from Access for Part 2:

In English, the above says, “For this range when JoelFromDad is not blank but Sharon from Dad is, and Joel from Dad has a different value that Heidi from Dad, put that Heidi from Dad value where Sharon had the blank. It sounds a little complicated.

Back to Row 197704 and I’ll look at 197709 while I’m at it:

Oh no, it is still wrong! I checked the previous ABA Table and that was the reason for the error. The error is also in the old AAB Table. However, the error was not in the file before that. My guess is that the AAB rule got applied to the wrong range of rows. I don’t see an error there, so I’ll have to rerun all the queries.

That’s OK, because I’m brushing up on the queries and will use the Is Null value so we will only be filling in the missing bases.

I had more problems, so I deleted the AAB Table and recopied the previous Table into it. I reran the Revised AAB Query halfway and it looked OK. However, when I ran the second half of the AAB query – filling Sharon’s results, the problem came back at ID# 197704. Very mysterious. The problem was where I thought it was originally. Look at the ID Criteria for the AAB Pattern Query:

There is an extra digit in the first between. The range goes from 45393 to 544155. The second number should be 54155. So this query was performed on 450,000 more rows than intended. I updated the AAB query with fewer rows. Again fewer is better. After many requeryings, I got the desired result for ID# 197704:

That should be the end of the first phase of nit picky work on the Paternal Side.

Summary, Conclusion and What’s Next

This was a lot of work, but the good news is that this update is for all the Chromosomes at once.
The bad news is that I have to do this again for the Maternal Side
Next up should be easy. That is just re-applying the Principles that Whit Athey Outlined on the new bases that I added from knowing the patterns. This should update missing maternally received bases from the updated paternally received bases.
I haven’t filled in blanks for the AAA patterns yet.
I am a little ahead of the game as I looked at how some of the first paternal crossovers will look.
Also with some basic phasing, I was able to deduce who those first paternal crossovers belonged to – one each to my two sisters and one for me.
If anything can go wrong it will

September 20, 2016April 13, 2017

Back to Newfoundland DNA: Dicks Family Joyce Line Update

It’s been since last Spring that I have written about Newfoundland DNA – specifically the Dicks Family of Newfoundland. Since that time two things have happened:

There is now a new Facebook Group called Newfoundland Gedmatch. The purpose is to find those with Newfoundland heritage who have tested their autosomal DNA and uploaded those results to Gedmatch. At that point people compare their DNA results and their genealogy.
I believe that as a result of 1 above, Pauline has joined Newfoundland Gedmatch and also this Dicks DNA Study Group.

Both of the above are great news. We now have 12 in the Dicks Study group that have tested their DNA. That is plenty of DNA for comparing results. The chart below makes it look like 13 people but Marilyn is in 2 lines. There are two other people that have tested from another Dicks Line. They probably descend from a brother of the Christopher Dicks in the second box from the top below. Due to the large size of the Dicks family, they provide a good study group.

Here is an overall view of Dicks descendants that have tested.

Those in green have tested their DNA. Pauline is descended from Rachel Dicks. I call her Line the Joyce Line because Rachel married James Joyce. The Joyce Line was already the largest Dicks Line – now it is bigger with 5 DNA tested members.

Here is a closer view of the Joyce Line:

Our new member, Pauline, is in the lower left.

Let’s Get Into the DNA

First I’ll do a comparison of everyone to everyone.

It looks like Pauline hit the jackpot with Molly at almost 109 cM shared. Eric and Crystal were from the more remote Dicks Line and don’t show any shared DNA with Pauline. Esther is my wife’s great Aunt. Wallace and Kenneth are a generation closer to a common ancestor than Judy, so they have higher DNA shared amounts. This doesn’t mean that all the DNA shared above is Dicks DNA. However, as the Dicks are the common ancestors, it would explain a lot of the matches.

Triangulating with Pauline

I like to look for triangulation groups (TGs). That is when Person A has a match with B and C. Then Person B also has a match with C. Hopefully it will become clear. When this happens, it pretty much locks in the common ancestor. It’s not needed if we are sure about our genealogy, but if there is any doubt these TGs help clear up the doubt.

There are a couple of ways to look for TGs. One is by spreadsheet and the other is by chromosome browser. I’ll try the chromosome browser method. For example, here is Chromosome 5:

The bars represent DNA matches with Pauline. Molly is in yellow, Kenneth in green, Nelson is in blue and Eric from the faraway Dicks branch has a tiny pink match. I won’t bother looking at the small pink match. So it looks like Pauline, Molly and Kenneth are in a TG. All we need to know is if Molly and Kenneth match each other.

Yes, they do, from about position 76M to 122M. Here is what our first TG looks like:

However, there is only one problem with this TG. Well not a problem, but is it a Joyce TG or a Dicks TG? All these people descend from James Joyce as well as Rachel Dicks. I tend to lean toward the Joyce TG as there are other Dicks descendants that could have matched here but didn’t. Either those Dicks descendants didn’t match by chance or this is a Joyce TG. I suppose if Pauline, Marilyn or Kenneth match any of their Joyce relatives that aren’t related to the Dicks that would prove that this is a Joyce TG.

Another Joyce tg at Chromosome 7?

Here is the next potential TG at Chromosome 7

#1 is Kenneth, #2 is Wallace. #3 is a tiny match with Crystal on the faraway Dicks Line. Kenneth and Wallace are both Joyce descendants. But do they match each other’s DNA?

They need to match near the beginning of Chromosome 7 and they do from position 4M to 19M. I won’t do the circle and line thing as it is similar to the previous image. Kenneth and Wallace are 2nd cousins.

Chromosome 12 TG on the Upshall Line

On Chromosome 12 Pauline matches my wife’s family: her mom Joan and her great Aunt Esther.

#3 is a small match with Sandra. I would think that even though Esther is Joan’s 1/2 Aunt that they still should match here at the end of Chromosome 12:

They have too many matches to show them all, but Joan and Esther do match Pauline from 107M to 132M which matches with the Chromosome Browser. Here comes another triangulation image:

Except I have another potential problem. Pauline tells me that some of her ancestors were from Harbor Buffet. This could be a Dicks TG or some other TG. Perhaps there is a clue here to bolster some of the missing ancestors in my wife’s Newfoundland genealogy. In cases like this, I tend to assume the match is with the known ancestor rather than the unknown. However, it is good to keep an open mind.

Here are some of the missing ancestors on my wife’s Upshall Line:

Summary

Pauline has shown good matches to others in the Dicks DNA Project – especially to those in the Joyce Line which she is a part of.
Pauline is in 2 Triangulation Groups (TGs) with the Joyce Line. These TGs point to the James Joyce/Rachel Dicks couple. Further testing may show which specific person of the couple that the DNA comes from.
Pauline is in 1 TG with my wife’s mother and great Aunt. This TG likely represents DNA from Christopher Dicks b. 1784. As some of my wife’s Harbour Buffet ancestors are unknown, there is also a chance that this TG represents some of those unknown ancestors.

September 15, 2016April 13, 2017

Phasing Raw DNA with MS Access a la Whit Athey: Part 1

In this Blog, I would like to look at my raw DNA data. Those are the A’s, T’s, G’s and C’s. I have tested at AncestryDNA as has my mom and 2 sisters, so I will use those results. Whit Athey has a paper that describes how to phase your DNA when the DNA from one parent is missing:

Journal of Genetic Genealogy, 6(1), 2010

Journal of Genetic Genealogy

Fall 2010, Vol. 6, Number 1

Phasing the Chromosomes of a Family Group When One Parent is Missing

T. Whit Athey

Many have used MS Excel to phase their raw DNA results. However, it occurred to me that perhaps MS Access would be a better tool for phasing than Excel. When I download my AncestryDNA data, I get about 700,000 lines of data. That is a lot more data than Excel can handle easily. I will go through the Athey Paper and use Access to get results. However, I will not be giving a tutorial on Access as that would take too long.

Downloading AncestryDNA: Getting Rid of Zeros

Many people have downloaded raw data to upload to gedmatch.com. Ancestry raw data is in text form. Access gets along with Excel well, so first I import the AncestryDNA text data into Excel. Perhaps if you are curious, you have taken a look at your raw data to see what it looks like. Unfortunately, it takes a while to open up such a large file. Here is what a few lines of my AncestryDNA text file look like:

It is important to note in the information above that Ancestry uses Build 37. That means that these results need to be converted to compare to Build 36. For example, Gedmatch uses Build 36. I remove the information above the column titles and bring it into Excel. However, I put my name on the top of the last 2 columns because eventually there will be columns for 4 people’s results (mine, my mom’s and my 2 sisters’). I will need to distinguish between each person’s alleles. It is important to note that when importing this text file to Excel, Excel retains the file as text. This is probably such a file as note that the no-calls have been changed to zeros. To save the file as an Excel file, you must specifically do that step.

Here is a file with the no-calls as blanks, like I want them, and with my name at the top and the verbiage removed:

Here is the file in Excel. I have used the search and replace in the last 2 columns. I want blanks for no-calls and not zeros which Excel likes to add.

Using Access

At this point, I had to switch to my laptop as I don’t have Access on my desk top. I open up Access and name a new database. I go to External Data and choose the Excel icon with the arrow pointing up to import my 4 Excel Files of Raw DNA for Mom, my 2 sisters and me.

Next under Create, I choose Query Design. I choose the 4 Excel files that I have imported to Excel.

I should note that when I imported the Excel files, that Access creates a unique ID for each row. I let Access do that. It has set that ID as a key identifier. I could have used the rsid as a key that is somewhat as a unique constant. Next I will connect each table by the rsid’s with something called an equal join. That is the dark line I added between the rsid Field for each persons DNA data.

This means return the results when the rsid is the same for each file. Note the last table ( 2 images above) was wrong, so I took that out and added my sister Heidi’s Raw data table on the right. It is important to get the initial importing right and in the right format as this will save a lot of time later. Here is the form that I want the data in:

This is a portion of Table 1 from the Whit Athey Paper. The difference is that Whit only had part of Chromosome 16. I will have all Chromosomes at once. In my Access query I choose the Excel Titles as Fields. I need the rsid, chromosome and chromosome position only once. Then I add the 2 alleles for each person. FTDNA uses right and left alleles. AncestryDNA uses allele 1 and 2. They are the same undifferentiated alleles.

When I run the view the query results, I get this:

So with one push of the button, I have all the raw results of 4 people in my family in one area. I actually have more information than I need. AncestryDNA includes chromosome 24 and 25 which is YDNA and mitochondrial information that I don’t care about here. This is easily filtered out in the criteria section of the design view. I choose ‘Between 1 and 23’ there. That gives me each chromosome between and including 1 and 23.

Now I am down from roughly 701,000 lines of data to the 700,000 lines that I want. It is important to save these results as a Table in Access as we will be using that Table to make more tables. Also save the query. Even though I say to do this, I didn’t. but just saved the results under the next step.

Whit Athey’s Principle 1

This Principle is simple and straightforward. It says that if you have two letters the same in your results, one of those came from one parent and one came from the other. In line 1 of my results above I have TT. All my siblings have this result also. My mother is already shown as TT as she was tested. My father who was not tested must have had a T which he gave to me and my 2 sisters. Here is Table 2 from Athey showing the next set of data that we need to produce the AncestryDNA raw data. Ancestry didn’t tell us which side each of our bases came from, so we will figure that out.

I have only 3 siblings that I’m looking at right now, so I need 6 more ‘Fields’ in my database. There are a few ways to do this in Access. Here is one way that I did it.

Athey Principal 1 in Access: Homozygous Siblings

Homozygous is just a fancy term for my TT result found in the 1st position tested of my 1st Chrmosome. I created 6 more fields. These are to show what allele (letter) I got from my dad and my mom when I had a TT or other such homozygous results. Here is what the first field out of six that I added looks like on the Access Query Screen.

JoelFromDad is the first new field name. After the semicolon is the criteria. In English is says that if my allele1 is the same as my allele2, then put my allele1 in as the result I got from my dad. I used the same reasoning for a field called JoelFromMom and in similar fields for my two sisters. I viewed the results to make sure they made sense. I chose Make Table as I want the results in a Table to use later.

I hit the Run button and created a Table called tblAncestrySibHomozygous. Here I have squished the results together.

The results are as above: MomAllele1,2, etc. Then I added in the last 6 columns: JoelFromDad; SharonFromDad; HeidiFromDad; JoelFromMom; etc. In the first line above, The T’s that we all had were added as contributed from our mom and dad. There appears to be an error on line 3. Note that there were no-calls for Joel and Heidi. What we got from Dad was right, but we shouldn’t know what we got from our mom, just based on our own results. I must have saved my next step to this table also.

Fortunately, when I view the original query, the results are correct:

Note now that the blanks that should be there in the end of the 3rd line are there. Now I have 700,153 lines of results showing where my 2 sisters and I got our DNA from each parents just based on our own ‘homozygous’ results. Good old Principle 1.

Another tip is that when you make a Table from a query, the order may be slightly different than what you want. To keep the same order, in the Sort row, choose Ascending for the chromosome and position.

This will make sure that the chromosomes and positions within the chromosomes stay in the correct order. Otherwise, Access may try to sort by the first field which is the rsid.

Principle 2 in Access: Homozygous Parent

In my case, the homozygous parent is my mom. I spilled the beans already by my mistake above. In Line 3 above, my mom is GG. That means she had no other choice than but to contribute one of those G’s to each of her children at that location on Chrmomosome 1. Now I will put that Principle into Access language. For this portion I will use an Update Table. An Update Table will add new information to an existing Table. In this case, I added it to my tblAncestrySibHomozygous Table. That is why it showed the results already above. Here is what the Update Query looks like in design:

Here I have the tblAncestrySibHomozygous Table which I reran (or un-updated). This query says for the criteria where Momallel1 equals Momallele2, update the JoelFromMom, etc Fields with the Momallele2 value. Obviously I could have chosen either of her alleles to update the fields as they are the same. In the bottom left of the image above there is a pink highlighted query called qryMomHomozygous for Table. That is this update query. the ! means that it is going to create something. I assume that the little symbol to the left of the ! means that it is an update query. I ran the query and then created a new table with the results called tableAncestryMomHomozygous. Again, what I had forgotten was that by running this query, I also updated tblSibAncestryHomozygous. It’s always good to do quality checks – especially when you are dealing with over 700,000 rows of results at one time.

I did the update and got a warning from Access that I was updating over 400,000 rows. And that action cannot be reversed. Here is my old tblAncestryMomHomozygous to show the zeros that I didn’t like:

I’ll delete that table and replace it with the update on tblAncestrySibHomozygous that I just did. Here is the new table without zeros.

I still had to sort the table to get it right. The trick is to sort the position first and then the chromosome and everything comes out in the right order. Notice that I got rid of my old zero problem. Now I have over 700,000 rows of phased DNA based on homozygous results. Next, I look at heterozygous DNA. Whoa.

Principle 3: Heterozygous Child

I’ll copy the Athey Principle as he stated it as it is slightly more complicated than the previous two:

How to put Principle 3 into access?

Here is an example of heterozygous children alleles where the mother’s contributing base is known:

We know that each sibling got a G from mom as she only has G’s at this location. All the siblings have TG for their raw results, which means the T must have come from dad. I can go through over 700,000 lines and apply that rule or try to use the Access Update Query to produce the same results. This time I copied the tblAncestryMomHomozygous to a table called tblSibHeterozygous before I did the update to maintain the integrity of the older table. In the Update Query, I combined 2 steps. First I set a criteria that there has to be something in the JoelFromMom for this to work. So I said that JoelFromMom is not Null. Next, my allele1 is T. I want this to go into my JoelFromDad spot. If this T doesn’t equal the G I got from my mom, I am already heterozygous, so I don’t need an extra query for that. [That was the step that I didn’t need.]

Here is what I have for an Update Query:

However, note that instead of looking at allele1 here I chose allele2. I am thinking that this will be a 2 step process for each allele. This query is updating over 70,000 rows, or a little over 10% of all the data. I’m trying to show that this Update Query did not address my example above which had to do with allele1 and it didn’t:

The first line was my example. The 3 blanks in the first line are the bases from Dad that were not produced from the query as expected. However, it did work for the second line. In that case, my allele2 was not equal to the allele I got from my mom, so it inserted that allele (G) as the allele I got from my Dad. Next I’ll copy the query: qryHeterozygousSibsAllele2 and rename it as qryHeterozygousSibsAllele1. Then I changed 6 of the allele2’s to allele1’s. This is to cover my original example where allele 1 wasn’t the same as the base contributed from Mom.

In English: When allele1 doesn’t equal the allele that you got from Mom, put it in as the allele you got from Dad. This results in over 76,000 row changes. By the way, if I haven’t mentioned it, in the Update Query, the row that says Update To is the one where the update to your data is happening. So in the above example if Sharon’s allele1 doesn’t equal the one she got from Mom, that allele is then known to be the one from dad and is inserted in the correct place in a new table.

I check my updated table for good old rs13303118 and find:

So I think that it looks pretty good. The first line is now filled in with our Dad’s contributing base. Also all the applicable following lines out of 700,000. There are some situations there should be blanks. In the third line, my mom is AC and I am AC. That is the situation where it is not possible to know what base came from what parent. So the base for each of my contributing parent is left blank – meaning that it is unknown.

One Last Step: Looking for Patterns

This is about as far as I’ve gotten and understood. The next issue that Whit Athey looked at in his paper were patterns. In his example there were 4 siblings tested, so more patterns. He added a column between the allele inherited from Dad and one from Mom called Dad informative pattern.

The idea is that there will be a pattern that lasts for a long time as we go down the results sequentially. These are the patterns of the segments that we inherit from our grandparents. Where the patterns change are the crossovers. Whit says to use those patterns to fill some of the missing letters. I haven’t started filling in the missing bases yet for a few reasons. One is that I’m not sure why I need to. In scanning the Athey paper there is a repetitive procedure of going back and forth between the data using the base from Dad’s side fill-in’s to help with the base from Mom’s side and then back again. First I’m not sure how to automate this yet. And if I could, how much better would the data be? I have quite a bit of data already. Once I get some answers of why I need to do this, I will continue on.

Here is a paragraph from the Athey Paper concerning the above Table 2:

Note the pattern of inheritance from Dad shown in Table 2 for the four siblings in the leftmost four columns. The first few rows show an AABB base pattern, but this gives way in about lines 12-13 to a new pattern, ABBB. Even though we only can see the pattern showing in some of the rows, these patterns persist over hundreds or thousands of SNPs, and can be assumed to exist also in the intervening rows where no pattern was discernable (and in the underlying sequence). Note that often there will be the same base in every location, a case of “accidental matching” which does not contribute to or detract from the pattern we are looking for. When two or more bases are different in a row, however, this represents an informative pattern—if any two are different, then since there are only two possible chromosomes contributing, it means we can see the chromosomal origins of the bases.

One of the reasons that I quote the above is to address the accidental matching where there was the same contributing parent base for each sibling. However, what I didn’t see addressed is that there are cases where that is not just accidental which I will discuss later.

Finding the crossovers

I do know the importance of finding the crossovers. I wrote a query in Access to cull out the patterns that Whit mentions.

Above is my query in design view using the table that has Principles, 1, 2, and 3 already applied. This query basically filters out the situations where the 3 siblings have the same base. The thought is that if one sibling has one base that is different from one of the others, then the three siblings’ will not share the same base.

Above is the start of the results of the query. Note the XYX pattern. This should make it possible to fill in Heidi’s missing bases from Dad. It looks like multiple choice test answers, but I would add C, G, C, C, C and A in the last column for the bases that Heidi got from Dad. My homework assignment is to find a formula to fill in those letters so I don’t have to do it manually 10’s of thousands of times.

Another thing I want Access to do is find where the crossovers are. Here I scrolled down all the bases the my sisters and I had from Dad. I can see where the XYX pattern changes to XYY:

But there was a problem. the XYX pattern stopped at position 18,759,377 and the XYY pattern started at 23,288,828. That means we have a large area with no pattern. Exactly. That is the area of XXX pattern that I just queried out. That has to be the area where all three siblings match the same paternal grandparent.

Checking my results with m macneill’s work

Fortunately, I have secret weapon. M MacNeill – prairielad_genealogy@hotmail.com has also been looking at my raw DNA using his own Excel spreadsheet method. Here is what he has for Chromosome 1:

Now just look at the first 3 red bars above. They represent my paternal side. The first break would be on Sharon’s bar – the third red bar from the top. The end of her dark red bar is at 18,631,964:

Look at Sharon’s bar in that region and then scan up the 3 red bars. There is an area where all three siblings match on the paternal grandmother side (lighter red).

That is my paternal XXX Pattern.

To satisfy my curiosity, I went back to my unfiltered/unqueried table at the spot that the first pattern changed from XYX to XXX. The end of the first base pattern from Dad is highlighted in blue.

Line 2 is a no-call. Line 3 is one of the random XXX matches in the XYX pattern area that Athey mentioned above. Note that I could not likely fill in line 4 with what I know as I don’t know if that should be AAA, AGA, or something else. Actually, I could fill in Heidi’s with an A. If her results are AAA or AGA, Heidi still gets the A from Dad. It is only Sharon’s base from Dad that I don’t know.

However, starting at CCC, it seems like it would make sense to fill in all the letters in the XXX pattern area – even if there is only one known base out of three.

Converting Build 37 to Build 36 positions

At the top of the Blog I had mentioned that AncestryDNA results were in Build 37. M MacNeill’s work is in Build 36. I really didn’t want to have to convert results and thought that I was being clever by using all AncestryDNA results. However, to compare to M MacNeill’s Map above or to Gedmatch results, I still have to convert positions. Hey, life is tough.

NCBI genome Remapping service

Fortunately there is a way to convert positions here.

Assuming we are all homo sapiens, we select that choice and we select that we want to go from Build 37 to 36:

Here is the place to enter the data we want converted. It has to be in the format below – “chr1:” followed by the position number. There is also a place to upload a file which I haven’t tried.

These are the 2 positions from my query where one pattern stopped and another started. Here is what they look like in Build 36 under Map Location:

These Build 36 position numbers match up perfectly with M MacNeill’s map positions which gives me some confidence. This is where I’ll end Part 1.

I have found 2 paternal crossover points. However, I have not yet figured out which siblings they belong to – unless I cheat and look at the MacNeill Map above. I can easily do the same thing and find the pattern changes for the maternal side. I have shown 2 crossovers, but all the others exist in my query for 23 chromosomes. I just haven’t looked for them yet.

Summary

The Whit Athey Paper has been very helpful in phasing my raw DNA based on my mother and 2 siblings test results.
M MacNeill has piqued an interest in raw DNA data that I never thought I would have
M MacNeill’s Chromosome Maps are very helpful in checking my work
MS Access appears to be a great tool to use to quickly phase a lot of raw DNA
There is probably no way around DNA remapping or conversions
I still need:
- An easy way to find all the crossover points
- A formula to fill in the various patterns
- A good reason to fill in those missing bases
I have a lot more to learn about DNA phasing using raw DNA data

September 9, 2016April 29, 2017

Can 83 cM Last for 7 Generations?

Recently, I came across a DNA match at Ancestry. This match was on my mother’s side. Here is how the match showed at AncestryDNA:

The match, Nigel, showed as a predicted 4th cousin. However, the range stated he was possibly a 4th to 6th cousin to my mother (and my sister). Further, the matching surnames looked familiar based on my mother’s ancestry. However, the Ellis on Nigel’s side was a female from the early 1700’s. Any possible Ellis connection would be before the Nicholson/Staniforth connection.

The Common Ancestors

I wrote to Nigel and mentioned that it looked like we were related on at least one line. I had a bit of trouble figuring out exactly how we were related as did Nigel. It helped me to map it out – especially as Nigel has 4 Johns in a row in his ancestry.

It turns out that Nigel was not just a 4th cousin as predicted by AncestryDNA, but a 4th cousin, 2 times removed to my mom. Our common ancestor based on the chart above is John Nicholson baptized 1765. That is where the 7 generations comes in. John Nicholson is 7 generations before Nigel and 5 generations before my mom. However, my sister Heidi and Nigel have the same DNA as my mom and Nigel and Heidi is 6 generations away from the probable common ancestors of John Nicholson and Sarah Stanisforth.

Nigel at Gedmatch

I mentioned my Nicholson webpage to Nigel which he enjoyed. Nigel was willing to upload his DNA to Gedmatch for my research. Here is how his match looks like with my mother:

Here is where the 83.8 cM comes in. Hence the title of the Blog: “Can 83 cM Last for 7 Generations?”

A chromosome 1 map

Here is a map of my Chromosome 1 kindly produced by M MacNeill – prairielad_genealogy@hotmail.com. The top portion of this map was based on raw data DNA. It shows how my 2 sisters and I inherited our DNA from our 4 grandparents.

The four light blue bars at the bottom of the above image show the DNA matches that Nigel has to my mom, my sister Heidi, myself and my sister Sharon near the beginning of Chromosome 1. Nigel is related on my mother’s mother’s side. Notice how Nigel’s light blue matches below correspond to the DNA mapped to my mother’s mother’s light blue regions above. Heidi inherited a large maternal grandmother segment in this area of Chromosome 1 from our mom that had the large match to Nigel. The entire segment mapped to my maternal grandmother’s side appears to make up the match I have with Nigel.

A Nicholson Triangulation Group

My mother forms a Triangulation Group (TG) with her 2nd cousin Carol and 4th cousin, twice removed, Nigel. The TG is on Chromosome 3. To show the TG, I have to take the Gedmatch threshold down a little.

My mom’s match to Nigel

Likewise, the threshold was reduced to show the match between Nigel and Carol.

Nigels’s Match to Carol

No threshold change was needed for the match between my mother and her second cousin Carol.

Mom’s match to carol

Here is what the TG looks like with the likely common ancestors of Nicholson and Staniforth:

Are There Other Possibilities?

83.3 cM is way off the charts for 4th cousin or 4th cousin, 2 times removed. I brought the question to the ISOGG Facebook Group. The prevailing wisdom there is to check for other closer relatives (which makes sense). If there are missing ancestors on either side of the match (my family or Nigel’s), that may leave room for other more recent common ancestors.

my ancestry

First, the match is on my mother’s side. So that narrows things down. Secondly, my mom is 1/4 English. Therefor, I am only looking at 1/8 of my ancestry and 1/4 of my mom’s.

Above, I have circled in yellow the one out of 4 grandparents of my mother that could match Nigel as Nigel has not shown any German ancestry. Annie Nicholson is 2 generations back from my mom (my mom’s grandmother).

Here is an enlargement of Ann Nicholson’s ancestors:

This shows that in the 5th generation from my mom where the assumed common ancestors of our match is found, most of the ancestors are identified. Mary doesn’t have a last name and I’m missing parents for Charles Ellis. So even if the new common ancestors were in this generation, they would be in the same generation of our currently assumed common ancestors. But what if Nigel has an unidentified ancestor in his 6th generation that matches someone in my mom’s 4th generation? That would be a closer match. So let’s look at Nigel’s tree.

Nigel’s tree

Nigel’s father’s side appears to be from Scotland. His mother’s side is from England. Nigel’s maternal grandmother is from the Derbyshire area and his maternal grandfather is from the Sheffield area. So that narrows things down to 1/4 for Nigel. My mom’s only English ancestors were from the Sheffield area, so we will concentrate on Nigel’s maternal grandfather’s side.

Here are Nigel’s maternal grandfather’s Sheffield ancestors:

The tentative common ancestors between Nigel and my family is one generation off this chart. The John Nicholson married to Martha Jow had as parents another John Nicholson who married Sarah Stanisforth. The ancestry above shows that Nigel has 6 out of 16 Sheffield ancestors 6 generations away. Is this a problem?

Nigel’s missing ancestors

Above, I had said that if Nigel had missing ancestors in generation 6 that matched with my mom’s generation 4 ancestors, then there could be a closer match. I’ll look at thee various possibilities and we will decide if they pose a problem.

A problem that I hadn’t considered previously would be if Nigel’s unknown 6th generation matched with my mom’s 2 unknown ancestors in her 5th generation. Those unknowns are the parents of Charles Ellis born 1795. I don’t think that scenario is very likely. First, it would not likely be on the Ellis side. Charles Ellis’ father would also be an Ellis and Nigel doesn’t have any Ellis’s in his known generation 5 ancestors. But what about Nigel’s unknown female ancestors in generation 5? They were already married and having the children that are known in Nigel’s generation 4. So any unknown common ancestor there would have to be in Nigel’s generation 7 which is back where we started.
Another scenario would have a missing female ancestor of Nigel remarrying. However, usually in this case, there would only be a 1/2 match and thuse 1/2 the DNA coming down to Nigel and my family. I would rule this scenario out based on the very large DNA match between my family and Nigel.
When I look at other scenarios the reasoning seems to be similar as what I mention in #1 above. The options appear to bring us back to Nigel’s generation 7 again. That means that we either have an additional set of common ancestors in addition to the one that we have identified or we don’t. It makes sense to me to go with the ancestors that we do have rather than worry about missing ones we may have. Put another way, I’m gambling on the possibility that there were not additional common ancestors in Nigel’s generation 7 and my mom’s generation 5.

ON the other hand: Our non-conformist ancestors

One thing that Nigel and my family’s Sheffield ancestors had in common were that they were non-conformists. This means that they attended a church that was not the official Church of England. In their case it was the Congregational Church. Perhaps there were other types of churches that they attended during the family history. What I don’t know is if people in these these groups married cousins to keep within the faith, or if there were enough of these non-conformists around that this wasn’t necessary.

So, Where Are We?

The prevailing wisdom is that if there are missing ancestors, then the matches could be in a closer generation in those missing spots.
I would like to push back the prevailing wisdom a bit. Even if we are missing some ancestors, there are things that can be deduced about those missing ancestors based on known ancestors in the next more recent generation.
In genealogy research and DNA matching, things are not usually known 100 percent. I believe that there is a high probability that John Nicholson and Sarah Stanisforth are the common ancestors between Nigel and my family represented by a relatively large amount of DNA that made it down through both of our lines from the 1700’s.

Kitty Cooper’s Chromosome Maps

Above I have shown the genealogy and a Triangulation Group for the Nicholsons. I have also shown that the match between Nigel and my family is through my correct grandparent’s (mother’s mother’s) DNA. Now that I have convinced myself that John Nicholson and Sarah Stanisforth produced the matching DNA between Nigel and my family I will add that couple to my Kitty Cooper generated Chromosome Map:

The Nicholson/Staniforth connection on my map above is shown on Chromosomes 1 and 3. Note that this is not the oldest DNA that I have and that the matches are in line with 2 other ancestors (Frazer in Green and Rathfelder in purple) from around the same time period.

Of course, I can’t leave it at that. Now I need to show my mom’s updated Chromosome map:

Note the following:

My mom’s segments are larger than my corresponding maternal segments as she is one generation back from me
My mom’s Nicholson/Stanisforth DNA is shown in purple.
My mom does not show DNA from that couple at Chromosome 3. That is because her match came in at 6.9 cM which is just under the 7.0 Gedmatch threshold. If I wanted to be more accurate, I would have added that match also – especially as that is the match that resulted in the triangulation group.

September 1, 2016April 13, 2017

Finally, Triangulation Groups for the James Frazer Line of Roscommon County, Ireland

Recently, Kathy from the James Line of our Frazer DNA Project notified me that her Aunt Madeline had been tested for DNA and would I look at the results? I will take a look at the results in this Blog. Here is where Madeline fits in on the James Line Tree. She is on the second row from the bottom and the 3rd box from the left.

Above, the James Line Descendants that tested their DNA are in red. So now there are 14 people that have tested. There is one person below Clyde on the bottom left that I don’t show. I don’t analyze the children of people that have tested their DNA because they got all their DNA from their parents. Betty hasn’t uploaded her results to Gematch. As a result, I am comparing 11 people to each other on the James Line.

Let’s Triangulate

First, I’ll say that I won’t bother triangulating Charlotte, Madeline and Mary. That triangulation would only point to the parents of Charlotte and Madeline and they already know who their parent are. Plus one of their parents is not a Frazer.

Chromosome 2

Here are the matches that Madeline has on Chromosome 2:

Madeline’s sister Charlotte is #1 in red. Notice a large match. Madeline’s niece Mary is #2. As expected, the matches are smaller and more broken up, but still fairly large. #3 is my 2nd cousin, Paul. He is actually on the Archibald line, but I believe that he and I have some James Line ancestors that haven’t been identified. Paul has a very small pink match with Madeline and Charlotte and another fairly small blue match with Madeline, Charlotte, and Mary.

I won’t go down to the pink level at this time but will look at Paul’s blue match. Even that is below the normal gedmatch.com threshold of 7 cM. In order for this to be a true triangulation group, Paul would also have to match Charlotte and Mary. And Charlotte and Mary would have to match each other. Paul’s blue match is at position 174 to 178M on the image above. We already know that Charlotte and Mary match each other in that region.

Here are Paul’s matches with Charlotte:

Note that on Chromosome 2 where we wanted him to match Charlotte (around 175M). he doesn’t. At least not down to 4 cM and 400 SNPs. This match does appear to be in Paul’s pink match area that we didn’t consider.

Just to make sure, I will see if Paul matches Mary.

Here there are no matches at Chromosome 2, so I would say there is not a triangulation group there and Paul’s match with Madeline was by chance. Let’s move on to another Chromosome.

chromosome 4

Again, the top 2 matches to Madeline are her sister Charlotte and her niece. #3 is Clydie (also known as Clyde). #4 is my sister Heidi, but we won’t consider that match right now. Here is how Cyldie matches Charlotte:

Unfortunately, there is no match on Chromosome 4. Again, there is no match between Clydie and Charlotte’s niece, so no triangulation at Chromosome 4:

chromosome 5 – Two TGs

Mary
Charlotte
Bonnie
Judith
Jane (from the Archibald Line. I’ll ignore this small match for now.)

Here is more TG potential with Bonnie and Judith, both of whom have a paper trail on the James Frazer Line. From previous testing, Charlotte and Bonnie match in the area of Chromosome 5 that we are interested in:

Here we have our first James Line only TG. This means that Madeline, Charlotte and Bonnie all have a common ancestor. It would be tempting to think that this DNA comes from James Frazer:

However, there are other possibilities. We don’t know the spouse of Archibald Frazer born around 1792. That could be area of the match. Alternatively, one of the genealogies could be wrong.

Next, let’s look at Judith’s small match. Here is where she matches Charlotte:

Here is how our new Chromosome 5 TG could look:

Again, there are other possibilities. Note that Charlotte and Madeline are 5th cousins to Judith – assuming we have the chart right. Also, taken together, these 2 TGs infer a common ancestor between Charlotte, Madeline, Mary, Bonnie and Judith.

Chromosome 6

Charlotte
Mary
Jonathan
Janet

Here is the comparison between Jonathan and Charlotte:

So this does not look promising for triangulating. I compared Mary and Jonathan – no match there either. As Jonathan and Janet are siblings, there should be no match between Janet and Charlotte or Janet and Mary.

Chromosome 7 – a non-James Line TG?

Above are Madeline’s matches with Charlotte, Mary and Bill from the Archibald Line. It appears that Madeline, Charlotte and Bill are in a small TG. Bill has a small match with Madeline right at the area that he needs to (from position 127 to 130M) in order to form a TG.

TG at Chromosome 10

Here are Madeline’s matches with her close relatives Mary and Charlotte, and her matches with her more distant relatives Jonathan and Janet. It looks like there should be a TG between Madeline, Mary, Jonathan and Janet.

Here I don’t even have to lower the Gedmatch thresholds for the match between Mary and Jonathan:

The match between Mary and Janet is slightly smaller at 9.0 cM. This is another case where Madeline has tipped the scales and resulted in another TG.

Chromosome 12

Above is the representation of Madeline’s matches with Mary, Charlotte and Prudence. Let’s look for a match between Charlotte and Prudence:

They do have a good match right where we need them to to form a TG. This is an important TG as it adds a new line:

On paper, Charlotte and Madeline are 4th cousins with Prudence. The Edward Frazer Line is well documented, so this supports the genealogy that links Charlotte and Madeline up through Archibald Frazer and Catherine Peyton.

Chromosome 15

As usual, Madeline is matching with Charlotte and Mary. The next 2 blue segments represent Madeline matching siblings Jonathan and Joanna. If Joanna and Charlotte match, that will be one TG. They do:

Now, we need to check if Jonathan matches Charlotte and Mary. He doesn’t match Charlotte on Chromosome 15:

Chances are, he won’t match Mary here either. I checked and he didn’t. This is why it pays to check each connection. From the Chromosome Browser above, it looked like Jonathan could be in a TG, but only his sister Joanna was.

Chromosomes 16-22 only have matches between Madeline, Charlotte and Mary.

The X Chromosome

The X Chromosome can be confusing as the male only inherits an X Chromosome from his mother. The female inherits and X from both parents. That means where there are 2 male Frazers in a line of inheritance, the X cannot represent a Frazer match. That is, unless there is intermarriage of the Frazers. Just to show I’m not afraid of being confused, here are Madeline’s matches on the X Chromosome:

As above, I’ll ignore the small pink matches. The first 2 of Madeline’s matches are again Charlotte and Mary. The 2 yellow matches belong to my sisters Sharon and Heidi. In the past, I have explained these by an unknown Frazer in my ancestry that is likely in the James Line. #8 is Bonnie. #10 is Clydie. This is an interesting match because it is almost 20 cM. Also Clydie does not have 2 Frazers in a row in her ancestry until before William Fitzgerald Frazer and Margaret Graham. This means that Madeline, Charlotte and Clydie could have a Graham in common or perhaps an ancestor of Margaret Graham in common.

Summary of the Five New TGs

Here is the summary of the new James Line TGs not including the X Chromosome:

The numbers in the top right are the Chromosomes where the James Line TGs are. The names in the bottom left indicate the likely common ancestor(s) for the TGs. For simplicity, I left out the new TGs that had James and Archibald Line people in them.

Summary and Conclusions

The addition of Madeline to the James Line DNA Test Group tipped the scales and resulted in previously unknown TGs for the James Line.
Out of the 11 people, considered, 9 were in James Line TGs.
The newest member of our James Line DNA Group, Madeline, was in the most TGs: four
Charlotte helped form 3 new TGs
Even though Mary is a niece of Madeline and Charlotte, she also helped form 3 new TGs
Bonnie, Judith, Jonathan, Janet, Prudence and Joanna were each in one new TG
The 2 that weren’t in TGs were Clydie and Beverly. However, Clydie was in an X Chromosome TG. Beverly shows as a 3rd cousin to Bonnie and Judith. As a result, her relationships can be inferred through them.
These new TGs add certainty to the relatedness of those on the James Line.