DNA Phasing of Raw DNA When One Sibling is Missing: Part 10

In this Blog, I would like to portray my phasing results in an Excel Bar Chart if possible. This has been one of the most difficult parts a phasing my DNA for me.

I have looked at Stacked Bar Charts in Excel as they seem to be the closest to what I am looking for. Today I looked at a method for producing Gantt Charts at ablebits.com which seems to hold some promise of application for DNA mapping:

bar-chart-excel

I had my Maternal Patterns’ Starts and Stops from my last blog. I took those and converted them to Build 36 and put them in a spreadsheet:

momcrossoverstable

Start is the ID# I was using. Start36 is the Chromosome position of the Start of the pattern in Build 36. App ID is the approximate position of the Crossover. Then I have that same location in Build 37 and Build 36. Following the logic in the Ablebits.com tutorial, I have the first Maternal Crossovers for Chromosome 7 in my simplified Chart:

matfirstxover7

I got this by choosing the Build 36 column and choosing Insert Stacked Bar. I suppose a better Title would have been Chromosome 7 Maternal Crossover rather than Build 36. This was taken from my Column Header. The goal is to get a 2 color bar above. However, I already see a problem. The bar needs to be different colors for different people. Well, I have to start somewhere.

Next, I put in the next crossover location for each person. I took this position and subtracted from it the first Crossover to get a length.

step2crossexcel

You may note that the Bar Chart inverts the original order. It gives Sharon a 4 which is now on top. Here is my visual phasing of Chromosome 7 that I am trying to replicate:

chr7visphase

My Excel Bar Chart order is Sharon, Jon, Joel, Heidi. My visual phasing order is Sharon, Joel, Heidi, Jon. The 2 maternal colors I have above are green and orange representing Lentz and Rathfelder. If I keep orange as Rathfelder, that means I want to change bar 2 and 3 (Joel and Jon) on the Excel Bar Chart. One way to do this is to move over the first Crossovers for Joel and Jon in my spreadsheet:

modchart

However, that made the 2 male siblings’ first maternal grandparent match too long. I needed to move the start over 2 places in my spreadsheet:

mat7revised

Now the Chr7 Maternal Crossover column can be called Lentz and the 2length column can be called Rathfelder.

Next, I added another column for the next Lentz portion of DNA:

chr73rdxover

I was hoping that if I named the next column Lentz, that Excel would give me the same blue as the first Lentz. I was able to right click on the gray and change it to blue. I then added another Rathfelder segment. For this to work in Excel, a Rathfelder length is added rather than a start and stop location.

chr7xover3

Again, I had to reformat the Excel-chosen color to be consistent with what I had for Rathfelder. I chose the last position for Heidi and Sharon as the highest that I had as this was their last segment. After a bit of wrangling with Excel, I was able to get this:

chr7

So that is the presentation. However, I notice that on my visual phasing, I had 5 segments for Jon and only 4 here. I missed his last Rathfelder segment. I had ended Jon’s Chromosome too early. Here is the correction:

chr7corrected

It still looks like one of Jon’s crossovers in the middle of the Chromosome may be off, but I’ll have to figure that out later.

Paternal Bar Chart

Now that I have something that looks like a maternal Chromosome Map, I need the paternal side to go along with it. It looks like if I add 4 more rows to my spreadsheet, I may have it.

I did this and I added Hartley and Frazer (my paternal side grandparents) to the right of the maternal side grandparents. I had to make a new chart that came out like this:

chr7matpat

Here #4 is my Paternal DNA. I found it a bit disconcerting that my paternal side was longer than the maternal. Here I’ve added a bit of formatting and made the colors consistent (one color per grandparent):

chr7patmatmap

Well, I guess I’ll just leave this imperfect. It will give me something to work on later. I did change the scale from millions to M’s to be easier to read.  The above shows that Jon and Heidi share their paternal grandfather’s Hartley DNA un-recombined on Chromosome 7.

Summary and Conclusions

  • Learning how to phase my raw DNA has been interesting and time consuming
  • Delving into the A’s, G’s, T’s and C’s promotes understanding of one’s DNA
  • I owe a lot to M MacNeill and Whit Athey in learning how to do this phasing
  • Due to the data intensive nature of phasing, I would recommend the use of MS Access or some other database software.
  • An understanding of Excel or similar spreadsheet software is also important.
  • I had tested my brother Jon as an afterthought. It turned out that his test results were important in determining the phasing of the 4 siblings.
  • I have the overall skeleton of the phasing with crossovers. There is still a lot of work to complete the individual Chromosomes and trouble shoot problem areas.
  • Further, I have not worked on the X Chromosome due to the different nature of that Chromosome. My brother and I are already phased. My sisters are not.
  • Once these maps are done they will be a reference to all matches to my 3 siblings and myself.

DNA Phasing of 4 Siblings When One Parent Is Missing: Part 9

Mom Patterns

Up to this point, I have phased 4 siblings based on 3 principles outlined by Whit Athey. I have looked at the bases the 4 siblings had from their Dad. Those Dad bases made up patterns. Based on those patterns, other Dad bases were added to those siblings within those pattern areas. After those bases were added, mom bases were added where the siblings were heterozygous. The changes were documented in a Base Tracker.

Start stop using access min max – AAAB Mom Pattern

I can just look at my previous Blog to see what I did for my dad pattern. The results of this query:

mompatternquery

get copied to this spreadsheet where I added a column for Pattern:

mompatternspreadsheet

That was my big time saving step from my last query. Before I run each Min Max Total Query, I check a regular Select Query to make sure I have the right pattern. For example, here is my ABBA Mom Pattern check:

abbacheck

In a few minutes, I have 111 Start/Stop Mom Pattern pairs. This time, I’ll add conditional formatting to point out the one position patterns:

startstoponepattern

These single patterns tend to mess me up as I’m looking for patterns, so I’ll take them out of my spreadsheet, but not out of my Access data tables. There were 10 of these. I don’t know if that is a lot.

Getting better starts and stop for the mom patterns

The next step takes a little while. I look at the [now] 95 Start/Stop pairs for the various patterns. I highlighted the overlapping areas in yellow:

mompatternoverlaps

Actually, the first pattern overlaps into the second also. Some of these may be caused by single location patterns. For example at Chromosome 1, when I got to ID# 548 I find this:

momchr1

There is an ABAA Pattern, but it only lasts for one position and then is on to an ABBB pattern. I copied the end location for ABAA and put it at the end of Chromosome 1 to check later and made note of the one position pattern:

onepatternchr1mom

After that, it makes more sense that the ABBB pattern Stop at 2314 goes into an AABB Pattern Start at 2317. Here is the adjusted Chromosome 1 for my siblings’ Mom Patterns:

cleaneduppatternsmom

I moved the first Start to the Start of Chromosome 1 and last Stop to the end of Chromosome 1 as they were already pretty close to those positions. All combinations of patterns are represented here except for ABAB. I don’t have a start and stop for the single patterns as I’ll be taking them out later.

Filling In Mom Patterns

Now that I have all the mom patterns and their starts and stops as well as I can, I will fill in the patterns. I’ll start with AAAB. First I use the Concatenate formula in Excel to get my starts and stops in Access language. Then I sort the patterns in Excel:

mompatterssorted

I have 19 AAAB Mom Patterns. Next I go into Access and create an Update Query using the table called tbl4SibsNewMomPatternsFillin. In the AAAB Pattern, I will want to fill in the missing A’s.

updatequeryaaabmom

This looks like a good query, but I want to track how many bases I’m updating, so this query would make it difficult to track that as I’m adding bases to Sharon and Heidi. So again, I will go with the simpler query.

aaabsimplerupdate

Here is the first Mom Pattern Fill-in update on the Base Tracker:

basetrackeraaabmom

I continued the same process down the Mom Patterns, filling in what was missing from each of the siblngs:

momfillinupdatetracker

In each case for each pattern, I added less than 5,000 bases to each sibling. I also added to my spreadsheet a percentage of overall phasing which is now at 89.1%. This is how the 4 siblings are phased on average. Jon, who tested with the Ancestry V2 is bringing the other siblings’ overall average down.

Principle 3 – Dad Bases From Mom Bases

This is the icing on the cake for me. After all the work of determining Patterns and Starts and Stops, I have an easy step to add bases. Principle 3 says if you are heterozygous and you know one of your bases is assigned to one parent, then the other base must be assigned to the other parent.

I had to look at my previous blog to see how I did this. Let’s see if this looks right:

udatedadfrommom

The first column makes sure that I am heterozygous as my 2 alleles are not the same. The 2nd columns says that I know that I got allele2 from Mom. The 3rd column says to put my allele1 as the one I got from dad. That seems to make sense. This results in 9523 rows of updates in 22 Chromosomes. In part 2 of this Update Query, I switch the alleles:

dadfrommomforjoel

This says if my allele1 is from Mom assign allele2 to be from Dad.

Summary of Pattern Filling In and Dad Bases from Mom Bases

btdadfrommom

Here the overall phasing is 90%, but I had a pretty strict measure of phasing. It involved alleles that Jon was not even tested for. Here we are getting a diminishing return. I could continue the process, but I won’t.

Next Steps

Now I have a good idea where all the crossovers are. I need to assign those to siblings. Then I need to figure out how to portray the final results.

Assigning Crossovers to Siblings

I might as well jump right in. I’ll try a Chromosome that McNeill has mapped. Actually, he only did the 3 siblings at the time, so it may be a little different.

Chromosome 7 Crossovers

This has been mapped by MacNeill to 3 siblings. Let’s see how my mapping compares. Here is the mom pattern:

momstartstop7

Here I have by my own ID’s the start and stop. Then I have gap to the next pattern. This may indicate an AAAA pattern. Under description, I have what the pattern changes are. Then I have the person assigned to the Crossover. Then I have the approximate location of the Crossover. The first line I have the description as ABBA to ABBB. Here, Jon (in the last position) was matching with me as I’m in the first position of course. Then he changed to match with Sharon and Heidi. So I assigned the crossover to him.

Look at the 5th line. The pattern is ABAB to AAAB. This goes through a gap of over 6,000 ID’s. That usually means there is an AAAA pattern there.  AAAA could go to AAAB easily, but to go from ABAB to AAAA would take two crossovers. I don’t have a good idea where the crossover is, so I’ll go to gedmatch. The good news is that I have already tried using visual phasing on this Chromsome:

chr7vismapjon

The crossovers that I looked at above in my spreadsheet were on the maternal side. So that would be the top part of the bar (green-orange). It looks like I have 11 or 12 maternal crossovers, if I did it right. Looking at the top part of the image above, notice the non-match areas. These have no blue bar below and have red areas above. These are important. The reason is that if there is any of these areas at any place, there cannot be an AAAA pattern for maternal or paternal. That means that all 4 siblings cannot match the same grandparent in any of these areas. The only potential AAAA patterns, then are at either ends of the Chromosome or in the middle. The middle locations are about 60-70M. Also note that I have Rathfelder as the same match for each sibling from 56-70M.

There is a discrepancy between my spreadsheet crossovers (7) and the visual above (11 or 12). The other problem is that I need a double conversion from my spreadsheet. The spreadsheet is in ID’s which refers to Build 37 locations and Gedmatch is in Build 36.

Before I start converting numbers, I’ll look at what I have for the Dad Crossovers.

dadpatterncrossover7

Here I added a position number for the Chromosome (Build 37). This matches up with the visual phasing above. What is missing would be the crossover for Joel after an AAAA pattern at the beginning of the Chromosome.

Where is Heidi?

As I look at the maternal visual phasing, I see that Heidi has 3 crossovers. On my spreadsheet, she doesn’t have any. One can be explained as going onto the right end of the Chromosome to an AAAA pattern, but what about the other 2 crossovers, in the middle of the Chromosome? I got these positions from an old file where I compared myself to my 2 sisters. Then I put those in a spreadsheet and converted them to Build 37:

findingheidi

The Chromosome position numbers in blue were where I had Heidi’s crossovers. I then went to my Access Database.

Heidi found

heidi

Here is an ABBB Mom Pattern that I missed. Going through the list, I updated my crossover list:

updatedchr7xover

Now I am up to 12 Maternal Crossovers. The AAAA patterns tend to fit in naturally. Note next to the first blue ‘Joel’. There would be no way to go to an ABBB pattern to an AAAB pattern without 2 changes. That is why an AAAA pattern is required within the other 2 patterns.

Paternal Crossovers – Chromosome 7

crossoverschr7

Here I only show 2 crossovers, where on my map above, I show 3. I am just missing my own crossover from AAAA to ABBB. This is at the beginning of Chromosome 7. Here is my database table for my Dad Patterns:

chr7dadpattern

The position I have highlighted would still be an AAAA pattern as I have A??A. So that is the last position with that pattern. Id 285993 is the first spot I have the ABBB pattern, so I chose the crossover as ID# 285992 (under App. ID):

 

dadcrossover7

Here is what MacNeill had for 3 siblings at Chromosome 7:

macneill-chr7

What is now clear from have 4 DNA tested siblings is that my first crossover is paternal and not maternal. For my first crossover to be maternal, I would have had to have gone from an AAAA pattern to an ABBA pattern which would have been a double crossover. Having my brother Jon (the last ‘A’) tested made that clear.

Summary

In this Blog, I have looked at the Mom Patterns created by 4 siblings. Based on those patterns I have filled in alleles from other siblings. I have also filled in alleles for heterozygous siblings. This is based on the Mom allele being known and assigning the other allele as from the Dad. Then I looked at assigning crossovers to the various siblings. Based on the Patterns, it seemed clear who the crossovers should be assigned to. I then checked the crossovers I had with a visual phasing based on gedmatch. This showed where I was missing crossovers, which I was able to add using Chromosome 7 as an example.

Next: How to show the final results?

DNA Phasing of 4 Siblings When One Parent Is Missing: Part 8

Dad Patterns

In my last Blog, I looked at the Whit Athey 3 Principles and used MS Access to assign bases to the paternal or maternal side for the 4 tested siblings of my family. The next step is to look at Dad Patterns. I have been doing this by querying for a pattern and then scrolling down for start and stop positions. This has been quite tedious. It occurred to me that there may be another way to do this.

MS Access Min Max Functions

Access has a function that finds a minimum or maximum value in a group. In this case the group can be Chromosome.

AAAB Dad Pattern – Access to the rescue

 

aaabminmax

To get the total line I hit the summation [totals] icon in the Show/Hide Group. This adds a Group By to each field to in the Total row. Here I looked for the Minimum and Maximum ID for each chromosome for the AAAB Dad Pattern. That is where Joel’s base from dad was the same as Sharon’s. Sharon’s base from dad was the same as Heidi’s and Heidi’s base from Dad was different from Jon’s. Here is the output for the AAAB Dad Pattern:

aaabminmaxresults

This step has revolutionized my work as it saves me from scrolling through 100’s of thousands of dad base AAAB Patterns.  This takes about 2 minutes vs. the old way which seemed like an hour.

The upside of this method is that it is fast. The downside is that it only finds the minimum and maximum of a pattern within a chromosome. It doesn’t find all the breaks in the patterns within the chromosomes.

Using this method, in a couple of minutes I have 91 Start and Stop locations for all the possible patterns – except for AAAA.

Here are the sorted results for Chromosome 1:

dadpatternchr1

Note that there are some overlaps that will need to be resolved. However, there also clean breaks such as between ABBB and ABAB. ABBB stops at ID# 19797 and ABAB starts at 19837. Also note the last line. AABA has the same Min and Max ID#. This means that this is a single AABA pattern apparently within the AABB pattern.

Looking at the Table

In this step, I’ll look at tbl4SibsNewDadPattern and use the Access Pattern Mins and Maxes to get more accurate Start and Stop points. My spreadsheet above shows that ABAA starts at ID 52. I scroll up from there:

chr1tabledadpattern

At ID# 18 I see ?AG?. I can imagine that being an ABAA pattern, so why not start the ABAA Dad Pattern at ID# 1? Out of 680,000 ID’s, that doesn’t seem too much of a stretch.

Next it seems like the ABAA should stop somewhere before ID# 6605. I’ll hasten the process by a query that looks at the case where Sharon’s base from Dad is not equal to Heidi’s Base from Dad:

abaa-stop

Clearly, there is a break at ID# 5127, so I’ll use that.

chr1dadpatternstartstop

Here, I’ve added a finer Start and Stop for Dad Pattern ABAA. What that means is that in this segment of Chromosome 1, I got my DNA from one of my dad’s grandparents as did Heidi and Jon. Sharon got here DNA from the opposite paternal grandparent.

Here is the Start/Stops filled in:

chr1dadpatternfilledin

I highlighted the 57205 as a reminder that I needed to add an extra ABAA pattern in later. There is a gap between ABAA and ABBB of 1477 ID’s where there is a likely AAAA pattern, which means the 4 siblings got their DNA from the same paternal grandparent.

Finished Start Stop Dad Pattern Spreadsheet

I took out the single patterns and re-sorted by pattern. Then I wrote a formula to get the locations in Access language:

dadpatternstartstop

Next I made a copy of my working table in Access to a new table called tbl4SibsNewDadPatternFillin. I’ll use this to fill in the Dad Patterns.

Filling in the First AAAB Pattern

In this pattern, I will be filling in all the missing ‘A’s of the AAAB pattern. I won’t fill in the B as I won’t know if an ‘A’ or a ‘B’ belong there. Here is my first update query:

aaabupdate

This says if I am missing a base from dad in any of the AAAB Pattern areas that I am in and Sharon has that base, I’ll take the base she has. I can save a little time, by adding on to that query:

joelaaabfromsharonheidi

It is important to put the second ‘Is Not Null’ and ‘Is Null’ on a separate line as that is the ‘or’ line. Otherwise, I would only get the Sharon from Dad and Heidi from Dad bases where they equaled each other.

First I run the query to make sure it shows what I want.

aaabqueryex

It does [although, see below. For one thing I missed the ID criteria in the 2nd line of criteria!]. If I had the criteria all on one line, I wouldn’t have gotten the Heidi from Dad bases where Sharon is missing a base (ID# 63) and visa versa. I will want to check my query later, so I can check it at least two ways. One way is to check at ID# 63 and 99 to see if that base was added. The other way is to see if the Update Query updates 49094 lines as that is the number of lines in the above query.

When I went to run my query, I got this error:

udateaaaberror

Before I give up on this double query, I’ll try one more thing:

heidiorsharonaaabtojoel

Here I say if the conditions I mentioned above apply give either Heidi’s base from Dad or Sharon’s base from dad to me. I note that the update is for 49094 rows, so that seems on the right track. The reason why I don’t mind doing a double query here is that either Heidi’s base from Dad or Sharon’s should always be the same in an AAAB pattern.

I ran this and now I am checking ID# 63:

erroraaab

Unfortunately, Access gave me a -1 instead of Heidi’s C Base from dad. Part of why I wanted to do the one query is so I wouldn’t have to add the 2 queries. However, instead, I’ll just add a line to my base tracker:

basetrackernew

That means that I am back to my simpler query. Sharon should add 3975 bases from Dad to my bases from Dad:

3975row

Heidi was going to add over 2200 of her bases from Dad before Sharon gave me hers. Now it is a lower number:

heidibasestojoel

Now check Line 63:

line63

My base from Dad still isn’t filled in. But that is a good thing. When I checked my double query above, it gave me areas outside the AAAB Pattern area. ID# 63 is actually a different pattern. So that is why the number was so high also. The lesson learned is to keep the queries simple.

Now I’ve updated my Base Tracker for the AAAB Dad Pattern:

aaabbasetracker

Note that the Heidi from Dad Bases didn’t go up in the second round of this query. After she had gotten her extra Dad bases from me in the AAAB region, Sharon didn’t have any extra ones to give to her that I hadn’t already.

nodadbasestoheidifromsharon

AABA Fill-in

This time Heidi will be left out and Joel, Sharon and Jon will get new bases from dad based on others from the AABA areas. This is the same simple query as before, except that the ID#’s are different:

aabafillin

Here is Jon’s first bases from Dad from one of his siblings:

jonfromdad

This brings up an interesting point. There may be cases where Jon has a phased base at a location which his DNA test didn’t cover.

AABB Fill-IN

Here there should be Bases for all siblings. Wherever there is an A and an missing A, add it, and the same for B. Again my first query is the same except for the ID#’s:

aabbfillin

On the AABB bases from Dad, Jon doesn’t have a lot to add to Heidi’s bases, but Heidi has a lot to add to Jon’s:

aabbbasetracker

abaa dad pattern fill-in

Here we start with Joel being updated with Heidi’s bases from Dad because Sharon is the lone B.

abaa

There are more rows updated as the ABAA Dad Patterns had more regions than the other patterns.

In my last update query, I made a mistake:

jonfromjoelmistake

I’m not sure if it makes a difference. I said that in the case where my base from Mom is not null, give Jon my Base from Dad where he doesn’t have any. To check, I run the correct query:

abaaquerymistake

This shows that there are still 2063 bases that didn’t get added to Jon from my bases from Dad. I will add them now. Plus I will add that number to the previous 29113 bases I added to Jon’s bases from Dad from my bases from Dad.

abaatracker

As there were 3 siblings the same in this pattern, I again took 2 rows to add the bases to the table.

ABAB Dad Pattern Fill-in

ababtracker

Jon now has more bases phased than he had tested on his paternal side. He already had more than he had tested on the maternal side.

ABBA and ABBB Dad Pattern Fill-ins

basetrackersummary

As expected, Jon made out best in this Pattern Phasing.

Mom Bases From Dad Bases

This is the part of the project that seems ironic. My dad who wasn’t tested for DNA is now supplying bases to his children that were from their mom. Here I’m looking for where the siblings are heterozygous. In those cases where there is now a Dad base from the patterns and a mom base is missing, we can fill it in.

First, I am making another copy of my table called tble4SibsNewMomPatternFillin.

Here is my first Mom from Dad Update Query:

joeldadfrommom

It says where I am heterozygous and my Dad base is my 2nd one put my first base in as the base I got from Mom, but only if she doesn’t already have a base there. The last part is just an extra precaution so that I don’t overwrite anything.

In the next query, I just reverse the Joelallele1 and 2 to get 12,000 more rows of phased DNA:

momfromdad2

Summary of Mom Bases from Dad Bases

trackermomfromdad

Check the numbers

I have been adding up the rows added. But now I will check my table to see of the Total Bases Phased added up. And the answer is:

countfromtbl

The numbers are pretty close. The above Heidi from Dad is higher than my tracker. I’m guessing the table sums are correct and mine are a little off. The means that Heidi’s paternal phasing should be a little lower.

Part 8 Summary

  • The use of MS Access Min and Max functions to get Dad Pattern starts and stops saved a lot of time
  • It still takes time to verify those starts and stops
  • The Base Tracker makes it easier to track the numbers and the process. It is also interesting to see how the % phased goes up with each round of updates
  • I wasn’t expecting the numbers from my base tracker and actual updated bases to reconcile perfectly, but most of the numbers did. It is possible the discrepancies are from the 2 minor errors I made and tried to correct along the way.

 

 

DNA Phasing of 4 Siblings and One Parent: Part 7 (Starting Over)

In my last blog, I found a few errors when I was checking some odd results. This lead me to think that it would be better to start the phasing process from the beginning. The beginning means using 4 siblings’ raw data and my mom’s raw data. This time I will be more methodical and keep track of the results. I have a new spreadsheet called The Base Tracker. Every step that I take, it will keep track of the bases from each sibling when they assigned to a parent.

A New Table

First I’ll create a new table from the raw data. I’ll start with my mom, me and my 2 sisters as they are all tested using Ancestry Version 1.

3sibtable

I called the table tbl3Sibs.

Next, I combined tbl3Sibs with Jon’s Ancestry V2 results into a new table called tble4SibsNew. I made sure I had a right connect on the arrow. That means that I wanted everything in the 3Sibs table plus what was in Jon’s information. If I had left it an equal join, I would have lost the bases that are in Version 1 but not Version of the AncestryDNA results.

mergejonwsibs

It is important here to connect by rsid. I made the mistake of connecting by IDs last time. As the different AncestryDNA test results versions had different ID’s, this produced crazy results. I also used only Chromosome 1-22 as there are too many special cases for the X Chromosome.

tbl4sibsr1

Then I used a count function to count the number of bases each sibling had. I also figured out how many blank lines there were out of the 682549 and subtracted those 8229 sibling blanks from the total to get 674,320. I’ll use that number to figure out the percent phased. This is the Count Query showing the Totals button in the Access Ribbon:

countrawbases

The results of this query were put in the RawBases Row below.

My New Base Tracker: % Phased

basetracker

The first column has the step taken. P1 is Principle 1. JoelFD is the Joel from Dad column, so all the Dad bases are on the left and mom bases are on the right. This table will give me the % phased for each sibling.

Principle 1 Query – Homozygous Siblings

This Principle is on the Principles from a Whit Athey Paper where you have 2 bases the same and each one is from each parent. The last time I did this, I may have had too much in a query at a time. This time, I’ll do the query separately for each sibling.

First, I opened up my tbl4SibsNew in design view and added more fields to put the new dad and mom bases.

newbasefields

First, I copied the table, so I’d have the raw data table with no additions. I called my new table tbl4SibsNewPrinciples. That is where the phased bases will go.

Here is a simple Principle 1 Update Query for me:

joelprinciple1

It says where I am homozygous, put both those bases in my JoelFromDad and JoelFromMom columns in the new tbl4SibsR1Principles.

joelprq

That little query phased over 900,000 of my bases into Paternal and Maternal sides.

I was interested in seeing the effect of Jon’s testing using AncestryDNA V2:

jontracker

Jon has a ways to go to catch up on being phased. This is due to the differences in AncestryDNA V1 and V2.

Principle 2 – Homozygous Mom

Here if my mom has the same base twice, one of those has to go her child. Here is a query to update my mom bases. As my dad’s DNA was not tested, he gets a non-applicable in that column.

joelpr2

Note that I have a criteria ‘Is Null’. This means only update this base if there is a blank there already. Here is the Principle 2: Homozygous Mom summary:

p2summary

Here I don’t know why my Principle 2 Bases were so low. I think it is because I made a mistake above, so I’ll do these steps over from the beginning.

Here I get more consistent results for my mom bases:

pr2joelfix

Here is the revised Principle 2 Summary:

prtrackerrev

Jon’s results also changed to be more realistic to where he was after Principle 1. I can also use the Access Count function to check these numbers:

countpr2

All the numbers match up except for JonsFromMom. For some reason, the spreadsheet is showing a higher number of Total Bases from Mom for Jon of 540956. If I subtract that from his Principle 1 bases from Mom, I get 272250. I’ll put that in as his Principle 2 bases from mom and assume that I made a mistake in writing down Jon’s Principle 2 base from Mom number.

pr2summaryreconciled

I suppose it’s like reconciling my bank statement. I assume that these are Jon’s mom bases filling in where Jon didn’t have test results that lined up with the AncestryDNA V1 results for his mom and siblings.

Moving On To Principle 3: Heterozyous Siblings

This works when the child is heterozygous and has one base phased to one parent. Then the other base is phased to the other parent. It appears that this would have to work just from the mom side for now to fill in the dad side. That is because we haven’t filled in the ‘fromDad’ side with any Heterozygous sibling results yet.

pr3joel

This query says in the situation where I am heterozygous and I get my allele2 from mom, assign my allele1 to be from my dad. But only do that where there isn’t already a JoelFromDad base there.

However, this raises a question. Here is the same query without the ‘Is Null’ criteria:

pr3joellarge

As you can tell, I am beginning to doubt my work. The question is, if there has been no previous addition of Joel bases from dad based on my heterozygous results why is there a difference between the two queries?

I checked Sharon’s results and found that she didn’t have the same situation. Where she was heterozygous, she didn’t have any bases from dad assigned to her.

Here is a query showing my problem:

p3problem

It is not a problem for phasing, but more for what I will enter into my Base Tracker. Fortunately, I can do a Count Query:

countjoelfromdad

This shows that my JoelFromDad bases have gone up by 25589 somehow since I last tracked them. This means that I should use the larger number for my Base Tracker.

Here is the Principle 3 Summary in my Base Tracker:

p3summary

In a few hours, I’ve phased over 4 million bases. And that time includes making mistakes and fixing them. All siblings are phased at over 80% at this point except for Jon. His Paternal phasing is lagging at only at one half.

I suppose that this is the time for me to say that it takes 20% of your time to get 80% of the result and 80% of your time to get the last 20% of your result.

Summary Part 7

  • After making mistakes, it feels good to start with a clean slate
  • Principals 1-3 of the Athey paper are easy to implement using MS Access
  • If a mistake is found, it is usually good to start from a clean table of data and fix it from there
  • The Patterns don’t lend themselves as well to Access and take more time to get
  • Having a Table to track the work and results is helpful and interesting.
  • In the next Blog (Part 8), I will be back looking at filling in the Patterns areas

Raw DNA Phasing of 4 Siblings Using One Parent’s DNA: Part 6

In my last Blog, I was still playing catchup in going from my original 3 sibling phasing, to incorporating my brother’s new DNA results.

Missing Principle 2 for Jon

Here is Principle 2 from the Whit Athey Phasing Paper I’ve been using:

Principle 2 –If data from one of the parents are available, and that parent is homozygous at a SNP location, then another almost trivial phasing is possible
since obviously that parent had to send the only type of base s/he had at that location to the child
I checked this in MS Access. Here is the query:
homozygousmom
This says if mom is homozygous, here allele1 is the same as her allele2. For those if Jon has null values in his FromMom column, then I skipped this step.
homomomerror
Clearly, I did mess this step from position one. As I was doing my previous steps, I thought that Jon’s results were very sparse.
principle 2 fix

For this, I will again use the update query.

homomomfix

In this case, I didn’t bother writing ‘Is Null’ under the JonFromMom column. That is because even if there is something in there, I would just as soon overwrite it, as this is such a basic principle. I only missed 481,000 rows.

second part of fix

Now that I have mom’s bases, I will go back and fill in Jon’s dad bases based on his mom bases. Those are also Principle 2 fillns where Jon is heterozygous. I don’t mind doing these updates in Access as they are so easy.

dadsbasefrommomjon

This says in the case where Jon is heterozygous and his mom has allele1, put Jon’s allele2 in as Jon’s allele from Dad. This query says if a Jon’s has allele1 from his Mom, the allele2 has to go to his dad.

jonallele1fordad

So that is an easy way to update over 7,000 rows in a few minutes.

Next, On to Mom Patterns

It’s a good thing that I added these mom bases to Jon, because now it is time to look at mom patterns. From Athey:

In the next step, we use the pattern on the mother’s side to fill in as many more cells as possible. Finally, we can project the information in those newly filled cells back to the father’s side using Principle 3 again.

 This procedure will be the same one that I used for the Dad Patterns.
aaab mom pattern
 I might as well go in alphabetical order. In this pattern, Jon will not match the other siblings.
aaabmompattern
This works, but it doesn’t include the areas where there are missing mom bases. So I will use it to get rough ID’s. There were about 45 AAAB Mom Patterns that I found. Perhaps the rough ID’s will do.
AAAB Quality Control

My spreadsheet counts the numbers of ID’s between patterns.

aaabqc

619 is close to the cutoff that I had set. I went back to the original spreadsheet and found other AAAB patterns between the Stop and next Start. So I can combine those 2 AAAB Chromosome 15 patterns. I checked another pattern with about 700 ID’s from the Stop to the next AAAB Start. However, there was another pattern between, so those were a valid Stop and Start. There were about 45 AAAB Mom Patterns or about 2 per chromosome which seems like a lot.

ABAA Mom Pattern

The query should be similar to the previous one. If Sharon isn’t the same as her siblings, we will have an ABAA Pattern.

abaaqpattern

This pattern was easier to figure out. There were about 35 of them.

aaba Mom pattern

This is the one I should have done second if I had wanted to stay in alphabetical order. I checked a few with differences of about 500 between Stop and next Start, but they looked OK. There were a few single allele patterns.

aabb mom pattern

I have 3 criteria for this one:

aabbmompatternquery

I had to enter that Sharon’s allele from mom could not be the same as Heidi’s allele from mom or I would get a lot of AAAA Patterns. When I looked for these, there appear to be 19 AABB maternal patterns.

abab mom pattern

Again, this is a bit out of alphabetical order. This query is not unlike the previous one.

ababq

When I make Heidi’s mom base different from Sharon’s mom base, that gives me the ABAB pattern:

ababresults

Here I have Excel on the left where I am entering the results from the Mom Patterns that I found in Access.

ababworksheet

The jump in Chromosome 4 from position 6M to 37.9M indicates a change in pattern. That is entered in Excel on the left. The change from the previous pattern is shown as 7544 ID’s. ID’s should be the same as SNPs.

A change in Chromosome is an obvious Stop and Start:

ababex

There were about 30 ABAB mom patterns for me and my 3 other siblings. I’ve done:

  • AAAB
  • AABA
  • AABB
  • ABAA
  • ABAB
abbb mom pattern

It looks like this must be the last Mom Pattern. This is the mom pattern where I show my individualism – unlike my siblings who have the same mom base:

abbbq

Here’s an ABBB example:

abbbex

In this case on Chromosome 9, there is a jump from position 38M to 71M. However, the SNP (or ID) count between the two is only 190. That means this must be an area where the SNPs are not counted for some reason, so I would think that I could continue the Mom Pattern through that area. However, when I look at my Access table, I see this:

chr9ex

Above ID 370485 is a different pattern of AABB in the last four columns. This would have come out when I merged all my patterns and I would have had to fix it then. However, I might as well get this as good as I can now. As it is, there will be a discrepancy to work out:

chr9discrepancy

The AABB pattern started at 369193 which is before the ABBB Pattern stopped at 370295. This means I need to go back to the Table:

 

chr9problem

 

Here is position 370295 where I had the ABBB Pattern ending. However, this is a a very small pattern, going only up to ID# 370290. Before that is the AABB pattern again. Here the AABB Pattern picks up again.

chr9aabb

Here is how I corrected my Chromosome 9 Mom AABB Pattern:

chr9aabbcorrected

However, note that I had to break my 500 ID/SNP rule. That 51 represents the tiny ABBB Pattern between two AABB Mom Patterns.

Here is the start of the AABB pattern at 369193:

morechr9issues

First note, that it would actually start at 369192. Before that is a single ABBB pattern. Then above that in the first row is an ABAA pattern. The first row is the end of an ABAA Pattern that I already recorded in my spreadsheet at ID# 369181, so that doesn’t need to change:

chr9abaa

At 369190 there is a single pattern of ABBB. This will be noted in my spreadsheet, but not entered as a start/stop position.

Re-Sort the Mom Patterns by Pattern

Now I have 426 lines of Mom Pattern Locations. I need to sort them by pattern and hope there are not many weird issues like I found in Chromosome 9. I will also take out the single patterns. When I do this, I get quite a mess. Here is Chromosome 1:

chr1mompatternsorted

Here we have quite a few nested patterns.

chr1fix

The first AABB pattern is a single, so I can take that out, but what do I do with the AABB Stop? It looks like that was a single also, so I can take that out.

chr1fix2

The AAGG is between a CTTT and an AGG? which would turn out to be an AGGG. What I had previously described was a single pattern going to another single pattern within a valid non-single pattern.

Next, starting at ID# 6608 I have three starts in a row which cannot be good. Looking at the first two patterns of ABAA and AABB, they look like they could be good.

fixchr1

I’ll add a ‘G’ where the cursor is above and call that the end to a very short ABAA Mom Pattern.

chr1fix3

Here is the corrected ABAA stop. I highlighted the next ABAA Stop in yellow as that will need work.

Next I’ll look at the ABBB Start at 19885. It looks like I missed the previous AABB Stop at 19884.

chr1fix4

 At least that makes for a clean cut. I made a note of my correction:
chr1fix5
I also made note to look at the next AABB Stop (in yellow). Now there is a Start for an ABBB followed by a Stop for an AABB which looks fishy. Here is the area following ID# 19885:
chr1table
It seems that there are about 5 ABBB patterns followed by a single AABB Pattern, a single ABBB pattern and another AABB Pattern. As this looks confusing, let’s look at the full table for the single ABBB Pattern area at ID# 19905:
fullspreadsheet

Time for Quality Control

Are there any errors here? Principle 1 says that if a person is homozygous, then one base is from the dad and one is from the mom. I have CC and Jon has TT. My assignment is correct, but Jon is missing a T from his dad.
Let’s look at this Query:
jonqc
This looks for missing Dad bases for Jon that should be there where Jon is homzygous. It turns out he is mising about 1300 results:
jonqcresults
I ran this query to see if Jon was missing any mom bases and he wasn’t. I also ran this query for myself and saw that I was missing dad bases. I will have to re-run this update to the current table. This is not a problem as this is an easy thing to do in Access.

Just Like Starting Over

Based on the errors that I’ve found, I will start from scratch in Part 7

Raw DNA Phasing of 4 Siblings Using One Parent’s DNA: Part 5

In my past 4 Blogs on the topic, I have started the phasing of my siblings raw DNA using my mother’s raw DNA. I used Whit Athey’s applicable paper on the subject, MS Access and have checked my results with the work that M MacNeill’s similar analysis of my raw DNA. I started out using 3 siblings in the analysis. Part way through, my brother’s results came in, so now I am looking at 4 siblings.

I parted a bit with the Whit Athey analysis in that where he went to a visual analysis, I decided to look for the change points in the data. I then used those points to perform an Access query to update the various patterns found. When I left off my last Blog, I had just located Starts and Stops for the 7 different paternal patterns for the 4 siblings.

Sibling Patterns in an Excel Stacked Bar Chart

Today I was getting a headache trying to find a way to put the paternal patterns information into Excel in a Bar Chart. Here is the best I could do for the first 2 Chromosomes:

pattern-bar-chart

The spreadsheet data format is above. I chose Stacked Bar Chart. Then I had to transpose the row and columns. The slight glitch is that I had to create an extra duplicate pattern when that occurred to get the results in one bar per chromosome. I used the end point for each pattern. The bar assumes the start is at zero for each chromosome which isn’t totally accurate, but close enough, I suppose for a bar chart. The bar chart is meant to represent all the paternal changes in patterns for me and my 3 siblings.

When I check the change in patterns to the number of crossovers in the work of M MacNeill, it appears that I have missed a pattern change on each of the above crossovers. Hopefully, I will find them as I go through the process and re-check my work. I guess I’m batting 2 out of 3 now.

Finding the Two Missed Paternal Crossovers

It is possible that they aren’t missing at all. Perhaps in all the work I did to represent the information in an Excel Chart above, I misrepresented the work I had done. Here is my spreadsheet for Chromosome 1:

chr1startstop

Here is what M MacNeill has for Joel, Heidi and Sharon’s paternal Starts and Stops on Chromosome 1:

macneillstartstopchr1

It looks like MacNeill has 6 paternal starts and stops. I don’t count the last one as that goes to the end of the Chromosome. Again, I run into the conversion issue between my Build 37 and MacNeill’s Build 36 work. Here is what happen when I put the approximate crossover locations side by side:

macneill-checkchr1

This shows that we both have 6 crossover, which is good. It gets a bit confusing. Note that I had to add a crossover at my position 23289397. That is because there was a gap. That is the gap where the 4 siblings must match the same paternal grandparent. Normally, there shouldn’t be any gap between the Stop from one pattern and the Start of the next one. So it turns out I’m doing better than I thought. That is encouraging to know. For the last pattern, I don’t have an entry, because the crossover is at the same spot as the Stop of the previous pattern.

However, I am comparing my 4 sibling work to MacNeill’s 3 sibling. Also MacNeill had a start of 742429 and mine was higher. That means that there must be a pattern between 742,429 and 1,062,638. I checked, and there aren’t many extra locations there. I suppose I did as well as I could do. I do wonder where Jon’s Chromosome 1 crossovers are, though. Perhaps he has a double crossover with another sibling or one that is very close in location to another sibling.

Gedmatch Check

Here is how my 3 tested siblings and I look compared to each other at Gedmatch in the browser:

chr1-4sibs

The lines don’t match up perfectly, but I have for 3 crossovers for Sharon in a row. I am J, my brother Jon is F and only has one crossover of his own. These are the combined paternal/maternal crossovers. When I map it out using a visual method, it appears that Jon may have no recombination in Chromosome 1.

jonvisphasechr1

If that is true, it would make him a good candidate for finding Frazer or Lentz relatives at Chromosome 1.

assigning Paternal crossovers for Chromosome 1

Assigning crossovers is getting a little ahead of myself, but I would like to see if I am on the right track.

assignxoverchr1

Here the Dad4Pattern represents Joel, Sharon, Heidi and Jon. There appears to be a logic to assigning these crossovers that I have in the XSib column. The first crossover I have going to Sharon. That is between the first Stop and the second Start. Sharon’s B in ABAA goes to an A in the AAAA homozygous region. That means all siblings match the same paternal grandparent in this region. The next crossover goes to me as I’m represented by the A in the ABBB pattern. The 3 other siblings remain matched to each other. Then Heidi gets the next crossover as she goes from matching Sharon and Jon to matching me. The next to the last crossover, I had as Jon. But it has to be Heidi as she went from matching me and Sharon to matching Jon. If Jon was the one that changed to matching Heidi, the pattern would have gone to AAAA. Likewise, the last crossover I had as Heidi, but it has to be Joel. I went from matching Sharon to matching Heidi and Jon.

There are a few cross checks to the method. One is to check to see what MacNeill has done. Another way is to check to see known matches. I noticed above that Jon had matches to known matches on my paternal grandmother on either side of the wrong crossover that I had assigned to him, so that was likely not a good crossover. Another note is that there is a least one other homozygous region. That is between the 191M stop and 192M start above. That means that there should be an AAAA pattern stuck in there, but it is not necessary to know at this stage.

Time to Push the Button: Back to Phasing by MS Access

A lot of the above work was to make sure that I had the right number of crossovers in the right places. I was worried that if I didn’t, that I wouldn’t be applying the right rules to the right areas of the spreadsheets.

First aAAB

Here are my AAAB paternal Patterns with start and stop in Access language:

aaabstartstop

Here are some examples of fixes that are needed within these AAAB areas:

aaabexample

Basically, if there is a blank in the first 3 positions, it should be filled in by the non-blank in that area. But how do I write that into a formula? Here is one way:

update

This says if Joel’s Dad base or Heidi’s Dad base is null, put Sharon’s value in. I ran that and it updated 2165 rows.

Next:

aaabfillin

 

This time only about 1400 rows were updated. The last time we fill in Heidi’s value if Joel or Sharon had a missing Dad base.

407rows

I’ll check my work. I see a flaw in my logic already. I shouldn’t have put the 2 ‘Is not null’s’ on the same line and Access sees that as an and. I wanted an ‘OR’, so they should’ve been on separate lines. Here is the revised query putting Heidi’s Dad base in the empty spot of Sharon and Joel.

aaabrev

 

Note that I had to put the position criteria for the Paternal Patterns in twice also. See, I had missed 4121 rows. I went through this with the 2 other siblings.

AAbA Paternal pattern

aabappatternstartstop

The AABA also has potential for filling in.

aabaexample

In the first line there is ?T??. In and AABA pattern, we know that the first and last position will also need to be T. In the second line, we don’t know what to fill in. In the 3rd line we can put a C in the last column.

My ID locations for AABA look like this:

idsaaba

The queries will be similar to last time except that they will involve Joel, Sharon and Jon and leave out Heidi.

aabaqueryfillin

This was a more popular fill-in. In the query above, if I had a Dad Base and Sharon and Jon didn’t it went to Sharon or Jon. I then did the same thing for Sharon and Jon. Here I check my results.

checkaaba

These are the same ID Lines shown as above before I did the query. This now shows that Joel, Sharon and Jon have the same bases for this AABA Pattern. This is even true when we don’t know the base Heidi has on her paternal side as in the first row.

aabb pattern

aabbpattern

Here is what an AABB Pattern area looks like before I fill in the bases:

aabbpatternarea

The rule is if Joel or Sharon or Heidi or Jon have one base and the other is missing, fill in the missing one with the one that is there. However, as in the second row, where Heidi and Jon are both missing, nothing may be filled in. This will take a little thought. Perhaps I can do this in 2 steps:

aabbfillinquery

This says if I have a base AND Sharon doesn’t, give her my base, then also do the same and fill in Heidi’s base to Jon’s missing base from dad. This query filled in a little less than 20,000 bases with the push of a Run button. Then I’ll do the opposite:

aabbquery2

This time Sharon’s base goes to me and Jon’s goes to Heidi. I’ll check good old ID 45494.

aabbcheck

It looks like I filled in what I wanted to and didn’t fill in what should not have been filled in.

The other combinations will be variations on what has just been done. Either 3 will match each other and one won’t or there will be 2 pairs that match each other within the pair.

abaa Fill-in

This is the first pattern of my siblings’ 1st Chromosome.

abaaquery

Another ho-hum 20,000 rows filled in.

Here Heidi fills in Joel and Jon:

abaa2

The updated rows go down the 3rd time I run this.

abaa3

abab Dad pattern

This will be where Joel and Heidi match paternally and Sharon and Jon match.

ababquery

Jon is probably missing a lot of bases due to being tested on with the Ancestry Version 2.

ababq2

abba

abbaq1

This query says in the ID areas where there is an ABBA pattern put Jon’s dad base into Joel’s missing dad base area and put Heidi’s dad base into Sharon’s missing dad base area in the table called tbl4SibsPPatternFillin.

joelsharonabba

Here, I made a mistake. Note that I had Access overwrite a bracket “]” that didn’t get erased. That means that I will have to run this query again to get my bases back from Jon. Here is what the above Update Query did.

mistake

Fortunately, Jon still has the bases that I gave him. I’ll redo the query to get my bases back.

fixqurey

This query will fix my error. It says if I have an end bracket as a base, fill it up with what Jon has.

fixresults

abbb – the last paternal combo

This time I won’t touch my bases, but make sure that Sharon, Heidi and Jon match.

abbbq1

heidiabbb

jonabbb

So that should have filled in all the paternal patterns.

Finding the AAAA ‘Patterns’

This should be a little trickier. Previously, we had identified one AAAA pattern in Chromosome 1. This can be seen between 19 and 23 below. All the paternal areas are orange.

jonvisphasechr1

There is no other area on this Chromosome that is all orange or all green for all siblings. However, how do I identify all the other quadruple A patterns? It is not as easy as the other patterns because this pattern may occur within other patterns. I could make a chromosome map for each chromosome as above, however, it becomes a chicken and egg problem. It would be nice to know the AAAA areas so I could draw the map.

Here is a spreadsheet where I checked the number of IDs from the Start to the previous Stop.

startminusstop

When the amount was more that 500 IDs, I highlighted that number in yellow. Above between the Stop of ABAA and the Start of ABBB on Chromosome 1, there was an AAAA pattern for 1478 position numbers.

The next yellow area is in Chromosome 2 which is a larger region of AAAA
pattern.

Here is an interesting situation:

chr6and7

This yellow area is above the amount I chose as a minimum of 500 positions. However, as I look at my worksheet, I see that the ABBA pattern extends beyond ID# 285124. So I will do a new query based on the new fill-in table. Here is the new ABBA:

newabba

newabbaresults

This shows that the ABBA pattern goes to the end of Chromosome 6. I can fill in the extra letters by hand and adjust my spreadsheet.

However, what about Chromosome 7?

Chromosome 7 appears to have an AAAA pattern for about 847 ID#’s. This is how MacNeill mapped my Chromosome 7.

chr7macneill

He would have the ABBB Paternal Pattern with me being the A. This is how I had visually mapped Chromosome 7:

chr7joel

These end pieces are difficult where there is a half identical region. I will stick with my as I do notice a small match with my Hartley-related 2nd cousin Pat:

chr7joelpat

This may become more clear once my brother Jon is mapped out. In fact Jon is Fully Identical with me in that region:

jonjoelchr7

Jon also matches cousin Pat in that same spot:

jonjoelchr7pat

Ergo, I must match Pat aka Hartley DNA at the first part of Chromosome 7.

Here is Jon mapped out no Chromosome 7:

chr7vismapjon

Jon (F) and Heidi (H) got a full dose of Hartley DNA at Chromosome 7.

That was a bit of a long exercise, but the intention was to prove to myself that an AAAA pattern of over 500 positions (or my ID#s) is a valid AAAA Pattern.

Filling in the aaaa’s

As I have now convinced myself that this small area was indeed an AAAA area, I can proceed. I made a formula in Excel that takes the other Patterns’ Stops and Starts and puts them into Access language.

aaaagaps

The formula adds an ID# to the beginning and subtracts one from the end so the AAAA patterns have their own range.

 

Inspecting my work

Having found a pattern boundary that was off at the end of Chromosome 6, I will check the other boundaries. According to my spreadsheet, the first AAAA should end at 6604.

aaaaspreadsheetchrq1

The actual Access Data table is different by one:

chr1correction

That mean that the I need to add an ‘A’ to the missing space and change the start of the ABBB Pattern from 6605 to 6604 – a pretty minor change. I made a few more minor changes. However, I’ll hold off on making the AAAA pattern changes for now. That is in case the boundary changes again due to other changes I’ll be making.

Filling In Mom Bases From Dad Bases

This is about how far I got last time when I was trying to phase 3 siblings. My interpretation of this portion of the process is to look at the heterozygous siblings. Where they have a new base on the Dad side, they will know that the other base goes on the Mom side.

Finding heterozygous siblings

First I made a new table to put the new information in. It is just a copy of my last table of the fill-ins based on patterns. Here is a query just to find the alleles for each sibling that are different from each other:

heterozgous4sibs

Here is the Update Query. I better get it right as it is doing a lot of things:

updatemomfromdad4sibs

The first part has the criteria that makes a person heterozygous. I forgot to make sure that the mom base was missing, so I need to add an ‘is null’ phrase:

4sibheteromomfromdadrev

This may not be necessary, but just makes sure I am not overwriting anything that is already there. So when mom’s base is missing add the base that isn’t the dad base. Or more specifically, add allele2. This changes 39,260 rows.

Next to get the opposite effect, I change most of the alleles 1’s to 2’s and the 2’s to 1’s.

otherallelemomfromdad4sib

That changed over 10% of all the results. To check, here is a query from the older un-updated table showing just my results where I’m heterozygous and my allele1 was from Dad:

qryoldtable

Here is the updated table.

updatedtablemomfromdad

The G, C, C, G was added as a base from my mom – along with 10’s of thousands of other bases.

Summary

In overview:

  • Principal 1: I’ve added the homozygous sibling results. This says a double base for a sibling means that they got the same base for each parent.
  • Principal 2: I forgot to add the homozygous mom results to Jon. I’ll do that in the next Blog
  • Principal 3: This is for heterozygous siblings. When one base is known for a parent and the other parent base is missing, the other base is assigned to the other parent
  • Next I looked at the paternal patterns and made note of where they changed
  • For each paternal pattern region I filled in the bases that could be filled in based on that pattern
  • Then based on that new information, I filled in more missing mom bases from the dad bases in areas where the children were homozygous. This is Principal 3 reapplied.

 

Using Triangulation Groups to Map My Wife’s Chromosomes

I would like to update the Chromosome Map I have for my wife. The one I have now looks like this:

marie-cmap-old

This map is based on programming by Kitty Munson Cooper. It doesn’t look too bad. It only has 3 colors: 2 blue colors for her dad’s side and one color for her mom’s side. The red is based on the results from her 1/2 great Aunt. The blue is based on paternal grandmother cousins.

Here is Marie’s family of DNA tested relatives:

marie-relationship

From bottom left to right we have the following that have had their DNA tested:

  • Fred, Fred’s sister
  • Pat, Buddy
  • 1st cousin John
  • 2 Paternal Aunts
  • Dad and Mom
  • Aunt Esther
  • In addition I have results from a Dicks DNA study

The Rule of 1st Cousin, 2nd Cousin Combo

In my previous blog, looking at my mother’s side DNA, I came up with a rule. That rule said:

In a triangulation group between a person’s 1st cousin and a second cousin, the second cousin will be able to identify which grandparent the 1st cousins share.

I would like to apply this rule to my wife Marie as she has 1 first cousin and 2 aunts who have tested their DNA. These 3 are like cousins as the common ancestor of grandparent are the same. Marie also has 2 first cousins once removed tested. These would be similar to 2nd cousins as they both have great grandparents in common.

mariepaternalrelationships

Basically, right now if Marie compares herself to John or her 2 Aunts Lorraine and Virginia, she doesn’t know if the shared DNA is from Estelle LeFevre or Edward Butler. However, a triangulation group (TG) with Fred, Fred’s sister, Pat or Buddy and John, Lorraine or Virginia, will show that DNA to be from Estelle LeFevre. Further, not just the match in common to the TG will be from Estelle, but the entire segment represented by Marie’s match to John or her 2 Aunts will be from Estelle.

That’s My Theory, Let’s Try It Out

I have a boatload of combinations to try this theory out on. First, I’ll go with Fred, Fred’s sister, John, Marie and her 2 aunts. First I go to Marie’s one to many menu at Gedmatch and I choose Marie’s relatives I just mentioned. Then I choose the Matching Segment CSV. This downloads a file of all the matches between these 4 people, making it easy to find TGs. I could have used the Chromosome Browser but that only hints at TGs. However, I will use the Chromosome Browser to focus my search.

Chromosome 14 example

chr20ex

The browser show’s Marie’s matches to:

  1. Aunt Lorraine
  2. Cousin Pat
  3. Cousin Buddy
  4. Aunt Virginia

Here is how I have the Triangulation Group (TG) beween these 5 mapped out:

patbuddytg

This shows a Triangulation Group (TG) between Pat, Buddy, Aunt Lorraine. Aunt Virginia and Marie.

Now a few observations:

  • The chromosome browser view above is from Marie’s point of view
  • Marie’s matches with Pat or Buddy (#2 and #3 on the browser) represent the DNA they share from either Martin LeFevre or Emma Pouliot. It is also likely that one segment is shared from each of Marie’s great grandparents.
  • These segments are represented in the Kitty Munson Cooper Chromosome Map at the top of this Blog.
  • The long segment shared between Marie and her Aunt Lorraine is from one of Marie’s grandparents. Because Pat and Buddy also match Aunt Lorraine, we may say for sure that the segment Aunt Lorraine shares with Marie must have come from Aunt Lorraine’s mother Estelle LeFevre.
  • Marie’s DNA she got from her grandmother Estelle is shown below.

munsonmaprev

The previous map had 2 blue segments on Chromosome 20 representing either of Marie’s paternal grandmother’s parents. We didn’t know which. Now it shows the one large segment taking up all of Chromosome 20 from her known paternal grandmother. The green should say Estelle LeFevre b. 1904 – not Emma Pouliot b. 1894.

chromosome 15

On Chromosome 15 here are the same people, but in the following order: Aunts Lorraine and Virginia, Pat and Buddy.

mariechr15

kittymarie15

Interestingly, this time the program doesn’t overwrite the light blue. This is because the match for the light blue extends further than the match for the green. When I mouse-over the original map, it shows that the light blue match starts at about position 34 while the green match starts at about 35. Because of this, the entire blue match shows until it’s end and then the green match is shown.

This blue, light blue, green progression represents 3 generations of Marie’s ancestors on her paternal grandmother’s side.

Paternal Grandmother Results Using 1st Cousins, Once Removed

Here are the results of comparing Marie’s cousin and two aunts to her two 1st cousins, once removed. Here I correctly have Estelle LeFevre  b.1905 labeled for the green areas.

mariepatbuddychromomap

 

I didn’t bother doing the comparison for Marie’s X Chromosome. The reason is this. The X Chromosome that her dad gave to her, he got from his mom. That means that the green must extend for the whole X Chromosome. For that matter, the light blue would also be Marie’s paternal grandmother’s parents.

How to Identify Emma Pouliot?

That seemed to work well for Estelle, but is it possible to be go back one generation further and identify one of Marie’s great grandparents by DNA? I think so. Let’s take a look. This time, I don’t want to look at Marie’s 1st cousin John or her 2 aunts. The reason for that is that when I compare Marie to those 3 people, the common ancestor would be Marie’s grandparents. I want to compare Marie to her 2 first cousins, once removed to find her great grandparents – or in this case her paternal grandmother’s mother Emma Pouliot b. 1874 in the Province of Quebec.

tgchr1

We are using the same principle as before, but going one rung up the ladder. I will look for a Triangulation Group (TG) between Fred, Fred’s sister, Pat, Buddy and Marie. Once I find that TG, I will take the DNA match between Pat or Buddy and Marie and assign that DNA match to Emma Pouliot.

Chromosome 1

Let’s try this out on Chromosome 1:

pouliotlefevrechr1

  1. Fred’s sister (2C,1R)
  2. Fred (2C,1R)
  3. Pat (1C, 1R)
  4. Buddy (1C, 1R)

It looks like there should be an overlap between #1 and #3, but they have no match there in the middle of the Chromosome. However, on the right side, there is a match between #1 and #3. Using my plan, I’ll assign Emma Pouliot to the green segment. In this case, #1 and #2 representing the parents of Emma Pouliot are larger. It would stand to reason that these would belong to Emma also. However, for consistency, I will just map Emma to the green segment.

When I tried to map this using the Kitty Chromosome Mapper, it didn’t show up as Estelle had already filled up that slot.

Chromosome 2

chr2buddyfred

This time the two 1C’s, 1R are on the top and the smaller segments representing Marie’s two 2C’s, 1R are on the bottom. Is there a TG? I lowered the gedmatch thresholds, which I didn’t do for the first part of this Blog. Here is the match between the 1C, 1R and the 2C, 1R:

gedmatchchr23

They match on Chromosome 2, but a little below the 7 cM threshold. I’m not worried as I’ve read that in a TG a match is likely to be good down to 5 cM. That means that I will map Pat’s #1 green segment to Emma.

Unfortunately Estelle is taking up the space where Emma would be mapped on Kitty’s Mapper. This seems to be a trend.

Chromosome 13

I did the same exercise as above and mapped with no results. This time I took out the other references in the area of Chromosome 13 that were blocking Emma and got this:

emmachr13

Now we see Emma’s DNA in lighter green on Chromosome 13. The downside was that I took out some of Estelle’s DNA to the right of the light green area so Emma’s DNA match with Marie would show. Hey, I created this map; I can do what I want with it.

So that is what I found. My wife can claim hold to a lot of her grandmother’s DNA, but only 3 identified segments of her great grandmother’s DNA based on this procedure. Of course, one may say that every instance of finding the parents of Emma would be the same as Emma. Based on that idea, I’ll try another map.

emmaestelle-map

This map isn’t really any better, it is just meant to show that whether you have the parents or the child, it fills up the same area on the map. Note I have the same problem here where Estelle fills up the older Emma DNA on Chromosomes 1, 2, and 13.

Marie’s Dicks DNA

The idea for this section should be more straightforward. I have been involved with a Newfoundland Dicks DNA project. There are many people who have tested their DNA and found through triangulation to be likely related to the Newfoundland Dicks family. For example, here is a list of the Dicks Triangulation Groups (TGs):

Dicks TG Summary

These include the Dicks TGs except for the most recent few. Joan is near the middle of the chart. She is my wife’s mother. All I have to do is see if Marie is in any of the same TGs that her mother is in. Then I can take the match with the other 2 from the TG and assign that DNA to the appropriate Dicks ancestors.

Here is what was added (in yellow):

mariechromomap

All that was added was a probable Dicks segment on Chromosome 2. There were other Dicks segments but they were “behind” Upshall matches. That means that they are the ancestor of Frederick Upshall. The reason that the Chromosome 2 match stood out was that it was a match with Joan (Marie’s mom) and not with Marie’s great Aunt Esther (represented in red above).

Check Your Work

Fortunately, M MacNeill [prairielad_genealogy@hotmail.com] has looked at my wife’s family’s Chromosome 1. He has looked at the raw DNA which is more under the hood than what I am doing. Here is a small portion of his work. He phased Marie’s father and 2 aunts and then went back and put that information into Marie’s DNA.

macneillchr1marie

The interesting thing about MacNeill’s map is that it includes the DNA for Marie’s 4 paternal great grandparents. The cross-hatched area is where it was not possible to determine the crossover point. At any rate, MacNeill points out some errors in my Chromosome mapping for Marie. He has sections of salmon or pink indicating Richard’s paternal grandparents where I have Marie mapped to Richard’s maternal side.

This is when I go back to my spreadsheet for the details:

mariechr1notg

In the first part of Chromosome 1, it is clear that Marie does not match Pat, Buddy, Fred, or Fred’s sister, so I cannot call that a TG or a Paternal grandmother match for Marie. My original rule said that Marie had to be in a TG for my segment extending plan to work.

Here is where I removed 2 paternal grandmother segments on Chromosome 1:

mariechr1rev

However, on the right of Chromosome 1,  MacNeill has more paternal grandfather DNA mapped where I again have paternal grandmother. In my defense, this was an area where, according to MacNeill, Fred and Fred’s sister appear to match on both the paternal grandmother and grandfather side. I couldn’t have known that as I only had information for the paternal grandmother side.

One other point on Emma pouliot

emmaphoto

Above, I had mapped Emma Pouliot to Marie on Chromosome 1:

emmamappedsegments

Here is a larger view of what MacNeill had for Marie’s family’s Chromosome 1:

richard-chr1

The legend on the top line is difficult to read, but Pouliot is the darker red. More specifically, that would be Emma Pouliot. Marie is on the bottom line. The last vertical white line in Marie’s dark red area represents position 198. As I had mapped Emma from 197 to 207, that would put her in the end of the dark red area of Richard’s Pouliot maternal grandmother, before Marie’s DNA switches to the DNA she got from her dad’s paternal side in the salmon color. So at least my work agrees with MacNeill in this little area.

Summary and Conclusion

  • Most of the additional segments came by phasing the unknown grandparent using the 1st cousins’, once removed shared DNA
  • This method could work well along with the visual chromosome mapping that Kathy Johnston developed.
  • There is a fine distinction with mapping the DNA of one’s known grandparent and mapping the DNA of the parents of that known grandparent. When mapping to the parents, the individual segments could be from either parent. When mapping to the known grandparent, that larger segment could contain compound segments of the parents. It is a subtle distinction, but one that should be maintained in my opinion for future research.
  • Using the Kitty Mapping tool is fun and instructive as to how DNA works. It can be manipulated to show what one would like to be shown. For example, when I wanted to highlight the Emma Pouliot segment, I was able to do that.
  • Even with no paternal and maternal grandfather DNA matches for Marie, I have been able to fill out her Chromosome map quite a bit – mostly on her paternal grandmother side.

More Lentz/Nicholson DNA and the 1st Cousin, 2nd Cousin Combo Rule

A little over a year ago I decided to test my autosomal DNA at 23andme. I had tried the other 2 testing companies and was curious as to what 23andme was like. Perhaps I would have some more matches that I didn’t know about. The most interesting match that I found was my mother’s 1st cousin once removed. Her name is Judy. I was asked  her if she would  upload her results to gedmatch.com for analysis. She tried a few times without success. Recently, she went back and successfully uploaded her results, so now I can write about them.

Lentz/Nicholson Lines

Judy descends from our common Lentz/Nicholson Line. Others that I have been in touch with and have tested for DNA are just from the Nicholson Line. The Nicholson Line is in red. The Lentz line is in yellow. The Lentz/Nicholson Line is in orange. From my early school days, I recall that if you mix yellow and red, you get orange. Judy and Joshua are on the orange line. My mom shows as green, but for the purposes of this Blog, can be considered orange also.

lentznicholschart

The bottom row indicates people that have had their DNA tested. There is also a further out line of Nicholsons that I don’t have included here.

I haven’t identified anyone yet who is only from the Lentz Line.

Here is Judy’s match with my mom at Gedmatch:

judymomgedmatch

comparing cM’s for first cousin once removed

Their total match of 269 cM is actually on the low side for 1C, 1R. Here is a Bettinger study showing the average DNA shared between 1st cousins, once removed as being in the 400 cM range:

bettinger1c1r

Not to be outdone by Blaine Bettinger, I also looked at some of my own family relationships to see how they compare:

joelcmstudy

So with just 8 people, I came to the same conclusion on the average amount of DNA that 1st cousins, once removed shared. Blaine took thousands of people to come to his result. Another side point of interest is that my brother Jon shares over 150 cM more with my dad’s first cousin (583.7 cM) compared to what my sister Sharon shares with my dad’s first cousin (421 cM).

Chromosome Mapping for Mom

Judy’s new DNA results update my mom’s Chromosome map in many of the red areas below:

momchromomapoct16

More About Judy’s DNA

Based on the tree, we can see a few things.

lentznicholschart

  1. If Judy matches Joan or Carol, that means the DNA has to be from the Nicholson side.
  2. If she matches my mom and no red people, then the DNA could be from Lentz or Nicholson.
  3. If Judy matches just Joshua, the DNA could be from Lentz, Nicholson or from the wife of William Lentz.
  4. If she matches my mom plus Joan or Carol, the match would be from the Nicholson side. If Judy matches Joshua plus Joan or Carol, the same should apply. However, this would have to be a triangulation group.
Judy’s Nicholson (or Ellis) DNA

William Nicholson

Here is an example of Judy’s Nicholson DNA. She matches both Carol and Joan who are not descended from the Lentz family.

judynicholson-match

These 3 are also in a triangulation group (TG) which means they match each other on Chromosome 13. Here is what that TG looks like on a family chart:

judytgnicholson

The same segment of DNA from Chromosome 13 has come down to these 3 women. We know that the DNA was from either William Nicholson or Martha Ellis but we don’t know which. So when I said that this was her Nicholson DNA, it could really be either Nicholson or Ellis DNA – but not both.

In addition, like the next example below, Joan and Carol can know something else. They can know that the 51.4 segment that they share on Chromosome 13 is with Carol’s grandmother, Nellie Nicholson and not with Nellie’s husband. Before this match with Judy, they wouldn’t have known this.

a fine distinction on the Nicholson DNA

Here is an example of case #4 above where Judy matches both Carol and my mom, forming another triangulation group on a portion of Chromosome 18:

tg18

tgmomjudycarol

Again Mom, Judy and Carol all Share this specific segment of either William Nicholson or Martha Ellis. There is something else interesting about this chart. Judy and mom share that same DNA from Ann Nicholson. Usually when Mom and Judy match, they wouldn’t know from which of the couple the DNA came from. In this case their Chromosome 18 match came from Annie Nicholson.

That means Judy and my mom could assign that part of their DNA to Annie Nicholson. Also I could modify the Chromosome map for my mom that I did earlier in the blog. I think that I will do that.

chromomap18rev

On Chromosome 18, what I had as red is now in yellow. That means that the information is more specific. Interestingly, the orange on that Chromosome would also be from Annie, but because of who was matched to get to that, we say that it would be from one of Annie’s parents. It gets a little confusing. So at the point where the bar goes from yellow to orange, we are seeing further into the past when we see the orange part.

The practical part is that whenever someone matches my mom’s maternal side on that portion of Chromosome 18, she will know that it is a Nicholson (or Nicholson ancestor) match and not a Lentz match.

What about me?

I wonder if I share any of this Annie Nicholson DNA. Here is how Judy matches my brother Jon and 2 sisters Sharon and Heidi on Chromosome 18:

judychr18

Below is a chromosome map that I updated now that my brother’s  DNA results are in. This indicates the DNA that my 3 siblings and I got from our 4 grandparents. The maternal side is in orange and green and the paternal grandparents are shown in purple and blue. My brother Jon’s yellow match with Judy above is within the orange area of the bottom F bar below. Sharon’s green bar match with Judy above corresponds to the second orange segment below on the S row. Heidi’s blue bar match above corresponds to her second orange (Lentz) segment below on the H row. I match my mother’s father’s Rathfelder side for most of Chromosome 18. That is shown in green in the 4th bar below (J row). So I didn’t inherit any Annie Nicholson DNA here where my 3 siblings did.

chr18maprev

This method maps to our 4 grandparents, so Nicholson is not shown. Annie Nicholson is one of my 8 great grandparents. However, we now know that two of my sisters’ and my brother’s orange bars came from our great grandmother Annie Nicholson by way of her Lentz daughter.

Judy’s Lentz (or Nicholson) DNA

Speaking of Annie Nicholson, here she is with her husband Jacob Lentz:

Jacob Lentz

Below is another triangulation group from Chromosome 1 that Judy is in with my mom and Joshua:

judymomjoshuatg

Here is the family chart again:

tg1chart

This time the DNA may be from either Jacob Lentz or Annie Nicholson – but not both. This same segment of DNA came down 2 generations to my mom, 3 to Judy and 5 generations to Joshua. We might guess that this is Lentz DNA. That is because there are no Nicholson only matches there, but we don’t know for sure.

The Rule of the 1st and 2nd Cousin Combo

In two of the examples above, there was a 1st and 2nd cousin combo – including a triangulation group.

In the first case, Carol and Judy are 1st cousins, once removed. As such, they couldn’t tell which grandparent’s DNA that they shared (Nicholson or Nicholson spouse). Enter my mom as Carol’s 2nd cousin. She is further out relationally and they match on the Nicholson Line at Carol and mom’s great grandparent level. This identifies Carol and Joan’s DNA as coming from the Nicholson side. How is this helpful? Now anytime that Joan and Carol match someone on that same segment, they will know that the match has to be along the Nicholson Line going up through the Nicholson ancestors. This narrows down the possibilities a lot.

The rule: In a triangulation group between a 1st cousin and a second cousin, the second cousin will be able to identify which grandparent the 1st cousins share.

I’m sure that is why it is said that it is important to test second cousins. The reason that I haven’t come upon this situation before is that this combination hasn’t come up on my father’s side. I have results of my father’s first cousin’s DNA and my own 1st cousin’s once removed, but no second cousins to compare.

Summary and Conclusion

  • Cousin Judy has been helpful in filling in my mom’s Chromosome map
  • Judy’s DNA results will also be helpful also as I fill in my siblings’ and my own chromosome maps.
  • Judy’s results have partially phased the DNA. That means, for my mom she can tell at least for one area, not only where she has a maternal match, but also that it is a maternal grandmother match (Nicholson).
  • I had thought that there would be a way to identify some of the Lentz DNA. However, I don’t see a way without finding a Lentz cousin who doesn’t descend from the Nicholson side. This would have to be a second cousin or further out.
  • Once Nicholson DNA is identified, it is more likely that the remaining non-Nicholson DNA could be from the Lentz side. However, that is not sure, it just represents more than a 50% likelihood.

A Hartley Z17911 STR Tree

In my previous blogs on Hartley YDNA, I mentioned that my terminal SNP is Z17911. That is a part of the L513 Branch of the larger L21 Branch of R1b. Here is what the L513 Branch looks like. This Tree represents those who have taken the Big Y Test in the colored area above.

l513chart

My Hartley Z17911 is difficult to see but it is slightly to the left of the middle and to the left of an orange area. The checkerboard pattern shows the part of England that my Hartleys are from. As far as I know I am the only Hartley that has had SNPs tested positive for Z17911, or for L513, for that matter.

STRs and Z17911

However, quite a few Hartleys have tested their YDNA. They have tested STRs. As a result, it is possible to do a comparison to others taking this test. STRs are not SNPs which are a more definitive designation of where you are on the Y Tree. However, they can suggest what SNP you should belong to. I belong to an L513 and the Administrator Mike is actively looking for others that might be in L513. As a result, Mike has put out lists of people that appear to be L513 based on their STR patterns. I have mentioned in past Blogs that some of those people are Hartleys.

Here is a recent list:

suspectedz17911

The first on the list above is me. Then follows three other Hartleys. Administrator Mike has grouped these other 3 Hartleys next to me. Based on their STRs, he has grouped them as Z17911. This is even though these 3 have not tested for Z17911, L513, or probably not even for L21 which is way up on the Y Tree. The row with the orange, green and yellow above the results has what is called STR Rates. These are the rates at which each individual STR mutates. Some are very slow and some mutate relatively quickly. The selected mode above is likely the mode of L513. This will come in handy later on in this Blog in a few ways.

Z17911 and Signature STRs

It turns out that STRs form themselves into groups. That means that for groups of people that are related by YDNA have combinations of STRs that are almost always unique to that group. Here I will make an assumption that the other 3 Hartleys are indeed Z17911, even though they haven’t tested their SNPs.

In the results section to the right of the Hartley names are the values for each STR marker. The colored values are the ones that vary from the L513 Mode. These values, especially the ones that are in the darker colors will result in a signature for these Z17811 Hartleys. The darker colors indicate more of a variance or distance from the mode. Another way to put it, is that the L513 mode is the older value and the Z17811 Hartley numbers are the newer values for the STRs that have mutated away from the L513 mode.

Up or Down?

These Z17811 STRs may mutate up or down. The blue shaded numbers are going down and the reds are going up. Why is this important? It is important as I’d like to build a tree from these 4 Hartleys. I will need to know who is descending from whom. Or at least, which of the 4 branches of Hartleys may be the oldest.

Here is an example:

str-example

These are some of the results of our 4 presumed positive Z17911 Hartleys. It is  difficult to create a mode of these results as the mode is the value which occurs the most. If there are 2 of each value, which value do you use? This happens the #449 Marker results. I am 31 at the top, but there are two 31’s and two 32’s. I have the L513 mode at the top of the image. The value for Marker #449 is 29. That means I have the older 31 value and the other 2 Hartleys have newer 32 values. They are moving away from 29.

Defining Hartley Z17911 STRs

Next, I looked at all the STRs where the 4 Hartley had different results. The other results are interesting but in comparing Hartley to Hartley they don’t matter if they are the same. Well, they might matter if there was a STR that mutated up and back down again, but the chance of that happening should be relatively rare.

hartley-strs

Here I have compacted 67 STR results to 12. This is a good time to point out the STR rates. The rate for 447 is about 0.09. The rate for CDYb is 35. That means that CDYb will change over 350 times as fast as 447. Another point is that Hartley #4 seems to be a special case. He was categorized as a non-L513 person which was thought by the L513 Administrator Mike to be a mistake. I don’t know if that was ever resolved. I do note that some of his STRs are a bit different than the other 3 Hartleys, but not totally different. I also note that he has tested positive for R-L21, so perhaps this has been resolved.

But Wait, There’s More

I had forgotten, there is one more Hartley in the group. He doesn’t have a Hartley last name but believes that he is descended from the Hartley Line. Great news. I will call him Hartley #5.

5-hartleys

Previously, I had missed Marker 481. Also when I copied things, my numbers didn’t get colors, but that’s alright. Now I have 13 markers and 5 Hartleys.

References for Trees

I’m aware of 3 references for creating STR trees.

  • Robert Laurence Baber – He has written quite a few articles on STR trees. I have not read them all yet. I downloaded a 5 part study he wrote but I don’t totally understand his method yet – though I understand some of the principles. He uses an upstream STR mode as I tried to do above.
  • Robb Hand Drawn Tree example – He compares a hand drawn tree to the Fluxus software. Although he likes the hand drawn version better, he learns some from using difficult to use the Fluxus software
  • Gleeson STR Tree – Maurice Gleeson gives a method and example of how to build a STR tree

More on Modes

I seem to be getting hung up on Modes:

more-modes

Here I have the L513 Mode and various modes from downstream SNPs. The 458 mode went quickly from 17 for L513 to 19 for S5668 and then appeared to stay there for quite a while.As a result, I chose 19 for the mode. Had I just looked at the older L513 Mode, I may have come to a different conclusion as to which way this STR was mutating.

Then the very fast CDYb seemed to move up in a regular way through the ages. Of course, in reality, it could have gone up and down over that period of time, but we wouldn’t know it if it did. I picked the lower 39 value for the CDYb STR at the Hartley mode level. To the right, I have the GD or generational distance from the Hartley Mode. This says that these Hartleys should be related at about the same level – around 4 or 5 GDs or STR mutations.

A 5 Hartley Likely Z17911 STR Tree

Here is the tree I came up with. It is along the line of and in the form of the Gleeson STR Tree example mentioned above:

5hartleytree

  • The Hartley common ancestor’s signature STR values are listed at the top. The mutations from that are shown down the branches to the individual Hartleys.
  • I also added some dates assuming that on average, a STR will mutate every 170 years given a test of 67 STRs. The lower horizontal lines above happen at the 2 or 3 STR mutation rate (which is the same as the GD). The top horizontal line happens at a GD of 4 or 5. The Hartley #5 horizontal line is up higher as the 358b mutation is a double one from 16 to 18.
  • In the above scenario, Hartley #5 is by himself. Another scenario would have Hartley #4 and Hartley #5 together as they share a mutation at 389b. Instead, I chose the above tree due to Hartley #1, 4, 3, and 2 each sharing 2 STRs.

This image shows some of my rationale for the tree:

5hartley-groping

I chose the double combo of 25-32 that Hartley #2 and #3 shared. I also chose the double combo of 17-40 (in yellow) that Hartley #1 and #4 shared. Other possible single combos that I didn’t choose to group were the two step 16>18 mutation for Hartley #4 and 5, the 11 mutation for Hartley #1 and 5 and the 16 mutation for Hartley #1 and 3. The principle used is to try to get the tree as simple as possible. This is what Gleeson calls the parsimony principle. My assumption is that my groupings achieve that goal.

How Do the Hartleys Compare to the Z17911 Mode?

In comparing Hartleys to the Z17911 Mode,  I go from the age of surnames to before the age of surnames. There are 4 that have tested positive for Z17911. They are Hartley (me), Goff, Thomas and Merrick. In that group, the level of GD’s and the variance in surnames indicate a pre-surname common ancestor.

So the GD’s will be further back also.

z17911gds

Here I am assuming no back mutations. Under the previous tree I assumed that Hartley #5 had a back mutation at CDYb. Due to the volatility of this marker, it is sometimes ignored in these analyses. Notice that now the range of GDs is from 3 to 8. Again, I group Hartley #1 and #4 together and Hartley #2 and #3 together.

z17911tree

Hartley #4 has the GD of 8. This is due to 2 double mutations. That pushes back his connection to Z17911 to around the year 600. This seems to be pushing back to a possible age of Z17911. Z17911 positive Thomas has submitted his Big Y results to YFull, so I am hoping to get a date from YFull for Z17911. It will be interesting to see what they come up with. The structure of the tree is the same as the previous Hartley Tree. I just adjusted the relative heights of the horizontal arms.

Summary and Conclusion

  • STRs from 5 Hartleys who have tested their YDNA seem to indicate a relatively close relationship – at least in YDNA terms
  • I have had my SNPs tested and the administrator of the R1b-L513 project has grouped the other STR-testing Hartleys in the same Z17911 group as me based on similar STR patterns. That is quite a way down the SNP tree.
  • If any of these Hartley were to test for for the L513 SNP or further down for Z17911, it could confirm what the STRs seem to be saying. Then I wouldn’t be the lone SNP tested Z17911 Hartley
  • SNPs create a solid reliable marker for relationships. It is best to have the SNP relationship established through testing before doing this type of STR analysis. However, even without SNP testing, STR trees can be informative
  • Back mutations and the different mutation rates leading to unpredictable STR mutations are the 2 major variables that make STR testing less accurate than SNP testing
  • The weakness of the SNP testing is that many have not done it. The other issue is SNP testing may only take you up to a certain date. After that date, STR analysis is  more useful
  • STR testing is best used in conjunction with SNP testing
  • Making a STR tree takes some practice and knowledge of STRs and mutations.
  • This YDNA research and resulting connections could shed light on the history of this branch of the Hartley family over the past 400-1400 years or so.