Raw Data Phasing: Part 3

This Blog is Part 3 documenting my learning process of phasing my DNA raw data using:

Part 1 and 2 Recap

  1. I imported 4 sets of raw data into Access from AncestryDNA after taking out the zeros that the Excel software produced for the no-calls.
  2. I used Access Queries to apply 3 Whit Athey Principles. This resulted in many phased bases for me and my 2 sisters.
  3. I put the phased A’s, G’s, C’s and T’s for each siblings into 2 new columns for each sibling
  4. This resulted in 6 new columns. The first 3 of these six were for the paternally based bases. These resulted in a pattern which was either in the form of AAB, ABA, or ABB.
  5. The Athey Paper did not emphasize the AAA pattern or considered it a non-pattern. While specific AAA results within another pattern area are by chance, there are other areas where 3 siblings match the same grandparent where there will be an AAA-only Pattern.
  6. I separated my results into 3 patterns using Access: AAB, ABA, and ABB
  7. For each of those results, I noted where those patterns changed.  I did this by looking at the ID numbers. Breaks in the ID numbers were considered changes.
  8. However, there were some cases where the changes occurred around missing bases. For these, I went back and noted a more precise position of the pattern change based on where the change would be if the missing base were to be filled in.
  9. I Made a preliminary bar graph using the first 3 paternal changes. These crossovers were mapped to myself and 2 sisters.
  10. Using the 3 patterns I developed Access queries to fill in the missing bases in the 3 paternal pattern areas.

So those were the 10 easy steps. Actually step 10 was difficult as there was quite a bit of refining the Access queries and quality checking the results. I needed 2 queries for each of the pattern areas. However, once I had the queries, it was the push of a button to update missing parental-received bases for 3 siblings within over 700,000 lines of DNA.

Back to Athey

This portion of the Athey Paper appears to apply to where I am now:

For some of the unfilled cells on the mother’s side of the table, we can fill in the alternative (other) base from the corresponding location on the father’s side of the table. That is, we know that the sibling with an empty cell got one base from the father, but the alternative base from the mother. Therefore, after the use of the Dad pattern fills in more cells, a newly filled – in cell in the father’s side of the table gives rise to a filled – in cell in the same position on the mother’s side–the alternative base to what was on the father’s side.

Unfortunately, I’m not sure what is meant above. My guess is that this relates to Principle 3:

Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base. This principle will be very useful in the present approach.

So now that missing paternal bases have been determined based on the patterns, it should be possible to fill in missing maternal bases for heterozygous children. First, I’ll do a Query to see if I can locate this situation. I’ll take my most recently updated Dad ABB Pattern Table update and query that. I’ll look at the situation where there are heterozygous results. Then, I’ll look at spots where there are missing bases from Mom.

Fortunately, I was able to come up with a slick looking Query for this situation:

mom-from-dad

Plus the Query design has some nice symmetry. The first criteria row of the query is for my (Joel) DNA. Reading across, it says Joel is heterozygous because my allele 1 does not equal my allele 2. Then it says that I have a base from Dad but not from Mom. This will show areas where the mom bases are missing in this heterozygous child situation.

mom-bases-to-fill-in

The truncated fields above are Joel Allele 1, Joel Allele 2, Sharon allele 1&2, Heidi allele 1&2. The next 3 columns are Joel, Sharon and Heidi from Dad. Then Joel, Sharon and Heidi from Mom (the last 3 columns). This shows that there are almost 12,000 of these Mom bases to fill in. Above the blue line are Heidi’s bases missing from Mom. Heidi is TC (heterozygous) on that line. Her Dad base is T. I love these binary problems. They seem well suited for the computer. That means that a query could not be too difficult to update almost 12,000 records. So Heidi’s Mom base will be C above the blue line. At the blue highlighted area, I am TC and my Dad base is C. My Mom base will be T on the blue line.

Looking for a Good Query to Fill In Mom Bases from Dad Bases

First, I copied my ABB Table to a new Table called tbleMomBaseFromDadBase. I will want to update that table with a new Update Query. I already have the first part of the query. Now I need my thinking cap. Even better than thinking, I can look at what I did before. Here is my old query.

allele1-query-heterozygous

This is difficult to see, but I split the problem into 2 alleles. What this says is when Sharon has a base from her mom and Sharon’s allele 1 is not the same as the base from her Mom, pop that allele 1 into her base from Dad slot.

For our situation we are doing the opposite. So we will switch Mom and Dad. This time we are using our Dad results to get some Mom results. I’ll also add a criteria to make sure the Mom result is Null, so I’m not overwriting anything. It will just be an extra precaution.

Basically, I want to make sure Heidi has a base from Dad and not from Mom. In that case, when her allele1 is not equal to her base from Dad, put that allele 1 in as her base from Mom. Drawing upon my vast experience in this area of about 1 week, I get this:

allele1dad-to-mom

When I preview the results, I get about 6,000 lines which is half of my previous query, so that seems OK. I’ll go ahead and update my new Table. I renamed my Query to qryMomBaseFromDadBaseAllele1 and copied it to do the same thing with Allele2. I’ll change the Allele’s 1’s to Allele’s 2 in the Query design. First I’ll do a Select (non-updating) Query to show what I’ll be updating with the allele’s 2.

allele2momfromdadselectquery

Here I added the ID numbers, so I can make sure my update went well.

Here is my Allele2 Update Query with the 3 siblings included:

allele2momfromdadupdatequery

The results:

momfromdadupdate

In the far right column is the Base Heidi got from Mom. It was updated on lines 2292, 2295 and 2299. In each case Heidi’s Paternal Base was T and the Maternally derived Base from Dad was C.

Here is my corresponding filled in Mom Base:

joelmomfromdad

My Dad’s T’s in 6 columns from the right were used to fill in the missing C’s in 3 columns from the right. Doesn’t it seem a bit ironic? Even though my dad was not tested for DNA, his “results” from this process are used to find the DNA I got from my mom who was tested.

A Premature End to This Blog and a New Beginning

This will be one of my shortest Blogs. I was both awaiting and not awaiting my brother’s DNA test results. Those results came in this week. The reason I was not awaiting was that I knew that I would need to re-start the raw data DNA phasing process once his results came in. With that, I’ll end this Blog and start a new one.

 

 

 

 

Raw Data Phasing Via Access, Athey and MacNeill: Part 2

In my last Blog on raw data phasing, I went through 3 principals that Whit Athey laid out in a paper on phasing raw data when one parent’s DNA results were missing. Using those principals, and the MS Access program, I was able to sort many of my bases and 2 sisters’ bases into ones we received from our mom and ones that we received from our dad. I checked a few of my results with a chromosome map made for me by M Macneill.

Paternal Patterns

I had gotten to the part of the Athey paper where he talks about paternal patterns of bases that the sibling combinations received. I noted a space between the first two paternal patterns that I looked at. Below the pattern goes from an ABA pattern to an ABB pattern.

change-in-dad-pattern-hilite

There was a gap between the ABA and ABB pattern where there was no ‘pattern’ as my 2 sisters and I shared the same base there. When my sisters and I all share the same base, that is an AAA “pattern”. That AAA area corresponded exactly to the area between the 2 yellow lines below in the chromosome map made for me by M MacNeill – prairielad_genealogy@hotmail.com .

macneill-chr1-hilite

In the map above, MacNeill was able to determine that my 2 sisters and I got our DNA from our paternal grandmother in the area between the 2 yellow lines. Further, the first yellow line described Sharon’s first paternal crossover point and the second yellow line described my (Joel’s) first paternal crossover point.

Finding All the Paternal Crossover Points

At this point in the Athey Paper, he recommended looking at the paternal pattern and filling in the missing bases based on the known pattern. I was looking for an easier way to do this, so decided to take a different approach. I decided that I would find all the paternal crossover points first. Then, armed with that information, I would create a formula that would fill in most or all of the missing bases for each pattern.

However, this required a modification of my database to make the work easier. I wanted a number to define the range of patterns, so that I could apply an easy query to add missing bases. I already had this but I hadn’t used it. Back when I imported the 4 sets of raw data into Access, Access assigned an ID to every row of data. That meant that I needed to add that ID into all the queries that I had done previously to make tables and further queries. This took a while, but I believe that it was worth it.

table-with-id

The ID is the first column.

I started going down all my data and noting the change of each pattern. I put the results into an Excel table. Here the Start and Stop numbers are the Access assigned ID numbers. The ID’s corrrespond with the number of DNA locations looked at. In this case there were a bit over a total of 700,000 of these locations for my mom, my 2 sisters, and me.

excel-pattern

Then I noted the patterns are repeating as would be expected. For example, my first pattern was ABA, but 3 patterns later, that same ABA repeated. My thought was to create a query just for ABA patterns. Then when scrolling down looking for changes, the separation between rows should be greater and it would be easier to see where those changes were.

Here is what my Access query looks like. I changed the query name to DadSpecificPattern.

dad-specific-pattern-queryquery

This particular query gives me the ABB pattern. I have the HeidifromDad base equal to the SharonFromDad base. That makes me the A and Sharon and Heidi the BB of the ABB Pattern. If you think about it, that also means in these areas that Heidi and Sharon will have their base from the same paternal grandparent and mine will be from the other paternal grandparent. I’m learning as I go. I’m sure that information will come in handy later.

My plan seemed to be good, but there was one catch. Once I refined my query, most or all of the blanks disappeared. That meant that the start and end points might not be exact. Here is an example of what I mean.

change-in-pattern-rouch

This is from my old Dad Pattern query with the blanks still there. The change from ABB to ABA happens at ID or line 19809. However, the new query takes out the blanks to make it look like the change is at ID Line 19826.

Here is what my DNA results look like so far without a filter (or query). The last 3 columns are the bases from Dad columns. There is a lot going on between lines 19809 and 19826.

pattern-unfiltered

Once I apply a formula to add bases, it will say something like: In the lines that have the ABA pattern where there is a blank at either A spot, replace the blank with the A that is there. If I apply the rule too late, I will be missing an area. Worse, If I were to use the 19826 cutoff, I may be still using the previous rule. That rule would say basically the same thing except, “Where the row is ABB and one of the B’s is missing replace the missing B with the one that is there.” If I apply an ABB rule to an ABA area, I’ll get bad results.

Long story short, I ended up recording a rough start and stop in my Excel Spreadsheet.

revised-spreadsheet-for-pattern

I started naming the segments, but realized that was not necessary. Some of the patterns were only at one point rather than in a long segment. I believe that is an anomaly due to a bad read, mutation or some other problem. Those are the ones in the spreadsheet that had no end point. It took me part of a morning to get all the paternal crossover pattern points for all 23 chromosomes. Fortunately for 3 siblings, the patterns are only ABA, AAB and ABB.

I just went back and checked the error points/aonomalies. I reran the Heterozygous Sibling Query and it fixed at least the first problem and hopefully the others. When I added the ID’s in, I had to redo all the queries quickly, so I suppose that is where the errors came in. That is not a problem as long as the problem can be found a fix can usually also be found. There actually weren’t that many errors. There are still some anomalies that are just anomalies. I have left those in yellow in the spreadsheet image below.

So in my spreadsheet, I have all the rough starts and ends for all the crossovers for my 2 sisters and myself. Here is the top part of the spreadsheet sorted by rough start:

rough-start-sort

Next, all I need are more exact start and end points. Here is the start of what I have:

pos-and-id-and-pattern

I picked this section because it looks pretty complete already. Note that my Start and Stop numbers are pretty close to each other. That means that there are no other AAA segments in-between. I had to do an additional Access query to add in the position numbers for the Start and Stop of each chromosome’s pattern change. This was important if I want to convert the results from Build 37 to Build 36 to compare to MacNeill’s work or to gedmatch.com.

Starting to Find Paternal Crossovers and Assigning to Siblings

Previously I had been calling the start and end of my patterns crossovers. These two terms aren’t totally interchangeable as the start or stop of a pattern may happen at the beginning or end of a Chromosome and therefor not be a crossover at that point. It seems like it should be pretty easy to find the crossovers. Look at the image above. The first and second rows show ABA going to AAA. The order in me and my siblings are JSH or Joel, Sharon and Heidi. The only letter that changes is the B to A. That is the position that Sharon is in, so the paternal crossover has to go to her. From row 2 to row 3 the pattern changes from AAA to ABB at Chromosome 1, position 23,288,828, Build 37. That doesn’t mean that 2 siblings have a crossover there as we are looking at the patterns, not the letters. It is actually the letter that stayed the same that represents the crossover here. AAA to ABB means: all the same (AAA) goes to one different and 2 the same (ABB) – in this case Sharon and Heidi). The one that is different is me and I get the crossover at this location. The next change is from ABB to ABA. This is a little harder to see. I would say that that this crossover goes to Heidi if my reasoning is right. BB was the same before and goes to BA. It must be Heidi that changed because now she matches Joel who didn’t change. I’ll need to figure out how to make better bar graphs in Excel, but here is how the beginning part my father’s Chromosome 1 broke up for 3 of his children. Or another way to look at it the vertical lines are where my father’s maternal and paternal chromosomes combined in each of his 3 children that we are now looking at.

excel-bar-chart-chr1

Where:

  • Series 1 is Sharon. Where the color goes from blue to orange is where Sharon has a change from one paternal grandparent’s DNA to another paternal grandparent’s DNA. The number to the right of Series 1 is the Build 37 Chromosome position number for Sharon’s crossover.
  • Series 2 is Joel’s first crossover (between orange and gray) and
  • Series 3 is Heidi’s first crossover position between gray and yellow [The same explanation under Sharon above applies to Joel and Heidi]

I’ll go back to the M MacNeill Standard. It’s like having an answer sheet to my questions.

macneill-chr1-hilite

According to MacNeill, I have assigned the crossovers to the correct siblings. In the above chart, just look at the red. I haven’t gotten to the maternal part yet, which MacNeill has in blue. The first 3 crossovers are where the red changes from light to dark or dark to light red. The difference in the MacNeill Chart is that his chart is split out one bar for each sibling. The other difference is that MacNeill has build 36 Chromosome position numbers and the numbers I have are from Build 37.

The Process

  1. Phase the siblings into maternal and paternal DNA using the principles that Athey outlines
  2. Find the paternal and maternal crossovers by pattern changes
  3. Assign the crossovers to the correct sibling using the pattern changes
  4. Assign the segments to the correct grandparent. This requires knowledge of cousin matches on the appropriate grandparent side.

That is the big picture which I am understanding as long as I don’t get too lost in the details.

Back to the Details: Fill in More A’s, G’s, C’s and T’s

I have been setting up my data for this, so hopefully, this will be easy. I now have 3 areas to look at:

  • AAB
  • ABA
  • ABB
AAB paternal update

Now I go back to my spreadsheet and sort it by Dad Pattern:

sort-by-pattern

The Start and Stop areas are the ones I want to update. First, I’ll copy my most up to date Table in Access which is tblSibHetorzygous. I’ll rename that tblDadPatternUpdate. Then I want to look for missing data and update the blanks using the AAB pattern.

In Access, I create a query with the new table.

dad-pattern-update-1

I chose the position fields and Paternal Pattern fields. I will change this to an update query which adds an Update To row. The criteria I want is when JoelFromDad = Sharon from Dad (AAB). Actually, I forgot, I was going to use ID criteria. So in the ID field, I need a lot of information. For the first AAB segment, I need everything between ID 45393 and 54155. This is what the criteria looks like:

aab-first-area

When I choose that area, I get over 8,000 lines. However, I only want to update when there is one missing value in the first 2 and the one that isn’t missing is not equal to the third. Here is the result of the above query in my first AAB area:

aab-patterns

I assume that the first blank should be a T. This would be one of the AAA results by chance in an AAB area. I don’t want to fill in the second line as I don’t know if it will be GGG or something else. That is what I meant by saying I don’t want to fill anything in unless there is only one missing value. In the 5th line there is A?G. That would have to be AAG (in an AAB Pattern area). There are some lines that have everything missing that I don’t want to touch.

How to create a query?

First, I want the situation where Joel doesn’t equal Sharon or Joel Doesn’t equal Sharon. That would create an AAB situation:

heid-not-joel-or-sharon

This query results in 1,666 rows of data including rows that are already filled in. Note that I had to write the range of ID’s twice because in order to get an OR situation I needed to put Joel not equat to Heidi and Sharon not equal to Heidi on separate lines. A simpler query is this one:

heidi-not-joel-or-sharon-one-line

The above achieves the same results in one line. Now, for this query, if Joel is blank, replace it with Sharon’s results. If Sharon is blank, replace it with Joel’s results. Here is the query prior to the updating part:

joel-sharon-blanks

This shows that there are 29 blanks for Joel and Sharon meeting this AAB criteria in the first range of AAB’s:

29-records-aab

Next, I apply the same logic to all the AAB segments. In the Expression Builder of Access, I type in this simple formula:

Between 45393 And 54155 Or Between 60990 And 72548 Or Between 207109 And 220679 Or Between 313271 And 317516 OR Between 326845 And 326912 OR Between 389395 And 390311 OR Between 400045 And 405578 OR Between 419982 and 427158 OR Between 433191 And 446672
OR Between 482297 And 492542 OR Between 532520 And 539292 OR Between 571557 And 579594 OR Between 589614 And 589666 OR Between 630037 And 630314 OR Between 630319 And 630378 OR Between 658744 And 659375 OR Between 670533 And 672360 OR Between 673325 And 682544

Simple but long. This has the AAB Starts and Stops for 23 chromosomes. Then I copy it into the next ID criteria line and get this result:

all-missing-aabs

It took a few minutes to type the criteria, but the goal is to update 1,514 lines of missing Paterrnal Pattern data with the push of one button. I still think it is quicker than going line by line and will be more accurate if I got the criteria right.

Next, I change the above Select Query to an Update Query.

paternal-aab-update-query

When my (Joel’s) base from Dad is missing, I update to Sharon’s base. When Sharon’s base from Dad is missing her base is updated with mine. Isn’t sharing great? I didn’t look at the case where Heidi’s base from dad was missing, because if that was missing we wouldn’t be able to see any AAB Pattern.

Let’s UPdate

I push the run button and check the results. Here is my standard dire warning:

standard-dire-warning

Now I will check if it worked. I’ll try ID or Line # 682124:

bad-aab-results

Unfortunately, that was an undesirable result. Before I had A?G. I changed this to ?AG. It appears that my query both replaced my value with Sharon’s, but replaced Sharon’s with my blank. I hadn’t expected that. Next, I’ll check ID# 682182. I had ?AG and replaced it with A?G. So until, I can think of a solution, I’ll need to split the 2 queries.

Fix it! Quick!

First I recopied by Heterozygous Sibling Table back to the Dad Pattern Update 1 Table. This got the table back to the way it was. Here is my simpler query.

dad-aab-simpler-query

Here if my base from Dad is null, replace it with Sharon’s base from Dad. I’ll check ID# 682182 again:

second-mistake

This gets into the category of trial and error. Sharon’s result still got replaced with nothing. See in the previous query I still was telling Access to put update Sharon’s results with mine. I needed to take that out:

fix

There. Now the SharonFromDad Update To is blank. I go through the same procedures and now it looks right.

right-results

We now went from ?AG to AAG in the last 3 columns. These are the bases from Dad columns.

The next step is pretty easy:

sharon-missing-aab

I took out my criteria and put criteria in the SharonFromDad field. When she has a blank, replace it with Joel’s base from Dad. I hit run and it updated over 600 rows. Here is my original check spot at ID# 682124 with better results in the last 3 columns:

better-results

It took a while, but at least I got it right. The moral of the story is to not ask Access to do 2 things at once when those 2 things involve the same 2 people.

The Next Step: ABA

This time I’ll try a different query. I want there to be a B from the ABA in each case, so I’ll make sure that Sharon’s base from Dad is there:

aba-query

Maybe I’ll figure what went wrong last time or come up with a new error. Above, I want the criteria on the first line to be for my blank base: If Sharon’s base from Dad is not equal to Heidi’s Base from Dad Put Heidi’s base from Dad in my blank spot. For Heidi, When Joel’s base from Dad doesn’t equal Sharon’s base from Dad, put Joel’s Base in Heidi’s spot.

I’m so tempted to try this query, but before I do, I’ll copy the previous table of the DadPatternUpdate to a new Dad Pattern Update ABA Table.  This will preserve what I have in the now older DadPatternUpdate Table in case anything goes wrong. Hey, what could go wrong?

query-aba-dad

I pushed the Update Button and updated over 30,000 rows. The results don’t appear to be any better, so I’m back to my 2 step process.

Here is my new slimmed down query:

slimmed-down-query

This new Update Query should update my Line 18 in the new UpdateABA Dad Pattern Table and it does:

lne-18

I now have a full ABA pattern on that line. According to Access over 30,000 Lines were updated, so it wasn’t a total waste of time.

heidi-aba

Run and check Line 149:

check-149

We have ABA in the last 3 columns, so that is good. Line 18 is still OK. I checked it just to make sure.

Query AAB Revised

After seeing how well the ABA Query went, I decided to revise the old AAB Query:

aab-query-rev

This is now looking at over 37,000 rows. This updates my AAB Blanks to tblDadPatternAAB. I don’t know if it is a better query, but at least I’m being consistent.

sharon-missing-aab-rev

This was over 80,000 rows, so I’ll assume that bigger is better.

I copied that resulting Table to tblDadPatternUpdateABA and reran the 2 ABA Update Queries. Here is one of the rerun queries updating the ABA Paternal Table:

rerun-aba

Down to ABB

My Last updated Paternal Table was updating ABA, so I’ll copy that to a new Table called tblDadPatternUpdateABB. I’ll also copy my last query and put in the appropriate Starts and Stops for the paternal ABB patterns. Again,

abb1

This says when Joel’s base from dad is not the same as Heidi, put that Joel from Dad into the space. Probably a more precise query would have said when Sharon from Dad is null and Joel from Dad is not equal to Heidi from Dad. I suppose technically the above query could be writing over a base with the same base in most cases.

I’ll fix that and notice that I had the wrong table in the top, so I’ll change that also.

abb-rev

This only updated 944 rows, so maybe bigger is not better. Here is Part 2:

abb2

This was almost 3,000 rows updated. Now I should check if it worked. I scrolled for an ABB Pattern in an old query and found this:

dad-pattern-abb

Here is my check:

abb-check

I guess I’ve been working too long. Here I have an AAB instead of the ABB I wanted. That is because I had Heidi updated to me (the A) instead of Sharon (the B). Here is the correction:

abb-corrections

I made a fresh Table of ABB. When I opened up the Query, it was saved this way:

corrected-abb

So Access changed my query. Note that there are 2 fields with HeidiFromDad in them. One is for the Update To and the other has Criteria. That is probably a clearer way to do it. Who should argue with Access?

I updated that and I take a cue from Access for Part 2:

access-abb-part-2

In English, the above says, “For this range when JoelFromDad is not blank but Sharon from Dad is, and Joel from Dad has a different value that Heidi from Dad, put that Heidi from Dad value where Sharon had the blank. It sounds a little complicated.

Back to Row 197704 and I’ll look at 197709 while I’m at it:

corrected-abb-pattern

Oh no, it is still wrong! I checked the previous ABA Table and that was the reason for the error. The error is also in the old AAB Table. However, the error was not in the file before that. My guess is that the AAB rule got applied to the wrong range of rows. I don’t see an error there, so I’ll have to rerun all the queries.

That’s OK, because I’m brushing up on the queries and will use the Is Null value so we will only be filling in the missing bases.

rev-aab-query

I had more problems, so I deleted the AAB Table and recopied the previous Table into it. I reran the Revised AAB Query halfway and it looked OK. However, when I ran the second half of the AAB query – filling Sharon’s results, the problem came back at ID# 197704. Very mysterious. The problem was where I thought it was originally. Look at the ID Criteria for the AAB Pattern Query:

the-problem

There is an extra digit in the first between. The range goes from 45393 to 544155. The second number should be 54155. So this query was performed on 450,000 more rows than intended. I updated the AAB query with fewer rows. Again fewer is better. After many requeryings, I got the desired result for ID# 197704:

197704

That should be the end of the first phase of nit picky work on the Paternal Side.

Summary, Conclusion and What’s Next

  • This was a lot of work, but the good news is that this update is for all the Chromosomes at once.
  • The bad news is that I have to do this again for the Maternal Side
  • Next up should be easy. That is just re-applying the Principles that Whit Athey Outlined on the new bases that I added from knowing the patterns. This should update missing maternally received bases from the updated paternally received bases.
  • I haven’t filled in blanks for the AAA patterns yet.
  • I am a little ahead of the game as I looked at how some of the first paternal crossovers will look.
  • Also with some basic phasing, I was able to deduce who those first paternal crossovers belonged to – one each to my two sisters and one for me.
  • If anything can go wrong it will

More Hartley DNA – Patricia’s DNA

This blog is a follow-up on my last Blog: My Hartley Autosomal DNA. I was inspired to write that blog following this year’s Hartley reunion in Rochester, Massachusetts. I intended to send around a little poster I made up about Hartley DNA and get a DNA sample from my father’s cousin Martha, but didn’t get a chance to. Instead I wrote a blog. I did talk to Patricia though. She is my second cousin and the sister of my childhood best friend, Warren. She had taken an AncestryDNA test. I think her daughter bought it for her. I asked if she could upload her DNA to gedmatch.com and she said that her daughter would be good at doing that.

Here are Patricia’s 2 brothers and Patricia. The one in the middle was my best friend in my first 6 years of school. I remember seeing home movies of Curtis, Warren’s older brother. He came to one of my older siblings’ birthday party when he was about this age.

Patricia and family

In my last blog, I wrote about the Hartley DNA matches my father’s first cousin Jim had with me and my 2 sisters. I was surprised to find out that every match that we had represented one of my four 2nd Great Grandparents. They were all born around the 1830’s. It turns out that Patricia’s matches with cousin Jim represent the same four 2nd great grandparents. In addition Patricia’s DNA matches with my 2 sisters and me represent the same four old timers.

Here is what my DNA match to Patricia looks like at AncestryDNA:

Patricia Ancestry

Here, AncestryDNA has it right that we are 2nd cousins. They show we match for a total of 206 cM (centimorgans) across 14 DNA segments. That is about all you can get out of ancestry. They won’t tell you which chromosomes we match on or how much we match on each chromosome. That is why people upload their results to gedmatch.com. Ancestry does show other people that match DNA to both Patricia and me. These are my 2 sisters and 5 others. All these people also descend from the same Rochester Hartley ancestors, but none of them have uploaded their results to gedmatch.com, so we don’t know their detailed DNA matching information.

Here is the same match between Patricia and me at Gedmatch:

Pat Joel Gedmatch

Ancestry has 14 segments vs. the 8 at Gedmatch. But at Gedmatch we know on which chromosome we match, how much on each chromosome and the exact start and stop location on the Chromosome. However, even with Ancestry’s 14 segments, their total is a bit smaller. Here is how I match Patricia on Chromosome 15 in the Gedmatch Chromosome Browser:

Joel Pat Chr 15

The blue areas represent the two DNA matches Patricia and I have on Chromosome 15.

Patricia on the Hartley Family Tree

Growing up, Patricia’s grandmother was my great aunt and also one of my neighbors, my Aunt Mary.

Patricia's Tree

The bottom box in each row are the people that have tested their DNA and uploaded to gedmatch.com. I now show 3 of the 13 children of James Hartley and Annie Louisa Snell (James, Mary and Annie). I now can check how my sisters and I match Patricia’s DNA as well as how Patricia matches Jim’s DNA.

Here are my great grandparents and three of their older children.

James and Annie Hartley

It is in interesting photo. Two of the children are looking away. I think that one is my grandfather James. The mother, Annie, is looking at something in her hands. The older son Dan is looking at a book and the father James doesn’t look comfortable being dressed up.

Patricia’s DNA at Gedmatch

One of the basic functions at gedmatch is called ‘One to Many’. In this case, I took Patricia’s DNA and compared them to everyone else that has ever uploaded their DNA results to gedmatch. Here are her 1st 4 matches:

Patricia's 1st 4 matches

Not surprisingly, her top matches are her 1st cousin, once removed, Jim, me and my sister’s Sharon and Heidi. The Gen column lists how far away gedmatch thinks Patricia’s matches are to a common ancestor. Patricia and I are 3 generations to James Hartley and Annie Snell, so that is right. Patricia shows 2.6 generations to a common ancestor with her match to Jim. A first cousin once removed would typically be 2.5 generations, so she shares a little less DNA than average here with Jim. Patricia also shares 19.3 cM of the X Chromosome with cousin Jim which I find interesting.

The Hartley X Chromosome

I’m taking the X Chromosome out of order because I find it interesting. There is one most important thing to know about the X Chromosome. If you are a male, you get one from your mother. If you are a female, you get one from your mother and one from your father. My father only got an X chromosome from his Frazer mother, so he doesn’t match anyone further up on the Hartley line by the X Chromosome. However Patricia and Jim both have maternal matches that carry up the line.

Here is how Jim got his X Chromosome from his mother and her ancestors:

Jim's X Inheritance

Jim only inherited his X Chromosome from those ancestors in pink or blue. So, for example, he got no X Chromosome from any Bradford before Harvey Bradford.

We need to compare Jim’s chart with Patricia’s X Inheritance Chart:

Patricia's X Inheritance

Here I didn’t show the X Chromosome that Patricia got from her father as this won’t match Jim. Then of what I show, only the bottom half will match Jim. This means that going back 4 generations from Patricia, she could match Jim by the X Chromosome on the Emmet, Snell or Bradford Line. One other difference between Jim and Patricia is that Jim got 100% of his total X Chromosome from his mother and Patricia only got 50%. However, that is a confusing way to put it because Patricia did get 2 X Chromosomes. So her one 50% must be similar to Jim’s 100% if that makes sense.

Here is what the X Chromosome match looks like between Patricia and Jim at gedmatch.com on their browser:

Jim Patricia X Match

The yellow part with the blue under it is where they match at the end of the X Chromosome. That is enough on my X diversion for now.

Back to the Hartley DNA Matches on the Other 22 Chromsomes

At gedmatch, I go to the Jim’s ‘One to Many’ matches to see how he matches my family and Patricia. Here are Jim’s top 4 matches. You may have already guessed who they are:

Jim's top 4 matches

Above, I said that Patricia matched Jim a little less than expected. My sister Heidi at the top of the list matches him a little more than average.

Here are Jim’s DNA matches on Chromosome 1

Pat Chr 1

  1. Me
  2. Heidi
  3. Sharon
  4. Patricia

Here Patricia has identified a new piece of DNA in green that is a Hartley ancestor that we didn’t know about before. Again, this “Hartley” ancestor may be Hartley, Emmet, Snell or Bradford.

Here is another new Hartley segment on Chromosome 2:

Pat Chr 2

Patricia matched Jim on Chromosome 2. My sisters and I had no match with Jim on that Chromosome.

It looks like Patricia got a double segment of Hartley DNA on Chromosome 5:

Patricia Chr 5

Patricia is #1 above. Where the color changes from orange to yellow likely represents a change from Greenwood Hartley to Ann Emmet DNA or Isaiah Snell to Hannah Bradford DNA.

Patricia Helping Me Map My Chromosome 7

I’ve tried to map all my chromosomes as well as my 2 sisters’ to my 4 grandparents. I got a little stuck on Chromosome 7:

Chr 7 Map Pat

My chromosome 7 depiction is the one with the J to the left of it. On my paternal side (which is the blue (FRAZER) and red bar), I have the DNA I got from my dad’s mother in blue and my dad’s Hartley dad in red. Above that is the gedmatch depiction of how I match my 2 sisters by DNA and how they match each other. The bright green bar is called the Fully Identical Region or FIR. This means wherever that occurs a sibling matches the other sibling by getting the same DNA from the same 2 grandparents (one maternal and the other paternal). So in comparing Sharon to Heidi, they have that FIR from 0 to 25. It turns out that their 2 grandparents were their mother’s mother (Lentz) and their father’s father (Hartley). In the tiny section between 0 and 4, I have what is called a Half Identical Region or HIR. That means that I shared one grandparent’s DNA  with my sisters and the other grandparent I didn’t get any of their DNA. In this case I had to share either the Lentz or Hartley grandparent with my 2 sisters, but I didn’t know which.

That is where Patricia’s results came in handy. Here is how she matches Sharon, Heidi and me:

Patricia Chromosome 7

Patricia has 3 good matches with Sharon and Heidi and one tiny one with me (#3 on the Chromosome Browser). However, the tiny one is the one I need. The pink match shows that my Chromosome 7 from 0-4 (in millions) is where I got my DNA from my Hartley grandfather and not my Frazer grandmother.

Here is my completed Chromosome 7 thanks to Patricia. I extended the Rathfelder on my Chromosome 7 all the way to the left or beginning and added a small chunk of red Hartley from my grandfather.

Chr 7 complete

Another Type of Chromosome Mapping

There’s is another type of Chromosome Mapping developed by Kitty Munson. The way the Munson Mapping is generally used is to map out your relatives’ common ancestors. In the case of Patricia and Jim our common ancestors are James Hartley and Annie Louisa Snell. Here is what my new Chromosome Map looks like with the addition of Patricia’s DNA matches with me shown in blue.

New Kitty Map for Joel based on Pat

Well, that’s about enough for Patricia’s DNA for now.

Summary and Conclusions

  • Patricia shared the first Hartley X Chromosome match that I’ve seen.
  • The X tends to shy away from the male line, so Patricia and Jim’s match is more likely down somewhere in the Massachusetts colonial line rather than the English Line.
  • I would like to use Hartley DNA to break through the Hartley genealogical brick wall. Right now I’m stuck in the early 1800’s in Trawden, England. There were too many Hartleys there with the same first name to figure out who was who. Patricia’s DNA may help in finding matches to other Hartleys
  • Patricia’s DNA helped me in mapping my chromosomes in 2 different ways.

 

My Hartley Autosomal DNA

I have written many blogs on DNA but I don’t think that I have written about my Hartley autosomal DNA. Autosomal DNA is the kind of DNA test of which Ancestry claims they have tested over 2 million people. Autosomal looks at the DNA we get from both our parents and their parents and so on until the DNA runs out. And it does run out for some ancestors at some point. Due to this effect, very little of my DNA is actually Hartley DNA. If you think of it, I got half of my DNA from my father, but he got half from his father, his father got half his DNA from his father and so on.

Paternal DNA from Maternal DNA

The best way to get your paternal DNA is to test your father. This avenue was not available to me. However, I was able to test my mother. Gedmatch.com has a utility available that will separate out the DNA I got from my mom from that which I got from my dad. That utility does not recreate my dad’s DNA, but it does recreate most of the portion of DNA that I got from him.

Here is what the utility looks like. It is quite simple to use and works quickly.

Phased Data Generator

Once I have this information, I can run the results against all my matches to find out which of my matches are from my dad and which are from my mom. There are also those that match neither which may be considered false matches. This takes out a lot of the guesswork with our matches. It makes life twice as easy.

Paternal DNA from Testing a Paternal Relative

The other way to find paternal (that is Hartley) DNA is to test a paternal or Hartley relative. That is when I went to my father’s cousin Jim and asked him to take a DNA test. He was willing and I have some Hartley matches. I also had tested myself and my two sister’s. Here is what Jim’s DNA results look like compared to me and my 2 sisters on a Chromosome Browser:

Hartley DNA

I find this graphic interesting. It shows that Jim matches me and my 2 sisters on almost every chromosome. The last chromosome is the X Chromosome. It was cut off a bit. However, Jim could not match us on the X as my father only got his X Chromosome from his mother who was a Frazer and not a Hartley. On Chromosome 13 my 2 sisters and I have pretty much the same match with Jim. The 3 bars are of equal length. On Chromosome 20, only my sister Sharon matches Jim. On Chromosome 11 we all match but at different amounts. My sister Heidi has the largest match there. The places where we don’t match, my family is busy matching the other 3 grandparents. Or perhaps Jim is busy matching on his father’s non-Hartley line.

What Do All Those Matches Mean?

All those matches represent Hartley DNA. But remember that I said that even our Hartley DNA consists of other families. So the answer is a bit more complicated. First I will show the Hartley genealogy relative to the DNA match between Jim and my family. That will help explain all these DNA matches. In the first line below, Greenwood Hartley was from Trawden, England. Ann Emmet was from Bacup, England. Isaiah Snell had non-Pilgrim colonial ancestors. Hannah Bradford had Pilgrim Colonial ancestors.

Greenwood DNA

I have those with Hartley DNA in green. Those that have no Hartley DNA are in blue.

Here is Greenwood Hartley and Ann Emmet:

greenwood

Probably Hannah Bradford and Isaiah Snell at their house in Rochester, Massachusetts:

Hannah Isaiah

Every match between Jim, me and my siblings represents a specific Ancestor from the 1st line above

The common ancestors between Jim and me are James Hartley born 1862 and Annie Louisa Snell born 1866, but the DNA represented between Jim and me is actually their parents who were all born around the first third of the 1800’s. This was just made clear to me within the last few days. I know, it gets confusing. That means that out of the 1/4 of my DNA that is Hartley (as I have 4 grandparents), only 1/4 of that quarter is Hartley when we go back to where the DNA came from. That means that every orange, blue or green bar in the first image represents one of the 4 ancestors from the early 1800’s above.

How We Get Our DNA

When we were conceived, we got our own blend of DNA. That DNA was really from our 4 grandparents. We got equal amounts from our mom and dad, but the amounts we got from their parents was blended and we may have not gotten an exact 25% from each our grandparents. We all actually have 2 of each chromosome. One is paternal and one is maternal. For example, the siblings James Hartley b. 1891 and Annie Louisa Hartley b. 1902 received on their paternal chromosome alternating segments of Greenwood Hartley and Ann Emmet DNA. Likewise, on their maternal chromosomes, they had alternating DNA from Isaiah Snell and Hannah Bradford. Those mixtures of their 4 grandparents was passed down to Jim, me and my 2 sisters and is represented in the Family Tree DNA Browser that I show above and again below.

How Can We Tell Which Segment Matches Which of the Four Ancestors?

For example, it would be nice to know if Heidi’s Chromosome 11 match with Jim shown in green below represents  Hartley, Emmet, Snell or Bradford.

Hartley DNA

The best way to find out which segment represents which ancestor is to do additional testing.

Test:

  • A Hartley relative not related to Emmet, Snell or Bradford
  • An Emmet relative not related to Hartley, Snell or Bradford
  • Etc.

Well, I think you get the picture. Once one of these people is tested, they would be a reference and any match Jim or my family had with them would be from the Hartley, Emmet, Snell or Bradford lines. The problem is, where are these people? There may be Snells around not related to Hartleys, but I dont’ know of many Hartleys not related to Snells. Sorry for the double negative.

Another way is to wait until one of these Snells not related to a Hartley shows up on a DNA match list. This doesn’t work for Ancestry matches because AncestryDNA doesn’t tell you which chromosome you match on. However, if they were to upload their results to gedmatch.com, then the segments could be identified.

why do we want to identify these segments?

Well, for one, some find it interesting to know where they got their DNA from. Another reason is, that once these are identified, then we know right away where to look for an ancestor match. For example, if we knew a match was on the Bradford side. We would look for a common matching ancestor descending from the Mayflower perhaps.

Summary and Conclusions

  • When I tested my Hartley father’s 1st cousin, I got a lot of DNA matches on most of my chromosomes
  • These matches represent 4 of my 2nd great grandparents
  • These four 2nd great grandparents represent Trawden and Bacup, England and Colonial Pilgrim and non-Pilgrim lines.
  • So far, I have not been able to figure out which colored bar represents which 2nd great grandparent.
  • There may be some advanced techniques that could help me tease those out. Or I may be able to find those out by testing appropriate relatives if found.
  • The older generations are the best for testing as the further you get from your ancestors, the less autosomal DNA you carry. It reduces by a factor of 4 every generation.
  • Those relatives that have tested at Ancestry should upload their results to gedmatch.com for comparison.
  • One of my Hartley 2nd cousins has uploaded her DNA results to gedmatch.com and that will be the subject of my next Blog.

Slimming Down My Big Fat Chromosome 20

In a previous Blog, I mentioned My Big Fat Chromosome 20. I had discovered, for some reason, that more than one half of all my matches were on this Chromosome. This can be seen visually using a Swedish web site called dnagen.net.

dnagen circle chart

Here the default setting is at 200%. That means that only the matches that are twice as large as the median are shown. This program uses FTDNA matches. The match names are on the outside of the circle and the lines going between the names are what FTDNA calls ICW or (In Common With). I just noted today that there is a group on this circle that doesn’t connect with others at about 9 o’clock on the circle. These matches like to stay in their own Chromosome apparently. They are in a dark color which I take to be Chromosome 3. However, that is an aside.

The real point is to show Chromosome 20 in the dark green in the lower right half of the circle. Chromosome 20 is the Hong Kong of Chromosomes. In a little space, I have  lot of matches. Remember that Chromosome 20 is one of the smaller Chromosomes. If I have about 4,000 matches, that means that over 2,000 of them are on Chromosome 20. In my previous Blog on Chromosome 20, I determined that these matches were on my Frazer grandmother’s side. Her 2 parents were born in Ireland. That means that these matches represented Irish matches and not Colonial American matches as I had previously assumed.

The Progression of Sorting Matches

Autosomal DNA matches may be grouped in different ways. When I first tested, I got a bunch of matches at FTDNA. I didn’t know who any of them were. FTDNA had suggested some relationships which were mostly optimistic. Here is some of the progression of how I have sorted my matches:

  1. Sorted by projected relationship or match level (cMs)
  2. Sorted by actual relationship if known
  3. Sorted by Chromosome. This option is not available at AncestryDNA. One has to upload the AncestryDNA results to gedmatch for this option. This is when I discovered all my Chromosome 20 matches.
  4. Sorted by Triangulation Groups. By using a Tier 1 option at Gedmatch or by finding by hand all the matches that match each other at a particular segment, I was able to find many Triangulation Groups (TGs)
  5. Sorted by Maternal or Paternal. All our valid DNA matches should match on either the maternal or paternal side. Once I tested my mother, I was able to phase my results at gedmatch and find out whether I matched other testers on my mother’s side or my father’s side. This was a big breakthrough for me. This cut down a lot of frustrating searches. For example, there are a lot of people that match my mother that have Frazer or Fraser ancestors. My Frazer ancestors are on my father’s side. Therefor, I knew that when looking for Frazers, I could eliminate all my mother’s matches who had them as ancestors and not worry about them.
  6. Sorted by other known matches. I had my father’s 1st cousin tested. This got to the level of my great grandparents on my Hartley side. However, it didn’t tell me which great grandparent. My Hartley great grandparent was a relatively recent immigrant from England. My non-Hartley great grandparent had ancestors going back tot he Pilgrims in Massachusetts. I also had other relatives tested and found other matches that I knew I was related to.
  7. Another breakthrough happened after I had my 2 sisters tested. I used a method by Kathy Johnston to find out where you got all your DNA from your 4 grandparents by comparing your DNA results to 2 siblings. This method worked pretty well on most of my chromosomes. Now I knew where the DNA was coming from at my grandparent level for most of my matches. When I had a match, I could check my map to see which grandparent that match belonged to.

That is about where I left it at my last Blog on Chromosome 20. I looked at my crossover points for Chromosome 20. Here are my sisters compared to each other and to me:

Chr 20 Crossovers

Here is how I used the above comparison to map my grandparents that gave me my Chromosome 20 segments. The blank parts are half identical and ambiguous, so rather than guessing, I left them blank. For example, on Sharon’s row on the top, either the orange goes to the left and blue starts at the lower half or the opposite: the purple continues to the left and the green starts at the crossover line.

Chr 20 Final Segment

My chromosome 20 is on the bottom. At the time I wrote my previous Blog on Chromosome 20, I discovered that the vast majority of my matches were due to my Frazer side (green) and not my Hartley side (orange). This was a surprise as my Hartley grandfather had a mother with American Colonial roots. The final point of my previous blog on the subject was:

The fact that all these matches are on my Frazer line doesn’t necessarily mean that they are Frazer matches. They could be McMaster, Clarke, Spratt or any other known or unknown ancestor of my Frazer grandmother.

It’s great that I now know that most of my Chromsome 20 matches are Paternal and that they are on my Frazer grandmother’s line. But I am still curious as to where they are coming from. Can I find out more? I would like to try.

Chromosome 20: Beyond Grandparents

One advantage I have is that I am working on a Frazer DNA project with 27 testers. There are 2 lines of Frazers. I am on the Archibald Line and there is another line called the James Line. These 2 lines are somewhat distantly related as these 2 brothers were born in the early 1700’s. Here are the matches for the project on Chromosome 20:

Chr 20 Matches

All of these matches involve at least one James Line tester which I am not on. The 2 major matches between the Archibald Line and James line are between myself (JH) and my sister (SH) on the Archibald Line and Bonnie (BN) on the James Line. As I show below, even my McMaster Line has Frazers in it, which could be the source of that match. Sharon had very few Chromosome 20 matches compared to her siblings Heidi and myself. The 1,000 plus matches I had were before the 47 million mark where I match Bonnie above. My mega-matches mostly occur on Chromosome at 44,000,000 (End Location) or before. This tells me that my mega-matches are not of the Frazer surname. If they were, I would have seen some of my closer Archibald Line matches on Chromosome 20 from the Frazer DNA Project.

Enter cousin paul

Paul is my second cousin once removed who tested for DNA. His great grandparents are my 2nd great grandparents: George Frazer and Margaret McMaster.

George Frazer Tree

When I compare myself to Paul, I get to either the Frazer or McMaster Lines. This will eliminate the Clarke line of my great grandmother and her Spratt mother as they are not in Paul’s line – only mine.

My McMasters: It’s a Bit Complicated

Here is my McMaster Line going back from my Frazer grandmother.

McMaster Ancestry

Not only did 2 McMasters marry each other, one of them had a Frazer mother! Marion Frazer is my grandmother, so she is 2 generations from me. Margaret McMaster is at 4 generations. James and Fanny McMaster are at 5 generations to me. Their parents (the left-most McMasters above) are at 5 generations out from my cousin Paul and six generations from me. This is useful to know in the Generations Estimate I have below.

Here is where the Frazer/McMaster split is.

Frazer Buggy

George Frazer b. 1838 is on the left and Margaret McMaster b. 1846 is on the right. The photo was taken in Ballindoon, Ireland in front of the Frazer family home.

At Gedmatch.com, I compared Paul and myself at:

People who match one
or both of 2 kits
Updated

I chose most of those that matched both Paul and me. I left out an apparent duplicate and one who is anonymous for now. I also left out my 2 siblings. With those results, I chose the Traceability option and got this chart:

Generations Paul Joel

Those in red are in the Frazer DNA Project. We know their genealogy. Gladys descends from the couple above George Frazer and Margaret McMaster. Michael and Jane descend from one level above that. The circle above are those that are related to Paul and me, but not to others in the Frazer DNA Project. [One exception is Jane, but she matches at generation 7 which is about as far out as Gedmatch goes. This may or may not be a real match.] If those in the circle are not Frazer, then the apparent conclusion is that they are McMaster relatives.

Back to chromosome 20

See all the Chromosome 20 matches on my Gedmatch Traceability Report:

TG Chart Chr 20

Remember I said that my 1,000 plus matches on Chromosome 20 ended around 44M? This is what the above shows. It also shows a triangulation of matches. This triangulation is also implied by the cluster of matches within the circle of the Generations Estimate Chart above. The Chromosome 20 Triangulation Group (TG) includes:

  • Myself
  • *S. S.
  • Daphine
  • Feeney
  • Gladys

Now Gladys should not be in this list as she is in the Frazer DNA Project and has no known McMaster ancestors. In fact, when I run the ‘one to one’ at Gedmatch, she doesn’t match the others in the above list. There are glitches in the Traceability Report, so caution is needed. I will take out the last 3 names in the Generations Estimate to simplify the results. Unfortunately, that didn’t fix the problem, so I had to take out Gladys from the Frazer Project (sorry Gladys).

Gen Est Paul Joel

Now my presumed McMaster relatives are in the green circle. Here are the improved and simplified matches:

TG Chart Chr 20

I note now that the 2 ‘M’ kits (indicating 23andme testers) are now matching each other which is what I had expected previously. Note that I left my previous Traceability results in the blog as a warning that the Traceability utility is glitchy. Actually the new report is not indeed improved as now Michael from the Frazer project is matching my presumed non-Frazer McMasters. I took out Michael, and then Jane from the Frazer Project developed similar bogus matches with those she is not related to!

I’ll have to take out all the other Frazer Project people out for this Traceability to work. This was supposed to have worked so smoothly. Here below Joel and Paul should be the remaining McMaster relatives:

Joel Paul R3

Here is the Chromosome 20 TG. Note that Paul is not in it, but he matches others from the TG in other Chromosomes:

TG Chart Chr 20

This chart is only mostly right. Paul’s green match is actually on Chromosome 19 rather than 15:

Paul's Actual Match with Edge
Paul’s Actual Match with Edge

Here is the globe view of my proposed McMaster relative TG:

McMaster Globe

The colors in the lines correspond to the colors in the chart above. The light blue lines are the Chromosome 20 TG from my “big fat” area. The blue lines indicate a TG as they go from each of six people to the other 5. The gray lines represent multiple matches. I am at the bottom of the globe and my cousin Paul is to my right. He is not in the blue TG on Chromosome 20, but matches all my matches on other chromosomes at least once.

Conclusions and Further Research

From what I have shown above, I feel like I have found my McMaster relatives through DNA. However, these would have to be verified by genealogy. None of my proposed ‘McMasters’ have any gedcoms at gedmatch.

  • Daphine – she is on FTDNA but with no tree and no ancestors mentioned. An ICW search reveals 59 pages of matches – likely mostly on Chromosome 20.
  • Edge – He is at FTDNA. He has a limited tree. His paternal grandmother may be a lead. He has only 52 pages of in common matches at FTDNA
  • John – A search at 23andme showed nothing. Perhaps he is anonymous there.
  • Feeney – Same result – or perhaps these people are using different names?
  • *S.S – I see an S.S at Ancestry, but it is difficult to tell if it is the same person.

I have McMaster connections through DNA and genealogy at AncestryDNA, but there is no way to tell if the connection is on Chromosome 20 without a chromosome browser. My Mcmaster matches at AncestryDNA either don’t know how to upload their DNA to gedmatch, aren’t interested or haven’t gotten to it.

Opposition to TGs

Of late, on Facebook, there has been questioning as to the validity of  TGs – especially large TGs like I have at Chromosome 20. The thought is that no common ancestors will be found as there are just too many common ancestors in these large TGs. I have not explained the 100’s of matches in my Chromosome 20 TG, but I have shown 5 people that match both myself and my cousin Paul. These 5 by DNA do not have obvious Frazer ancestry and appear to be in my McMaster Line. So I suppose we have a stalemate. I cannot prove at this time (except to myself) that my Chromosome 20 TG matches are McMaster relatives and those who are not in favor of large TGs cannot prove that these matches are not McMaster relatives.

 

 

 

 

 

 

 

Mapping My DNA To My Four Grandparents

I was thinking of calling this Blog “Kathy Meet Kitty“. Kathy is Kathy Johnston who taught me how to map my ancestral segments by comparing my DNA to two of my siblings’ DNA results and determining our crossover points. The crossover points can then be used to map out which grandparent you got your DNA from without having to physically test those grandparents. This is quite convenient as all my grandparents have been gone for quite a while. Kitty is Kitty Munson who has developed a Chromosome Mapper here. I have not seen a blog using Kitty’s Chromosome Mapper to map ancestral DNA segments via Kathy Johnston’s method, so I thought that I would write one. Kathy’s method is posted here.

Two Types of Segments

There are two types of segments, thus at least two types of segment mapping. This concept is best explained at the Segmentology Blog in an article appropriately called, What is a Segment?

ancestral segments

That Segmentology article first mentions ancestral segments. These are the segments that Kathy Johnston knows how to map. I have written many blogs about mapping my ancestral segments using her method. Ancestral Segments are the segments that you actually get from your ancestors. They fill up all your DNA. Here is an example of the ancestral segments that I have mapped to my four grandparents.

Joel Segment Map

Look at Chromosomes 1, 5, 6 and 7 for starters. This shows all my DNA filled in. The 2 paternal grandparents are on the top half of the chromosomes in blue and grean and the maternal two grandparents are on the bottom in red and peach color. The DNA I received alternates between one grandparent and another and fills in all the area. In fact, that is the process of recombination and can be seen in the Ancestral Segment Maps.

shared segments

These are segments that you find at gedmatch.com for example. These are our DNA matches. These matches may have a proposed relationship based on how much DNA you and your match share. Here is an example of some of my matches using Kitty’s Chromosome Mapper.

Chromosome map 4 Apr 2016

The best way to fill in a map like this is by testing as many relatives as possible. Now look at chromosome 1, 5, 6, and 7 on the shared segment map compared to the ancestral segment map above. The ancestral segment map on Chromosome 1, for example,  shows how much DNA I actually got from my Hartley grandfather. The blue in the Shared Segment Map shows how much I matched my father’s cousin. Next look at the maternal (bottom) part of Chromosome 1. Here the Rathfelder and Lentz matches on the right hand side are filled in on the Ancestral Segment Map. However, there is an additional section of Lentz on the left hand side of the Ancestral Segment Map where I don’t even have a match. I can tell I got my DNA there from my Lentz maternal grandmother. That is due to the crossover points I have and the fact that the DNA you get from your grandparents alternates between grandparent. On the maternal side, the alternation is between Rathfelder and Lentz.

If you find any inconsistencies between my Ancestral Segment Map and my Shared Segment Map, that means I messed up somehow.

More Ancestral Segment Mapping: Sister Heidi

In order to map my ancestral segments, I needed two siblings, so I used my two sisters, Heidi and Sharon. Here is Heidi’s ancestral DNA mapped out:

Heidi Segment Map

A few observations:

  • The areas of pale blue are where I had trouble figuring out how to map the ancestral segments, so nothing is mapped in these areas. I may have mapped out some of the segments, but then had difficulty telling whether they were maternal or paternal due to lack of known cousins that had tested. So I left these areas blank
  • The maternal areas shown as MG1 and MG2 – For these areas, I knew I had two maternal grandparents but I wasn’t sure which was which. Again based on lack of known cousins that had tested. I could perhaps guess, based on actual matches I had in these segments or where those matches were from, but I noted where the crossovers were and left these grandparents un-named.
  • These unknown grandparents are consistent within each chromosome and each sibling within each chromosome, but they are not consistent between chromosomes. So the unknown MG2 in Chromosome 8 may not be the same MG2 in Chromosome 11.
  • In my (Joel’s) Ancestral Segment Map, I don’t show any DNA on my paternal side for the X Chromosome. That is because males don’t get an X Chromosome from their father.
  • Heidi shows that she got her paternal X from her dad’s mom – a Frazer. Further, that chromosome did not appear to recombine. That means that she got that whole chunk from one of her great grandparents on the Frazer side.

How Do You Know What You Are Finding If You Don’t Know Where To Look?

These maps are very helpful in showing you where to look for DNA. Many people have matches that have ancestral names that are common to us but are not related. For example, my mother has matches with people that have Fraser or Frazer ancestors. I am related to Frazer on my father’s side. That means that I can forget about following up on maternal Frazer matches.

  • If I do want to look for Frazers, I need to look in my green areas (or my sister’s green areas) which is on her paternal side.
  • My sister Heidi is in an important Frazer Triangulation Group on her Chromosome 1 on the right hand side. She triangulates with others in a Frazer DNA Project I am working on. I am not in that group. Look at my Chromosome 1. It is nearly all covered by Hartley DNA. That explains why I don’t match these other Frazers at standard thresholds.
  • What if we were to want to look for Lentz ancestors of Heidi? We need to look at the red areas. Chromosomes 1, 6, 9. 14, 20, and 22 would be a good place to look. Fortunately, I also have Heidi’s matches on a spreadsheet. They are mostly divided by maternal and paternal matches. My mother has been tested for DNA. Based on that, I have Heidi’s phased maternal and paternal results and her matches to each of those results using Gedmatch.com.

Finally Sharon

My sister Sharon completes the Ancestral Segment Mapping:

Sharon Segment Map

  • The autosomal DNA that is missing on Sharon’s Map is the same for her 2 siblings. This is because Kathy Johnson’s ancestral segment mapping technique compares the siblings to each other using the Gedmatch.com chromosome browser.
  • Sharon has a lot of Frazer DNA match potential at Chromosomes 1, 8-12, 15, and 22.
  • However, Sharon is also not in the Frazer Triangulation Group in Chromosome 1 on the right hand side. In that particular section, she got her DNA from her Hartley paternal side.
  • The above point shows why it is important to test siblings.
  • Heidi and Sharon both have a large match (50+ cM) with someone on their X Chromosome. This person also has autosomal matches with my sisters and others in the Frazer DNA project.

Summary and Observations:

  • Ancestral Segment Mapping can be useful in determining which grandparent your matches match.
  • I know already whether my matches are on my maternal or paternal side. However, this goes back one more generation and further sorts my matches to grandparents. This cuts down the guessing by another half.
  • The maps also point out the areas where you can’t be as sure as to which grandparent your matches match as those areas are not mapped yet.
  • Ancestral Segments should line up with Triangulation Groups
  • Ancestral Segment Mapping can show matches that are Identical by Chance (IBC) or false matches.

 

My Big Fat Chromosome 20

I never would have guessed 10 years ago that I would be blogging about my Chromosome 20. 10 Years ago I was definitely interested in genealogy, but knew virtually nothing about DNA. Even if I did know anything about DNA I would not have guessed that it would have anything to do with genealogy.

My Chromosome 20

My Chromosome 20 actually isn’t that big and fat. Actually it is one of my smallest chromosomes. However, I have more matches there than on any other chromosome. In fact, over 1,000 – more than a quarter of my matches – are on Chromosome 20. This is pretty amazing considering I have 23 chromosomes counting my X Chromosome. If my matches were spread out evenly over these 23 chromosomes, I would expect each chromosome to have about 4% of my matches. This representation shows the ridiculous number of matches I have on Chromosome 20. They are on the bottom of the image in light blue.Joel Hartley Circle Chart

This particular representation is for just my FTDNA Family Finder matches. I believe the threshold was set relatively high and this was done a while ago. However, at the time and threshold, it appears that more than half of all my matches were at Chromosome 20.

How To Explain All the Matches? Colonial Massachusetts?

I had a difficult time explaining all the matches I had on Chromosome 20. Most were on my paternal side as that is where most of my matches are. I had guessed that these may have been due to a colonial effect as that had been suggested in various places. My great grandmother’s mother was a Bradford and was descended from the Mayflower Bradfords. A lot of those early Pilgrims married other related Pilgrims. In fact, some of my Chromosome 20 matches were descended from a Brewster who was one of the Pilgrims that I am also descended from. Then there were a few who seemed to be related on my Irish Frazer side. Finally I had a match with Bonnie from the Frazer DNA Project I am working on. She matched on Chromosome 20 but was outside my large triangulation groups.

Chromosome 20 Triangulation Groups

I also have Triangulation Groups (TGs) for Chromosome 20 – very large ones. In fact, gedmatch would overload when I tried to run an analysis I had so many. I have 2 paternal TGs and one maternal TG. There also may be sub-TGs within those.  I have roughly 650 matches in these combined TGs. So now, based on testing my mother, I knew if my matches were maternal or paternal and if they were in TGs, but I still didn’t know much about where the common ancestors could be other than a vague guess about colonial Massachusetts. What I did was ignore Chromosome 20. I gave up even adding matches to my spreadsheet because I had so many. These matches tended to be around 13 cM with some higher and some lower.

Sticky Segments Or Pileup Areas?

While looking for a Chromosome 20 explanation, I read about sticky segments and pileup areas. Sticky segments are those that came down intact for many generations. They don’t want to go away. However, a few sticky segments wouldn’t explain over 1,000 matches. It seemed like I had a pileup, so I looked into those. Pileup areas are areas are described by Jim Bartlett in his comment on one of his blogs:

I do find that each person tends to have two kinds of pileup areas: 1) are fairly narrow, are widespread, and are outlined in this ISOGG article: http://isogg.org/wiki/IBD#Excess_IBD_sharing; and 2) are also fairly compact (7-9cM) and are unique to each person. I believe these are caused by a unique set of markers in our personal DNA that makes it easy to form matches with others in that region. These are characterized by many segments in a narrow range, which do not generally Triangulate, and the Matches don’t see this as a pile-up area, only you do.

However, my case didn’t seem to match some of the explanations of sticky segments or pileup areas. My matches were larger and did triangulate. Furthermore, they were not in areas of the chromosomes described in the ISOGG article above.

Enter Kathy Johnston and Her Crossover/Segment Analysis

At the beginning of 2015, Kathy posted her instructions on an FTDNA Forum for analyzing DNA based on the 3 siblings. She showed how to determine the 4 grandparents’ contributing DNA for each of these siblings.  I discovered her post at the end of 2015. Could this help me figure out my Chromosome 20? I tried Kathy’s method and got some surprising results.

Finding Chromosome 20 Crossover Points

Finding crossover points in Chromosome 20 was not as easy as it has been in other chromosomes. According to Kathy, usually there will be one owner of a crossover point. This owner will appear in 2 out of the 3 comparisons at a crossover point. In this one, I found only one clear owner. That was my sister Heidi at position 47. For the other ambiguous crossover points, I gave a double initial separated by a slash.

Chr 20 Crossovers

Below, the gedmatch comparison is transformed into a maternal/paternal Chromosome 20 map. The green area means that Heidi matches Joel on the 3rd segment. This match is a Fully Identical Region (FIR). This means they match the same maternal grandparent and the same paternal grandparent. For Joel, I move those grandparent to the right as I have no crossovers until the last crossover point.

Chr 1 Segment 1

Sharon has no match with her 2 siblings in the same area, so that will mean she shares the complementary grandparent on her maternal and paternal DNA. This will be represented by 2 different colors. I again extend that double segment to Sharon’s crossover points.

Chr 1 Segment 2

Looking at the earlier gedmatch comparison, in the 2 segments to the right of Heidi’s existing mapped segment, there is a Half Identical Region (HIR). That means a grandparent matches on one chromosome and doesn’t match on the other. This will be shown as 2 different colors in this area when comparing Heidi to Joel. This first HIR choice is chosen randomly as no names or side (maternal/paternal) have yet been assigned to the grandparents.

Chr 1 Segment 3

Next, we have an illogical situation.

Chr 20 Crossovers

In the next to last segment, the smaller one, Sharon is no match with Heidi or Joel and Heidi and Joel have a half match. That is illogical because if Sharon doesn’t match with Joel, that is the same orange/purple scheme continued in the small segment for Sharon. Then if Sharon and Heidi are opposites, it goes back to green/blue for Heidi in that small segment. Those are the same colors that Joel already has, so that means that Heidi and Joel can’t be HIR which means they should have one matching color and one non-matching color. However, look at that small segment again in the first two rows. The red is strong in the first row. In the second row, I hardly see any red – with red indicating no match at gedmatch. Therefor, I’m going with the first comparison of Heidi and Sharon. Plus this goes with the matches that I will mention soon that Sharon has. I make Sharon and Heidi opposite in Sharon’s little segment and extend that segment to the end.

Chr 1 Segment 4

I filled in some of the no matches and FIRs on the right. On the left, I was left with 2 illogical no matches again, so I chose the redder of the 2. This left me with having to guess a HIR on the left. I am only allowed one guess, so I left this blank for now.

Chr 1 Segment 5

Adding Real Grandparents

It would be nice to add actual grandparents here and not just speak of my orange grandparent, for example. I can do this using two of Sharon’s matches.

Sharon's Chr 20 Matches

These 2 important matches Sharon has are both on the paternal side. James is related to my grandfather and Bonnie is in the Frazer DNA Project on my Frazer grandmother’s side. Coincidentally, the orange match above goes with the orange on my chromosome map. That would make my paternal grandfather Hartley orange and paternal grandmother Frazer green.

Joel’s Matches

Here’s my Frazer match with Bonnie. 47 to 54 is in my green Frazer region on my map. So that is a relief.

Joel's Frazer 20 Match

Below is my only maternal match. It is with a cousin on my maternal grandmother’s line. She matches only with me because she tested at 23andme and hasn’t uploaded to gedmatch yet.

Joel match Judith 20

However, Judy gets me unstuck on my maternal side. Her match is telling me that from zero to 8, I can identify my grandparent. I already have blue from 6 to 8 (from using my brighter red logic). So I just need to extend the blue all the way to the left on my maternal segment line. That gives me a solid blue on Chromosome 20 on my maternal side.

Chr 20 Final Segment

This is as far as I can figure out now without further guessing. Perhaps when cousin Judy gets her DNA uploaded to Gedmatch, I will know more. So what does this tell me about my 1,000 plus Chromosome 20 matches and 600 plus matches that appear to be in Triangulation Groups?

Mystery Solved?

I think it is. These matches correspond to the area on the map above between 16 and 49. By the above mapping these massive amount of matches are solidly in Frazer territory for me. Instead of my huge block of matches being in colonial Massachusetts, I see that they are on my Frazer line. That came as quite a surprise. These ancestors were in Ireland mostly. I assume that many of these ancestors got out of Ireland. Perhaps they moved to the United States and married people who were descended from colonial Americans. That would explain some of the other colonial matches.

Summary, Application and Conclusions

  • When you are looking for DNA matches, it helps to know where you are looking
  • While I was looking at my largest group of matches, I was looking in the wrong place even though I had some reasonable assumptions
  • Kathy Johnston’s method cuts through bad assumptions and replaces them with sound logic
  • Phasing by parents cuts the looking in half but didn’t help me with identifying a huge block of Chromosome 20 matches. However, Kathy Johnston’s method is twice as good as phasing as it separates all matches to areas of 4 grandparents.
  • This method needs 3 siblings and some known tested relatives.
  • If I have this mapped correctly, any maternal match after 6 million for Sharon will be on the Rathfelder line and any maternal match for me will be on the Lentz line.
  • Interestingly, I have only about 42 matches for my sister Sharon on this Chromosome. Given that the makeup of her Chromosome 20 is mostly opposite of her 2 sibling, this makes a lot of sense.
  • I forgot to mention that my sister Heidi has almost as many matches as I do on Chromosome 20. Her shorter Frazer segment compared to mine would explain the slightly fewer matches.
  • The fact that all these matches are on my Frazer line doesn’t necessarily mean that they are Frazer matches. They could be McMaster, Clarke, Spratt or any other known or unknown ancestor of my Frazer grandmother.

 

How I Lazarus’ed My Dad

According to the Gospels, Lazarus was a man who died and Jesus raised him from the dead.

lazarus

Lazarus is also a program on Gedmatch to recreate the DNA of those who are no longer with us. You won’t see this unless you kick in $10 for the Tier 1 Utilities. The Link says, “Lazarus, Create surrogate kits to create close ancestors.”

How I did it: first I practiced on my wife’s family.

Fortunately, my wife’s dad has 2 first cousins and one second cousin on his mother’s side who have had their DNA tested. This came in handy. So I went about to create my father in law’s mom, Estelle LeFevre. Lazarus takes Group 1 people who are descendants of the target person to be Lazarus’ed, Estelle. In my case, the descendant was my father in law. I had him tested a while back at FTDNA. Then the program takes relatives who are not descended from Estelle. In this case, Pat and Joe who were the 1st cousins and Fred the 2nd cousin of my father in law. Those three are Group 2. Lazarus takes Group 1 and Group 2 and mushes them together to recreate Estelle. Actually only a part of Estelle is recreated. That is the part of Estelle that was mushed together from Group 1 and Group 2. If I had all of Estelle’s children and all of her relatives, I would’ve had a much more complete result. The trick is to get a Lazarus result that is over 1500 cMs. Then you can use some of the other utilities at Gedmatch with that kit such as the One to Many. It’s OK to create a Lazarus kit with less than 1500 cMs but it’s not as useful. Well, Estelle came out at about 1700 cMs, so that was good news. Buoyed with these results, I thought it would be a good idea to try to recreate my dad’s DNA.

A Slight Detour

I followed the Gedmatch directions. I took two Group 1 people. That was me and my sister. Then I took for Group 2, the only relative of my father that I had tested, his 1st cousin. I ran the program and came up with only about 700 cMs. Very disappointing. Then, as I’ve been working on my father’s mother line, the Frazers, I thought, ‘my father’s cousin isn’t related to the Frazers. He’s only related to my Hartley side’. Duh. What I had created was a Lazarus of my father’s dad, my Grandfather.

My Dad and His Dad
My Dad and His Dad

Sometimes I don’t mind making mistakes. Especially when they lead to the right answer.

How I did it the right way

Well, how was I to get up to 1500 cMs, when all I had was 700 cMs from my grandfather’s side? I only had 2 people for Group 1. I’m too cheap to have other siblings test. I noticed that Gedmatch had room for 100 people. Hmm, where to get 100 people? From working with my distant Frazer relatives, I knew I had their results, but this wouldn’t get me the numbers I needed. So I decided to use the phased matches of my sister and I. What is phasing, you may ask? Phasing is another utility that Gedmatch has. If you know the results of one parent, Gedmatch will subtract those out from your whole results and create 2 kits. One is a maternally phased kit of matches on your mom’s side. The other is a phased paternal kit of your matches on your dad’s side. Fortunately my mom is still alive at 93 and I had her tested. Based on her testing, I had already created phased maternal and paternal kits for myself and my sister. Now all the gedmatch matches are marked either P for Paternal or M for Maternal on a spreadsheet that I keep. I have one spreadsheet for myself and one for my sister. So I took a bunch of the top paternally phased matches from my matches and my sisters matches. I put in 100 of those top matches into the Gedmatch Lazarus Utility under Group 2. I ran the Lazarus program and got just over 1500 cMs for my dad.

Is This the Best Way to Create a Lazarus Kit?

I don’t know. It was certainly much more difficult than when I Lazarus’ed my father in law’s mom. For her, I only used 4 people and got better results. However, if you are cheap like me, or aren’t, but just don’t have the people to test, you might want to try this method and see if it works for you.

Joel Hartley