Cousin Holly’s Hartley DNA Results

I have many 2nd cousins. Over 100 I’m sure. My Hartley great grandparents had 13 children. All their descendants in my generation are 2nd cousins.Holly is one of those 2nd cousins. My first recollection of Holly is that she was creating a bit of commotion at our Town’s ball field. I was probably about 5 years old at the time. I had an impression that she may have been a relative but I wasn’t sure. Holly was challenging the local boys in a foot race and beating them. I was thinking that she was one cool girl.

So far on my Hartley side, those in gold below have tested and uploaded and uploaded to Gedmatch.com:

Note that Patricia and Beth are also first cousins to each other.

Here’s Holly’s grandmother Grace Hartley. I borrowed the photo from Holly’s Ancestry Tree:

Does she look like Holly? I think so. Except I don’t picture Holly as looking as serious.

All the Hartley cousins in the chart above have James Hartley and Annis Louisa Snell in common. But we won’t know which – easily. Another point is that everyone has eight great grandparents. So all the second cousins get 2/8 or 1/4 of their DNA from these two great grandparents. That is, on average. Here are the numbers of how Holly matches the tested Hartleys:

The Gen is how far it seems that the common ancestors are away based on the DNA match. James, my dad’s 1st cousin seems 2.5 away. That is just right for a 1st cousin once removed. Holly should match her 2nd cousins on average at a level of three. That is because our great grandparents are 3 generations away from us. Because of the random way we get our DNA, however, Holly is more closely matching Joel, Beth and Patricia and is further away matching on my four siblings.

The X Chromosome Rule

There is a rule that the X Chromosome does not pass down from father to son.

That means that no X Chromosome from Greenwood Hartley got passed down to any of us. That also means that no Hartley X Chromosome got passed down to anyone in my family. That is why Holly matches James, Beth and Patricia on the X Chromosome and only incidentally matches Lori and Heidi from my family.

Here is how Holly matches James, Beth, Patricia and incidentally my 2 sisters.

Holly and Jim have a longer match as they are more closely related (1st cousin, once removed). As a rule, the more closely you are related, the longer the segments.

Shared Autosomal DNA

Holly and I share this much DNA:

By comparison, here is my overall Chromosome map before I add in my DNA matches with Holly:

On my map, the James Hartley/Annie Snell part is shown in darker blue. It looks like Holly’s DNA could add quite a bit to my map. Ideally, if I could test enough relatives, the dark blue whould fill up 1/2 of my paternal chromosome. The other half should be from my paternal grandmother who was a Frazer.

Here is Holly’s DNA added in. I also added a maternal first cousin who contributed to my first substantial X Chromosome match:

Remember I get no X Chromosome from my dad (top part of each line). So that has to be blank on the X Chromosome.

Next I’ll add in 1st cousin once removed Jim to Holly’s map:

Jim’s contribution to our great grandparents is in blue. Notice that now the X Chromosome is kicking in.

Adding beth’s DNA to Joel and Jim

Here is the addition of Beth’s DNA:

Note that Holly has a lot of matches on Chromosomes 5 and 9. That must mean that Holly got most or all of her paternal DNA on that Chromosome from her Hartley grandmother, Grace May.

Kicking it up a notch

Next I’d like to add my siblings’ results to ‘the other matches on Holly’s Chromosome map. My siblings’ results plus mine should be similar in size to Holly’s matches with Jim, my dad’s first cousin. It takes 5 siblings to get about the same DNA as you would have for one parent. While I’m at it, I’ll add Patricia.

This is all Holly’s DNA that she got from James Hartley and Annie Snell, her great grandparents based on the matches that we’ve looked at so far. I probably should have lumped Beth and Patricia together as they have the same Hartley grandmother [Mary], but I didn’t.

Separating the Hartley and Snell DNA

One thing I would like to do would be to separate the Snell DNA from the Hartley DNA. If I could do this I could find matches that were just Snell or just Hartley. The DNA matching is about narrowing down the possibilities. The best way to do this would be to have a match that is known to be a Snell but not Hartley or a Hartley but not Snell. Unfortunately, I don’t know of any such people. The next best thing to do is to guess. One way to guess is called phasing by location. So, say I have a match with a lot of ancestors from colonial New England, but not Lancashire. And I would need to know that I match this person on my Hartley side (not my mother’s side). I would say that this would likely indicate DNA from the Snell Line. That is because the Snell ancestors go back to Colonial New England and the Hartleys came later from Lancashire, England.

My Chromosome 16

Here is a section of the first part of my Chromosome 16 matches (without the matches’ names) in spreadsheet form:

Each line represents a different match with someone. About half way down this list I have a match with Ned at 39.93 cM. I don’t know who our common ancestor is, but Ned has a lot of colonial New England ancestors, including the Warren Pilgrim family. I also am descended from the Pilgrim Warrens, but it is generally thought that a DNA match that large would be likely to last that long.

Triangulating with ned

Triangulation shows what common ancestors unknown DNA matches may have. Triangulation is when you match someone’s DNA, they match another person’s and you and the other person all match. Successful triangulation shows that all the DNA came from the same ancestor.

Here is my match with Ned:

Here is Holly’s match with Ned:

To close the loop, I have to match Holly in the same area of Chromosome 16:

No problem. This shows that Holly, Ned and I share an ancestor. By Ned’s Ancestry Tree, we think this is a New England Colonial ancestor, but we aren’t sure which New England Colonial ancestor it is. However, as Annie Snell has New England Colonial ancestors and James Hartley doesn’t I am pretty sure I can assign this segment to Annie instead of James.

This means I can update my Chromosome map with my first New England Colonial piece of DNA represented by Annie Louisa Snell on Chromosome 16. This is shown in light blue:

The other interesting thing about this piece of DNA, is that it not only is from Annie Louisa Snell, it is also from some New England Colonial person – the one I haven’t figured out yet that we have in common with Ned.

Other New England Colonial Connections Between Holly and Me

AncestryDNA recently came out with a new feature called Genetic Community. That feature lumps you into a group with a bunch of other people based on your DNA testing. One of those groups is called Settlers of Colonial New England. Here are my Genetic Communities (or GCs).

Notice I get a Likely rating for those Colonial Settlers. Holly, on the other hand, has one Genetic Community:

She gets a Very Likely. That means she is super Colonial New England. Holly has a Connection Link under her Settlers of Colonial New England. Under that link is another link that leads to “…a list of all 238 of your DNA matches who also belong to this Genetic Community.” Under my similar link I have 110 DNA matches. However, Ned that I mentioned above matches me under Settlers of Colonial New England. He doesn’t match Holly in her list for some reason – even though I showed that we triangulate. In addition, Holly and I match each other on our lists of DNA matches under Settlers of Colonial New England.

Summary

There’s plenty more I could have written about, but I’m a gonna wrap it up:

  • Holly is more Colonial than I. I expect her other non-Snell ancestors contributed more in this area
  • I looked at a way to separate out ancestral DNA when other reference matches are missing
  • We are getting a good group of Hartley/Snell descendants that have had their DNA tested and have uploaded to Gedmatch.com for comparison
  • I never knew Holly looked so much like her Hartley/Snell grandmother.

Beth’s Hartley DNA

In this Blog, I will be looking at Beth’s autosomal DNA. That is the DNA that she got from both her parents. However, I am more interested in Beth’s father’s mother’s DNA as she was a Hartley and the DNA that we share would be Hartley DNA.

Hartley Tree of DNA Testers

Here are those closer relatives that have had their DNA tested and uploaded to Gedmatch.com:

Here Hartley is shown as green and Snells are shown as yellow. The DNA testers are in gold. Any DNA that the four DNA testers have in common will belong to James Hartley and Annie Snell. However, it will be difficult to tell which. Any DNA that Patricia and Beth share could also belong to Charles Nute which Jim and my family will not share. Here is an example of that on Chromosome 1.

Here is a photo believed to be Mary Hartley with her sister Nellie:

Hartley and Nute DNA On Chromosome 1

This is a Chromosome browser from Gedmatch.com showing where Beth shares DNA with Heidi (1), Joel (2), Sharon (3), Jim (4) and her first cousin Patricia (5). Is the DNA that Beth and Patricia share Hartley DNA or Nute DNA? To find that out we can look at Patricia’s DNA browser. If she shares DNA in this same area with Heidi and Jim, then it will be Hartley DNA.

The above Browser shows Patricia matching Beth (1), Jim (2) and Joel (3). This means that the DNA that first cousins Beth and Patricia share in Chromosome 1 is Nute DNA. If I were to map Patricia’s maternal Chromosome 1, it would probably look like this:

This shows that Patricia got her green DNA (matching Jim and me) from her Hartley maternal grandmother and her pink DNA (matching Beth) from her Nute maternal grandfather.

First Cousins Vs. Second Cousins

First cousins share two grandparent as their most recent common ancestor. Second cousins share two great grandparents and get their shared DNA from one of them. The first cousin DNA matches will be larger in general. The second cousin matches will tend to be smaller.

First cousins

As shown above, first cousins will share the DNA from two of their grandparents. In the case of Patricia and Beth, those two grandparents will be maternal grandparents. The catch is, that when two first cousins match each other, they won’t know which grandparent they match on. They just know that it will be one or the other. In the example above, we did know which grandparent matched because of other second cousin matches.

second cousins – Two common Great grandparents

Second cousins have as their most recent common ancestors two of their great grandparents. But again they won’t know which great grandparent they are matching on.

The best way to identify which great grandparent the gold people match on would be to have a third cousin that is only related on the Hartley side OR the Snell side. I don’t know of anyone in this category right now, so I’m a bit stuck. I would like to figure out which DNA is which. The main reason is that I’m stuck on the Hartley genealogy. I know that Greenwood’s father was Robert, but before that, I’m not sure. If we could find another Hartley relative going back then it might break down the Hartley brick wall.

Any Other Way To Separate Hartley DNA From Snell DNA?

There is one main difference from James Hartley and Annie Snell above as it relates to their DNA. James was born in Bacup, Lancashire, England and Annie was born in Rochester, Massachusetts. All of James ancestors would also have been born in Lancashire. On the other hand, all of Annie’s ancestors that would produce matches go back to Colonial Southeastern New England. That means that if we find a match that is from England and has no ancestors in the United States, there would be a good chance that that DNA match was through the James Hartley side.

Beth’s X Chromosome

First, let’s look at my family. There is  no Hartley X Chromosome sharing with this group because the X-DNA does not travel from father to son.

Second, look at Beth compared to Jim:

Beth got one of her X Chromosomes from her dad. This was the same X that he got from his mother Mary. Jim got an X Chromosome from his mother. She got it from James Hartley b. 1862 and Annie Snell. So Beth and Jim have James Hartley and Annie Snell in common.

These pieces of blue where Beth and Jim match represent DNA that they share from James Hartley and/or Annie Snell.

How do Patricia and Beth compare by X-DNA?

Next we will look at Patricia and Beth. They will share X-DNA with their grandmother Mary Hartley. Beth’s dad got no X-DNA from his Nute dad, so Beth and Patricia will only match on Mary Hartley.

Note here that Beth and Patricia share some X-DNA from their grandmother that isn’t shared between Jim and Beth on the left side. They also share a longer segment at the right hand side than Beth and Jim shared. However, Jim and Beth shared a segment from 123 to 138M that wasn’t shared between Patricia and Beth.

Let’s See How Patricia Compares With Jim

The only comparison left is between Patricia and Jim.

I compared the three comparisons and came up with a bit of an X Chromosome map. In the first match between Beth and Patricia, I have that match in red. On the very right there are three matches, so I have that as great grandparent 1. We don’t know which great grandparent it is – just that it is the same one. On Jim’s map, it is his grandparent 1. Going from right to left on Jim’s map, he changes from getting his X-DNA from grandparent 1 to grandparent 2. However, Patricia and Beth continue to match on great grandparent 1. In the middle there are no matches, so we can’t tell what is going on. Also the two reds and one blue on the left may actually be two blues and a red as we don’t know how they match with the segments on the right.

Beth’s Hartley (and Snell) Chromosome Map

If we look at all the matches Beth has with Jim, my siblings and me, we will have a map of her known Hartley (and Snell) DNA:

I didn’t use the DNA shared between Patricia and Beth as they are first cousins. As such, they will share Nute and Hartley DNA and it will not be as easy to tell which is which. So second cousins are good for these maps. The red is in the bottom part of each chromosome. That represents the paternal chromosome. We have not mapped any of Beth’s maternal chromosome. If Beth were to look for Hartley or Snell matches, it looks like her best bet would be on Chromosome 12.

For comparison, here is my Chromosome Map.

On my map, the blue corresponds to Beth’s red Hartley DNA. We seem to share a stretch of Hartley DNA on Chromosome 1. But where Beth has a long stretch of Hartley DNA on Chromosome 12, I have none.

 

Using M MacNeills Raw DNA Phasing Spreadsheet and My Problem Chromosome 10

I have written many blogs about phasing my own raw DNA. One of the things that was bothering me while going through the process was the presentation of the results. It is possible to phase millions of bases using the raw DNA results from one parent and at least 3 siblings. But once the DNA is phased, how can those results be best portrayed? In my previous Blog on the subject, I was able to figure out a fairly simple way to show my results, but the outcome was not totally satisfactory.

chr7patmatmap

I liked how I was able to get the grandparents’ surnames at least in the first 2 bars. I also liked how I had a simple scale at the bottom. However, one of my bars went too far. Also, my simple chart started at zero and Chromosomes start at different positions. I was able to fix the bar going too far today. Excel makes these bars based on distance rather than positions, so one of my equations was wrong.

I told M MacNeill <prairielad_genealogy@hotmail.com> of my concerns and he sent me his spreadsheet. One feature I really liked about the MacNeill Spreadsheet is that it had a place for cousin matches at the bottom. Below is the first Chromosome where I used my phased raw data from my mom and 3 other siblings to create a MacNeill Chart.

chromosome15macneill

Sharon’s maternal first little segment didn’t work out perfectly, but that didn’t bother me. I know that the beginning and ends of Chromosomes can have small problematic segments. Note at the bottom that my match to Carolyn in yellow shows where my maternal crossover is in the upper part of the chart where I go from red to orange.

My Chromosome 10

I am looking at my Chromosome 10 because, for one thing, I have had trouble trying to visually phase this Chromosome in the past. Here is my attempt at visual phasing from early in 2016:

chr10visphase

Here is another try including additional cousins that tested:

10r1visphase

Note how different the maternal (lower) side is. I switched most of the maternal grandparents around.

Here is the MacNeill spreadsheet showing just the cousin matching part:

cousinmatch10macneill

I have some good matches here. Blue is Hartley, green is Frazer, yellow is Lentz. Red is Rathfelder. This makes it clear that my chromosome is mapped wrong. I need more Hartley and Lentz. The above chart includes my brother who I had tested not too long ago.

Here is another try with my brother’s DNA results included:

10visphase3

My sister Sharon (S) has a better look now on her maternal side. I got rid of the small purple segment.

Looking At the Raw DNA Phasing – Paternal Side

I have two spreadsheet summarizing the results of the many hours of work it took to phase my family’s DNA  from the raw data. One spreadsheet is for the paternal side phased DNA and the other is maternal. I have patterns for both sides. They are based on the order of my siblings: me (Joel), Sharon, Heidi and Jonathan. So an ABBB pattern would mean that Sharon, Heidi, and Jonathan all get their DNA from one grandparent, and I get mine from the other. Here is the paternal spreadsheet:

dadpatternchr10

These patterns go logically one to the other. The first pattern goes from AABA to AAAA at position 2,605,158. The B changed to an A in Heidi position, so the crossover goes to her at that position. I have a column called GaptoNext. This is based on the number of tested SNPs between patterns. When this number is large, I suspect an AAAA pattern. That was the case above highlighted in yellow. Except there is a problem. To go from ABAB to AAAA means 2 changes, and there should only be one change (or crossover) at a time. This caused me to look at the bases.

A Paternal pattern missed

Here is what I found.

chr10patternmissed

I had missed an AABA pattern at Build 36 Position 30,683,878. I took another look by setting my MS Access query so that Sharon and Heidi would have a different base from Dad:

chr10rawpatterns

This shows that the there is a change from ABAB to AABA even sooner than I thought between ID 400008 and 400045. This is an ID I created that sequentially numbers the tested SNPs. You can see another way I missed this pattern, because I didn’t fill in the missing bases. TTC? should be TTCT. CCT? should be CCTC.

What does the missing pattern represent?

The pattern of ABAB TO AABA is actually my crossover (Joel). It is a bit more difficult to see than the others. That is because the ABAB pattern is the same as BABA. The change of BABA to AABA is my change of the first B to the first A. Naturally, I put myself in the first position. In rough terms, that gives me a paternal crossover at about position 30.5M. This is a good location as it does not interfere with a large match that I have with an unknown paternal DNA relative named Shamus:

shamus

Here is my corrected Dad Pattern for Chromosome 10:

dadpatternchr10corrected

I have gone from 6 to 8 crossovers as the previous correction lead to another one. I also took out one of Heidi’s crossovers that I had wrongly identified. So fixing one problem fixed a lot of others. It helps to describe the start and stop of each pattern and to describe each crossover. The important results are the person and the last Position column. These show who the crossover belongs to and where that crossover occurs on the chromosome. I then entered the paternal crossover results into the MacNeill Spreadsheet and got this:

patchr10chart

I took out the large space between the siblings. The problem is that the space is now the same as between the maternal and paternal phased part for each sibling. Excel has no happy medium that I’ve found.

The blue is Hartley and green is Frazer. The raw phasing in the upper part of the chart matches with the cousin matches below. It is interesting that some of the cousin matches define the crossovers. For example, the Jim to Sharon match gives Sharon’s crossover. Also the Paul to Sharon match gives Sharon’s other crossover. The Paul to Jonathan match gives Jon’s first crossover.

The Maternal Side

Hopefully resolving the maternal phasing will be easier than the paternal side. My visual phasing only showed four crossovers. Here is my unfinished spreadsheet showing 5 crossovers (under the Person column):

maternalchr10

Here, it looks like I already added an AAAA pattern to the end. That was because the AABA pattern ended at about 114M and the Chromosome itself ends at about 135M. My GapstoNext column showed that gap as almost 20,000 SNPs. My question now is: should I add an AAAA pattern to the beginning also? Perhaps. An AAAA pattern means that 4 siblings match and all got their DNA in that area from their maternal (in this case) grandmother. Those results were consistent with how I had the visual phasing done. In fact, the visual phasing indicated that the 4 siblings should all get their maternal DNA from the Lentz side up until about 60M. Let’s take a closer look. This gets at my first note above in the spreadsheet image. There were only 3 single SNPs showing the AAAB pattern and they were spaced a long way apart – over 10 Megabases each. In this case, I will disregard those 3 widely spaced patterns as some type of mistake and stay with the AAAA pattern. Once I made the change from the AAAA pattern to the AAAB pattern, that brings us up to about 60M for my (Joel’s) first crossover. That seems to fit well. That leaves us with 4 crossovers – one per sibling as opposed to the two per sibling on the paternal side.

First I’ll compact the Gedmatch browser results, then show the raw DNA Phasing results on the MacNeill Chart:

gedmatchcheckofrawphase

chr10phasemap

When I compare the results, I see a problem I had with the visual phasing. The next to the last crossover looked to belong to Sharon, but instead it belonged to Heidi. Also Jon’s second paternal crossover should have been marked as an “F” above. That was just a typo. The third J for Joel crossover that I had above was not a crossover. In the middle, the 2 close crossovers of J and S should be instead S and J if I’m reading the MacNeill Chart correctly. It looks like all the FIRs and HIRs, etc. match. Once I did the raw DNA phasing, it is obvious how the gedmatch browser results had to match the raw DNA phasing results. Before, I did the raw DNA phasing it was not so obvious.

I’m happy with the results. I get to pick whatever colors I want for the four grandparents. It still would be nice to have some sort of labels or color key. After a hard day of phasing DNA, it is rewarding to see the results displayed so nicely. Thank you Mr. MacNeill.

A few observations:

  • The 4 siblings did not inherit any Rathfelder DNA (brown) on the left side of Chromosome 10
  • Lentz DNA (yellow) is missing from the right side of the Chromosome for the same 4 siblings
  • As I have my mother’s DNA results, that would make up for the missing DNA from those 2 maternal grandparents
  • Short segments of Hartley DNA (blue) are missing near the beginning and near the end of the Chromosome (i.e. none of the four siblings inherited Hartley grandfather DNA in those areas).

Summary

  • M MacNeill has the best display that I am aware of for mapping phased DNA.
  • The final mapping is like the final exam where previous mistakes are brought out, but there is a chance to correct them.
  • The phasing process is difficult, but there are built in checks and balances to find and correct mistakes or missed patterns.
  • The raw DNA phasing procedure (I use the Athey method) would generally be used if a parent has been tested and the visual one is used if a parent has not been tested. However, the visual phasing as developed by Kathy Johnston is important to use as a framework for the raw DNA phasing as well as a check for the end result.
  • The raw DNA phasing results appear to be better than what I was able to get using the visual phasing. Not because the visual phasing method is bad; more because I have not mastered it.
  • If you are using someone else’s spreadsheet, it is a good idea to know how they work in case anything goes wrong.
  • After writing many blogs on visual and raw data DNA phasing, it is nice to see everything come together using the MacNeill Spreadsheets and Charts.

DNA Phasing of Raw DNA When One Sibling is Missing: Part 10

In this Blog, I would like to portray my phasing results in an Excel Bar Chart if possible. This has been one of the most difficult parts a phasing my DNA for me.

I have looked at Stacked Bar Charts in Excel as they seem to be the closest to what I am looking for. Today I looked at a method for producing Gantt Charts at ablebits.com which seems to hold some promise of application for DNA mapping:

bar-chart-excel

I had my Maternal Patterns’ Starts and Stops from my last blog. I took those and converted them to Build 36 and put them in a spreadsheet:

momcrossoverstable

Start is the ID# I was using. Start36 is the Chromosome position of the Start of the pattern in Build 36. App ID is the approximate position of the Crossover. Then I have that same location in Build 37 and Build 36. Following the logic in the Ablebits.com tutorial, I have the first Maternal Crossovers for Chromosome 7 in my simplified Chart:

matfirstxover7

I got this by choosing the Build 36 column and choosing Insert Stacked Bar. I suppose a better Title would have been Chromosome 7 Maternal Crossover rather than Build 36. This was taken from my Column Header. The goal is to get a 2 color bar above. However, I already see a problem. The bar needs to be different colors for different people. Well, I have to start somewhere.

Next, I put in the next crossover location for each person. I took this position and subtracted from it the first Crossover to get a length.

step2crossexcel

You may note that the Bar Chart inverts the original order. It gives Sharon a 4 which is now on top. Here is my visual phasing of Chromosome 7 that I am trying to replicate:

chr7visphase

My Excel Bar Chart order is Sharon, Jon, Joel, Heidi. My visual phasing order is Sharon, Joel, Heidi, Jon. The 2 maternal colors I have above are green and orange representing Lentz and Rathfelder. If I keep orange as Rathfelder, that means I want to change bar 2 and 3 (Joel and Jon) on the Excel Bar Chart. One way to do this is to move over the first Crossovers for Joel and Jon in my spreadsheet:

modchart

However, that made the 2 male siblings’ first maternal grandparent match too long. I needed to move the start over 2 places in my spreadsheet:

mat7revised

Now the Chr7 Maternal Crossover column can be called Lentz and the 2length column can be called Rathfelder.

Next, I added another column for the next Lentz portion of DNA:

chr73rdxover

I was hoping that if I named the next column Lentz, that Excel would give me the same blue as the first Lentz. I was able to right click on the gray and change it to blue. I then added another Rathfelder segment. For this to work in Excel, a Rathfelder length is added rather than a start and stop location.

chr7xover3

Again, I had to reformat the Excel-chosen color to be consistent with what I had for Rathfelder. I chose the last position for Heidi and Sharon as the highest that I had as this was their last segment. After a bit of wrangling with Excel, I was able to get this:

chr7

So that is the presentation. However, I notice that on my visual phasing, I had 5 segments for Jon and only 4 here. I missed his last Rathfelder segment. I had ended Jon’s Chromosome too early. Here is the correction:

chr7corrected

It still looks like one of Jon’s crossovers in the middle of the Chromosome may be off, but I’ll have to figure that out later.

Paternal Bar Chart

Now that I have something that looks like a maternal Chromosome Map, I need the paternal side to go along with it. It looks like if I add 4 more rows to my spreadsheet, I may have it.

I did this and I added Hartley and Frazer (my paternal side grandparents) to the right of the maternal side grandparents. I had to make a new chart that came out like this:

chr7matpat

Here #4 is my Paternal DNA. I found it a bit disconcerting that my paternal side was longer than the maternal. Here I’ve added a bit of formatting and made the colors consistent (one color per grandparent):

chr7patmatmap

Well, I guess I’ll just leave this imperfect. It will give me something to work on later. I did change the scale from millions to M’s to be easier to read.  The above shows that Jon and Heidi share their paternal grandfather’s Hartley DNA un-recombined on Chromosome 7.

Summary and Conclusions

  • Learning how to phase my raw DNA has been interesting and time consuming
  • Delving into the A’s, G’s, T’s and C’s promotes understanding of one’s DNA
  • I owe a lot to M MacNeill and Whit Athey in learning how to do this phasing
  • Due to the data intensive nature of phasing, I would recommend the use of MS Access or some other database software.
  • An understanding of Excel or similar spreadsheet software is also important.
  • I had tested my brother Jon as an afterthought. It turned out that his test results were important in determining the phasing of the 4 siblings.
  • I have the overall skeleton of the phasing with crossovers. There is still a lot of work to complete the individual Chromosomes and trouble shoot problem areas.
  • Further, I have not worked on the X Chromosome due to the different nature of that Chromosome. My brother and I are already phased. My sisters are not.
  • Once these maps are done they will be a reference to all matches to my 3 siblings and myself.

Raw Data Phasing Part 4: Going from 3 Siblings to 4

In my last Blog, I mentioned that my brother Jon’s DNA test results came in this week. This happened in the middle of my attempt learn how to phase the raw DNA data for my 2 sisters and myself. I was phasing the data in what I can only assume is a traditional way. I say I assume, as I haven’t seen any other blogs on the process. The difference is that I am using MS Access which I hope will speed up the process. I should be able to get results for 23 chromosomes at a time instead of just one at a time.

The arrival of the new DNA results poses at least two problems:

  • The previous 4 DNA data files were all in AncestryDNA version 1. Jon’s is in AncestryDNA2. While they are all Build 37, they look at somewhat different points on the chromosomes
  • One of the difficult parts of the previous process was identifying and dealing with patterns of phased paternal and maternal bases. Those patterns were AAB, AAB, and ABB. With 4 siblings, there will be more patterns. However, the Whit Athey Paper I have been following does also look at 4 siblings.

AncestestryDNA Version 1 Vs. AncestryDNA Version 2

My understanding is that Ancestry changed the locations on the chromosomes that they were testing to get more into the medical area like 23andme. I don’t know if that is true. Here is a chart comparing the different atDNA tests:

ancestrydna-compared

I was doing well comparing Anc1 with Anc1 as I was looking at over 700,000 base pairs among 4 people. Once I compare Anc2 to Anc1, that is number is cut down quite a bit. That is about a 40% drop. My only other option, other than re-testing Jon, is to compare Jon to my mother’s FTDNA results. However, that will only pick up 2-3,000 SNPs, so I won’t bother.

Back to Square One with 4 Siblings: Homozygous Siblings

I need to find Jon’s equal base pairs and apply one to his ‘from dad’ column and one to his ‘from mom’ column. That is, after I add all Jon’s data to my database and add those columns. First I need to decide where to add Jon’s data. I could add it to the beginning of what I have already done or to the end. I’ll try adding it to the end, because I think that the work I did already is OK. I want to build on that. So rather than adding Jon’s DNA to the first step, I’ll add it to my table called tblMomBaseFromDadBase. This table has over 700,000 lines of bases for 4 people. Jon’s has 668,942 lines. Actually, when I remove “Chromosomes” 24-26, I will only have 666,531 lines.

Querying Jon into my latest table

Here I am adding Jon and the Mom from Dad Table to my query design:

adding-jon

Access thinks the ID that it added was important, but it really isn’t, so I need to take out that equal join. I really want the join to be at the rsid, but I don’t want an equal join. Why not? If I had an equal join, I would end up only with the positions that Jon has. I will lose 40% of the work that I have already done. Instead, I’ll use an unequal join.

unequal-join

I flipped the 2 tables in the query design area, so things are moving left to right. Then I choose a #2 join which is basically, an unequal join left to right.

Actually, I changed my mind. I have a better idea. I will just do the first 2 steps on Jon’s raw DNA and then join the results together. That is a third way that I hadn’t thought of. The point is, that there are many ways to do things in Access. There can be more than one way to get to where you want to be.

Back to Homozygous Siblings

First I copied Jon’s raw data into a table called tblJonHeterozygousSib. This is so I can use an update query to update the data in the new table and still have the original. Hold that idea. The better idea is to use a make table query. The reason that this is better is that it can take out the “chromosomes” I don’t want:

make-table-query-jon

I took out the table I copied and I’ll make a better one with only Chromosomes 1-23. I hit the Run button and create a table with 666,000 lines:

jonhomosib

Then in the above table, I inserted 2 rows: JonFromDad and JonFromMom. Now this table is ready to phase for any homozygous siblings. By the way, it looks like my Chr23 or X is homozygous, but it isn’t. Ancestry adds an extra base. I only really have one for my X Chromosome.

Finally time to query and phase

I go to Query Design in Access and choose the above table. This is a very simple Update Query design:

qrysibhomojon

This says if Jon’s allele1 is the same as his allele 2, put allele 2 as his base from mom and as his base from dad. I hit the run button for the update and get the dire warning that I’m updating a lot of information, I can never change it back. Then I get a message that I’m updating 478,000+ rows. That is good. Those are the number of Jon’s homozygous bases – quite a few. I’d say over two thirds.

I’m not looking for crazy results and didn’t get any.

Homozygous Mom Query

I’ll copy my previous table into one to update. Then I need to add Jon’s base from mom where mom is homozygous. Easy peasy. I think this is all I need.

momhomoupdatequery

Actually, I did think of an issue. I have an equal join. That means I won’t be using the homozygous bases that mom tested for in the old AncestryDNA test that aren’t in the new AncestryDNA test list. My guess is that is interesting information but perhaps not very useful. It also occurs to me that in the spots where Jon doesn’t match up with my siblings, I will still have the 3 letter pattern work that I had done previously.

The query above says if Mom allele 1 = 2, then put that 2 allele in Jon’s from Mom base slot. I hit Run and pasted 277,000 rows of bases.

homomomforjonresults

This query will be a little more difficult to check. I have to create a query linking my mom’s DNA results to this table. I did that and see one problem already.

momrawtojonfrommom

The first problem is that ID 126 didn’t show up. That means that rs3819001 that Jon has is not in my mom’s raw DNA. I don’t want to have data for Jon that looks like it can be updated, but it can’t.

I think I can fix this.

Updated Table Query

A few steps ago, I ran a Table Query to get just Chromosomes 1-23 into Jon’s Table. I need to upgrade that query so that I am only including the locations (rsid’s) that are common to both my mother and Jon. I do this using an equal join on the rsid Field:

updatedtablequeryforjon

This time, my table for Jon only has the rsid’s that my mom has.

newtable-upda

Also my Chromosome formula was off, so I had to fix it. Also note that I have about the number of rows as per my Anc1 vs. Anc2 table earlier in the Blog. I then re-added the Jon from Dad and Mom columns into the new and improved table. Then I reran the update query which told me I was about to update 284,000+ rows.

homozgygousjonupdate

This worked as well as last time, but this time I have the fewer rows I was trying to get.

Re-Run the update query for homozygous mom for jon

I double clicked on my old update query. The message said I was updating 277,000 rows or so. Now I’ll re-check my work. If there is no ID 126, I’ll be happy. Well it is still there, because I forgot to copy the previous homozygous sibling table into the homozygous mom table. After re-re-running the update, I got the desired results:

tableno126

And there you [don’t] have it: no ID 126. Here is my mom’s raw file compared to Jon’s updated table.

momrawtojonfrommom

Jon gets a G from mom at ID 128 even though Jon is AG, because mom is GG. Now I’m talking DNA.

Merge Jon’s New Table with His 3 Siblings’ Tables

This is the point where I put everything together. I will try to use the Make Table Query for this one again. So I’ll put my newest Jon table together with my newest sibling table.

left-to-right-merge

This shows the left to right arrow join. I’ll want the larger file plus everything equal in the smaller file. Come to think of it, this Create Table Query would have fixed the earlier problem I had. I guess I was too careful! The other issue is that the ID in the 1st table won’t be the ID in the second table. I could keep the second ID, but I would have to rename it as Jon ID or Anc2ID.

newidtablemerge

 

Here I rename Jon’s IDs as JonID. I may not need it, but if I do need it I will have it. I guess MS Access wasn’t happy with my idea:

autonumber

OK, I took out the JonID and hit Run. Microsoft tells me about my new 700,000 row table.

Back to the Dad Patterns

Now that all the family is together I want to look at Dad Patterns, because I know that I will be updating those. Here is the first query I tried on my new Table of 4Sibs.

sharon-not-joel

This is looking for filled in Dad bases where Sharon’s base is not the same as Joel’s. That query gives me an ABAA pattern:

abaa-pattern

Also ABBB:

abbb

Here’s ABBA:

abba

It looks like ABAB is a possibility also. That means the following are possible:

  • AAAB
  • AABA
  • AABB
  • ABAA
  • ABAB
  • ABBA
  • ABBB

So if I chose Joel’s Base not equal to Sharon and then Joel’s base equal to Sharon would I have every combination? It looks like I need this combination to cover all possibilities:

  • Joel <> Sharon OR
  • Joel<>Heidi OR
  • Sharion<>Heidi OR
  • Heidi<>Jon OR
  • Jon<>Joel OR
  • Jon<>Sharon OR

Which in Access looks like:

access-pattern-combos

But Wait, I Forgot Principle 3 for Jon

Principle 3 says where Jon is heterozygous and he knows where he got his maternal base, the other base goes into his From Dad column. Looking back at my old queries, I see this is a 2 step query. I’m tempted to try this in one step, but I think  this got me in trouble before, so I’ll go with the simpler query. Simpler queries are usually better in MS Access.

jonhetero

This says where Jon is missing a phased allele from Dad and he has an allele that doesn’t equal the one he got from mom (making Jon heterozygous here) put that allele into Jon’s From Dad spot. I tried the query and only got 37 results. The problem is, I should have said ‘Is Null’ in the JonFromDad Criteria:

jonheteroisnull

This time I get 35,000 updates, so that is right. I then change the allele1’s to allele2’s above and get 33,000 updates to tbl4Sibs. I ran a quick query on the 4Sibs Table to get just Jons heterozygous results:

jonheterocheck

In the first line, Jon had allele1 as T which was different from the allele from Mom of G, so Jon’s T got put into the From Dad spot. At ID 41, Jon’s allele2 of G is from Dad because he had an A from Mom. When parent and child are heterozygous, the From Parent location remains blank.

Now I have Jon with 3 Principals: Homozygous Jon, Homozygous Mom and Heterozygous Jon.

Back to Dad Patterns

I have the old Dad Patterns for 3 siblings. Now I need to See what the 4 sibling Dad Patterns would be and add Jon’s Start and Stop Locations for his new Dad Pattern Areas. I need to combine that with the 3Sibs Table.

wrongpattern-query

My first query was wrong and gave bad results. The reason is that the ID for 4Sibs was from the raw data. The ID for the Dad Pattern Table just numbered the amount of Dad patterns. I needed to join the ID in the first table to the start and stop locations in the second table. I ended up doing 2 queries: one for the start position and one for the stop as I needed both. This query gives the stop position of a pattern.

stop-query

I took both those queries and put them into an Excel Spreadsheet.

excelstartstopfromaccessdad

I added a new column called Dad4Pattern. In the first row, the new pattern was AAA by chance. However, in the second row which is the Stop or End of the first Dad Pattern, it is obvious that the ABA Dad Pattern goes to an ABAA Pattern. I didn’t think that there would be many AAAA Patterns as that means that all siblings match the same Paternal grandparent. This is the only AAA pattern that I had noted so far as I wasn’t looking for them yet. Still, I will need to go back and verify that these Start and Stop AAAA’s were not by chance. Finally, on the last line, it is clear that the Dad Pattern goes from AAB to AABB with Jon added.

Next I chose all the cells where Jon had a base from Dad and performed a Concatenate operation to write the pattern.

concaternate

This gave me the CCCC that I wanted to check. Next, I wrote a formula to put the Dad bases together in a new column and wrote down the Dad Patterns that I had.

newdadpattern

A few notes:

  • Out of the 66 three sibling patterns that I had, I was able to find all but 5 new four sibling Dad Patterns. See the yellow above for two of the missing 4 sibling dad patterns.
  • The missing 4 sibling dad patterns should be easy to find by scrolling through the 4Sib Table
  • I noticed that there were no AAAB patterns. That is because in my previous search, I was not looking for AAA patterns. So now, I don’t have any AAAB patterns. I will have to find these in my new search.
  • AAAB is the situation where I match the same paternal grandparent as my 2 sisters, but Jon matches the other paternal grandparent.
Filling in more dad patterns

To fill in the yellow areas, I made a query in Access based on the 4Sibs Table. This looked at every case where Jon had a base from Dad. Searching around the ID 6604 and after, I found this pattern:

fill-in-patterns

ABBB

Then I checked near the end of the old 3 sibling pattern which is at ID 19806.

break-point

At ID 19827 we see an ABAB Pattern, so I enter that Pattern in my spreadsheet:

newpattern4

For the start of the new ABAB pattern, I used the old ABA location as that was more precise. The next interesting thing happens at Chromosome 2:

chr2

Here I have a problem in my spreadsheet. For some reason, the Start of the last pattern of Chromosome 2 ends at Chromsome 3, which is not right. My previous spreadsheet was better than that. From the ashes I will re-build.

I note that at ID 108798, my 4 Sib Spreadsheet goes to an ABAB Pattern. At the end of Chromosome 2, I see an AAAB Pattern. That was the one I wouldn’t have had from the 3 sibling pattern as I wasn’t checking on AAA’s.

I added new rows for the patterns ABAB and AAAB:

addnewrows

The most important thing here is the ID, the pattern, the Start and Stop. Here is the new change area from ABAB to AAAB:

chr2change

There are a few SNPs between the ABAB Stop and the AAAB Start that are a little unclear.

end-of-2

Finding Jon’s Patterns

Now I’ll check Jon’s Patterns. I’m looking for any changes in patterns as these should be important as crossovers later. I will need to assign the crossovers to each sibling’s Chromosome Map.

Good Old Triple A – B Pattern and all the others

AAAB is where Jon has a different paternal grandparent than his 3 tested siblings and the 3 siblings have the same paternal grandparent.

aaabquery

My query says that Jon has to be different from each sibling. I run that and insert the appropriate Start and Stop point for the AAAB in my spreadsheet.

I do the same for AABA which I can find using a similar query under Heidi’s criteria:

aaba-query

I ended up going to a clean spreadsheet. It was too messy combining the 4 sibling results with the old 3 sibling results.

4sibpatterns

Here I have the ID, the Chromosome, the pattern and the Start and Stop. The yellow marks a one SNP pattern. It appears that there should be 3 types of patterns:

  1. One where one sibling matches none of the others. That is what I have above: AAAB, ABAA, AABA and BAAA
  2. One where 2 pairs of siblings match each other: AABB or ABBA. I’m not sure what else there could be. I looked above and saw one other: ABAB
  3. One where all the siblings match each other: AAAA

That makes 7 or 8 patterns, depending on whether AAAA is considered a pattern.

Two Pairs of siblings match each other patterns

Here is the Access query for AABB

aabb-query

At first I was missing the criteria under SharonFromDad and that gave me AAAA combinations also. The result of the query looks like this:

aabb-results

Here Joel matches Sharon and Heidi matches Jon but on a different base. After I was finished putting in Starts and Stops for each Pattern, I then sorted my spreadsheet by ID. This brings up some issues that need looking at:

quality-control

Where there are 2 Starts or Stops in a row, there is a need to check what is going on. The ones around the yellow positions may not be a problem as I’ll likely be taking those single positions out. However, at the end of Chromosome, there are 2 starts and 2 stops together. I need to go to ID 236707 and see what is before that point. It apears that there is an AAAA pattern before that point and that the ABAB at 224584 is a single point. That fixes half of the problem. Then I go to ID 238976 to see why I have a Stop there for ABAB.

fix5

I had missed the Start for the ABAB right after the stop of the ABBA pattern, so I added it in. The repaired spreadsheet looks like this.

fix5spreadsheet

An application

Now that I have the change between ABBB and ABAB described, let’s look at what it means. Here is a different look at that location:

heidi-break

When the pattern changes from ABBB to ABAB, what has changed is the third B changes to an A. Heidi is in that location. So that says at the above position of Chromosome 5, Heidi has a paternal crossover. I thought it would be good to check my work against the work of M MacNeill. To do that, I used the NCBI Remap website to change my Build 37 results to Build 36:

remap

This would be the start of Heidi’s new segment. Here is what MacNeill had:

macneill-check

I got it right again. That is 2 for 2. Actually, the first time I tried, I was comparing the wrong Chromosomes. Rookie mistake. Here is M MacNeill’s map for Heidi on Chromosome 5:

macneill5

Perhaps it is difficult to see, but the point I am looking at is the little lighter red segment at the far right of Chromosome 5. Perhaps that is why I missed it the first time as it is so small.

Another Aside is that this was a very difficult Chromosome to decipher using visual methods. This was one of my attempts to figure out the crossovers visually for 3 siblings.

visual-chr5

I had missed the last crossover as it is so small and difficult to see. In my defense, I should note that M MacNeill did mention that the end of this Chromosome was difficult to decipher.

Taking Out the X

I’ve realized that I’ve generated some bases for the X I got from Dad. Of course, I didn’t really, so I’m taking out any bases there for me and my brother Jon. I’ll use this update query:

takeoutx

I was worried that I’d mess something up, so I created  a  new table called 4SibsChrX. My query put dashes in the spots where I couldn’t have an X base from Dad:

xtodash

This looks like a good place to end Part 4. It appears that there should be many chances to quality check my work and that the process is progressing. Getting Jon’s new DNA set me back a bit, but the results should be better than what I’d see with 3 siblings.

 

Raw Data Phasing: Part 3

This Blog is Part 3 documenting my learning process of phasing my DNA raw data using:

Part 1 and 2 Recap

  1. I imported 4 sets of raw data into Access from AncestryDNA after taking out the zeros that the Excel software produced for the no-calls.
  2. I used Access Queries to apply 3 Whit Athey Principles. This resulted in many phased bases for me and my 2 sisters.
  3. I put the phased A’s, G’s, C’s and T’s for each siblings into 2 new columns for each sibling
  4. This resulted in 6 new columns. The first 3 of these six were for the paternally based bases. These resulted in a pattern which was either in the form of AAB, ABA, or ABB.
  5. The Athey Paper did not emphasize the AAA pattern or considered it a non-pattern. While specific AAA results within another pattern area are by chance, there are other areas where 3 siblings match the same grandparent where there will be an AAA-only Pattern.
  6. I separated my results into 3 patterns using Access: AAB, ABA, and ABB
  7. For each of those results, I noted where those patterns changed.  I did this by looking at the ID numbers. Breaks in the ID numbers were considered changes.
  8. However, there were some cases where the changes occurred around missing bases. For these, I went back and noted a more precise position of the pattern change based on where the change would be if the missing base were to be filled in.
  9. I Made a preliminary bar graph using the first 3 paternal changes. These crossovers were mapped to myself and 2 sisters.
  10. Using the 3 patterns I developed Access queries to fill in the missing bases in the 3 paternal pattern areas.

So those were the 10 easy steps. Actually step 10 was difficult as there was quite a bit of refining the Access queries and quality checking the results. I needed 2 queries for each of the pattern areas. However, once I had the queries, it was the push of a button to update missing parental-received bases for 3 siblings within over 700,000 lines of DNA.

Back to Athey

This portion of the Athey Paper appears to apply to where I am now:

For some of the unfilled cells on the mother’s side of the table, we can fill in the alternative (other) base from the corresponding location on the father’s side of the table. That is, we know that the sibling with an empty cell got one base from the father, but the alternative base from the mother. Therefore, after the use of the Dad pattern fills in more cells, a newly filled – in cell in the father’s side of the table gives rise to a filled – in cell in the same position on the mother’s side–the alternative base to what was on the father’s side.

Unfortunately, I’m not sure what is meant above. My guess is that this relates to Principle 3:

Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base. This principle will be very useful in the present approach.

So now that missing paternal bases have been determined based on the patterns, it should be possible to fill in missing maternal bases for heterozygous children. First, I’ll do a Query to see if I can locate this situation. I’ll take my most recently updated Dad ABB Pattern Table update and query that. I’ll look at the situation where there are heterozygous results. Then, I’ll look at spots where there are missing bases from Mom.

Fortunately, I was able to come up with a slick looking Query for this situation:

mom-from-dad

Plus the Query design has some nice symmetry. The first criteria row of the query is for my (Joel) DNA. Reading across, it says Joel is heterozygous because my allele 1 does not equal my allele 2. Then it says that I have a base from Dad but not from Mom. This will show areas where the mom bases are missing in this heterozygous child situation.

mom-bases-to-fill-in

The truncated fields above are Joel Allele 1, Joel Allele 2, Sharon allele 1&2, Heidi allele 1&2. The next 3 columns are Joel, Sharon and Heidi from Dad. Then Joel, Sharon and Heidi from Mom (the last 3 columns). This shows that there are almost 12,000 of these Mom bases to fill in. Above the blue line are Heidi’s bases missing from Mom. Heidi is TC (heterozygous) on that line. Her Dad base is T. I love these binary problems. They seem well suited for the computer. That means that a query could not be too difficult to update almost 12,000 records. So Heidi’s Mom base will be C above the blue line. At the blue highlighted area, I am TC and my Dad base is C. My Mom base will be T on the blue line.

Looking for a Good Query to Fill In Mom Bases from Dad Bases

First, I copied my ABB Table to a new Table called tbleMomBaseFromDadBase. I will want to update that table with a new Update Query. I already have the first part of the query. Now I need my thinking cap. Even better than thinking, I can look at what I did before. Here is my old query.

allele1-query-heterozygous

This is difficult to see, but I split the problem into 2 alleles. What this says is when Sharon has a base from her mom and Sharon’s allele 1 is not the same as the base from her Mom, pop that allele 1 into her base from Dad slot.

For our situation we are doing the opposite. So we will switch Mom and Dad. This time we are using our Dad results to get some Mom results. I’ll also add a criteria to make sure the Mom result is Null, so I’m not overwriting anything. It will just be an extra precaution.

Basically, I want to make sure Heidi has a base from Dad and not from Mom. In that case, when her allele1 is not equal to her base from Dad, put that allele 1 in as her base from Mom. Drawing upon my vast experience in this area of about 1 week, I get this:

allele1dad-to-mom

When I preview the results, I get about 6,000 lines which is half of my previous query, so that seems OK. I’ll go ahead and update my new Table. I renamed my Query to qryMomBaseFromDadBaseAllele1 and copied it to do the same thing with Allele2. I’ll change the Allele’s 1’s to Allele’s 2 in the Query design. First I’ll do a Select (non-updating) Query to show what I’ll be updating with the allele’s 2.

allele2momfromdadselectquery

Here I added the ID numbers, so I can make sure my update went well.

Here is my Allele2 Update Query with the 3 siblings included:

allele2momfromdadupdatequery

The results:

momfromdadupdate

In the far right column is the Base Heidi got from Mom. It was updated on lines 2292, 2295 and 2299. In each case Heidi’s Paternal Base was T and the Maternally derived Base from Dad was C.

Here is my corresponding filled in Mom Base:

joelmomfromdad

My Dad’s T’s in 6 columns from the right were used to fill in the missing C’s in 3 columns from the right. Doesn’t it seem a bit ironic? Even though my dad was not tested for DNA, his “results” from this process are used to find the DNA I got from my mom who was tested.

A Premature End to This Blog and a New Beginning

This will be one of my shortest Blogs. I was both awaiting and not awaiting my brother’s DNA test results. Those results came in this week. The reason I was not awaiting was that I knew that I would need to re-start the raw data DNA phasing process once his results came in. With that, I’ll end this Blog and start a new one.

 

 

 

 

Raw Data Phasing Via Access, Athey and MacNeill: Part 2

In my last Blog on raw data phasing, I went through 3 principals that Whit Athey laid out in a paper on phasing raw data when one parent’s DNA results were missing. Using those principals, and the MS Access program, I was able to sort many of my bases and 2 sisters’ bases into ones we received from our mom and ones that we received from our dad. I checked a few of my results with a chromosome map made for me by M Macneill.

Paternal Patterns

I had gotten to the part of the Athey paper where he talks about paternal patterns of bases that the sibling combinations received. I noted a space between the first two paternal patterns that I looked at. Below the pattern goes from an ABA pattern to an ABB pattern.

change-in-dad-pattern-hilite

There was a gap between the ABA and ABB pattern where there was no ‘pattern’ as my 2 sisters and I shared the same base there. When my sisters and I all share the same base, that is an AAA “pattern”. That AAA area corresponded exactly to the area between the 2 yellow lines below in the chromosome map made for me by M MacNeill – prairielad_genealogy@hotmail.com .

macneill-chr1-hilite

In the map above, MacNeill was able to determine that my 2 sisters and I got our DNA from our paternal grandmother in the area between the 2 yellow lines. Further, the first yellow line described Sharon’s first paternal crossover point and the second yellow line described my (Joel’s) first paternal crossover point.

Finding All the Paternal Crossover Points

At this point in the Athey Paper, he recommended looking at the paternal pattern and filling in the missing bases based on the known pattern. I was looking for an easier way to do this, so decided to take a different approach. I decided that I would find all the paternal crossover points first. Then, armed with that information, I would create a formula that would fill in most or all of the missing bases for each pattern.

However, this required a modification of my database to make the work easier. I wanted a number to define the range of patterns, so that I could apply an easy query to add missing bases. I already had this but I hadn’t used it. Back when I imported the 4 sets of raw data into Access, Access assigned an ID to every row of data. That meant that I needed to add that ID into all the queries that I had done previously to make tables and further queries. This took a while, but I believe that it was worth it.

table-with-id

The ID is the first column.

I started going down all my data and noting the change of each pattern. I put the results into an Excel table. Here the Start and Stop numbers are the Access assigned ID numbers. The ID’s corrrespond with the number of DNA locations looked at. In this case there were a bit over a total of 700,000 of these locations for my mom, my 2 sisters, and me.

excel-pattern

Then I noted the patterns are repeating as would be expected. For example, my first pattern was ABA, but 3 patterns later, that same ABA repeated. My thought was to create a query just for ABA patterns. Then when scrolling down looking for changes, the separation between rows should be greater and it would be easier to see where those changes were.

Here is what my Access query looks like. I changed the query name to DadSpecificPattern.

dad-specific-pattern-queryquery

This particular query gives me the ABB pattern. I have the HeidifromDad base equal to the SharonFromDad base. That makes me the A and Sharon and Heidi the BB of the ABB Pattern. If you think about it, that also means in these areas that Heidi and Sharon will have their base from the same paternal grandparent and mine will be from the other paternal grandparent. I’m learning as I go. I’m sure that information will come in handy later.

My plan seemed to be good, but there was one catch. Once I refined my query, most or all of the blanks disappeared. That meant that the start and end points might not be exact. Here is an example of what I mean.

change-in-pattern-rouch

This is from my old Dad Pattern query with the blanks still there. The change from ABB to ABA happens at ID or line 19809. However, the new query takes out the blanks to make it look like the change is at ID Line 19826.

Here is what my DNA results look like so far without a filter (or query). The last 3 columns are the bases from Dad columns. There is a lot going on between lines 19809 and 19826.

pattern-unfiltered

Once I apply a formula to add bases, it will say something like: In the lines that have the ABA pattern where there is a blank at either A spot, replace the blank with the A that is there. If I apply the rule too late, I will be missing an area. Worse, If I were to use the 19826 cutoff, I may be still using the previous rule. That rule would say basically the same thing except, “Where the row is ABB and one of the B’s is missing replace the missing B with the one that is there.” If I apply an ABB rule to an ABA area, I’ll get bad results.

Long story short, I ended up recording a rough start and stop in my Excel Spreadsheet.

revised-spreadsheet-for-pattern

I started naming the segments, but realized that was not necessary. Some of the patterns were only at one point rather than in a long segment. I believe that is an anomaly due to a bad read, mutation or some other problem. Those are the ones in the spreadsheet that had no end point. It took me part of a morning to get all the paternal crossover pattern points for all 23 chromosomes. Fortunately for 3 siblings, the patterns are only ABA, AAB and ABB.

I just went back and checked the error points/aonomalies. I reran the Heterozygous Sibling Query and it fixed at least the first problem and hopefully the others. When I added the ID’s in, I had to redo all the queries quickly, so I suppose that is where the errors came in. That is not a problem as long as the problem can be found a fix can usually also be found. There actually weren’t that many errors. There are still some anomalies that are just anomalies. I have left those in yellow in the spreadsheet image below.

So in my spreadsheet, I have all the rough starts and ends for all the crossovers for my 2 sisters and myself. Here is the top part of the spreadsheet sorted by rough start:

rough-start-sort

Next, all I need are more exact start and end points. Here is the start of what I have:

pos-and-id-and-pattern

I picked this section because it looks pretty complete already. Note that my Start and Stop numbers are pretty close to each other. That means that there are no other AAA segments in-between. I had to do an additional Access query to add in the position numbers for the Start and Stop of each chromosome’s pattern change. This was important if I want to convert the results from Build 37 to Build 36 to compare to MacNeill’s work or to gedmatch.com.

Starting to Find Paternal Crossovers and Assigning to Siblings

Previously I had been calling the start and end of my patterns crossovers. These two terms aren’t totally interchangeable as the start or stop of a pattern may happen at the beginning or end of a Chromosome and therefor not be a crossover at that point. It seems like it should be pretty easy to find the crossovers. Look at the image above. The first and second rows show ABA going to AAA. The order in me and my siblings are JSH or Joel, Sharon and Heidi. The only letter that changes is the B to A. That is the position that Sharon is in, so the paternal crossover has to go to her. From row 2 to row 3 the pattern changes from AAA to ABB at Chromosome 1, position 23,288,828, Build 37. That doesn’t mean that 2 siblings have a crossover there as we are looking at the patterns, not the letters. It is actually the letter that stayed the same that represents the crossover here. AAA to ABB means: all the same (AAA) goes to one different and 2 the same (ABB) – in this case Sharon and Heidi). The one that is different is me and I get the crossover at this location. The next change is from ABB to ABA. This is a little harder to see. I would say that that this crossover goes to Heidi if my reasoning is right. BB was the same before and goes to BA. It must be Heidi that changed because now she matches Joel who didn’t change. I’ll need to figure out how to make better bar graphs in Excel, but here is how the beginning part my father’s Chromosome 1 broke up for 3 of his children. Or another way to look at it the vertical lines are where my father’s maternal and paternal chromosomes combined in each of his 3 children that we are now looking at.

excel-bar-chart-chr1

Where:

  • Series 1 is Sharon. Where the color goes from blue to orange is where Sharon has a change from one paternal grandparent’s DNA to another paternal grandparent’s DNA. The number to the right of Series 1 is the Build 37 Chromosome position number for Sharon’s crossover.
  • Series 2 is Joel’s first crossover (between orange and gray) and
  • Series 3 is Heidi’s first crossover position between gray and yellow [The same explanation under Sharon above applies to Joel and Heidi]

I’ll go back to the M MacNeill Standard. It’s like having an answer sheet to my questions.

macneill-chr1-hilite

According to MacNeill, I have assigned the crossovers to the correct siblings. In the above chart, just look at the red. I haven’t gotten to the maternal part yet, which MacNeill has in blue. The first 3 crossovers are where the red changes from light to dark or dark to light red. The difference in the MacNeill Chart is that his chart is split out one bar for each sibling. The other difference is that MacNeill has build 36 Chromosome position numbers and the numbers I have are from Build 37.

The Process

  1. Phase the siblings into maternal and paternal DNA using the principles that Athey outlines
  2. Find the paternal and maternal crossovers by pattern changes
  3. Assign the crossovers to the correct sibling using the pattern changes
  4. Assign the segments to the correct grandparent. This requires knowledge of cousin matches on the appropriate grandparent side.

That is the big picture which I am understanding as long as I don’t get too lost in the details.

Back to the Details: Fill in More A’s, G’s, C’s and T’s

I have been setting up my data for this, so hopefully, this will be easy. I now have 3 areas to look at:

  • AAB
  • ABA
  • ABB
AAB paternal update

Now I go back to my spreadsheet and sort it by Dad Pattern:

sort-by-pattern

The Start and Stop areas are the ones I want to update. First, I’ll copy my most up to date Table in Access which is tblSibHetorzygous. I’ll rename that tblDadPatternUpdate. Then I want to look for missing data and update the blanks using the AAB pattern.

In Access, I create a query with the new table.

dad-pattern-update-1

I chose the position fields and Paternal Pattern fields. I will change this to an update query which adds an Update To row. The criteria I want is when JoelFromDad = Sharon from Dad (AAB). Actually, I forgot, I was going to use ID criteria. So in the ID field, I need a lot of information. For the first AAB segment, I need everything between ID 45393 and 54155. This is what the criteria looks like:

aab-first-area

When I choose that area, I get over 8,000 lines. However, I only want to update when there is one missing value in the first 2 and the one that isn’t missing is not equal to the third. Here is the result of the above query in my first AAB area:

aab-patterns

I assume that the first blank should be a T. This would be one of the AAA results by chance in an AAB area. I don’t want to fill in the second line as I don’t know if it will be GGG or something else. That is what I meant by saying I don’t want to fill anything in unless there is only one missing value. In the 5th line there is A?G. That would have to be AAG (in an AAB Pattern area). There are some lines that have everything missing that I don’t want to touch.

How to create a query?

First, I want the situation where Joel doesn’t equal Sharon or Joel Doesn’t equal Sharon. That would create an AAB situation:

heid-not-joel-or-sharon

This query results in 1,666 rows of data including rows that are already filled in. Note that I had to write the range of ID’s twice because in order to get an OR situation I needed to put Joel not equat to Heidi and Sharon not equal to Heidi on separate lines. A simpler query is this one:

heidi-not-joel-or-sharon-one-line

The above achieves the same results in one line. Now, for this query, if Joel is blank, replace it with Sharon’s results. If Sharon is blank, replace it with Joel’s results. Here is the query prior to the updating part:

joel-sharon-blanks

This shows that there are 29 blanks for Joel and Sharon meeting this AAB criteria in the first range of AAB’s:

29-records-aab

Next, I apply the same logic to all the AAB segments. In the Expression Builder of Access, I type in this simple formula:

Between 45393 And 54155 Or Between 60990 And 72548 Or Between 207109 And 220679 Or Between 313271 And 317516 OR Between 326845 And 326912 OR Between 389395 And 390311 OR Between 400045 And 405578 OR Between 419982 and 427158 OR Between 433191 And 446672
OR Between 482297 And 492542 OR Between 532520 And 539292 OR Between 571557 And 579594 OR Between 589614 And 589666 OR Between 630037 And 630314 OR Between 630319 And 630378 OR Between 658744 And 659375 OR Between 670533 And 672360 OR Between 673325 And 682544

Simple but long. This has the AAB Starts and Stops for 23 chromosomes. Then I copy it into the next ID criteria line and get this result:

all-missing-aabs

It took a few minutes to type the criteria, but the goal is to update 1,514 lines of missing Paterrnal Pattern data with the push of one button. I still think it is quicker than going line by line and will be more accurate if I got the criteria right.

Next, I change the above Select Query to an Update Query.

paternal-aab-update-query

When my (Joel’s) base from Dad is missing, I update to Sharon’s base. When Sharon’s base from Dad is missing her base is updated with mine. Isn’t sharing great? I didn’t look at the case where Heidi’s base from dad was missing, because if that was missing we wouldn’t be able to see any AAB Pattern.

Let’s UPdate

I push the run button and check the results. Here is my standard dire warning:

standard-dire-warning

Now I will check if it worked. I’ll try ID or Line # 682124:

bad-aab-results

Unfortunately, that was an undesirable result. Before I had A?G. I changed this to ?AG. It appears that my query both replaced my value with Sharon’s, but replaced Sharon’s with my blank. I hadn’t expected that. Next, I’ll check ID# 682182. I had ?AG and replaced it with A?G. So until, I can think of a solution, I’ll need to split the 2 queries.

Fix it! Quick!

First I recopied by Heterozygous Sibling Table back to the Dad Pattern Update 1 Table. This got the table back to the way it was. Here is my simpler query.

dad-aab-simpler-query

Here if my base from Dad is null, replace it with Sharon’s base from Dad. I’ll check ID# 682182 again:

second-mistake

This gets into the category of trial and error. Sharon’s result still got replaced with nothing. See in the previous query I still was telling Access to put update Sharon’s results with mine. I needed to take that out:

fix

There. Now the SharonFromDad Update To is blank. I go through the same procedures and now it looks right.

right-results

We now went from ?AG to AAG in the last 3 columns. These are the bases from Dad columns.

The next step is pretty easy:

sharon-missing-aab

I took out my criteria and put criteria in the SharonFromDad field. When she has a blank, replace it with Joel’s base from Dad. I hit run and it updated over 600 rows. Here is my original check spot at ID# 682124 with better results in the last 3 columns:

better-results

It took a while, but at least I got it right. The moral of the story is to not ask Access to do 2 things at once when those 2 things involve the same 2 people.

The Next Step: ABA

This time I’ll try a different query. I want there to be a B from the ABA in each case, so I’ll make sure that Sharon’s base from Dad is there:

aba-query

Maybe I’ll figure what went wrong last time or come up with a new error. Above, I want the criteria on the first line to be for my blank base: If Sharon’s base from Dad is not equal to Heidi’s Base from Dad Put Heidi’s base from Dad in my blank spot. For Heidi, When Joel’s base from Dad doesn’t equal Sharon’s base from Dad, put Joel’s Base in Heidi’s spot.

I’m so tempted to try this query, but before I do, I’ll copy the previous table of the DadPatternUpdate to a new Dad Pattern Update ABA Table.  This will preserve what I have in the now older DadPatternUpdate Table in case anything goes wrong. Hey, what could go wrong?

query-aba-dad

I pushed the Update Button and updated over 30,000 rows. The results don’t appear to be any better, so I’m back to my 2 step process.

Here is my new slimmed down query:

slimmed-down-query

This new Update Query should update my Line 18 in the new UpdateABA Dad Pattern Table and it does:

lne-18

I now have a full ABA pattern on that line. According to Access over 30,000 Lines were updated, so it wasn’t a total waste of time.

heidi-aba

Run and check Line 149:

check-149

We have ABA in the last 3 columns, so that is good. Line 18 is still OK. I checked it just to make sure.

Query AAB Revised

After seeing how well the ABA Query went, I decided to revise the old AAB Query:

aab-query-rev

This is now looking at over 37,000 rows. This updates my AAB Blanks to tblDadPatternAAB. I don’t know if it is a better query, but at least I’m being consistent.

sharon-missing-aab-rev

This was over 80,000 rows, so I’ll assume that bigger is better.

I copied that resulting Table to tblDadPatternUpdateABA and reran the 2 ABA Update Queries. Here is one of the rerun queries updating the ABA Paternal Table:

rerun-aba

Down to ABB

My Last updated Paternal Table was updating ABA, so I’ll copy that to a new Table called tblDadPatternUpdateABB. I’ll also copy my last query and put in the appropriate Starts and Stops for the paternal ABB patterns. Again,

abb1

This says when Joel’s base from dad is not the same as Heidi, put that Joel from Dad into the space. Probably a more precise query would have said when Sharon from Dad is null and Joel from Dad is not equal to Heidi from Dad. I suppose technically the above query could be writing over a base with the same base in most cases.

I’ll fix that and notice that I had the wrong table in the top, so I’ll change that also.

abb-rev

This only updated 944 rows, so maybe bigger is not better. Here is Part 2:

abb2

This was almost 3,000 rows updated. Now I should check if it worked. I scrolled for an ABB Pattern in an old query and found this:

dad-pattern-abb

Here is my check:

abb-check

I guess I’ve been working too long. Here I have an AAB instead of the ABB I wanted. That is because I had Heidi updated to me (the A) instead of Sharon (the B). Here is the correction:

abb-corrections

I made a fresh Table of ABB. When I opened up the Query, it was saved this way:

corrected-abb

So Access changed my query. Note that there are 2 fields with HeidiFromDad in them. One is for the Update To and the other has Criteria. That is probably a clearer way to do it. Who should argue with Access?

I updated that and I take a cue from Access for Part 2:

access-abb-part-2

In English, the above says, “For this range when JoelFromDad is not blank but Sharon from Dad is, and Joel from Dad has a different value that Heidi from Dad, put that Heidi from Dad value where Sharon had the blank. It sounds a little complicated.

Back to Row 197704 and I’ll look at 197709 while I’m at it:

corrected-abb-pattern

Oh no, it is still wrong! I checked the previous ABA Table and that was the reason for the error. The error is also in the old AAB Table. However, the error was not in the file before that. My guess is that the AAB rule got applied to the wrong range of rows. I don’t see an error there, so I’ll have to rerun all the queries.

That’s OK, because I’m brushing up on the queries and will use the Is Null value so we will only be filling in the missing bases.

rev-aab-query

I had more problems, so I deleted the AAB Table and recopied the previous Table into it. I reran the Revised AAB Query halfway and it looked OK. However, when I ran the second half of the AAB query – filling Sharon’s results, the problem came back at ID# 197704. Very mysterious. The problem was where I thought it was originally. Look at the ID Criteria for the AAB Pattern Query:

the-problem

There is an extra digit in the first between. The range goes from 45393 to 544155. The second number should be 54155. So this query was performed on 450,000 more rows than intended. I updated the AAB query with fewer rows. Again fewer is better. After many requeryings, I got the desired result for ID# 197704:

197704

That should be the end of the first phase of nit picky work on the Paternal Side.

Summary, Conclusion and What’s Next

  • This was a lot of work, but the good news is that this update is for all the Chromosomes at once.
  • The bad news is that I have to do this again for the Maternal Side
  • Next up should be easy. That is just re-applying the Principles that Whit Athey Outlined on the new bases that I added from knowing the patterns. This should update missing maternally received bases from the updated paternally received bases.
  • I haven’t filled in blanks for the AAA patterns yet.
  • I am a little ahead of the game as I looked at how some of the first paternal crossovers will look.
  • Also with some basic phasing, I was able to deduce who those first paternal crossovers belonged to – one each to my two sisters and one for me.
  • If anything can go wrong it will

More Hartley DNA – Patricia’s DNA

This blog is a follow-up on my last Blog: My Hartley Autosomal DNA. I was inspired to write that blog following this year’s Hartley reunion in Rochester, Massachusetts. I intended to send around a little poster I made up about Hartley DNA and get a DNA sample from my father’s cousin Martha, but didn’t get a chance to. Instead I wrote a blog. I did talk to Patricia though. She is my second cousin and the sister of my childhood best friend, Warren. She had taken an AncestryDNA test. I think her daughter bought it for her. I asked if she could upload her DNA to gedmatch.com and she said that her daughter would be good at doing that.

Here are Patricia’s 2 brothers and Patricia. The one in the middle was my best friend in my first 6 years of school. I remember seeing home movies of Curtis, Warren’s older brother. He came to one of my older siblings’ birthday party when he was about this age.

Patricia and family

In my last blog, I wrote about the Hartley DNA matches my father’s first cousin Jim had with me and my 2 sisters. I was surprised to find out that every match that we had represented one of my four 2nd Great Grandparents. They were all born around the 1830’s. It turns out that Patricia’s matches with cousin Jim represent the same four 2nd great grandparents. In addition Patricia’s DNA matches with my 2 sisters and me represent the same four old timers.

Here is what my DNA match to Patricia looks like at AncestryDNA:

Patricia Ancestry

Here, AncestryDNA has it right that we are 2nd cousins. They show we match for a total of 206 cM (centimorgans) across 14 DNA segments. That is about all you can get out of ancestry. They won’t tell you which chromosomes we match on or how much we match on each chromosome. That is why people upload their results to gedmatch.com. Ancestry does show other people that match DNA to both Patricia and me. These are my 2 sisters and 5 others. All these people also descend from the same Rochester Hartley ancestors, but none of them have uploaded their results to gedmatch.com, so we don’t know their detailed DNA matching information.

Here is the same match between Patricia and me at Gedmatch:

Pat Joel Gedmatch

Ancestry has 14 segments vs. the 8 at Gedmatch. But at Gedmatch we know on which chromosome we match, how much on each chromosome and the exact start and stop location on the Chromosome. However, even with Ancestry’s 14 segments, their total is a bit smaller. Here is how I match Patricia on Chromosome 15 in the Gedmatch Chromosome Browser:

Joel Pat Chr 15

The blue areas represent the two DNA matches Patricia and I have on Chromosome 15.

Patricia on the Hartley Family Tree

Growing up, Patricia’s grandmother was my great aunt and also one of my neighbors, my Aunt Mary.

Patricia's Tree

The bottom box in each row are the people that have tested their DNA and uploaded to gedmatch.com. I now show 3 of the 13 children of James Hartley and Annie Louisa Snell (James, Mary and Annie). I now can check how my sisters and I match Patricia’s DNA as well as how Patricia matches Jim’s DNA.

Here are my great grandparents and three of their older children.

James and Annie Hartley

It is in interesting photo. Two of the children are looking away. I think that one is my grandfather James. The mother, Annie, is looking at something in her hands. The older son Dan is looking at a book and the father James doesn’t look comfortable being dressed up.

Patricia’s DNA at Gedmatch

One of the basic functions at gedmatch is called ‘One to Many’. In this case, I took Patricia’s DNA and compared them to everyone else that has ever uploaded their DNA results to gedmatch. Here are her 1st 4 matches:

Patricia's 1st 4 matches

Not surprisingly, her top matches are her 1st cousin, once removed, Jim, me and my sister’s Sharon and Heidi. The Gen column lists how far away gedmatch thinks Patricia’s matches are to a common ancestor. Patricia and I are 3 generations to James Hartley and Annie Snell, so that is right. Patricia shows 2.6 generations to a common ancestor with her match to Jim. A first cousin once removed would typically be 2.5 generations, so she shares a little less DNA than average here with Jim. Patricia also shares 19.3 cM of the X Chromosome with cousin Jim which I find interesting.

The Hartley X Chromosome

I’m taking the X Chromosome out of order because I find it interesting. There is one most important thing to know about the X Chromosome. If you are a male, you get one from your mother. If you are a female, you get one from your mother and one from your father. My father only got an X chromosome from his Frazer mother, so he doesn’t match anyone further up on the Hartley line by the X Chromosome. However Patricia and Jim both have maternal matches that carry up the line.

Here is how Jim got his X Chromosome from his mother and her ancestors:

Jim's X Inheritance

Jim only inherited his X Chromosome from those ancestors in pink or blue. So, for example, he got no X Chromosome from any Bradford before Harvey Bradford.

We need to compare Jim’s chart with Patricia’s X Inheritance Chart:

Patricia's X Inheritance

Here I didn’t show the X Chromosome that Patricia got from her father as this won’t match Jim. Then of what I show, only the bottom half will match Jim. This means that going back 4 generations from Patricia, she could match Jim by the X Chromosome on the Emmet, Snell or Bradford Line. One other difference between Jim and Patricia is that Jim got 100% of his total X Chromosome from his mother and Patricia only got 50%. However, that is a confusing way to put it because Patricia did get 2 X Chromosomes. So her one 50% must be similar to Jim’s 100% if that makes sense.

Here is what the X Chromosome match looks like between Patricia and Jim at gedmatch.com on their browser:

Jim Patricia X Match

The yellow part with the blue under it is where they match at the end of the X Chromosome. That is enough on my X diversion for now.

Back to the Hartley DNA Matches on the Other 22 Chromsomes

At gedmatch, I go to the Jim’s ‘One to Many’ matches to see how he matches my family and Patricia. Here are Jim’s top 4 matches. You may have already guessed who they are:

Jim's top 4 matches

Above, I said that Patricia matched Jim a little less than expected. My sister Heidi at the top of the list matches him a little more than average.

Here are Jim’s DNA matches on Chromosome 1

Pat Chr 1

  1. Me
  2. Heidi
  3. Sharon
  4. Patricia

Here Patricia has identified a new piece of DNA in green that is a Hartley ancestor that we didn’t know about before. Again, this “Hartley” ancestor may be Hartley, Emmet, Snell or Bradford.

Here is another new Hartley segment on Chromosome 2:

Pat Chr 2

Patricia matched Jim on Chromosome 2. My sisters and I had no match with Jim on that Chromosome.

It looks like Patricia got a double segment of Hartley DNA on Chromosome 5:

Patricia Chr 5

Patricia is #1 above. Where the color changes from orange to yellow likely represents a change from Greenwood Hartley to Ann Emmet DNA or Isaiah Snell to Hannah Bradford DNA.

Patricia Helping Me Map My Chromosome 7

I’ve tried to map all my chromosomes as well as my 2 sisters’ to my 4 grandparents. I got a little stuck on Chromosome 7:

Chr 7 Map Pat

My chromosome 7 depiction is the one with the J to the left of it. On my paternal side (which is the blue (FRAZER) and red bar), I have the DNA I got from my dad’s mother in blue and my dad’s Hartley dad in red. Above that is the gedmatch depiction of how I match my 2 sisters by DNA and how they match each other. The bright green bar is called the Fully Identical Region or FIR. This means wherever that occurs a sibling matches the other sibling by getting the same DNA from the same 2 grandparents (one maternal and the other paternal). So in comparing Sharon to Heidi, they have that FIR from 0 to 25. It turns out that their 2 grandparents were their mother’s mother (Lentz) and their father’s father (Hartley). In the tiny section between 0 and 4, I have what is called a Half Identical Region or HIR. That means that I shared one grandparent’s DNA  with my sisters and the other grandparent I didn’t get any of their DNA. In this case I had to share either the Lentz or Hartley grandparent with my 2 sisters, but I didn’t know which.

That is where Patricia’s results came in handy. Here is how she matches Sharon, Heidi and me:

Patricia Chromosome 7

Patricia has 3 good matches with Sharon and Heidi and one tiny one with me (#3 on the Chromosome Browser). However, the tiny one is the one I need. The pink match shows that my Chromosome 7 from 0-4 (in millions) is where I got my DNA from my Hartley grandfather and not my Frazer grandmother.

Here is my completed Chromosome 7 thanks to Patricia. I extended the Rathfelder on my Chromosome 7 all the way to the left or beginning and added a small chunk of red Hartley from my grandfather.

Chr 7 complete

Another Type of Chromosome Mapping

There’s is another type of Chromosome Mapping developed by Kitty Munson. The way the Munson Mapping is generally used is to map out your relatives’ common ancestors. In the case of Patricia and Jim our common ancestors are James Hartley and Annie Louisa Snell. Here is what my new Chromosome Map looks like with the addition of Patricia’s DNA matches with me shown in blue.

New Kitty Map for Joel based on Pat

Well, that’s about enough for Patricia’s DNA for now.

Summary and Conclusions

  • Patricia shared the first Hartley X Chromosome match that I’ve seen.
  • The X tends to shy away from the male line, so Patricia and Jim’s match is more likely down somewhere in the Massachusetts colonial line rather than the English Line.
  • I would like to use Hartley DNA to break through the Hartley genealogical brick wall. Right now I’m stuck in the early 1800’s in Trawden, England. There were too many Hartleys there with the same first name to figure out who was who. Patricia’s DNA may help in finding matches to other Hartleys
  • Patricia’s DNA helped me in mapping my chromosomes in 2 different ways.

 

My Hartley Autosomal DNA

I have written many blogs on DNA but I don’t think that I have written about my Hartley autosomal DNA. Autosomal DNA is the kind of DNA test of which Ancestry claims they have tested over 2 million people. Autosomal looks at the DNA we get from both our parents and their parents and so on until the DNA runs out. And it does run out for some ancestors at some point. Due to this effect, very little of my DNA is actually Hartley DNA. If you think of it, I got half of my DNA from my father, but he got half from his father, his father got half his DNA from his father and so on.

Paternal DNA from Maternal DNA

The best way to get your paternal DNA is to test your father. This avenue was not available to me. However, I was able to test my mother. Gedmatch.com has a utility available that will separate out the DNA I got from my mom from that which I got from my dad. That utility does not recreate my dad’s DNA, but it does recreate most of the portion of DNA that I got from him.

Here is what the utility looks like. It is quite simple to use and works quickly.

Phased Data Generator

Once I have this information, I can run the results against all my matches to find out which of my matches are from my dad and which are from my mom. There are also those that match neither which may be considered false matches. This takes out a lot of the guesswork with our matches. It makes life twice as easy.

Paternal DNA from Testing a Paternal Relative

The other way to find paternal (that is Hartley) DNA is to test a paternal or Hartley relative. That is when I went to my father’s cousin Jim and asked him to take a DNA test. He was willing and I have some Hartley matches. I also had tested myself and my two sister’s. Here is what Jim’s DNA results look like compared to me and my 2 sisters on a Chromosome Browser:

Hartley DNA

I find this graphic interesting. It shows that Jim matches me and my 2 sisters on almost every chromosome. The last chromosome is the X Chromosome. It was cut off a bit. However, Jim could not match us on the X as my father only got his X Chromosome from his mother who was a Frazer and not a Hartley. On Chromosome 13 my 2 sisters and I have pretty much the same match with Jim. The 3 bars are of equal length. On Chromosome 20, only my sister Sharon matches Jim. On Chromosome 11 we all match but at different amounts. My sister Heidi has the largest match there. The places where we don’t match, my family is busy matching the other 3 grandparents. Or perhaps Jim is busy matching on his father’s non-Hartley line.

What Do All Those Matches Mean?

All those matches represent Hartley DNA. But remember that I said that even our Hartley DNA consists of other families. So the answer is a bit more complicated. First I will show the Hartley genealogy relative to the DNA match between Jim and my family. That will help explain all these DNA matches. In the first line below, Greenwood Hartley was from Trawden, England. Ann Emmet was from Bacup, England. Isaiah Snell had non-Pilgrim colonial ancestors. Hannah Bradford had Pilgrim Colonial ancestors.

Greenwood DNA

I have those with Hartley DNA in green. Those that have no Hartley DNA are in blue.

Here is Greenwood Hartley and Ann Emmet:

greenwood

Probably Hannah Bradford and Isaiah Snell at their house in Rochester, Massachusetts:

Hannah Isaiah

Every match between Jim, me and my siblings represents a specific Ancestor from the 1st line above

The common ancestors between Jim and me are James Hartley born 1862 and Annie Louisa Snell born 1866, but the DNA represented between Jim and me is actually their parents who were all born around the first third of the 1800’s. This was just made clear to me within the last few days. I know, it gets confusing. That means that out of the 1/4 of my DNA that is Hartley (as I have 4 grandparents), only 1/4 of that quarter is Hartley when we go back to where the DNA came from. That means that every orange, blue or green bar in the first image represents one of the 4 ancestors from the early 1800’s above.

How We Get Our DNA

When we were conceived, we got our own blend of DNA. That DNA was really from our 4 grandparents. We got equal amounts from our mom and dad, but the amounts we got from their parents was blended and we may have not gotten an exact 25% from each our grandparents. We all actually have 2 of each chromosome. One is paternal and one is maternal. For example, the siblings James Hartley b. 1891 and Annie Louisa Hartley b. 1902 received on their paternal chromosome alternating segments of Greenwood Hartley and Ann Emmet DNA. Likewise, on their maternal chromosomes, they had alternating DNA from Isaiah Snell and Hannah Bradford. Those mixtures of their 4 grandparents was passed down to Jim, me and my 2 sisters and is represented in the Family Tree DNA Browser that I show above and again below.

How Can We Tell Which Segment Matches Which of the Four Ancestors?

For example, it would be nice to know if Heidi’s Chromosome 11 match with Jim shown in green below represents  Hartley, Emmet, Snell or Bradford.

Hartley DNA

The best way to find out which segment represents which ancestor is to do additional testing.

Test:

  • A Hartley relative not related to Emmet, Snell or Bradford
  • An Emmet relative not related to Hartley, Snell or Bradford
  • Etc.

Well, I think you get the picture. Once one of these people is tested, they would be a reference and any match Jim or my family had with them would be from the Hartley, Emmet, Snell or Bradford lines. The problem is, where are these people? There may be Snells around not related to Hartleys, but I dont’ know of many Hartleys not related to Snells. Sorry for the double negative.

Another way is to wait until one of these Snells not related to a Hartley shows up on a DNA match list. This doesn’t work for Ancestry matches because AncestryDNA doesn’t tell you which chromosome you match on. However, if they were to upload their results to gedmatch.com, then the segments could be identified.

why do we want to identify these segments?

Well, for one, some find it interesting to know where they got their DNA from. Another reason is, that once these are identified, then we know right away where to look for an ancestor match. For example, if we knew a match was on the Bradford side. We would look for a common matching ancestor descending from the Mayflower perhaps.

Summary and Conclusions

  • When I tested my Hartley father’s 1st cousin, I got a lot of DNA matches on most of my chromosomes
  • These matches represent 4 of my 2nd great grandparents
  • These four 2nd great grandparents represent Trawden and Bacup, England and Colonial Pilgrim and non-Pilgrim lines.
  • So far, I have not been able to figure out which colored bar represents which 2nd great grandparent.
  • There may be some advanced techniques that could help me tease those out. Or I may be able to find those out by testing appropriate relatives if found.
  • The older generations are the best for testing as the further you get from your ancestors, the less autosomal DNA you carry. It reduces by a factor of 4 every generation.
  • Those relatives that have tested at Ancestry should upload their results to gedmatch.com for comparison.
  • One of my Hartley 2nd cousins has uploaded her DNA results to gedmatch.com and that will be the subject of my next Blog.

Slimming Down My Big Fat Chromosome 20

In a previous Blog, I mentioned My Big Fat Chromosome 20. I had discovered, for some reason, that more than one half of all my matches were on this Chromosome. This can be seen visually using a Swedish web site called dnagen.net.

dnagen circle chart

Here the default setting is at 200%. That means that only the matches that are twice as large as the median are shown. This program uses FTDNA matches. The match names are on the outside of the circle and the lines going between the names are what FTDNA calls ICW or (In Common With). I just noted today that there is a group on this circle that doesn’t connect with others at about 9 o’clock on the circle. These matches like to stay in their own Chromosome apparently. They are in a dark color which I take to be Chromosome 3. However, that is an aside.

The real point is to show Chromosome 20 in the dark green in the lower right half of the circle. Chromosome 20 is the Hong Kong of Chromosomes. In a little space, I have  lot of matches. Remember that Chromosome 20 is one of the smaller Chromosomes. If I have about 4,000 matches, that means that over 2,000 of them are on Chromosome 20. In my previous Blog on Chromosome 20, I determined that these matches were on my Frazer grandmother’s side. Her 2 parents were born in Ireland. That means that these matches represented Irish matches and not Colonial American matches as I had previously assumed.

The Progression of Sorting Matches

Autosomal DNA matches may be grouped in different ways. When I first tested, I got a bunch of matches at FTDNA. I didn’t know who any of them were. FTDNA had suggested some relationships which were mostly optimistic. Here is some of the progression of how I have sorted my matches:

  1. Sorted by projected relationship or match level (cMs)
  2. Sorted by actual relationship if known
  3. Sorted by Chromosome. This option is not available at AncestryDNA. One has to upload the AncestryDNA results to gedmatch for this option. This is when I discovered all my Chromosome 20 matches.
  4. Sorted by Triangulation Groups. By using a Tier 1 option at Gedmatch or by finding by hand all the matches that match each other at a particular segment, I was able to find many Triangulation Groups (TGs)
  5. Sorted by Maternal or Paternal. All our valid DNA matches should match on either the maternal or paternal side. Once I tested my mother, I was able to phase my results at gedmatch and find out whether I matched other testers on my mother’s side or my father’s side. This was a big breakthrough for me. This cut down a lot of frustrating searches. For example, there are a lot of people that match my mother that have Frazer or Fraser ancestors. My Frazer ancestors are on my father’s side. Therefor, I knew that when looking for Frazers, I could eliminate all my mother’s matches who had them as ancestors and not worry about them.
  6. Sorted by other known matches. I had my father’s 1st cousin tested. This got to the level of my great grandparents on my Hartley side. However, it didn’t tell me which great grandparent. My Hartley great grandparent was a relatively recent immigrant from England. My non-Hartley great grandparent had ancestors going back tot he Pilgrims in Massachusetts. I also had other relatives tested and found other matches that I knew I was related to.
  7. Another breakthrough happened after I had my 2 sisters tested. I used a method by Kathy Johnston to find out where you got all your DNA from your 4 grandparents by comparing your DNA results to 2 siblings. This method worked pretty well on most of my chromosomes. Now I knew where the DNA was coming from at my grandparent level for most of my matches. When I had a match, I could check my map to see which grandparent that match belonged to.

That is about where I left it at my last Blog on Chromosome 20. I looked at my crossover points for Chromosome 20. Here are my sisters compared to each other and to me:

Chr 20 Crossovers

Here is how I used the above comparison to map my grandparents that gave me my Chromosome 20 segments. The blank parts are half identical and ambiguous, so rather than guessing, I left them blank. For example, on Sharon’s row on the top, either the orange goes to the left and blue starts at the lower half or the opposite: the purple continues to the left and the green starts at the crossover line.

Chr 20 Final Segment

My chromosome 20 is on the bottom. At the time I wrote my previous Blog on Chromosome 20, I discovered that the vast majority of my matches were due to my Frazer side (green) and not my Hartley side (orange). This was a surprise as my Hartley grandfather had a mother with American Colonial roots. The final point of my previous blog on the subject was:

The fact that all these matches are on my Frazer line doesn’t necessarily mean that they are Frazer matches. They could be McMaster, Clarke, Spratt or any other known or unknown ancestor of my Frazer grandmother.

It’s great that I now know that most of my Chromsome 20 matches are Paternal and that they are on my Frazer grandmother’s line. But I am still curious as to where they are coming from. Can I find out more? I would like to try.

Chromosome 20: Beyond Grandparents

One advantage I have is that I am working on a Frazer DNA project with 27 testers. There are 2 lines of Frazers. I am on the Archibald Line and there is another line called the James Line. These 2 lines are somewhat distantly related as these 2 brothers were born in the early 1700’s. Here are the matches for the project on Chromosome 20:

Chr 20 Matches

All of these matches involve at least one James Line tester which I am not on. The 2 major matches between the Archibald Line and James line are between myself (JH) and my sister (SH) on the Archibald Line and Bonnie (BN) on the James Line. As I show below, even my McMaster Line has Frazers in it, which could be the source of that match. Sharon had very few Chromosome 20 matches compared to her siblings Heidi and myself. The 1,000 plus matches I had were before the 47 million mark where I match Bonnie above. My mega-matches mostly occur on Chromosome at 44,000,000 (End Location) or before. This tells me that my mega-matches are not of the Frazer surname. If they were, I would have seen some of my closer Archibald Line matches on Chromosome 20 from the Frazer DNA Project.

Enter cousin paul

Paul is my second cousin once removed who tested for DNA. His great grandparents are my 2nd great grandparents: George Frazer and Margaret McMaster.

George Frazer Tree

When I compare myself to Paul, I get to either the Frazer or McMaster Lines. This will eliminate the Clarke line of my great grandmother and her Spratt mother as they are not in Paul’s line – only mine.

My McMasters: It’s a Bit Complicated

Here is my McMaster Line going back from my Frazer grandmother.

McMaster Ancestry

Not only did 2 McMasters marry each other, one of them had a Frazer mother! Marion Frazer is my grandmother, so she is 2 generations from me. Margaret McMaster is at 4 generations. James and Fanny McMaster are at 5 generations to me. Their parents (the left-most McMasters above) are at 5 generations out from my cousin Paul and six generations from me. This is useful to know in the Generations Estimate I have below.

Here is where the Frazer/McMaster split is.

Frazer Buggy

George Frazer b. 1838 is on the left and Margaret McMaster b. 1846 is on the right. The photo was taken in Ballindoon, Ireland in front of the Frazer family home.

At Gedmatch.com, I compared Paul and myself at:

People who match one
or both of 2 kits
Updated

I chose most of those that matched both Paul and me. I left out an apparent duplicate and one who is anonymous for now. I also left out my 2 siblings. With those results, I chose the Traceability option and got this chart:

Generations Paul Joel

Those in red are in the Frazer DNA Project. We know their genealogy. Gladys descends from the couple above George Frazer and Margaret McMaster. Michael and Jane descend from one level above that. The circle above are those that are related to Paul and me, but not to others in the Frazer DNA Project. [One exception is Jane, but she matches at generation 7 which is about as far out as Gedmatch goes. This may or may not be a real match.] If those in the circle are not Frazer, then the apparent conclusion is that they are McMaster relatives.

Back to chromosome 20

See all the Chromosome 20 matches on my Gedmatch Traceability Report:

TG Chart Chr 20

Remember I said that my 1,000 plus matches on Chromosome 20 ended around 44M? This is what the above shows. It also shows a triangulation of matches. This triangulation is also implied by the cluster of matches within the circle of the Generations Estimate Chart above. The Chromosome 20 Triangulation Group (TG) includes:

  • Myself
  • *S. S.
  • Daphine
  • Feeney
  • Gladys

Now Gladys should not be in this list as she is in the Frazer DNA Project and has no known McMaster ancestors. In fact, when I run the ‘one to one’ at Gedmatch, she doesn’t match the others in the above list. There are glitches in the Traceability Report, so caution is needed. I will take out the last 3 names in the Generations Estimate to simplify the results. Unfortunately, that didn’t fix the problem, so I had to take out Gladys from the Frazer Project (sorry Gladys).

Gen Est Paul Joel

Now my presumed McMaster relatives are in the green circle. Here are the improved and simplified matches:

TG Chart Chr 20

I note now that the 2 ‘M’ kits (indicating 23andme testers) are now matching each other which is what I had expected previously. Note that I left my previous Traceability results in the blog as a warning that the Traceability utility is glitchy. Actually the new report is not indeed improved as now Michael from the Frazer project is matching my presumed non-Frazer McMasters. I took out Michael, and then Jane from the Frazer Project developed similar bogus matches with those she is not related to!

I’ll have to take out all the other Frazer Project people out for this Traceability to work. This was supposed to have worked so smoothly. Here below Joel and Paul should be the remaining McMaster relatives:

Joel Paul R3

Here is the Chromosome 20 TG. Note that Paul is not in it, but he matches others from the TG in other Chromosomes:

TG Chart Chr 20

This chart is only mostly right. Paul’s green match is actually on Chromosome 19 rather than 15:

Paul's Actual Match with Edge
Paul’s Actual Match with Edge

Here is the globe view of my proposed McMaster relative TG:

McMaster Globe

The colors in the lines correspond to the colors in the chart above. The light blue lines are the Chromosome 20 TG from my “big fat” area. The blue lines indicate a TG as they go from each of six people to the other 5. The gray lines represent multiple matches. I am at the bottom of the globe and my cousin Paul is to my right. He is not in the blue TG on Chromosome 20, but matches all my matches on other chromosomes at least once.

Conclusions and Further Research

From what I have shown above, I feel like I have found my McMaster relatives through DNA. However, these would have to be verified by genealogy. None of my proposed ‘McMasters’ have any gedcoms at gedmatch.

  • Daphine – she is on FTDNA but with no tree and no ancestors mentioned. An ICW search reveals 59 pages of matches – likely mostly on Chromosome 20.
  • Edge – He is at FTDNA. He has a limited tree. His paternal grandmother may be a lead. He has only 52 pages of in common matches at FTDNA
  • John – A search at 23andme showed nothing. Perhaps he is anonymous there.
  • Feeney – Same result – or perhaps these people are using different names?
  • *S.S – I see an S.S at Ancestry, but it is difficult to tell if it is the same person.

I have McMaster connections through DNA and genealogy at AncestryDNA, but there is no way to tell if the connection is on Chromosome 20 without a chromosome browser. My Mcmaster matches at AncestryDNA either don’t know how to upload their DNA to gedmatch, aren’t interested or haven’t gotten to it.

Opposition to TGs

Of late, on Facebook, there has been questioning as to the validity of  TGs – especially large TGs like I have at Chromosome 20. The thought is that no common ancestors will be found as there are just too many common ancestors in these large TGs. I have not explained the 100’s of matches in my Chromosome 20 TG, but I have shown 5 people that match both myself and my cousin Paul. These 5 by DNA do not have obvious Frazer ancestry and appear to be in my McMaster Line. So I suppose we have a stalemate. I cannot prove at this time (except to myself) that my Chromosome 20 TG matches are McMaster relatives and those who are not in favor of large TGs cannot prove that these matches are not McMaster relatives.