Chasing Down Some Massachusetts Colonial DNA

Recently I was contacted by someone I knew in high school who said, “Who knew we were related? Skot had tested his DNA at Ancestry and had found me as a Shared Ancestor Hint. Ancestry compares your trees and if there is a match in ancestors and a match in DNA you are put on a list.

Shared Hathaway Ancestors

Skot’s and my genealogy research both lead to Simon Hathaway and Hannah Clifton.

I have the above chart to my grandfather and Skot’s grandmother. The chart says that Skot and I are seventh cousins. Simon and Hannah were born in the early 1700’s and married in Rochester, Massachusetts. This is interesting as Skot and I both grew up in Rochester.

Does Skot and My Shared  DNA Point to Hathaway and Clifton?

AncestryDNA doesn’t show that the DNA you share is the same DNA of your shared ancestor. It sort of implies that but doesn’t prove that. To prove that, we need to use triangulation and have chromosome browser. I asked Skot to upload his DNA results to Gedmatch where we could compare the DNA results. Here is what my match with Skot looks like at

This shows that we match on Chromosome 10. I have a paternal phased kit at Gedmatch, and Skot also matched me there. That match shows that we match on my father’s side who had the Hathaway ancestors, so that is good.

Further, I have mapped my Chromosome 10 and it shows we match in an area where I got my DNA from my Hartley grandparent and not my Frazer grandparent whose parents were from Ireland. That is also a good sign:

This map shows me as J on the fourth bar. The Hartley is in orange and for me it goes from position 32M to 114M. According to Gedmatch, I match Skot from 68M to 77M, so that is well within my orange Hartley grandfather DNA area.

Triangulation of DNA

Triangulation of DNA is when A matches B, B matches C and A matches C. This is fairly easy to do. Once this triangulation occurs, it indicates a common ancestor. It is more difficult to find the common ancestor of that triangulation for various reasons. The next thing I look at is my sister Lori’s spreadsheet of matches. These matches have tested at various places and uploaded their results to I’m looking at Lori’s matches because she matches Skot also, and because her test is more recent, so I have more matches for her.

Lori’s biggest match is 54, but that is with me. Lori matches Skot from about 68 to 77M, so these all start before that point. A few end before then. Lori has other matches in this region. Lori’s matches tested at AncestryDNA, 23andme and FTDNA. I tend to prefer AncestryDNA matches as the family trees are easier for me to read.

Lori’s first match of 22 cM is with Cheryl. Skot and Cheryl match at about the same spot and about the same cM as Lori and Skot match. That means the three triangulate.

Now the Hard Part – Finding the Common Ancestor

Cheryl has over 25,000 people in her tree. Does she have Hathaways or Cliftons? At Ancestry, Cheryl and Lori are not Shared Ancestor Hints to each other. According to AncestryDNA, the common surnames between Lori and Cheryl are:

However, Baker and Schmidt appear to me on my mom’s side, so I won’t look at those. Phillips and Warren didn’t show anything obviously helpful. When I click on Cheryl’s White, I get this:

This is interesting as I have ancestors in Dighton on my Snell Line and also White and Hathaway ancestors. With a little trial and error, I see that Elizabeth Hathaway’s mother is Elizabeth Talbot. That is one of my ancestral names also. Elizabeth’s parents according to Cheryls were Jared Talbot and Sarah Andrews. I have a match in that couple. Here is my tree:

This is what I meant when I said that finding common ancestors among triangulated matches was not easy. I’m not happy that Lori and Cheryl’s common ancestor is from the 1600’s, but at least we found a match. Perhaps we will come back to Cheryl. Right now, a tie-breaker would help. Hathaway/Clifton or Talbot/Andrews?

Skot’s Genealogy

Here is the spot of Skot’s genealogy where Ancestry has us matching:

Note that Ancestry simplified the situation a bit. We are matching on Simon Hathaway and Hannah Clifton. However, we also match on Arthur Hathaway. It is even more confusing than that because Arthur Hathaway was also the father of Simon Hathaway by his first wife Maria Luce. Wow. Then Skot has more than one Clifton in there.

Shamus Match

One of my good matches at Chromsome 10 in this area of interest is Shamus. He matches me closely at 43.8 cM by FTDNA and 39.4 by According to FTDNA, we share the following surnames:

Barstow Cook Swift Samson Talbot Taylor Townsend White Wing Ward

I looked through these names, but saw no obvious connection before the 1700’s.

Sarah Match

Sarah matches Lori at 18 cM. She is at FTDNA. Her surnames that match are:

Clark Hatch Jewett Johnson Lutzelburger Lutzelberger Lombard Richmond Spooner Smith White Wing

At least between Shamus and Sarah are the common White and Wing names. By the way, Sarah has a different last name at Gedmatch and FTDNA, but I assume that she is the same person. Actually there is a way to prove it, because FTDNA has a chromosome browser. Here is how Sarah matches me using FTDNA’s chromosome browser:

Again, the DNA part is easy. It is the genealogy that is a bear.

Here is Sarah’s White and Wing connection:

Here is how I connect:

Again it is not a very satisfying connection. We connect only on Daniel Wing at the top. Our ancestors appear to be from two different mothers and Daniel who was born in 1617. I wasn’t able to place Sarah’s Hannah White.

I didn’t find out much about Joanne or Joanna Hatch. I did read an account of a family tradition that said that Joanna and Bachelor Wing were cousins.

At this point, I’m ready to call it quits.

Summary of Genealogy Linked to DNA

So far I match:

  • Skot on Hathaway/Clifton – early 1700’s Rochester, MA
  • Cheryl – Talbot/Andrews 1640’s Dighton, MA
  • Shamus and Sarah – Wing 1617 Sandwich, MA

I’m sure there are other connections.

Continuing to Work Down My Sister Lori’s Match List

There are some 23andme matches, but I have no idea how to find their ancestry without contacting them. Next I see Michelle. I am able to find her using a Chrome add-on to AncestryDNA which I think is called DNA Helper. She matches at 22 cM at Gedmatch. Oddly, she matches at 27.6 cM at AncestryDNA where the matches are usually less than at Gedmatch. Unfortunately, her tree is private. I have been in touch with her by email and she says she is related to the Hatch family somehow. The next match is Sean at FTDNA, but he has no family tree.

Summary and Conclusion

  • The DNA shows that there is a common ancestor between the paternal matches that I have on a particular segment of Chromosome 10
  • Finding the one common ancestor of a triangulated group is difficult
  • It is likely that there are holes in the ancestry trees of these Chromosome 10 matches. If all those holes were filled in, then the common ancestor may become apparent.
  • While I was doing this exercise I filled in some missing ancestors on my Jewett line. One ancestor was a Reverend up in Rowley which I found interesting. So this exercise wasn’t a total waste of time.
  • Skot and I still likely match on Hathaway and Clifton. However, the DNA tests we both took don’t necessarily point to those two ancestors.
  • At this point, the only triangulated ancestors I found in this Chromosome 10 group was Daniel Wing from Sandwich b., 1617.
  • In summary, the DNA is saying that there is some kind of colonial Massachusetts ancestry passed down. However, whether that ancestry is from Dighton, Rochester or Sandwich, MA or even somewhere else is not clear.





Cousin Holly’s Hartley DNA Results

I have many 2nd cousins. Over 100 I’m sure. My Hartley great grandparents had 13 children. All their descendants in my generation are 2nd cousins.Holly is one of those 2nd cousins. My first recollection of Holly is that she was creating a bit of commotion at our Town’s ball field. I was probably about 5 years old at the time. I had an impression that she may have been a relative but I wasn’t sure. Holly was challenging the local boys in a foot race and beating them. I was thinking that she was one cool girl.

So far on my Hartley side, those in gold below have tested and uploaded and uploaded to

Note that Patricia and Beth are also first cousins to each other.

Here’s Holly’s grandmother Grace Hartley. I borrowed the photo from Holly’s Ancestry Tree:

Does she look like Holly? I think so. Except I don’t picture Holly as looking as serious.

All the Hartley cousins in the chart above have James Hartley and Annis Louisa Snell in common. But we won’t know which – easily. Another point is that everyone has eight great grandparents. So all the second cousins get 2/8 or 1/4 of their DNA from these two great grandparents. That is, on average. Here are the numbers of how Holly matches the tested Hartleys:

The Gen is how far it seems that the common ancestors are away based on the DNA match. James, my dad’s 1st cousin seems 2.5 away. That is just right for a 1st cousin once removed. Holly should match her 2nd cousins on average at a level of three. That is because our great grandparents are 3 generations away from us. Because of the random way we get our DNA, however, Holly is more closely matching Joel, Beth and Patricia and is further away matching on my four siblings.

The X Chromosome Rule

There is a rule that the X Chromosome does not pass down from father to son.

That means that no X Chromosome from Greenwood Hartley got passed down to any of us. That also means that no Hartley X Chromosome got passed down to anyone in my family. That is why Holly matches James, Beth and Patricia on the X Chromosome and only incidentally matches Lori and Heidi from my family.

Here is how Holly matches James, Beth, Patricia and incidentally my 2 sisters.

Holly and Jim have a longer match as they are more closely related (1st cousin, once removed). As a rule, the more closely you are related, the longer the segments.

Shared Autosomal DNA

Holly and I share this much DNA:

By comparison, here is my overall Chromosome map before I add in my DNA matches with Holly:

On my map, the James Hartley/Annie Snell part is shown in darker blue. It looks like Holly’s DNA could add quite a bit to my map. Ideally, if I could test enough relatives, the dark blue whould fill up 1/2 of my paternal chromosome. The other half should be from my paternal grandmother who was a Frazer.

Here is Holly’s DNA added in. I also added a maternal first cousin who contributed to my first substantial X Chromosome match:

Remember I get no X Chromosome from my dad (top part of each line). So that has to be blank on the X Chromosome.

Next I’ll add in 1st cousin once removed Jim to Holly’s map:

Jim’s contribution to our great grandparents is in blue. Notice that now the X Chromosome is kicking in.

Adding beth’s DNA to Joel and Jim

Here is the addition of Beth’s DNA:

Note that Holly has a lot of matches on Chromosomes 5 and 9. That must mean that Holly got most or all of her paternal DNA on that Chromosome from her Hartley grandmother, Grace May.

Kicking it up a notch

Next I’d like to add my siblings’ results to ‘the other matches on Holly’s Chromosome map. My siblings’ results plus mine should be similar in size to Holly’s matches with Jim, my dad’s first cousin. It takes 5 siblings to get about the same DNA as you would have for one parent. While I’m at it, I’ll add Patricia.

This is all Holly’s DNA that she got from James Hartley and Annie Snell, her great grandparents based on the matches that we’ve looked at so far. I probably should have lumped Beth and Patricia together as they have the same Hartley grandmother [Mary], but I didn’t.

Separating the Hartley and Snell DNA

One thing I would like to do would be to separate the Snell DNA from the Hartley DNA. If I could do this I could find matches that were just Snell or just Hartley. The DNA matching is about narrowing down the possibilities. The best way to do this would be to have a match that is known to be a Snell but not Hartley or a Hartley but not Snell. Unfortunately, I don’t know of any such people. The next best thing to do is to guess. One way to guess is called phasing by location. So, say I have a match with a lot of ancestors from colonial New England, but not Lancashire. And I would need to know that I match this person on my Hartley side (not my mother’s side). I would say that this would likely indicate DNA from the Snell Line. That is because the Snell ancestors go back to Colonial New England and the Hartleys came later from Lancashire, England.

My Chromosome 16

Here is a section of the first part of my Chromosome 16 matches (without the matches’ names) in spreadsheet form:

Each line represents a different match with someone. About half way down this list I have a match with Ned at 39.93 cM. I don’t know who our common ancestor is, but Ned has a lot of colonial New England ancestors, including the Warren Pilgrim family. I also am descended from the Pilgrim Warrens, but it is generally thought that a DNA match that large would be likely to last that long.

Triangulating with ned

Triangulation shows what common ancestors unknown DNA matches may have. Triangulation is when you match someone’s DNA, they match another person’s and you and the other person all match. Successful triangulation shows that all the DNA came from the same ancestor.

Here is my match with Ned:

Here is Holly’s match with Ned:

To close the loop, I have to match Holly in the same area of Chromosome 16:

No problem. This shows that Holly, Ned and I share an ancestor. By Ned’s Ancestry Tree, we think this is a New England Colonial ancestor, but we aren’t sure which New England Colonial ancestor it is. However, as Annie Snell has New England Colonial ancestors and James Hartley doesn’t I am pretty sure I can assign this segment to Annie instead of James.

This means I can update my Chromosome map with my first New England Colonial piece of DNA represented by Annie Louisa Snell on Chromosome 16. This is shown in light blue:

The other interesting thing about this piece of DNA, is that it not only is from Annie Louisa Snell, it is also from some New England Colonial person – the one I haven’t figured out yet that we have in common with Ned.

Other New England Colonial Connections Between Holly and Me

AncestryDNA recently came out with a new feature called Genetic Community. That feature lumps you into a group with a bunch of other people based on your DNA testing. One of those groups is called Settlers of Colonial New England. Here are my Genetic Communities (or GCs).

Notice I get a Likely rating for those Colonial Settlers. Holly, on the other hand, has one Genetic Community:

She gets a Very Likely. That means she is super Colonial New England. Holly has a Connection Link under her Settlers of Colonial New England. Under that link is another link that leads to “…a list of all 238 of your DNA matches who also belong to this Genetic Community.” Under my similar link I have 110 DNA matches. However, Ned that I mentioned above matches me under Settlers of Colonial New England. He doesn’t match Holly in her list for some reason – even though I showed that we triangulate. In addition, Holly and I match each other on our lists of DNA matches under Settlers of Colonial New England.


There’s plenty more I could have written about, but I’m a gonna wrap it up:

  • Holly is more Colonial than I. I expect her other non-Snell ancestors contributed more in this area
  • I looked at a way to separate out ancestral DNA when other reference matches are missing
  • We are getting a good group of Hartley/Snell descendants that have had their DNA tested and have uploaded to for comparison
  • I never knew Holly looked so much like her Hartley/Snell grandmother.

Beth’s Hartley DNA

In this Blog, I will be looking at Beth’s autosomal DNA. That is the DNA that she got from both her parents. However, I am more interested in Beth’s father’s mother’s DNA as she was a Hartley and the DNA that we share would be Hartley DNA.

Hartley Tree of DNA Testers

Here are those closer relatives that have had their DNA tested and uploaded to

Here Hartley is shown as green and Snells are shown as yellow. The DNA testers are in gold. Any DNA that the four DNA testers have in common will belong to James Hartley and Annie Snell. However, it will be difficult to tell which. Any DNA that Patricia and Beth share could also belong to Charles Nute which Jim and my family will not share. Here is an example of that on Chromosome 1.

Here is a photo believed to be Mary Hartley with her sister Nellie:

Hartley and Nute DNA On Chromosome 1

This is a Chromosome browser from showing where Beth shares DNA with Heidi (1), Joel (2), Sharon (3), Jim (4) and her first cousin Patricia (5). Is the DNA that Beth and Patricia share Hartley DNA or Nute DNA? To find that out we can look at Patricia’s DNA browser. If she shares DNA in this same area with Heidi and Jim, then it will be Hartley DNA.

The above Browser shows Patricia matching Beth (1), Jim (2) and Joel (3). This means that the DNA that first cousins Beth and Patricia share in Chromosome 1 is Nute DNA. If I were to map Patricia’s maternal Chromosome 1, it would probably look like this:

This shows that Patricia got her green DNA (matching Jim and me) from her Hartley maternal grandmother and her pink DNA (matching Beth) from her Nute maternal grandfather.

First Cousins Vs. Second Cousins

First cousins share two grandparent as their most recent common ancestor. Second cousins share two great grandparents and get their shared DNA from one of them. The first cousin DNA matches will be larger in general. The second cousin matches will tend to be smaller.

First cousins

As shown above, first cousins will share the DNA from two of their grandparents. In the case of Patricia and Beth, those two grandparents will be maternal grandparents. The catch is, that when two first cousins match each other, they won’t know which grandparent they match on. They just know that it will be one or the other. In the example above, we did know which grandparent matched because of other second cousin matches.

second cousins – Two common Great grandparents

Second cousins have as their most recent common ancestors two of their great grandparents. But again they won’t know which great grandparent they are matching on.

The best way to identify which great grandparent the gold people match on would be to have a third cousin that is only related on the Hartley side OR the Snell side. I don’t know of anyone in this category right now, so I’m a bit stuck. I would like to figure out which DNA is which. The main reason is that I’m stuck on the Hartley genealogy. I know that Greenwood’s father was Robert, but before that, I’m not sure. If we could find another Hartley relative going back then it might break down the Hartley brick wall.

Any Other Way To Separate Hartley DNA From Snell DNA?

There is one main difference from James Hartley and Annie Snell above as it relates to their DNA. James was born in Bacup, Lancashire, England and Annie was born in Rochester, Massachusetts. All of James ancestors would also have been born in Lancashire. On the other hand, all of Annie’s ancestors that would produce matches go back to Colonial Southeastern New England. That means that if we find a match that is from England and has no ancestors in the United States, there would be a good chance that that DNA match was through the James Hartley side.

Beth’s X Chromosome

First, let’s look at my family. There is  no Hartley X Chromosome sharing with this group because the X-DNA does not travel from father to son.

Second, look at Beth compared to Jim:

Beth got one of her X Chromosomes from her dad. This was the same X that he got from his mother Mary. Jim got an X Chromosome from his mother. She got it from James Hartley b. 1862 and Annie Snell. So Beth and Jim have James Hartley and Annie Snell in common.

These pieces of blue where Beth and Jim match represent DNA that they share from James Hartley and/or Annie Snell.

How do Patricia and Beth compare by X-DNA?

Next we will look at Patricia and Beth. They will share X-DNA with their grandmother Mary Hartley. Beth’s dad got no X-DNA from his Nute dad, so Beth and Patricia will only match on Mary Hartley.

Note here that Beth and Patricia share some X-DNA from their grandmother that isn’t shared between Jim and Beth on the left side. They also share a longer segment at the right hand side than Beth and Jim shared. However, Jim and Beth shared a segment from 123 to 138M that wasn’t shared between Patricia and Beth.

Let’s See How Patricia Compares With Jim

The only comparison left is between Patricia and Jim.

I compared the three comparisons and came up with a bit of an X Chromosome map. In the first match between Beth and Patricia, I have that match in red. On the very right there are three matches, so I have that as great grandparent 1. We don’t know which great grandparent it is – just that it is the same one. On Jim’s map, it is his grandparent 1. Going from right to left on Jim’s map, he changes from getting his X-DNA from grandparent 1 to grandparent 2. However, Patricia and Beth continue to match on great grandparent 1. In the middle there are no matches, so we can’t tell what is going on. Also the two reds and one blue on the left may actually be two blues and a red as we don’t know how they match with the segments on the right.

Beth’s Hartley (and Snell) Chromosome Map

If we look at all the matches Beth has with Jim, my siblings and me, we will have a map of her known Hartley (and Snell) DNA:

I didn’t use the DNA shared between Patricia and Beth as they are first cousins. As such, they will share Nute and Hartley DNA and it will not be as easy to tell which is which. So second cousins are good for these maps. The red is in the bottom part of each chromosome. That represents the paternal chromosome. We have not mapped any of Beth’s maternal chromosome. If Beth were to look for Hartley or Snell matches, it looks like her best bet would be on Chromosome 12.

For comparison, here is my Chromosome Map.

On my map, the blue corresponds to Beth’s red Hartley DNA. We seem to share a stretch of Hartley DNA on Chromosome 1. But where Beth has a long stretch of Hartley DNA on Chromosome 12, I have none.


Using M MacNeills Raw DNA Phasing Spreadsheet and My Problem Chromosome 10

I have written many blogs about phasing my own raw DNA. One of the things that was bothering me while going through the process was the presentation of the results. It is possible to phase millions of bases using the raw DNA results from one parent and at least 3 siblings. But once the DNA is phased, how can those results be best portrayed? In my previous Blog on the subject, I was able to figure out a fairly simple way to show my results, but the outcome was not totally satisfactory.


I liked how I was able to get the grandparents’ surnames at least in the first 2 bars. I also liked how I had a simple scale at the bottom. However, one of my bars went too far. Also, my simple chart started at zero and Chromosomes start at different positions. I was able to fix the bar going too far today. Excel makes these bars based on distance rather than positions, so one of my equations was wrong.

I told M MacNeill <> of my concerns and he sent me his spreadsheet. One feature I really liked about the MacNeill Spreadsheet is that it had a place for cousin matches at the bottom. Below is the first Chromosome where I used my phased raw data from my mom and 3 other siblings to create a MacNeill Chart.


Sharon’s maternal first little segment didn’t work out perfectly, but that didn’t bother me. I know that the beginning and ends of Chromosomes can have small problematic segments. Note at the bottom that my match to Carolyn in yellow shows where my maternal crossover is in the upper part of the chart where I go from red to orange.

My Chromosome 10

I am looking at my Chromosome 10 because, for one thing, I have had trouble trying to visually phase this Chromosome in the past. Here is my attempt at visual phasing from early in 2016:


Here is another try including additional cousins that tested:


Note how different the maternal (lower) side is. I switched most of the maternal grandparents around.

Here is the MacNeill spreadsheet showing just the cousin matching part:


I have some good matches here. Blue is Hartley, green is Frazer, yellow is Lentz. Red is Rathfelder. This makes it clear that my chromosome is mapped wrong. I need more Hartley and Lentz. The above chart includes my brother who I had tested not too long ago.

Here is another try with my brother’s DNA results included:


My sister Sharon (S) has a better look now on her maternal side. I got rid of the small purple segment.

Looking At the Raw DNA Phasing – Paternal Side

I have two spreadsheet summarizing the results of the many hours of work it took to phase my family’s DNA  from the raw data. One spreadsheet is for the paternal side phased DNA and the other is maternal. I have patterns for both sides. They are based on the order of my siblings: me (Joel), Sharon, Heidi and Jonathan. So an ABBB pattern would mean that Sharon, Heidi, and Jonathan all get their DNA from one grandparent, and I get mine from the other. Here is the paternal spreadsheet:


These patterns go logically one to the other. The first pattern goes from AABA to AAAA at position 2,605,158. The B changed to an A in Heidi position, so the crossover goes to her at that position. I have a column called GaptoNext. This is based on the number of tested SNPs between patterns. When this number is large, I suspect an AAAA pattern. That was the case above highlighted in yellow. Except there is a problem. To go from ABAB to AAAA means 2 changes, and there should only be one change (or crossover) at a time. This caused me to look at the bases.

A Paternal pattern missed

Here is what I found.


I had missed an AABA pattern at Build 36 Position 30,683,878. I took another look by setting my MS Access query so that Sharon and Heidi would have a different base from Dad:


This shows that the there is a change from ABAB to AABA even sooner than I thought between ID 400008 and 400045. This is an ID I created that sequentially numbers the tested SNPs. You can see another way I missed this pattern, because I didn’t fill in the missing bases. TTC? should be TTCT. CCT? should be CCTC.

What does the missing pattern represent?

The pattern of ABAB TO AABA is actually my crossover (Joel). It is a bit more difficult to see than the others. That is because the ABAB pattern is the same as BABA. The change of BABA to AABA is my change of the first B to the first A. Naturally, I put myself in the first position. In rough terms, that gives me a paternal crossover at about position 30.5M. This is a good location as it does not interfere with a large match that I have with an unknown paternal DNA relative named Shamus:


Here is my corrected Dad Pattern for Chromosome 10:


I have gone from 6 to 8 crossovers as the previous correction lead to another one. I also took out one of Heidi’s crossovers that I had wrongly identified. So fixing one problem fixed a lot of others. It helps to describe the start and stop of each pattern and to describe each crossover. The important results are the person and the last Position column. These show who the crossover belongs to and where that crossover occurs on the chromosome. I then entered the paternal crossover results into the MacNeill Spreadsheet and got this:


I took out the large space between the siblings. The problem is that the space is now the same as between the maternal and paternal phased part for each sibling. Excel has no happy medium that I’ve found.

The blue is Hartley and green is Frazer. The raw phasing in the upper part of the chart matches with the cousin matches below. It is interesting that some of the cousin matches define the crossovers. For example, the Jim to Sharon match gives Sharon’s crossover. Also the Paul to Sharon match gives Sharon’s other crossover. The Paul to Jonathan match gives Jon’s first crossover.

The Maternal Side

Hopefully resolving the maternal phasing will be easier than the paternal side. My visual phasing only showed four crossovers. Here is my unfinished spreadsheet showing 5 crossovers (under the Person column):


Here, it looks like I already added an AAAA pattern to the end. That was because the AABA pattern ended at about 114M and the Chromosome itself ends at about 135M. My GapstoNext column showed that gap as almost 20,000 SNPs. My question now is: should I add an AAAA pattern to the beginning also? Perhaps. An AAAA pattern means that 4 siblings match and all got their DNA in that area from their maternal (in this case) grandmother. Those results were consistent with how I had the visual phasing done. In fact, the visual phasing indicated that the 4 siblings should all get their maternal DNA from the Lentz side up until about 60M. Let’s take a closer look. This gets at my first note above in the spreadsheet image. There were only 3 single SNPs showing the AAAB pattern and they were spaced a long way apart – over 10 Megabases each. In this case, I will disregard those 3 widely spaced patterns as some type of mistake and stay with the AAAA pattern. Once I made the change from the AAAA pattern to the AAAB pattern, that brings us up to about 60M for my (Joel’s) first crossover. That seems to fit well. That leaves us with 4 crossovers – one per sibling as opposed to the two per sibling on the paternal side.

First I’ll compact the Gedmatch browser results, then show the raw DNA Phasing results on the MacNeill Chart:



When I compare the results, I see a problem I had with the visual phasing. The next to the last crossover looked to belong to Sharon, but instead it belonged to Heidi. Also Jon’s second paternal crossover should have been marked as an “F” above. That was just a typo. The third J for Joel crossover that I had above was not a crossover. In the middle, the 2 close crossovers of J and S should be instead S and J if I’m reading the MacNeill Chart correctly. It looks like all the FIRs and HIRs, etc. match. Once I did the raw DNA phasing, it is obvious how the gedmatch browser results had to match the raw DNA phasing results. Before, I did the raw DNA phasing it was not so obvious.

I’m happy with the results. I get to pick whatever colors I want for the four grandparents. It still would be nice to have some sort of labels or color key. After a hard day of phasing DNA, it is rewarding to see the results displayed so nicely. Thank you Mr. MacNeill.

A few observations:

  • The 4 siblings did not inherit any Rathfelder DNA (brown) on the left side of Chromosome 10
  • Lentz DNA (yellow) is missing from the right side of the Chromosome for the same 4 siblings
  • As I have my mother’s DNA results, that would make up for the missing DNA from those 2 maternal grandparents
  • Short segments of Hartley DNA (blue) are missing near the beginning and near the end of the Chromosome (i.e. none of the four siblings inherited Hartley grandfather DNA in those areas).


  • M MacNeill has the best display that I am aware of for mapping phased DNA.
  • The final mapping is like the final exam where previous mistakes are brought out, but there is a chance to correct them.
  • The phasing process is difficult, but there are built in checks and balances to find and correct mistakes or missed patterns.
  • The raw DNA phasing procedure (I use the Athey method) would generally be used if a parent has been tested and the visual one is used if a parent has not been tested. However, the visual phasing as developed by Kathy Johnston is important to use as a framework for the raw DNA phasing as well as a check for the end result.
  • The raw DNA phasing results appear to be better than what I was able to get using the visual phasing. Not because the visual phasing method is bad; more because I have not mastered it.
  • If you are using someone else’s spreadsheet, it is a good idea to know how they work in case anything goes wrong.
  • After writing many blogs on visual and raw data DNA phasing, it is nice to see everything come together using the MacNeill Spreadsheets and Charts.

DNA Phasing of Raw DNA When One Sibling is Missing: Part 10

In this Blog, I would like to portray my phasing results in an Excel Bar Chart if possible. This has been one of the most difficult parts a phasing my DNA for me.

I have looked at Stacked Bar Charts in Excel as they seem to be the closest to what I am looking for. Today I looked at a method for producing Gantt Charts at which seems to hold some promise of application for DNA mapping:


I had my Maternal Patterns’ Starts and Stops from my last blog. I took those and converted them to Build 36 and put them in a spreadsheet:


Start is the ID# I was using. Start36 is the Chromosome position of the Start of the pattern in Build 36. App ID is the approximate position of the Crossover. Then I have that same location in Build 37 and Build 36. Following the logic in the tutorial, I have the first Maternal Crossovers for Chromosome 7 in my simplified Chart:


I got this by choosing the Build 36 column and choosing Insert Stacked Bar. I suppose a better Title would have been Chromosome 7 Maternal Crossover rather than Build 36. This was taken from my Column Header. The goal is to get a 2 color bar above. However, I already see a problem. The bar needs to be different colors for different people. Well, I have to start somewhere.

Next, I put in the next crossover location for each person. I took this position and subtracted from it the first Crossover to get a length.


You may note that the Bar Chart inverts the original order. It gives Sharon a 4 which is now on top. Here is my visual phasing of Chromosome 7 that I am trying to replicate:


My Excel Bar Chart order is Sharon, Jon, Joel, Heidi. My visual phasing order is Sharon, Joel, Heidi, Jon. The 2 maternal colors I have above are green and orange representing Lentz and Rathfelder. If I keep orange as Rathfelder, that means I want to change bar 2 and 3 (Joel and Jon) on the Excel Bar Chart. One way to do this is to move over the first Crossovers for Joel and Jon in my spreadsheet:


However, that made the 2 male siblings’ first maternal grandparent match too long. I needed to move the start over 2 places in my spreadsheet:


Now the Chr7 Maternal Crossover column can be called Lentz and the 2length column can be called Rathfelder.

Next, I added another column for the next Lentz portion of DNA:


I was hoping that if I named the next column Lentz, that Excel would give me the same blue as the first Lentz. I was able to right click on the gray and change it to blue. I then added another Rathfelder segment. For this to work in Excel, a Rathfelder length is added rather than a start and stop location.


Again, I had to reformat the Excel-chosen color to be consistent with what I had for Rathfelder. I chose the last position for Heidi and Sharon as the highest that I had as this was their last segment. After a bit of wrangling with Excel, I was able to get this:


So that is the presentation. However, I notice that on my visual phasing, I had 5 segments for Jon and only 4 here. I missed his last Rathfelder segment. I had ended Jon’s Chromosome too early. Here is the correction:


It still looks like one of Jon’s crossovers in the middle of the Chromosome may be off, but I’ll have to figure that out later.

Paternal Bar Chart

Now that I have something that looks like a maternal Chromosome Map, I need the paternal side to go along with it. It looks like if I add 4 more rows to my spreadsheet, I may have it.

I did this and I added Hartley and Frazer (my paternal side grandparents) to the right of the maternal side grandparents. I had to make a new chart that came out like this:


Here #4 is my Paternal DNA. I found it a bit disconcerting that my paternal side was longer than the maternal. Here I’ve added a bit of formatting and made the colors consistent (one color per grandparent):


Well, I guess I’ll just leave this imperfect. It will give me something to work on later. I did change the scale from millions to M’s to be easier to read.  The above shows that Jon and Heidi share their paternal grandfather’s Hartley DNA un-recombined on Chromosome 7.

Summary and Conclusions

  • Learning how to phase my raw DNA has been interesting and time consuming
  • Delving into the A’s, G’s, T’s and C’s promotes understanding of one’s DNA
  • I owe a lot to M MacNeill and Whit Athey in learning how to do this phasing
  • Due to the data intensive nature of phasing, I would recommend the use of MS Access or some other database software.
  • An understanding of Excel or similar spreadsheet software is also important.
  • I had tested my brother Jon as an afterthought. It turned out that his test results were important in determining the phasing of the 4 siblings.
  • I have the overall skeleton of the phasing with crossovers. There is still a lot of work to complete the individual Chromosomes and trouble shoot problem areas.
  • Further, I have not worked on the X Chromosome due to the different nature of that Chromosome. My brother and I are already phased. My sisters are not.
  • Once these maps are done they will be a reference to all matches to my 3 siblings and myself.

Raw Data Phasing Part 4: Going from 3 Siblings to 4

In my last Blog, I mentioned that my brother Jon’s DNA test results came in this week. This happened in the middle of my attempt learn how to phase the raw DNA data for my 2 sisters and myself. I was phasing the data in what I can only assume is a traditional way. I say I assume, as I haven’t seen any other blogs on the process. The difference is that I am using MS Access which I hope will speed up the process. I should be able to get results for 23 chromosomes at a time instead of just one at a time.

The arrival of the new DNA results poses at least two problems:

  • The previous 4 DNA data files were all in AncestryDNA version 1. Jon’s is in AncestryDNA2. While they are all Build 37, they look at somewhat different points on the chromosomes
  • One of the difficult parts of the previous process was identifying and dealing with patterns of phased paternal and maternal bases. Those patterns were AAB, AAB, and ABB. With 4 siblings, there will be more patterns. However, the Whit Athey Paper I have been following does also look at 4 siblings.

AncestestryDNA Version 1 Vs. AncestryDNA Version 2

My understanding is that Ancestry changed the locations on the chromosomes that they were testing to get more into the medical area like 23andme. I don’t know if that is true. Here is a chart comparing the different atDNA tests:


I was doing well comparing Anc1 with Anc1 as I was looking at over 700,000 base pairs among 4 people. Once I compare Anc2 to Anc1, that is number is cut down quite a bit. That is about a 40% drop. My only other option, other than re-testing Jon, is to compare Jon to my mother’s FTDNA results. However, that will only pick up 2-3,000 SNPs, so I won’t bother.

Back to Square One with 4 Siblings: Homozygous Siblings

I need to find Jon’s equal base pairs and apply one to his ‘from dad’ column and one to his ‘from mom’ column. That is, after I add all Jon’s data to my database and add those columns. First I need to decide where to add Jon’s data. I could add it to the beginning of what I have already done or to the end. I’ll try adding it to the end, because I think that the work I did already is OK. I want to build on that. So rather than adding Jon’s DNA to the first step, I’ll add it to my table called tblMomBaseFromDadBase. This table has over 700,000 lines of bases for 4 people. Jon’s has 668,942 lines. Actually, when I remove “Chromosomes” 24-26, I will only have 666,531 lines.

Querying Jon into my latest table

Here I am adding Jon and the Mom from Dad Table to my query design:


Access thinks the ID that it added was important, but it really isn’t, so I need to take out that equal join. I really want the join to be at the rsid, but I don’t want an equal join. Why not? If I had an equal join, I would end up only with the positions that Jon has. I will lose 40% of the work that I have already done. Instead, I’ll use an unequal join.


I flipped the 2 tables in the query design area, so things are moving left to right. Then I choose a #2 join which is basically, an unequal join left to right.

Actually, I changed my mind. I have a better idea. I will just do the first 2 steps on Jon’s raw DNA and then join the results together. That is a third way that I hadn’t thought of. The point is, that there are many ways to do things in Access. There can be more than one way to get to where you want to be.

Back to Homozygous Siblings

First I copied Jon’s raw data into a table called tblJonHeterozygousSib. This is so I can use an update query to update the data in the new table and still have the original. Hold that idea. The better idea is to use a make table query. The reason that this is better is that it can take out the “chromosomes” I don’t want:


I took out the table I copied and I’ll make a better one with only Chromosomes 1-23. I hit the Run button and create a table with 666,000 lines:


Then in the above table, I inserted 2 rows: JonFromDad and JonFromMom. Now this table is ready to phase for any homozygous siblings. By the way, it looks like my Chr23 or X is homozygous, but it isn’t. Ancestry adds an extra base. I only really have one for my X Chromosome.

Finally time to query and phase

I go to Query Design in Access and choose the above table. This is a very simple Update Query design:


This says if Jon’s allele1 is the same as his allele 2, put allele 2 as his base from mom and as his base from dad. I hit the run button for the update and get the dire warning that I’m updating a lot of information, I can never change it back. Then I get a message that I’m updating 478,000+ rows. That is good. Those are the number of Jon’s homozygous bases – quite a few. I’d say over two thirds.

I’m not looking for crazy results and didn’t get any.

Homozygous Mom Query

I’ll copy my previous table into one to update. Then I need to add Jon’s base from mom where mom is homozygous. Easy peasy. I think this is all I need.


Actually, I did think of an issue. I have an equal join. That means I won’t be using the homozygous bases that mom tested for in the old AncestryDNA test that aren’t in the new AncestryDNA test list. My guess is that is interesting information but perhaps not very useful. It also occurs to me that in the spots where Jon doesn’t match up with my siblings, I will still have the 3 letter pattern work that I had done previously.

The query above says if Mom allele 1 = 2, then put that 2 allele in Jon’s from Mom base slot. I hit Run and pasted 277,000 rows of bases.


This query will be a little more difficult to check. I have to create a query linking my mom’s DNA results to this table. I did that and see one problem already.


The first problem is that ID 126 didn’t show up. That means that rs3819001 that Jon has is not in my mom’s raw DNA. I don’t want to have data for Jon that looks like it can be updated, but it can’t.

I think I can fix this.

Updated Table Query

A few steps ago, I ran a Table Query to get just Chromosomes 1-23 into Jon’s Table. I need to upgrade that query so that I am only including the locations (rsid’s) that are common to both my mother and Jon. I do this using an equal join on the rsid Field:


This time, my table for Jon only has the rsid’s that my mom has.


Also my Chromosome formula was off, so I had to fix it. Also note that I have about the number of rows as per my Anc1 vs. Anc2 table earlier in the Blog. I then re-added the Jon from Dad and Mom columns into the new and improved table. Then I reran the update query which told me I was about to update 284,000+ rows.


This worked as well as last time, but this time I have the fewer rows I was trying to get.

Re-Run the update query for homozygous mom for jon

I double clicked on my old update query. The message said I was updating 277,000 rows or so. Now I’ll re-check my work. If there is no ID 126, I’ll be happy. Well it is still there, because I forgot to copy the previous homozygous sibling table into the homozygous mom table. After re-re-running the update, I got the desired results:


And there you [don’t] have it: no ID 126. Here is my mom’s raw file compared to Jon’s updated table.


Jon gets a G from mom at ID 128 even though Jon is AG, because mom is GG. Now I’m talking DNA.

Merge Jon’s New Table with His 3 Siblings’ Tables

This is the point where I put everything together. I will try to use the Make Table Query for this one again. So I’ll put my newest Jon table together with my newest sibling table.


This shows the left to right arrow join. I’ll want the larger file plus everything equal in the smaller file. Come to think of it, this Create Table Query would have fixed the earlier problem I had. I guess I was too careful! The other issue is that the ID in the 1st table won’t be the ID in the second table. I could keep the second ID, but I would have to rename it as Jon ID or Anc2ID.



Here I rename Jon’s IDs as JonID. I may not need it, but if I do need it I will have it. I guess MS Access wasn’t happy with my idea:


OK, I took out the JonID and hit Run. Microsoft tells me about my new 700,000 row table.

Back to the Dad Patterns

Now that all the family is together I want to look at Dad Patterns, because I know that I will be updating those. Here is the first query I tried on my new Table of 4Sibs.


This is looking for filled in Dad bases where Sharon’s base is not the same as Joel’s. That query gives me an ABAA pattern:


Also ABBB:


Here’s ABBA:


It looks like ABAB is a possibility also. That means the following are possible:

  • AAAB
  • AABA
  • AABB
  • ABAA
  • ABAB
  • ABBA
  • ABBB

So if I chose Joel’s Base not equal to Sharon and then Joel’s base equal to Sharon would I have every combination? It looks like I need this combination to cover all possibilities:

  • Joel <> Sharon OR
  • Joel<>Heidi OR
  • Sharion<>Heidi OR
  • Heidi<>Jon OR
  • Jon<>Joel OR
  • Jon<>Sharon OR

Which in Access looks like:


But Wait, I Forgot Principle 3 for Jon

Principle 3 says where Jon is heterozygous and he knows where he got his maternal base, the other base goes into his From Dad column. Looking back at my old queries, I see this is a 2 step query. I’m tempted to try this in one step, but I think  this got me in trouble before, so I’ll go with the simpler query. Simpler queries are usually better in MS Access.


This says where Jon is missing a phased allele from Dad and he has an allele that doesn’t equal the one he got from mom (making Jon heterozygous here) put that allele into Jon’s From Dad spot. I tried the query and only got 37 results. The problem is, I should have said ‘Is Null’ in the JonFromDad Criteria:


This time I get 35,000 updates, so that is right. I then change the allele1’s to allele2’s above and get 33,000 updates to tbl4Sibs. I ran a quick query on the 4Sibs Table to get just Jons heterozygous results:


In the first line, Jon had allele1 as T which was different from the allele from Mom of G, so Jon’s T got put into the From Dad spot. At ID 41, Jon’s allele2 of G is from Dad because he had an A from Mom. When parent and child are heterozygous, the From Parent location remains blank.

Now I have Jon with 3 Principals: Homozygous Jon, Homozygous Mom and Heterozygous Jon.

Back to Dad Patterns

I have the old Dad Patterns for 3 siblings. Now I need to See what the 4 sibling Dad Patterns would be and add Jon’s Start and Stop Locations for his new Dad Pattern Areas. I need to combine that with the 3Sibs Table.


My first query was wrong and gave bad results. The reason is that the ID for 4Sibs was from the raw data. The ID for the Dad Pattern Table just numbered the amount of Dad patterns. I needed to join the ID in the first table to the start and stop locations in the second table. I ended up doing 2 queries: one for the start position and one for the stop as I needed both. This query gives the stop position of a pattern.


I took both those queries and put them into an Excel Spreadsheet.


I added a new column called Dad4Pattern. In the first row, the new pattern was AAA by chance. However, in the second row which is the Stop or End of the first Dad Pattern, it is obvious that the ABA Dad Pattern goes to an ABAA Pattern. I didn’t think that there would be many AAAA Patterns as that means that all siblings match the same Paternal grandparent. This is the only AAA pattern that I had noted so far as I wasn’t looking for them yet. Still, I will need to go back and verify that these Start and Stop AAAA’s were not by chance. Finally, on the last line, it is clear that the Dad Pattern goes from AAB to AABB with Jon added.

Next I chose all the cells where Jon had a base from Dad and performed a Concatenate operation to write the pattern.


This gave me the CCCC that I wanted to check. Next, I wrote a formula to put the Dad bases together in a new column and wrote down the Dad Patterns that I had.


A few notes:

  • Out of the 66 three sibling patterns that I had, I was able to find all but 5 new four sibling Dad Patterns. See the yellow above for two of the missing 4 sibling dad patterns.
  • The missing 4 sibling dad patterns should be easy to find by scrolling through the 4Sib Table
  • I noticed that there were no AAAB patterns. That is because in my previous search, I was not looking for AAA patterns. So now, I don’t have any AAAB patterns. I will have to find these in my new search.
  • AAAB is the situation where I match the same paternal grandparent as my 2 sisters, but Jon matches the other paternal grandparent.
Filling in more dad patterns

To fill in the yellow areas, I made a query in Access based on the 4Sibs Table. This looked at every case where Jon had a base from Dad. Searching around the ID 6604 and after, I found this pattern:



Then I checked near the end of the old 3 sibling pattern which is at ID 19806.


At ID 19827 we see an ABAB Pattern, so I enter that Pattern in my spreadsheet:


For the start of the new ABAB pattern, I used the old ABA location as that was more precise. The next interesting thing happens at Chromosome 2:


Here I have a problem in my spreadsheet. For some reason, the Start of the last pattern of Chromosome 2 ends at Chromsome 3, which is not right. My previous spreadsheet was better than that. From the ashes I will re-build.

I note that at ID 108798, my 4 Sib Spreadsheet goes to an ABAB Pattern. At the end of Chromosome 2, I see an AAAB Pattern. That was the one I wouldn’t have had from the 3 sibling pattern as I wasn’t checking on AAA’s.

I added new rows for the patterns ABAB and AAAB:


The most important thing here is the ID, the pattern, the Start and Stop. Here is the new change area from ABAB to AAAB:


There are a few SNPs between the ABAB Stop and the AAAB Start that are a little unclear.


Finding Jon’s Patterns

Now I’ll check Jon’s Patterns. I’m looking for any changes in patterns as these should be important as crossovers later. I will need to assign the crossovers to each sibling’s Chromosome Map.

Good Old Triple A – B Pattern and all the others

AAAB is where Jon has a different paternal grandparent than his 3 tested siblings and the 3 siblings have the same paternal grandparent.


My query says that Jon has to be different from each sibling. I run that and insert the appropriate Start and Stop point for the AAAB in my spreadsheet.

I do the same for AABA which I can find using a similar query under Heidi’s criteria:


I ended up going to a clean spreadsheet. It was too messy combining the 4 sibling results with the old 3 sibling results.


Here I have the ID, the Chromosome, the pattern and the Start and Stop. The yellow marks a one SNP pattern. It appears that there should be 3 types of patterns:

  1. One where one sibling matches none of the others. That is what I have above: AAAB, ABAA, AABA and BAAA
  2. One where 2 pairs of siblings match each other: AABB or ABBA. I’m not sure what else there could be. I looked above and saw one other: ABAB
  3. One where all the siblings match each other: AAAA

That makes 7 or 8 patterns, depending on whether AAAA is considered a pattern.

Two Pairs of siblings match each other patterns

Here is the Access query for AABB


At first I was missing the criteria under SharonFromDad and that gave me AAAA combinations also. The result of the query looks like this:


Here Joel matches Sharon and Heidi matches Jon but on a different base. After I was finished putting in Starts and Stops for each Pattern, I then sorted my spreadsheet by ID. This brings up some issues that need looking at:


Where there are 2 Starts or Stops in a row, there is a need to check what is going on. The ones around the yellow positions may not be a problem as I’ll likely be taking those single positions out. However, at the end of Chromosome, there are 2 starts and 2 stops together. I need to go to ID 236707 and see what is before that point. It apears that there is an AAAA pattern before that point and that the ABAB at 224584 is a single point. That fixes half of the problem. Then I go to ID 238976 to see why I have a Stop there for ABAB.


I had missed the Start for the ABAB right after the stop of the ABBA pattern, so I added it in. The repaired spreadsheet looks like this.


An application

Now that I have the change between ABBB and ABAB described, let’s look at what it means. Here is a different look at that location:


When the pattern changes from ABBB to ABAB, what has changed is the third B changes to an A. Heidi is in that location. So that says at the above position of Chromosome 5, Heidi has a paternal crossover. I thought it would be good to check my work against the work of M MacNeill. To do that, I used the NCBI Remap website to change my Build 37 results to Build 36:


This would be the start of Heidi’s new segment. Here is what MacNeill had:


I got it right again. That is 2 for 2. Actually, the first time I tried, I was comparing the wrong Chromosomes. Rookie mistake. Here is M MacNeill’s map for Heidi on Chromosome 5:


Perhaps it is difficult to see, but the point I am looking at is the little lighter red segment at the far right of Chromosome 5. Perhaps that is why I missed it the first time as it is so small.

Another Aside is that this was a very difficult Chromosome to decipher using visual methods. This was one of my attempts to figure out the crossovers visually for 3 siblings.


I had missed the last crossover as it is so small and difficult to see. In my defense, I should note that M MacNeill did mention that the end of this Chromosome was difficult to decipher.

Taking Out the X

I’ve realized that I’ve generated some bases for the X I got from Dad. Of course, I didn’t really, so I’m taking out any bases there for me and my brother Jon. I’ll use this update query:


I was worried that I’d mess something up, so I created  a  new table called 4SibsChrX. My query put dashes in the spots where I couldn’t have an X base from Dad:


This looks like a good place to end Part 4. It appears that there should be many chances to quality check my work and that the process is progressing. Getting Jon’s new DNA set me back a bit, but the results should be better than what I’d see with 3 siblings.


Raw Data Phasing: Part 3

This Blog is Part 3 documenting my learning process of phasing my DNA raw data using:

Part 1 and 2 Recap

  1. I imported 4 sets of raw data into Access from AncestryDNA after taking out the zeros that the Excel software produced for the no-calls.
  2. I used Access Queries to apply 3 Whit Athey Principles. This resulted in many phased bases for me and my 2 sisters.
  3. I put the phased A’s, G’s, C’s and T’s for each siblings into 2 new columns for each sibling
  4. This resulted in 6 new columns. The first 3 of these six were for the paternally based bases. These resulted in a pattern which was either in the form of AAB, ABA, or ABB.
  5. The Athey Paper did not emphasize the AAA pattern or considered it a non-pattern. While specific AAA results within another pattern area are by chance, there are other areas where 3 siblings match the same grandparent where there will be an AAA-only Pattern.
  6. I separated my results into 3 patterns using Access: AAB, ABA, and ABB
  7. For each of those results, I noted where those patterns changed.  I did this by looking at the ID numbers. Breaks in the ID numbers were considered changes.
  8. However, there were some cases where the changes occurred around missing bases. For these, I went back and noted a more precise position of the pattern change based on where the change would be if the missing base were to be filled in.
  9. I Made a preliminary bar graph using the first 3 paternal changes. These crossovers were mapped to myself and 2 sisters.
  10. Using the 3 patterns I developed Access queries to fill in the missing bases in the 3 paternal pattern areas.

So those were the 10 easy steps. Actually step 10 was difficult as there was quite a bit of refining the Access queries and quality checking the results. I needed 2 queries for each of the pattern areas. However, once I had the queries, it was the push of a button to update missing parental-received bases for 3 siblings within over 700,000 lines of DNA.

Back to Athey

This portion of the Athey Paper appears to apply to where I am now:

For some of the unfilled cells on the mother’s side of the table, we can fill in the alternative (other) base from the corresponding location on the father’s side of the table. That is, we know that the sibling with an empty cell got one base from the father, but the alternative base from the mother. Therefore, after the use of the Dad pattern fills in more cells, a newly filled – in cell in the father’s side of the table gives rise to a filled – in cell in the same position on the mother’s side–the alternative base to what was on the father’s side.

Unfortunately, I’m not sure what is meant above. My guess is that this relates to Principle 3:

Principle 3 — A final phasing principle is almost trivial, but it is normally not useful because there is usually no way to satisfy its conditions: If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base. This principle will be very useful in the present approach.

So now that missing paternal bases have been determined based on the patterns, it should be possible to fill in missing maternal bases for heterozygous children. First, I’ll do a Query to see if I can locate this situation. I’ll take my most recently updated Dad ABB Pattern Table update and query that. I’ll look at the situation where there are heterozygous results. Then, I’ll look at spots where there are missing bases from Mom.

Fortunately, I was able to come up with a slick looking Query for this situation:


Plus the Query design has some nice symmetry. The first criteria row of the query is for my (Joel) DNA. Reading across, it says Joel is heterozygous because my allele 1 does not equal my allele 2. Then it says that I have a base from Dad but not from Mom. This will show areas where the mom bases are missing in this heterozygous child situation.


The truncated fields above are Joel Allele 1, Joel Allele 2, Sharon allele 1&2, Heidi allele 1&2. The next 3 columns are Joel, Sharon and Heidi from Dad. Then Joel, Sharon and Heidi from Mom (the last 3 columns). This shows that there are almost 12,000 of these Mom bases to fill in. Above the blue line are Heidi’s bases missing from Mom. Heidi is TC (heterozygous) on that line. Her Dad base is T. I love these binary problems. They seem well suited for the computer. That means that a query could not be too difficult to update almost 12,000 records. So Heidi’s Mom base will be C above the blue line. At the blue highlighted area, I am TC and my Dad base is C. My Mom base will be T on the blue line.

Looking for a Good Query to Fill In Mom Bases from Dad Bases

First, I copied my ABB Table to a new Table called tbleMomBaseFromDadBase. I will want to update that table with a new Update Query. I already have the first part of the query. Now I need my thinking cap. Even better than thinking, I can look at what I did before. Here is my old query.


This is difficult to see, but I split the problem into 2 alleles. What this says is when Sharon has a base from her mom and Sharon’s allele 1 is not the same as the base from her Mom, pop that allele 1 into her base from Dad slot.

For our situation we are doing the opposite. So we will switch Mom and Dad. This time we are using our Dad results to get some Mom results. I’ll also add a criteria to make sure the Mom result is Null, so I’m not overwriting anything. It will just be an extra precaution.

Basically, I want to make sure Heidi has a base from Dad and not from Mom. In that case, when her allele1 is not equal to her base from Dad, put that allele 1 in as her base from Mom. Drawing upon my vast experience in this area of about 1 week, I get this:


When I preview the results, I get about 6,000 lines which is half of my previous query, so that seems OK. I’ll go ahead and update my new Table. I renamed my Query to qryMomBaseFromDadBaseAllele1 and copied it to do the same thing with Allele2. I’ll change the Allele’s 1’s to Allele’s 2 in the Query design. First I’ll do a Select (non-updating) Query to show what I’ll be updating with the allele’s 2.


Here I added the ID numbers, so I can make sure my update went well.

Here is my Allele2 Update Query with the 3 siblings included:


The results:


In the far right column is the Base Heidi got from Mom. It was updated on lines 2292, 2295 and 2299. In each case Heidi’s Paternal Base was T and the Maternally derived Base from Dad was C.

Here is my corresponding filled in Mom Base:


My Dad’s T’s in 6 columns from the right were used to fill in the missing C’s in 3 columns from the right. Doesn’t it seem a bit ironic? Even though my dad was not tested for DNA, his “results” from this process are used to find the DNA I got from my mom who was tested.

A Premature End to This Blog and a New Beginning

This will be one of my shortest Blogs. I was both awaiting and not awaiting my brother’s DNA test results. Those results came in this week. The reason I was not awaiting was that I knew that I would need to re-start the raw data DNA phasing process once his results came in. With that, I’ll end this Blog and start a new one.





Raw Data Phasing Via Access, Athey and MacNeill: Part 2

In my last Blog on raw data phasing, I went through 3 principals that Whit Athey laid out in a paper on phasing raw data when one parent’s DNA results were missing. Using those principals, and the MS Access program, I was able to sort many of my bases and 2 sisters’ bases into ones we received from our mom and ones that we received from our dad. I checked a few of my results with a chromosome map made for me by M Macneill.

Paternal Patterns

I had gotten to the part of the Athey paper where he talks about paternal patterns of bases that the sibling combinations received. I noted a space between the first two paternal patterns that I looked at. Below the pattern goes from an ABA pattern to an ABB pattern.


There was a gap between the ABA and ABB pattern where there was no ‘pattern’ as my 2 sisters and I shared the same base there. When my sisters and I all share the same base, that is an AAA “pattern”. That AAA area corresponded exactly to the area between the 2 yellow lines below in the chromosome map made for me by M MacNeill – .


In the map above, MacNeill was able to determine that my 2 sisters and I got our DNA from our paternal grandmother in the area between the 2 yellow lines. Further, the first yellow line described Sharon’s first paternal crossover point and the second yellow line described my (Joel’s) first paternal crossover point.

Finding All the Paternal Crossover Points

At this point in the Athey Paper, he recommended looking at the paternal pattern and filling in the missing bases based on the known pattern. I was looking for an easier way to do this, so decided to take a different approach. I decided that I would find all the paternal crossover points first. Then, armed with that information, I would create a formula that would fill in most or all of the missing bases for each pattern.

However, this required a modification of my database to make the work easier. I wanted a number to define the range of patterns, so that I could apply an easy query to add missing bases. I already had this but I hadn’t used it. Back when I imported the 4 sets of raw data into Access, Access assigned an ID to every row of data. That meant that I needed to add that ID into all the queries that I had done previously to make tables and further queries. This took a while, but I believe that it was worth it.


The ID is the first column.

I started going down all my data and noting the change of each pattern. I put the results into an Excel table. Here the Start and Stop numbers are the Access assigned ID numbers. The ID’s corrrespond with the number of DNA locations looked at. In this case there were a bit over a total of 700,000 of these locations for my mom, my 2 sisters, and me.


Then I noted the patterns are repeating as would be expected. For example, my first pattern was ABA, but 3 patterns later, that same ABA repeated. My thought was to create a query just for ABA patterns. Then when scrolling down looking for changes, the separation between rows should be greater and it would be easier to see where those changes were.

Here is what my Access query looks like. I changed the query name to DadSpecificPattern.


This particular query gives me the ABB pattern. I have the HeidifromDad base equal to the SharonFromDad base. That makes me the A and Sharon and Heidi the BB of the ABB Pattern. If you think about it, that also means in these areas that Heidi and Sharon will have their base from the same paternal grandparent and mine will be from the other paternal grandparent. I’m learning as I go. I’m sure that information will come in handy later.

My plan seemed to be good, but there was one catch. Once I refined my query, most or all of the blanks disappeared. That meant that the start and end points might not be exact. Here is an example of what I mean.


This is from my old Dad Pattern query with the blanks still there. The change from ABB to ABA happens at ID or line 19809. However, the new query takes out the blanks to make it look like the change is at ID Line 19826.

Here is what my DNA results look like so far without a filter (or query). The last 3 columns are the bases from Dad columns. There is a lot going on between lines 19809 and 19826.


Once I apply a formula to add bases, it will say something like: In the lines that have the ABA pattern where there is a blank at either A spot, replace the blank with the A that is there. If I apply the rule too late, I will be missing an area. Worse, If I were to use the 19826 cutoff, I may be still using the previous rule. That rule would say basically the same thing except, “Where the row is ABB and one of the B’s is missing replace the missing B with the one that is there.” If I apply an ABB rule to an ABA area, I’ll get bad results.

Long story short, I ended up recording a rough start and stop in my Excel Spreadsheet.


I started naming the segments, but realized that was not necessary. Some of the patterns were only at one point rather than in a long segment. I believe that is an anomaly due to a bad read, mutation or some other problem. Those are the ones in the spreadsheet that had no end point. It took me part of a morning to get all the paternal crossover pattern points for all 23 chromosomes. Fortunately for 3 siblings, the patterns are only ABA, AAB and ABB.

I just went back and checked the error points/aonomalies. I reran the Heterozygous Sibling Query and it fixed at least the first problem and hopefully the others. When I added the ID’s in, I had to redo all the queries quickly, so I suppose that is where the errors came in. That is not a problem as long as the problem can be found a fix can usually also be found. There actually weren’t that many errors. There are still some anomalies that are just anomalies. I have left those in yellow in the spreadsheet image below.

So in my spreadsheet, I have all the rough starts and ends for all the crossovers for my 2 sisters and myself. Here is the top part of the spreadsheet sorted by rough start:


Next, all I need are more exact start and end points. Here is the start of what I have:


I picked this section because it looks pretty complete already. Note that my Start and Stop numbers are pretty close to each other. That means that there are no other AAA segments in-between. I had to do an additional Access query to add in the position numbers for the Start and Stop of each chromosome’s pattern change. This was important if I want to convert the results from Build 37 to Build 36 to compare to MacNeill’s work or to

Starting to Find Paternal Crossovers and Assigning to Siblings

Previously I had been calling the start and end of my patterns crossovers. These two terms aren’t totally interchangeable as the start or stop of a pattern may happen at the beginning or end of a Chromosome and therefor not be a crossover at that point. It seems like it should be pretty easy to find the crossovers. Look at the image above. The first and second rows show ABA going to AAA. The order in me and my siblings are JSH or Joel, Sharon and Heidi. The only letter that changes is the B to A. That is the position that Sharon is in, so the paternal crossover has to go to her. From row 2 to row 3 the pattern changes from AAA to ABB at Chromosome 1, position 23,288,828, Build 37. That doesn’t mean that 2 siblings have a crossover there as we are looking at the patterns, not the letters. It is actually the letter that stayed the same that represents the crossover here. AAA to ABB means: all the same (AAA) goes to one different and 2 the same (ABB) – in this case Sharon and Heidi). The one that is different is me and I get the crossover at this location. The next change is from ABB to ABA. This is a little harder to see. I would say that that this crossover goes to Heidi if my reasoning is right. BB was the same before and goes to BA. It must be Heidi that changed because now she matches Joel who didn’t change. I’ll need to figure out how to make better bar graphs in Excel, but here is how the beginning part my father’s Chromosome 1 broke up for 3 of his children. Or another way to look at it the vertical lines are where my father’s maternal and paternal chromosomes combined in each of his 3 children that we are now looking at.



  • Series 1 is Sharon. Where the color goes from blue to orange is where Sharon has a change from one paternal grandparent’s DNA to another paternal grandparent’s DNA. The number to the right of Series 1 is the Build 37 Chromosome position number for Sharon’s crossover.
  • Series 2 is Joel’s first crossover (between orange and gray) and
  • Series 3 is Heidi’s first crossover position between gray and yellow [The same explanation under Sharon above applies to Joel and Heidi]

I’ll go back to the M MacNeill Standard. It’s like having an answer sheet to my questions.


According to MacNeill, I have assigned the crossovers to the correct siblings. In the above chart, just look at the red. I haven’t gotten to the maternal part yet, which MacNeill has in blue. The first 3 crossovers are where the red changes from light to dark or dark to light red. The difference in the MacNeill Chart is that his chart is split out one bar for each sibling. The other difference is that MacNeill has build 36 Chromosome position numbers and the numbers I have are from Build 37.

The Process

  1. Phase the siblings into maternal and paternal DNA using the principles that Athey outlines
  2. Find the paternal and maternal crossovers by pattern changes
  3. Assign the crossovers to the correct sibling using the pattern changes
  4. Assign the segments to the correct grandparent. This requires knowledge of cousin matches on the appropriate grandparent side.

That is the big picture which I am understanding as long as I don’t get too lost in the details.

Back to the Details: Fill in More A’s, G’s, C’s and T’s

I have been setting up my data for this, so hopefully, this will be easy. I now have 3 areas to look at:

  • AAB
  • ABA
  • ABB
AAB paternal update

Now I go back to my spreadsheet and sort it by Dad Pattern:


The Start and Stop areas are the ones I want to update. First, I’ll copy my most up to date Table in Access which is tblSibHetorzygous. I’ll rename that tblDadPatternUpdate. Then I want to look for missing data and update the blanks using the AAB pattern.

In Access, I create a query with the new table.


I chose the position fields and Paternal Pattern fields. I will change this to an update query which adds an Update To row. The criteria I want is when JoelFromDad = Sharon from Dad (AAB). Actually, I forgot, I was going to use ID criteria. So in the ID field, I need a lot of information. For the first AAB segment, I need everything between ID 45393 and 54155. This is what the criteria looks like:


When I choose that area, I get over 8,000 lines. However, I only want to update when there is one missing value in the first 2 and the one that isn’t missing is not equal to the third. Here is the result of the above query in my first AAB area:


I assume that the first blank should be a T. This would be one of the AAA results by chance in an AAB area. I don’t want to fill in the second line as I don’t know if it will be GGG or something else. That is what I meant by saying I don’t want to fill anything in unless there is only one missing value. In the 5th line there is A?G. That would have to be AAG (in an AAB Pattern area). There are some lines that have everything missing that I don’t want to touch.

How to create a query?

First, I want the situation where Joel doesn’t equal Sharon or Joel Doesn’t equal Sharon. That would create an AAB situation:


This query results in 1,666 rows of data including rows that are already filled in. Note that I had to write the range of ID’s twice because in order to get an OR situation I needed to put Joel not equat to Heidi and Sharon not equal to Heidi on separate lines. A simpler query is this one:


The above achieves the same results in one line. Now, for this query, if Joel is blank, replace it with Sharon’s results. If Sharon is blank, replace it with Joel’s results. Here is the query prior to the updating part:


This shows that there are 29 blanks for Joel and Sharon meeting this AAB criteria in the first range of AAB’s:


Next, I apply the same logic to all the AAB segments. In the Expression Builder of Access, I type in this simple formula:

Between 45393 And 54155 Or Between 60990 And 72548 Or Between 207109 And 220679 Or Between 313271 And 317516 OR Between 326845 And 326912 OR Between 389395 And 390311 OR Between 400045 And 405578 OR Between 419982 and 427158 OR Between 433191 And 446672
OR Between 482297 And 492542 OR Between 532520 And 539292 OR Between 571557 And 579594 OR Between 589614 And 589666 OR Between 630037 And 630314 OR Between 630319 And 630378 OR Between 658744 And 659375 OR Between 670533 And 672360 OR Between 673325 And 682544

Simple but long. This has the AAB Starts and Stops for 23 chromosomes. Then I copy it into the next ID criteria line and get this result:


It took a few minutes to type the criteria, but the goal is to update 1,514 lines of missing Paterrnal Pattern data with the push of one button. I still think it is quicker than going line by line and will be more accurate if I got the criteria right.

Next, I change the above Select Query to an Update Query.


When my (Joel’s) base from Dad is missing, I update to Sharon’s base. When Sharon’s base from Dad is missing her base is updated with mine. Isn’t sharing great? I didn’t look at the case where Heidi’s base from dad was missing, because if that was missing we wouldn’t be able to see any AAB Pattern.

Let’s UPdate

I push the run button and check the results. Here is my standard dire warning:


Now I will check if it worked. I’ll try ID or Line # 682124:


Unfortunately, that was an undesirable result. Before I had A?G. I changed this to ?AG. It appears that my query both replaced my value with Sharon’s, but replaced Sharon’s with my blank. I hadn’t expected that. Next, I’ll check ID# 682182. I had ?AG and replaced it with A?G. So until, I can think of a solution, I’ll need to split the 2 queries.

Fix it! Quick!

First I recopied by Heterozygous Sibling Table back to the Dad Pattern Update 1 Table. This got the table back to the way it was. Here is my simpler query.


Here if my base from Dad is null, replace it with Sharon’s base from Dad. I’ll check ID# 682182 again:


This gets into the category of trial and error. Sharon’s result still got replaced with nothing. See in the previous query I still was telling Access to put update Sharon’s results with mine. I needed to take that out:


There. Now the SharonFromDad Update To is blank. I go through the same procedures and now it looks right.


We now went from ?AG to AAG in the last 3 columns. These are the bases from Dad columns.

The next step is pretty easy:


I took out my criteria and put criteria in the SharonFromDad field. When she has a blank, replace it with Joel’s base from Dad. I hit run and it updated over 600 rows. Here is my original check spot at ID# 682124 with better results in the last 3 columns:


It took a while, but at least I got it right. The moral of the story is to not ask Access to do 2 things at once when those 2 things involve the same 2 people.

The Next Step: ABA

This time I’ll try a different query. I want there to be a B from the ABA in each case, so I’ll make sure that Sharon’s base from Dad is there:


Maybe I’ll figure what went wrong last time or come up with a new error. Above, I want the criteria on the first line to be for my blank base: If Sharon’s base from Dad is not equal to Heidi’s Base from Dad Put Heidi’s base from Dad in my blank spot. For Heidi, When Joel’s base from Dad doesn’t equal Sharon’s base from Dad, put Joel’s Base in Heidi’s spot.

I’m so tempted to try this query, but before I do, I’ll copy the previous table of the DadPatternUpdate to a new Dad Pattern Update ABA Table.  This will preserve what I have in the now older DadPatternUpdate Table in case anything goes wrong. Hey, what could go wrong?


I pushed the Update Button and updated over 30,000 rows. The results don’t appear to be any better, so I’m back to my 2 step process.

Here is my new slimmed down query:


This new Update Query should update my Line 18 in the new UpdateABA Dad Pattern Table and it does:


I now have a full ABA pattern on that line. According to Access over 30,000 Lines were updated, so it wasn’t a total waste of time.


Run and check Line 149:


We have ABA in the last 3 columns, so that is good. Line 18 is still OK. I checked it just to make sure.

Query AAB Revised

After seeing how well the ABA Query went, I decided to revise the old AAB Query:


This is now looking at over 37,000 rows. This updates my AAB Blanks to tblDadPatternAAB. I don’t know if it is a better query, but at least I’m being consistent.


This was over 80,000 rows, so I’ll assume that bigger is better.

I copied that resulting Table to tblDadPatternUpdateABA and reran the 2 ABA Update Queries. Here is one of the rerun queries updating the ABA Paternal Table:


Down to ABB

My Last updated Paternal Table was updating ABA, so I’ll copy that to a new Table called tblDadPatternUpdateABB. I’ll also copy my last query and put in the appropriate Starts and Stops for the paternal ABB patterns. Again,


This says when Joel’s base from dad is not the same as Heidi, put that Joel from Dad into the space. Probably a more precise query would have said when Sharon from Dad is null and Joel from Dad is not equal to Heidi from Dad. I suppose technically the above query could be writing over a base with the same base in most cases.

I’ll fix that and notice that I had the wrong table in the top, so I’ll change that also.


This only updated 944 rows, so maybe bigger is not better. Here is Part 2:


This was almost 3,000 rows updated. Now I should check if it worked. I scrolled for an ABB Pattern in an old query and found this:


Here is my check:


I guess I’ve been working too long. Here I have an AAB instead of the ABB I wanted. That is because I had Heidi updated to me (the A) instead of Sharon (the B). Here is the correction:


I made a fresh Table of ABB. When I opened up the Query, it was saved this way:


So Access changed my query. Note that there are 2 fields with HeidiFromDad in them. One is for the Update To and the other has Criteria. That is probably a clearer way to do it. Who should argue with Access?

I updated that and I take a cue from Access for Part 2:


In English, the above says, “For this range when JoelFromDad is not blank but Sharon from Dad is, and Joel from Dad has a different value that Heidi from Dad, put that Heidi from Dad value where Sharon had the blank. It sounds a little complicated.

Back to Row 197704 and I’ll look at 197709 while I’m at it:


Oh no, it is still wrong! I checked the previous ABA Table and that was the reason for the error. The error is also in the old AAB Table. However, the error was not in the file before that. My guess is that the AAB rule got applied to the wrong range of rows. I don’t see an error there, so I’ll have to rerun all the queries.

That’s OK, because I’m brushing up on the queries and will use the Is Null value so we will only be filling in the missing bases.


I had more problems, so I deleted the AAB Table and recopied the previous Table into it. I reran the Revised AAB Query halfway and it looked OK. However, when I ran the second half of the AAB query – filling Sharon’s results, the problem came back at ID# 197704. Very mysterious. The problem was where I thought it was originally. Look at the ID Criteria for the AAB Pattern Query:


There is an extra digit in the first between. The range goes from 45393 to 544155. The second number should be 54155. So this query was performed on 450,000 more rows than intended. I updated the AAB query with fewer rows. Again fewer is better. After many requeryings, I got the desired result for ID# 197704:


That should be the end of the first phase of nit picky work on the Paternal Side.

Summary, Conclusion and What’s Next

  • This was a lot of work, but the good news is that this update is for all the Chromosomes at once.
  • The bad news is that I have to do this again for the Maternal Side
  • Next up should be easy. That is just re-applying the Principles that Whit Athey Outlined on the new bases that I added from knowing the patterns. This should update missing maternally received bases from the updated paternally received bases.
  • I haven’t filled in blanks for the AAA patterns yet.
  • I am a little ahead of the game as I looked at how some of the first paternal crossovers will look.
  • Also with some basic phasing, I was able to deduce who those first paternal crossovers belonged to – one each to my two sisters and one for me.
  • If anything can go wrong it will

More Hartley DNA – Patricia’s DNA

This blog is a follow-up on my last Blog: My Hartley Autosomal DNA. I was inspired to write that blog following this year’s Hartley reunion in Rochester, Massachusetts. I intended to send around a little poster I made up about Hartley DNA and get a DNA sample from my father’s cousin Martha, but didn’t get a chance to. Instead I wrote a blog. I did talk to Patricia though. She is my second cousin and the sister of my childhood best friend, Warren. She had taken an AncestryDNA test. I think her daughter bought it for her. I asked if she could upload her DNA to and she said that her daughter would be good at doing that.

Here are Patricia’s 2 brothers and Patricia. The one in the middle was my best friend in my first 6 years of school. I remember seeing home movies of Curtis, Warren’s older brother. He came to one of my older siblings’ birthday party when he was about this age.

Patricia and family

In my last blog, I wrote about the Hartley DNA matches my father’s first cousin Jim had with me and my 2 sisters. I was surprised to find out that every match that we had represented one of my four 2nd Great Grandparents. They were all born around the 1830’s. It turns out that Patricia’s matches with cousin Jim represent the same four 2nd great grandparents. In addition Patricia’s DNA matches with my 2 sisters and me represent the same four old timers.

Here is what my DNA match to Patricia looks like at AncestryDNA:

Patricia Ancestry

Here, AncestryDNA has it right that we are 2nd cousins. They show we match for a total of 206 cM (centimorgans) across 14 DNA segments. That is about all you can get out of ancestry. They won’t tell you which chromosomes we match on or how much we match on each chromosome. That is why people upload their results to Ancestry does show other people that match DNA to both Patricia and me. These are my 2 sisters and 5 others. All these people also descend from the same Rochester Hartley ancestors, but none of them have uploaded their results to, so we don’t know their detailed DNA matching information.

Here is the same match between Patricia and me at Gedmatch:

Pat Joel Gedmatch

Ancestry has 14 segments vs. the 8 at Gedmatch. But at Gedmatch we know on which chromosome we match, how much on each chromosome and the exact start and stop location on the Chromosome. However, even with Ancestry’s 14 segments, their total is a bit smaller. Here is how I match Patricia on Chromosome 15 in the Gedmatch Chromosome Browser:

Joel Pat Chr 15

The blue areas represent the two DNA matches Patricia and I have on Chromosome 15.

Patricia on the Hartley Family Tree

Growing up, Patricia’s grandmother was my great aunt and also one of my neighbors, my Aunt Mary.

Patricia's Tree

The bottom box in each row are the people that have tested their DNA and uploaded to I now show 3 of the 13 children of James Hartley and Annie Louisa Snell (James, Mary and Annie). I now can check how my sisters and I match Patricia’s DNA as well as how Patricia matches Jim’s DNA.

Here are my great grandparents and three of their older children.

James and Annie Hartley

It is in interesting photo. Two of the children are looking away. I think that one is my grandfather James. The mother, Annie, is looking at something in her hands. The older son Dan is looking at a book and the father James doesn’t look comfortable being dressed up.

Patricia’s DNA at Gedmatch

One of the basic functions at gedmatch is called ‘One to Many’. In this case, I took Patricia’s DNA and compared them to everyone else that has ever uploaded their DNA results to gedmatch. Here are her 1st 4 matches:

Patricia's 1st 4 matches

Not surprisingly, her top matches are her 1st cousin, once removed, Jim, me and my sister’s Sharon and Heidi. The Gen column lists how far away gedmatch thinks Patricia’s matches are to a common ancestor. Patricia and I are 3 generations to James Hartley and Annie Snell, so that is right. Patricia shows 2.6 generations to a common ancestor with her match to Jim. A first cousin once removed would typically be 2.5 generations, so she shares a little less DNA than average here with Jim. Patricia also shares 19.3 cM of the X Chromosome with cousin Jim which I find interesting.

The Hartley X Chromosome

I’m taking the X Chromosome out of order because I find it interesting. There is one most important thing to know about the X Chromosome. If you are a male, you get one from your mother. If you are a female, you get one from your mother and one from your father. My father only got an X chromosome from his Frazer mother, so he doesn’t match anyone further up on the Hartley line by the X Chromosome. However Patricia and Jim both have maternal matches that carry up the line.

Here is how Jim got his X Chromosome from his mother and her ancestors:

Jim's X Inheritance

Jim only inherited his X Chromosome from those ancestors in pink or blue. So, for example, he got no X Chromosome from any Bradford before Harvey Bradford.

We need to compare Jim’s chart with Patricia’s X Inheritance Chart:

Patricia's X Inheritance

Here I didn’t show the X Chromosome that Patricia got from her father as this won’t match Jim. Then of what I show, only the bottom half will match Jim. This means that going back 4 generations from Patricia, she could match Jim by the X Chromosome on the Emmet, Snell or Bradford Line. One other difference between Jim and Patricia is that Jim got 100% of his total X Chromosome from his mother and Patricia only got 50%. However, that is a confusing way to put it because Patricia did get 2 X Chromosomes. So her one 50% must be similar to Jim’s 100% if that makes sense.

Here is what the X Chromosome match looks like between Patricia and Jim at on their browser:

Jim Patricia X Match

The yellow part with the blue under it is where they match at the end of the X Chromosome. That is enough on my X diversion for now.

Back to the Hartley DNA Matches on the Other 22 Chromsomes

At gedmatch, I go to the Jim’s ‘One to Many’ matches to see how he matches my family and Patricia. Here are Jim’s top 4 matches. You may have already guessed who they are:

Jim's top 4 matches

Above, I said that Patricia matched Jim a little less than expected. My sister Heidi at the top of the list matches him a little more than average.

Here are Jim’s DNA matches on Chromosome 1

Pat Chr 1

  1. Me
  2. Heidi
  3. Sharon
  4. Patricia

Here Patricia has identified a new piece of DNA in green that is a Hartley ancestor that we didn’t know about before. Again, this “Hartley” ancestor may be Hartley, Emmet, Snell or Bradford.

Here is another new Hartley segment on Chromosome 2:

Pat Chr 2

Patricia matched Jim on Chromosome 2. My sisters and I had no match with Jim on that Chromosome.

It looks like Patricia got a double segment of Hartley DNA on Chromosome 5:

Patricia Chr 5

Patricia is #1 above. Where the color changes from orange to yellow likely represents a change from Greenwood Hartley to Ann Emmet DNA or Isaiah Snell to Hannah Bradford DNA.

Patricia Helping Me Map My Chromosome 7

I’ve tried to map all my chromosomes as well as my 2 sisters’ to my 4 grandparents. I got a little stuck on Chromosome 7:

Chr 7 Map Pat

My chromosome 7 depiction is the one with the J to the left of it. On my paternal side (which is the blue (FRAZER) and red bar), I have the DNA I got from my dad’s mother in blue and my dad’s Hartley dad in red. Above that is the gedmatch depiction of how I match my 2 sisters by DNA and how they match each other. The bright green bar is called the Fully Identical Region or FIR. This means wherever that occurs a sibling matches the other sibling by getting the same DNA from the same 2 grandparents (one maternal and the other paternal). So in comparing Sharon to Heidi, they have that FIR from 0 to 25. It turns out that their 2 grandparents were their mother’s mother (Lentz) and their father’s father (Hartley). In the tiny section between 0 and 4, I have what is called a Half Identical Region or HIR. That means that I shared one grandparent’s DNA  with my sisters and the other grandparent I didn’t get any of their DNA. In this case I had to share either the Lentz or Hartley grandparent with my 2 sisters, but I didn’t know which.

That is where Patricia’s results came in handy. Here is how she matches Sharon, Heidi and me:

Patricia Chromosome 7

Patricia has 3 good matches with Sharon and Heidi and one tiny one with me (#3 on the Chromosome Browser). However, the tiny one is the one I need. The pink match shows that my Chromosome 7 from 0-4 (in millions) is where I got my DNA from my Hartley grandfather and not my Frazer grandmother.

Here is my completed Chromosome 7 thanks to Patricia. I extended the Rathfelder on my Chromosome 7 all the way to the left or beginning and added a small chunk of red Hartley from my grandfather.

Chr 7 complete

Another Type of Chromosome Mapping

There’s is another type of Chromosome Mapping developed by Kitty Munson. The way the Munson Mapping is generally used is to map out your relatives’ common ancestors. In the case of Patricia and Jim our common ancestors are James Hartley and Annie Louisa Snell. Here is what my new Chromosome Map looks like with the addition of Patricia’s DNA matches with me shown in blue.

New Kitty Map for Joel based on Pat

Well, that’s about enough for Patricia’s DNA for now.

Summary and Conclusions

  • Patricia shared the first Hartley X Chromosome match that I’ve seen.
  • The X tends to shy away from the male line, so Patricia and Jim’s match is more likely down somewhere in the Massachusetts colonial line rather than the English Line.
  • I would like to use Hartley DNA to break through the Hartley genealogical brick wall. Right now I’m stuck in the early 1800’s in Trawden, England. There were too many Hartleys there with the same first name to figure out who was who. Patricia’s DNA may help in finding matches to other Hartleys
  • Patricia’s DNA helped me in mapping my chromosomes in 2 different ways.


My Hartley Autosomal DNA

I have written many blogs on DNA but I don’t think that I have written about my Hartley autosomal DNA. Autosomal DNA is the kind of DNA test of which Ancestry claims they have tested over 2 million people. Autosomal looks at the DNA we get from both our parents and their parents and so on until the DNA runs out. And it does run out for some ancestors at some point. Due to this effect, very little of my DNA is actually Hartley DNA. If you think of it, I got half of my DNA from my father, but he got half from his father, his father got half his DNA from his father and so on.

Paternal DNA from Maternal DNA

The best way to get your paternal DNA is to test your father. This avenue was not available to me. However, I was able to test my mother. has a utility available that will separate out the DNA I got from my mom from that which I got from my dad. That utility does not recreate my dad’s DNA, but it does recreate most of the portion of DNA that I got from him.

Here is what the utility looks like. It is quite simple to use and works quickly.

Phased Data Generator

Once I have this information, I can run the results against all my matches to find out which of my matches are from my dad and which are from my mom. There are also those that match neither which may be considered false matches. This takes out a lot of the guesswork with our matches. It makes life twice as easy.

Paternal DNA from Testing a Paternal Relative

The other way to find paternal (that is Hartley) DNA is to test a paternal or Hartley relative. That is when I went to my father’s cousin Jim and asked him to take a DNA test. He was willing and I have some Hartley matches. I also had tested myself and my two sister’s. Here is what Jim’s DNA results look like compared to me and my 2 sisters on a Chromosome Browser:

Hartley DNA

I find this graphic interesting. It shows that Jim matches me and my 2 sisters on almost every chromosome. The last chromosome is the X Chromosome. It was cut off a bit. However, Jim could not match us on the X as my father only got his X Chromosome from his mother who was a Frazer and not a Hartley. On Chromosome 13 my 2 sisters and I have pretty much the same match with Jim. The 3 bars are of equal length. On Chromosome 20, only my sister Sharon matches Jim. On Chromosome 11 we all match but at different amounts. My sister Heidi has the largest match there. The places where we don’t match, my family is busy matching the other 3 grandparents. Or perhaps Jim is busy matching on his father’s non-Hartley line.

What Do All Those Matches Mean?

All those matches represent Hartley DNA. But remember that I said that even our Hartley DNA consists of other families. So the answer is a bit more complicated. First I will show the Hartley genealogy relative to the DNA match between Jim and my family. That will help explain all these DNA matches. In the first line below, Greenwood Hartley was from Trawden, England. Ann Emmet was from Bacup, England. Isaiah Snell had non-Pilgrim colonial ancestors. Hannah Bradford had Pilgrim Colonial ancestors.

Greenwood DNA

I have those with Hartley DNA in green. Those that have no Hartley DNA are in blue.

Here is Greenwood Hartley and Ann Emmet:


Probably Hannah Bradford and Isaiah Snell at their house in Rochester, Massachusetts:

Hannah Isaiah

Every match between Jim, me and my siblings represents a specific Ancestor from the 1st line above

The common ancestors between Jim and me are James Hartley born 1862 and Annie Louisa Snell born 1866, but the DNA represented between Jim and me is actually their parents who were all born around the first third of the 1800’s. This was just made clear to me within the last few days. I know, it gets confusing. That means that out of the 1/4 of my DNA that is Hartley (as I have 4 grandparents), only 1/4 of that quarter is Hartley when we go back to where the DNA came from. That means that every orange, blue or green bar in the first image represents one of the 4 ancestors from the early 1800’s above.

How We Get Our DNA

When we were conceived, we got our own blend of DNA. That DNA was really from our 4 grandparents. We got equal amounts from our mom and dad, but the amounts we got from their parents was blended and we may have not gotten an exact 25% from each our grandparents. We all actually have 2 of each chromosome. One is paternal and one is maternal. For example, the siblings James Hartley b. 1891 and Annie Louisa Hartley b. 1902 received on their paternal chromosome alternating segments of Greenwood Hartley and Ann Emmet DNA. Likewise, on their maternal chromosomes, they had alternating DNA from Isaiah Snell and Hannah Bradford. Those mixtures of their 4 grandparents was passed down to Jim, me and my 2 sisters and is represented in the Family Tree DNA Browser that I show above and again below.

How Can We Tell Which Segment Matches Which of the Four Ancestors?

For example, it would be nice to know if Heidi’s Chromosome 11 match with Jim shown in green below represents  Hartley, Emmet, Snell or Bradford.

Hartley DNA

The best way to find out which segment represents which ancestor is to do additional testing.


  • A Hartley relative not related to Emmet, Snell or Bradford
  • An Emmet relative not related to Hartley, Snell or Bradford
  • Etc.

Well, I think you get the picture. Once one of these people is tested, they would be a reference and any match Jim or my family had with them would be from the Hartley, Emmet, Snell or Bradford lines. The problem is, where are these people? There may be Snells around not related to Hartleys, but I dont’ know of many Hartleys not related to Snells. Sorry for the double negative.

Another way is to wait until one of these Snells not related to a Hartley shows up on a DNA match list. This doesn’t work for Ancestry matches because AncestryDNA doesn’t tell you which chromosome you match on. However, if they were to upload their results to, then the segments could be identified.

why do we want to identify these segments?

Well, for one, some find it interesting to know where they got their DNA from. Another reason is, that once these are identified, then we know right away where to look for an ancestor match. For example, if we knew a match was on the Bradford side. We would look for a common matching ancestor descending from the Mayflower perhaps.

Summary and Conclusions

  • When I tested my Hartley father’s 1st cousin, I got a lot of DNA matches on most of my chromosomes
  • These matches represent 4 of my 2nd great grandparents
  • These four 2nd great grandparents represent Trawden and Bacup, England and Colonial Pilgrim and non-Pilgrim lines.
  • So far, I have not been able to figure out which colored bar represents which 2nd great grandparent.
  • There may be some advanced techniques that could help me tease those out. Or I may be able to find those out by testing appropriate relatives if found.
  • The older generations are the best for testing as the further you get from your ancestors, the less autosomal DNA you carry. It reduces by a factor of 4 every generation.
  • Those relatives that have tested at Ancestry should upload their results to for comparison.
  • One of my Hartley 2nd cousins has uploaded her DNA results to and that will be the subject of my next Blog.