Ancestry AutoClustering Back in Action

I noticed that the Genetic Affairs Facebook site had a recent post. They said that as a Christmas present Ancestry AutoClustering was back in operation with some new controls to limit problems with the autoclustering. Ancestry AutoClustering has been popular. That is because AncestryDNA has the largest database of DNA-tested people but they are lacking in analytical tools.

My AncestryDNA AutoCluster

When AutoCluster first came out, I tried it at the low default settings. I wrote a Blog about those results here. Here are my annotated results:

 

I was impressed with the results and even though my clusters were small based on the default parameters, I liked the simplicity of the five clusters.

Here is my latest try at autoclustering. Now I used defaults that were 600 cM on the high end and 9 cM on the low end:

Now I have gone from five clusters to 76.

My Genealogy and Deciphering Some of the 76 Clusters

This tree goes to 16 branches. I suspect that 76 branches could go back at least two or three more generations than above. I have a lot of Hartley relatives as my great-grandparents had 13 children. My great-grandmother Snell had colonial Massachusetts ancestry. That means that I have a lot of 2nd cousins.

My Hartley 2nd Cousins

My Hartley 2nd cousins are not found in Cluster 1, but in Cluster 4:

These are my top 13 clusters. In my previous analysis, the present Cluster 4 was #1. By expanding the matches out to more distant matches, the new Clusters 1-3 beat out my former #1 Hartley 2nd cousin Cluster. Along with my 2nd cousins in Cluster 4 above are a few more distant cousins.

Massachusetts Colonial Ancestors – Cluster 1

Cluster 1 appears to include many of my more distant Colonial Massachusetts ancestors going back past my Hartley side. My closest match in Cluster 4 is my father’s cousin Joyce. My closest match on Cluster 1 is Jonathan – a relative of Joyce. Previously, Jonathan was in my old Cluster 1 also. Now he is a ringleader for my Colonial matches.

Other than Jonathan, I cannot pinpoint exact common ancestors for matches in Cluster 1 at this point.

My Largest Matching Clusters

Next, I am going to change my strategy. I will now sort by match on my Cluster List:

I clicked on the cM button until the arrow was pointing down. This gives me the clusters with the largest matches. Hence, the matches that I am likely to know about. The highest matching cluster is #4. #12 is the 2nd highest match. That is because it includes my 1st cousin’s daughter (on my mother’s side). That means that Cluster #12 could be either on my mother’s mother’s side or my mother’s father’s side.

The next Cluster by size is #1 with Jonathan.

Cluster 16 – Nicholson

The next cluster by size goes off my present image, so I will need to ratchet down the image. Cluster 16 has only nine people in it, but I have been in touch with many of them. The known people in this group descend from William Nicholson and Martha Ellis:

Cluster 27 – Clarke

Cluster 27 is important to me. Clarke is my largest brick wall. I will have to go down yet another level for Cluster 27.

I’m starting to use the Key for these higher number clusters.

My Top 23 Clusters by DNA Match Level

Here are my top clusters by match level in a spreadsheet:

This shows that the highest matches are on the paternal sides and on that paternal side, most of the matches are on my Frazer grandparent side.

I can also sort by cluster:

This shows that I am missing Cluster 3 even after looking at my top 23 clusters.

Cluster 3 – Mom’s Side

That makes me curious about Cluster 3. From the match list, I see that the top match is at 27.1 cM. This person has a large private tree, but hasn’t logged in to Ancestry for over a year. This group of matches is a bit of a mystery. I know that this cluster is maternal and probably the Lentz rather than the Rathfelder side as the Rathfelder matches are on the rare side.

Old Cluster and New Cluster

My original AutoCluster was done at conservative default levels and resulted in five clusters.

The old Cluster 1 is found in new Clusters 1 and 4. 2 is now 6 and 27. 3 is now 11. 4 is now 17 and 71. 5 is 19.

AncestryDNA Circles

It occurs to me that it would be helpful to compare clusters to the AncestryDNA circles. Here are my circles:

Nicholson, Ellis and Lentz are maternal and the rest are paternal. Nicholson and Ellis are both in Cluster 16. This points out an error I made on my spreadsheet:

I previously had my Nicholson Cluster as 11 and it should have been 16. My mother’s Lentz circle was emerging and the few matches were either not matching me or too low to be in a cluster.

The Mary Pilling Circle is interesting as this goes back to England. However, those in the circle who are not my 2nd cousins are a match to the circle and not to me.

Descendants of Anthony Snell Circle

I have a similar problem here. There are two people who are not second cousins to me that match me by DNA, but they match at levels below 20 cM. If I check the shared matches of one of these matches, I see that he matches Fred from Cluster 30. That is perhaps a hint as where I may find a common ancestor with Fred. Shared matches with another person in this circle also lead me to a three people who are in Cluster 30.

I believe that the Betsey Luther circle is somewhat redundant. She was the wife of Anthony Snell. Finally, the Churchill Circle. I match second cousins and others in the circle match those second cousins or closer matches. If I run clusters for others in my family, these relationships may be helpful.

This shows that three of my circles are associated with my second cousins in Cluster 4. Shared matches from the Snell circle brought me down to Cluster 30. The two circles for my mother’s side were the husband and wife Nicholson and Ellis. My mother had another Lentz circle but the matches were too low for me. When I look at my mother’s matches, I may find closer matches.

NADs and AutoCluster

NADs are New Ancestor Discoveries. Here are my NADs:

I have no idea who these people are.

The Long NAD

Here are some of the people in the Long NAD:

The orange indicates a match to me. So these are like circles or clusters also. The only difference is that these NADs are pointing to ancestors that I don’t know about. I may not know about them because they may not be my ancestors or the ancestors may be further back than the ones Ancestry is pointing to.

Brenda is in Cluster 7. I didn’t try to identify Cluster 7 above as it wasn’t in the top 23 clusters. This means that I can associate Cluster 7 with my Long NAD. I associate the Long name with Ireland. However, this family was from North and South Carolina. Angela is also in the NAD and in Cluster 7. She also matches Ron who is on my biggest brick wall – the Clarke/Spratt Line. Ron is in Cluster 27. Perhaps that indicates a relationship between these two Clusters. I did find one person who is in Cluster 7 who is not in the Long NAD. I’m not sure why. There are 21 in Cluster 7 and 31 in the Long NAD.

The Weems NAD

John Weems was from Tennessee. I see his connection even less than with Seymore Long. My matches to people in this group, when they do match, are below 20 cM. That means that I don’t have an analogous Cluster to this NAD.

Summary and Conclusions

  • I’m not done playing with AutoCluster yet. There is still more to explore.
  • My original AutoCluster looked at matches between 50 and 250 cM. In this AutoCluster run, I chose limits between 9 cM and 600 cM. The spreadsheet showed matches as low as 9 cM, but the html cluster chart showed matches only down to about 20 cM.
  • As I had so many clusters, I found it useful to look at the clusters with the highest DNA matches. These are the clusters that were, for the most part, easy to identify.
  • I compared the 76 cluster analysis with the 5 cluster analysis I did.
  • AutoCluster does a great job of condensing huge numbers of AncestryDNSA matches and putting those matches into categories.
  • AutoCluster gave me a sense of how many matches I had that were maternal or paternal and from which grandparent side those matches came from.
  • Next, I would like to look at a lower threshold of 25 cM to narrow down the number of clusters that I get.
  • I looked at how AncestryDNA circles related to Clusters.
  • Next I looked at my two NADs. One NAD had an analogous Cluster. The second NAD had matches that were two small and didn’t have an analogous cluster.

 

 

 

Making Sense of My FTDNA AutoClustering with a Leeds Color Analysis

AutoClustering is a new approach to looking at DNA matches. The progamming was created by Evert-Jan Blom. Right now the analysis is working better for FTDNA than it is for AncestryDNA. In a previous Blog, I looked at my 23andMe and FTDNA clusters, but had some trouble identifying many of the clusters. I was hoping that a Leeds Color Analysis would shed some light on my Clusters.

FTDNA AutoClusters

These are the 33 Clusters I came up with at FTDNA. I decided that FTDNA pads their DNA a bit. This padding problem blew up my orange Cluster 1 where there are a ton of matches on my Chromosome 20. These are on my Frazer grandmother’s side.

Here is a summary of some of my AutoClustering that I did previously:

The FTDNA results are in the middle column. It looks like I figured out 6 of the 33 Clusters.

Can the Leeds Color Analysis Help Figure Out More Clusters?

The Leeds Color Analysis also creates clusters, though not as graphically as the AutoCluster method. The good thing about the Leeds method is that it doesn’t rely on a  computer program and it requires some interpretation from the user. These could also be considered negatives.

Here is what I came up with using a Leeds Color Analysis of my FTDNA matches:

  • The first time a name came up as a match I gave them the color over their name.
  • If someone matched someone who matched someone up higher in the Cluster, I noted this on the spreadsheet.
  • I went out as far as FTDNA’s predicted 2nd to 4th cousin matches. This was 88 matches.
  • This represents 21 Clusters. Some are not technically clusters as there is only one person in the cluster. I assume that if I went to lower cM matches, I would get more matches in these one person ‘clusters’.
  • I identified three out of four of my grandparents

Starting with Hartley

In the Leeds Analysis, I used my father’s cousin as the lead Hartley person. He did not show up in the AutoClustering as he was too close a DNA match compared to the thresholds I used. However, the second person in the Blue column is Benjamin. He matches my father’s cousin Jim and becomes the lead person in the AutoClustering. A search for Benjamin in the AutoCluster shows that he is in Cluster #10.

Cluster 10

The problem is that Cluster 10 only has three people in it:

In the Leeds Color Analysis, there were 20 in the Blue column. When I go to my match with Benjamin at FTDNA and choose ICW, I get three people. So that makes sense. This is just one flavor of Hartley. A look at the ancestral names of these matches makes me think that this could be a Colonial SE Massachusetts branch. I’ll call this a Snell/Bradford Line as that covers all my Colonial ancestors:

I could be wrong, but that is my best guess right now. Next, I filtered for Hartley (blue on the Color Analysis) and added a column for the AutoCluster number to keep track of the Cluster number:

The other two people in AutoCluster #10 were not in the Leeds Color Analysis.

Going Down the Blue List

It would seem logical to go down the Blue list and put an AutoCluster numbers in for each person. I find the results interesting:

I would trust the Clusters except for #6 as that shows more than one color. I was a bit surprised that they didn’t all relate to AutoCluster numbers. I’m not sure why that is. Part of the reason is that I went by FTDNA predicted relationship and AutoCluster probably goes by total DNA match in cM.

I plugged these number back into my AutoCluster Summary:

Notice that I had one Hartley in Cluster 4 which I previously had as Frazer. Turns out that was a mistake and she should have been in Cluster 2. It all works out. It turns out I made another mistake and there is no obvious Hartley Cluster 8.

Corrected FTDNA Cluster Summary

I suppose it would be possible to further break down the Hartley into Colonial or non-colonial, but I’ll hold off on that for now. The Hartley List worked well, so I’ll move on to Frazer.

Plugging Leeds Frazer Colors Into AutoCluster

Here is what I get:

Again, I’m unsure why the people at the bottom of the list are not in clusters. The Clusters I found were not shared with other colors, so that was good. Now I feel like I am getting somewhere:

 

These are different flavors of Frazer in green. I also have Clarke who was my Frazer grandmother’s mother from my last look at AutoCluster. Frazer’s married Frazer’s. Frazer’s married McMaster’s who married McMaster’s. It gets complicated.

I now have 13 out of 33 Clusters identified. That is a good start. I have other ideas on how to identify other clusters, but that can wait for now.

Summary and Conclusions

  • I got stuck trying to identify many of my AutoCluster results from FTDNA.
  • Using the Leeds Color Analysis, I was able to put many matches into two major grandparent categories. I was able to cross-reference these matches to the AutoCluster.
  • My next idea is to use chromosomal analysis to identify the clusters. By this, I mean that I will compare the matches to my visual phasing results. This should get the clusters into the correct grandparent area.

 

 

 

 

 

 

 

 

My Mother-In-Law and Her FTDNA AutoClustering

Joan’s Genealogy

I find Joan’s DNA fun to work with. Even though Joan has Canadian background, she has no French Canadian which can muck up the works. I don’t mean to sound prejudice in a DNA sort of way. Joan is 1/4 Newfoundland, 1/4 Daley which is from Nova Scotia and the other half is from Prince Edward Island. Out of Joan’s four grandparents, the Daley side seems to be most obscure. However, the Newfoundland side is problematic due to poor records there. The Church in Harbour Buffet burned down at one point.

  • Ellis and Rayner – PEI
  • Upshall – Newfoundland
  • Daley – Nova Scotia

AutoClustering Joan

For some reason, Joan’s results came through as untitled text files:

I was able to change the first two files to csv files and the last one to an html file and that solved the problem. I chose a range between 12 and 400 cM.

How Many Clusters?

Joan had so many clusters that they ran off the graph:

I’ll say Joan has over 80 clusters. 

This represents about the first 25 of Joan’s clusters. Here is the total at the bottom of the report:

I forgot that FTDNA add small segments to make the matches larger, so I should have had a higher bottom cutoff point.

Joan’s Cluster #1 – Newfoundland

A journey of 1,000 miles starts with one step. Joan’s top match is Ken. I’ve looked at his DNA before and had trouble figuring out where all of his DNA came from. If you look real close, you will see Ken’s grey dots going toward other clusters. Those are other places where he is related to Joan. I mentioned that French Canadians mucked up the works with intermarriage. This would be true of islands also – like Newfoundland and Prince Edward Island.

Joan’s #1 AutoCluster Match: Ken

Ken and Joan both descend from Christopher Dicks born in the 1780’s and his wife Margaret. I have run a DIcks DNA project and I recognize a lot of people in this Cluster.

Joan and Nancy

I didn’t recognize Nancy’s name in the group. Here is her tree:

I don’t get a lot of Upshall leads, so this is interesting. I assume that Nancy also has Dicks ancestry at some point. See, AutoClustering leads to good things.

That was quite easy. Here is the spreadsheet I use to keep track:

Cluster 2: PEI

I recognize some PEI descendants in Cluster 2. I have written about Glenda. She descends from Elllis and Rayner and matches Joan equally on those lines. That means I need to look at other Cluster 2 people and their trees.

Barbara and Lee

Barbara and Lee from Cluster 2 both have McArthur or MacArthur in their trees. That would seem to favor the Ellis side over the Rayner side:

However, I am just matching surnames, I am not matching actual shared ancestors. That would take more work.

Agnes’ Tree

It seems that there a lot of good trees at FTNDA. Agnes matches on the Rayner side.

Agnes’ maternal side has an Edward Rayner. His parent are the Edward John Rayner and Mary Watson in Joan’s tree. Of course, that favors the Rayner side. However, I note that there is an Ellis on Agnes’ Rayner side also.

Jane’s Tree

Here is where I need Ancestry to pull the trees together for me:

Jane has McArthur and Ellis on her paternal side.

I guess I’ll call this cluster Ellis/McArthur for now.

I spent a bit of time on this cluster, but it is Joan’s second largest cluster.

Joan’s Cluster Three People Don’t Look Familiar

Unlike the first two clusters, I don’t recognize these matches. There were four trees for the 13 people in this cluster. I think I’ll skip this one. By the little dots to the left and above this cluster, I would say there is some connection to the previous PEI cluster. It seems like an odd group. At least one tree was from New Zealand and one was from Ireland.

Skipping on to Cluster 4

As I look at the names and trees, it appears that this Cluster is from Newfoundland. I’ll just call this a Newfoundland Cluster:

That also gave me an idea for a name for Cluster 3.

DNAPainter to the Rescue?

I’m getting stuck on these Clusters, so I’ll take a look at what I have already painted for Joan. Here is the key to Joan’s painted Chromsomes:

One problem I see with this is that DNAPainter takes from many places – not just FTDNA.

Melissa in Cluster 34

Melissa has a common ancestor of Ellis/Gorrill with Joan.

I’m not so sure about the other two matches in the group. So I didn’t find a lot by that method.

The Clicking on Trees Method

Next, I’ll just click on trees to see if anything shows up. This resulted in a few general discoveries. I then clicked on the highest cM button to try to overcome FTDNA’s over-counting of their DNA matches.

Here are some of the clusters partly identified:

Summary and Conclusions

  • I had trouble finding specific ancestors for many of these clusters. I think it may be related to FTDNA having higher cM matching than is warranted. This may be partially fixed by raising the lower threshold to 20 cM when running an AutoCluster Report at FTDNA.
  • At Joan’s 2nd great-granparent level, I can identify 16 ancestors. In this analysis, I got 92 clusters. That is too many. 
  • Even though the cluster identification was difficult, it was good to take a fresh look at Joan’s FTDNA through the eyes of AutoClustering. I have at least one new lead to follow up on.
  • Another issue that makes Joan’s cluster identification difficult is that her ancestors come from two islands: PEI and Newfoundland. There was some intermarriage going on there. Joan is also once quarter from Nova Scotia. I’m not aware of intermarriage there, but matches with these relatives are relatively rare (no pun intended). 

AutoClustering My Wife’s Aunt’s Ancestry DNA

My wife’s father had his DNA tested at FTDNA before he passed away. I also had his two sisters’ DNA tested at Ancestry. I’ll use his sister Virginia’s AncestryDNA results for Autoclustering as a stand-in for my later father-in-law Richard.

AutoClustering Virginia

I could have picked either sister, so I picked Virginia for no special reason. Actually, my thought was to pick Lorraine, as she is closer in age to Richard, but I picked Virginia. I chose a low threshold of 12 cM for the AutoClustering.

First the Genealogy

Virginia and siblings have half French Canadian and half Irish DNA. In my experience, the French Canadian DNA tends to take over. This is due to the common ancestry of French Canadians, and many descendants who have tested.

The top part of the tree is Irish and the bottom is French Canadian. I am more interested in the top because there are some missing black arrows. Those are the places where there are missing ancestors. The ancestry is filled in to the level of 2nd great-grandparents. The column on the right represents third cousin, but in many matches this should show as third cousin, once removed.

Looking at Virginia’s AutoCluster

Here is the key for Virginia’s Clusters:

The Key is on the Chart, so there are grey dots representing those that didn’t fit well into the clusters. Cluster 1 is no doubt French Canadian. Between Cluster 18 and 19, the cluster size goes down from three to 2. These numbers do not include Virginia who is in every cluster.

Name That Cluster

The game is to name the clusters. Before I do that, I notice that there are not too many grey dots between the first and second Cluster. I take that to mean that these two groups are not closely related. Perhaps the green Cluster is Irish and the orange is French Canadian.

Identifying Cluster #1

This should be easy as there are so many people. First I go to the list of people below the chart and search for Virginia’s second cousin Fred who is an avid genealogist. He is there in Cluster #1.

Fred’s shared ancestors with Virginia are here:

However, there are 120 members in Cluster #1. Next, I went down the list of people in Cluster #1. The last person I had notes for was Girard. Here is Michel’s Shared Ancestor Hint (SAH) with Virginia:

Michel has 72 people in his tree. The problem with that is that Michel and Virginia could have shared ancestors on other lines. Here is Louis Marie Henri Girard and his wife on Virginia’s tree:

I would say that Louis Girard is a hint as to where the cluster is going. I’ll try another SAH. The next person going up the list has six SAH’s and a large tree. Here is the most likely source of the DNA that is shared between Virginia and this match:

These matches so far tend to be around the bottom of Virginia’s French Canadian Tree:

I’ll try one more. The next SAH has over 1,000 people in his tree and his common ancestor with Virginia is Francoise Gagne:

So far, I would say that these are all ancestors of Elizee Fortin and Rosalie Gagne. It is even possible that I could name this Gagne/Girard if the person with six SAH’s has an ancestor there. It turns out our six-matcher has these ancestors also:

That means I would tend to call this a Gagne/Girard Cluster. I like to get the name as far back as possible to be the most specific name for the cluster.

I’ll look at one more SAH to make sure. Lucie has a good tree, but six SAH’s. For some reason her first hint puts her at 6th cousin once removed to Virginia and her second hint puts her at 6th cousin to Virginia. I’ll choose the 6th cousin which goes to Pierre Girard and Marie Anne Vesina. This ancestral couple is also on the Gagne/Girard Line. This is not a life or death decision, so I’ll go with the Gagne/Girard Label for Cluster #1:

That’s one down and 33 to go. I like to keep track of these clusters in a spreadsheet:

This way I can expand to the right for Richard at FTDNA eventually.

I’m Guessing Cluster #2 Is Irish

However, as I look at my notes nicely displayed by AutoCluster, I see that this cannot be:

This means that the LeFevre side is not as closely matched to the Pouliot side as the Pouliot side matches some other names. This makes sense also.

Name That Cluster #2

It is obvious that Cluster #2 is on the LeFevere side. However, I want to be more specific as in Cluster #1 above. The match at the bottom of the list shows a SAH of Maguerite Anger. The husband is not shown as he is shown as marrying her three times. However, I assume that the husband should be there also:

The husband is Joseph Methot. I am now just showing the line of Virginia’s LeFevre grandfather Joseph Martin as we know that this cluster is along the LeFevre Line. If I were to name this Cluster based on a sample size of one, it would be Methot/Anger. However, I want to be more sure and it is easy to look at these SAH’s by just clicking on a link from the AutoCluster list of matches in Cluster #2.

Going up the Cluster 2 Match List from the bottom, the next SAH is here:

This brings the name of this cluster one generation towards the present:

Based on a sample size of two, I would name this cluster LeFevre/Methot.

I’ll call in Jane for a tie-breaker:

I can see that Jane adds evidence to my previous guess:

Cluster #3 – French Canadian?

By looking at the Cluster Graph above, it appears that the red cluster will be more closely allied to the Pouliot side. There are not as many linked trees for Cluster #3:

Judy has an unlinked tree:

Cousin Fred is not in this Cluster even though he is closely related. This could be a case that he is too closely related to Judy. Judy’s tree shows that she is a second cousin to Virginia on the Pouliot/Fortin Line. This seems to be the best name for this Cluster:

Cluster 4 – Slim Pickings on Trees

Cluster 4 has very few linked trees:

The match names appear to be French Canadian, so that is a hint. The largest tree above is private. From the above three clusters, it appears that I am getting different flavors of French Canadians. Match #3 has a small unlinked tree:

I really don’t want to build out this tree, though I could. I see Gobeil in Virginia’s tree here:

Alexandre is Virginia’s match #6 on Cluster 4. He also has an unlinked tree:

Here is another small tree from Match #8:

Again, I’m not willing to build out his tree. Match #9 had an unlinked tree and Match #10 had a small tree, but they were not helpful. I’ll go with Pouliot/Gobeil for now.

Cut to the Irish Side

This is going slowly, so I’ll start looking for Irish matches. Here is a Leeds color analysis that I did for Virginia about three months ago:

I need to pick out some of these green and blue matches and see where they cluster. The first match is Donna. She matches at 417.5 cM, so this is a case where I set the upper limit too low. The four in a row green Kerivan matches are also all too high for the upper match limit that I picked. Here is part of the tree of the first Butler match that didn’t get filtered out:

The common ancestors are Edward Butler and Mary Crowley. This match is in Cluster 12:

Based on the notes, I can see that I have been tracking three out of four of these matches. I wrote a message to John to see if he has any family history. It may be that he pre-dates the Butler/Crowley connection by one generation.

This Butler/Crowley Cluster is small, but important.

Is There a Kerivan in the House?

The green in the Leeds Color Analysis above stands for Kerivan. Here are some Kerivan descendants in Cluster 11:

I have written about Gaby already as a Kerivan relative. She is Thomas’  Aunt. Here is the tree showing how Virginia and Gaby connect:

Virginia is a second cousin once removed to Gaby and 2nd cousin twice removed to Thomas. Here are the common ancestors on Virginia’s tree:

David: Match #2 in Cluster 11

Here is David’s tree on his maternal side:

I am interested in David’s tree enough that I will build it out a bit. I’m curious to find the common ancestors. I start with David and mark the tree private at Ancestry.  Here is David’s maternal grandmother Joan Kerivan in 1940 Newton, Massachusetts:

Here I use a split screen for working on David’s tree. The tree I am making is on the left and David’s tree is on the right:

I accepted Ancestry’s Joseph Edward Kerivan hint but not his wife as it was different than what David had. It seems like it should be an easy tree. I have the DNA match, the Kerivan name and the right area (Newton, MA).

Here’s David’s great-grandfather in 1910:

Next, Joseph’s birth record comes in handy:

I see on George E Kerivan’s marriage record, that his parents are John Kerivan and Alice. These are the couple that I am looking for. Here is part of my selective tree for David:

Alice is no doubt my wife’s ancestor Alice Rooney.

As an added bonus, I color-coded the Clusters in my summary spreadsheet based on my wife’s Aunt’s grandparents. These are the same colors I used in the Leeds Color Analysis.

The clusters are now taking shape. The magnitude of the French Canadian matches compared to the two Irish clusters is obvious.

Comparison with the Leeds Color Method

Next, I put the cluster names by the appropriate names from the previous Leeds Analysis I did:

I see that one of the people from the Butler Cluster was not in this analysis, so he must have gotten his test results since I did this analysis three months ago. The first green block that doesn’t have an assigned cluster represents Russel. Russel is in Cluster 7.

Cluster 7 – Kerivan?

Here are Virginia’s seven Cluster 7 relatives:

Here is Russell’s tree on his mother’s side:

Time for a Quick Tree for Russell

I found this hint at Ancestry for Thomas Kerivan:

This gets me to where I want to be. Here is my quick tree for Russell:

One might wonder why there is another Cluster for this same couple. It could be that one Cluster is Kerivan and one is Rooney.

Here is Sandra. She has the same mother as Russel, so I could have saved myself some time:

Actually, there is a Rooney in this cluster, so I’ll call this the Rooney/Kerivan Cluster.

There are a few new people in the Rooney/Kerivan Cluster that I should get in touch with.

Cluster 19 – A Butler Cluster on the Outskirts

Here are Brian and Michael:

I associated Brian with the Butler family due to a shared match with Patty. Neither Brian nor Michael have family trees, but it would be worthwhile to follow up with these two.

My guess is that the Cluster 19 Butler predates the Cluster 12 Butler/Crowley families. This is a good place to be as I am trying to pin down a place in Ireland where the Butlers came from.

Where is Patty?

One Butler DNA match I have been tracking is Patty. I couldn’t find her in the AutoCluster. Based on her shared matches at AncestryDNA, I would have expected her to be in Cluster #12. AutoCluster provides a list of names that didn’t match other people. I didn’t see her in that list either.

Summary and Conclusions

  • AutoCluster by Genetic Affairs continues to be a fun and useful tool to use to sort through your DNA matches.
  • The program is similar to the Leeds method but is more useful and takes the guesswork and human error out of the equation for the most part.
  • AutoCluster gives a visual as to where the bulk of the DNA matches are
  • In this Blog AutoCluster highlighted some important new matches. It would be worthwhile to contact these new matches.
  • The list of people in the Ancestry Clusters is especially helpful. I can click on each name and quickly go to their AncestryDNA match and see if they have a SAH or linked or unlinked tree.
  • Even though AutoCluster is one of the best things since sliced bread, it is not perfect. I could not find Patty in the clusters. Also the runs that I get are spotty. It seems to work about half the time for me. I would like to get better results at Ancestry for myself and my mother, but am not able to get results at the thresholds that I want. It may be that these glitches will be fixed as this is such a new tool.

 

 

 

 

 

 

 

 

 

Mom’s DNA AutoClustered

AutoClustering was down for AncestryDNA today, but now it appears to be working again. This time I wanted to try to autocluster my mom’s DNA. I meant to lower the threshold from 50 to 15, but apparently did not. I’ll take a look at what I got.

Mom only got three clusters. This could be a lesson in what not to do. The first cluster is Nicholson.

Cluster 3 – Rathfelder

I have blogged about this match. The common ancestor is at the level of Hans Jerg Rathfelder and Juliana Biedenbinder:

Green Cluster #2

I am less certain of Cluster 2. These two people don’t have trees and I have not been able to get in touch with them. I was in touch with a shared match who had Schwechheimer ancestry. This ancestry is also from Latvia, so that would be my best guess for this Cluster.

That’s as far as I get with this small autocluster. The orange is a maternal cluster. Clusters 2 and 3 are paternal for my mom as far as I can tell.

Here is my AutoCluster at Ancestry using the same default settings:

 

I had one maternal cluster (Nicholson) and four paternal clusters. My mom’s cluster of 7 Nicholson’s translated to a cluster of three for me at the preset thresholds. This makes sense as I got about half of my mother’s Nicholsn DNA.

AutoClustering My 23andMe Matches and More FTDNA

In my previous Blog, I looked at AutoClustering my AncestryDNA and FTDNA matches. In this Blog, I’ll look at 23andMe. I have to confess, that I have never had a good feel at working the DNA matches at 23andMe. I was hoping that AutoCluster would give me a boost in figuring out what I have there.

Here is my AutoCluster at 23andMe:

Now I am up to 45 Clusters. I used a slightly lower threshold than I used at FTDNA, and got different results (20 cM at 23andMe vs. 25 cM at FTDNA). At FTDNA, the first two clusters had 108 members and Cluster 2 had 10 members. At 23andMe, the first two Clusters are a bit more even at 66 and 65 members. Also I note that the green Cluster 2 is quite closely related. All 65 members match each other.

Identifying the 23andME Clusters

My first thought is to figure out what these clusters represent. Which line is which? I do have a few known cousins at 23andMe.

Cluster 8: The Lentz/Nicholson Line

My mom has a cousin Judith who is on the Lentz Line. She is on Cluster  8.

Judith also descends from the Nicholson family as does at least one other person in Cluster 8.

My Cousin Jennifer: Hartley Side

Another point of reference is Jennifer who is my 2nd cousin, once removed.

 

This corresponds with my Hartley’s at AncestryDNA:

Steve with Clarke Ancestry

 

I’ve blogged about Steve who is a 23andMe match. He has Clarke ancestry and is in Cluster 19:

Cluster 19 is quite a ways down on the list.

Cluster 2 and Chromosome 20

I have written a few Blogs on my Chromosome 20. I have many matches there on my Frazer grandmother’s Irish side. These Chromosome 20 matches appear to correspond with my Cluster 2. Here is one Blog I wrote on my Chromoosme 20 about 2-1/2 years ago. In that Blog, I reasoned that the matches may be on my McMaster side:

In my previous request for an AutoCluster at FTDNA, I had set the lower threshold at 25 cM and that had filtered out a lot of the Frazer side matches. At 23andMe, I lowered the threshold to 20 cM which would explain the larger cluster.

Deciphering FTDNA Cluster 1

If FTDNA is like Ancestry and 23andMe, then the yellow Cluster should be a Hartley Cluster. First I checked the top match. It turns out that FTDNA over-reports these matches:

Roger shows a match of 67.3 cM with me, but his top segment is 12.3. Here is what the FTDNA Browser shows:

The browser shows one small match at Chromosome 20. This is where I have a lot of Frazer matches as described above. Theresa is also in FTDNA Cluster 1:

Thesesa also has a relatively small match corresponding with her 13.1 cM largest segment on Chromosome 20. That means that even though I tried to avoid my Chromosome 20 overmatching problem by raising the cM threshold to 25 cM, FTDNA managed to add in tiny cM’s and up the totals for these matches.

It is unfortunate that FTDNA has small matches that come out as large. I don’t know if this is as big a problem for others as it is for me. Basically I have a large group of distant relatives that I can’t connect with in Cluster 1.

A Comparative View: Three Companies

Here is a comparison of the three AutoCluster runs I have done with three companies. A better comparison would be for me to rerun the Ancestry results with a lower threshold:

  • I changed the Ancestry Cluster 1 name from Hartley to Snell. That is because the cluster goes back to Snell and beyond my Hartley ancestors for some of the matches.
  • In the three analyses Clarke went from Cluster 2 to 6 to 19.
  • I noted a special Chromosome 20 issue that I had. This didn’t come up at Ancestry as the threshold was set low. I may be able to identify this group later at Ancestry when I am able to run an AutoCluster at a lower cM threshold.
  • The Ancestry AutoCluster analysis only went up to 5 Clusters based on the strandard set AutoCluster thresholds.

FTDNA Cluster 2

The above summary points out that I have not yet figured out FTDNA Cluster 2. So far, I don’t have a definitive answer for this Cluster. The people tend to match me on my Chromosome 10. I have tended to associate their ancestors with Colonial Massachusetts.

FTDNA Cluster 3

This Cluster appears to match on Chromosome 22. I think that they are Irish in background. My Chromosome 22 (Joel) is all Irish Frazer on the paternal side:

At least one of my matches from Cluster 3 is also listed at Gedmatch. I have a paternally phased kit which she matches. That is how I can tell that the match must be on my Irish Frazer side.

Back to 23andMe: Cluster 4

Cluster 4 has 17 people in it (or items according to AutoCluster).

 

Two of these “items” are listed as unknown. Next I need to identify one or more of these people in the list. John listed 8 surnames, but none of them sounded familiar. So far, these matches are matching me on Chromosome 3. Here is the match with Kris at the top of the Cluster 4 list:

From visual phasing, I know that has to be either Hartley or Rathfelder DNA (at the level of my grandparents).

I recognize some Hartley names in that area of the match and they aren’t in Cluster 4. That means that this has to be a Rathfelder side match.

I’m not getting very specific with these Clusters. Part of the reason is that 23andMe does not emphasize ancestral trees. So if I ever meet these cousins, I can introduce them as my Rathfelder Line Chromosome 3 cousins. From one of my other maternal Chromosome 3 matches, I see that I have traced one of these families to a German Colony in Saratov, Russia. I have not yet made the connection between them and to my ancestors who lived in a German Colony in Latvia.

So, Where Are We?

Here is a summary of some of the clusters:

I had the best luck with AncestryDNA. This is partly because I having been working with them more. Also partly because I used lower thresholds, I had the more obvious clusters and only five clusters. Ancestry also has the most matches and best genealogical trees.

FTDNA came in next as they do have some genealogical trees. This is where I tested first, so I have some familiarity with how they work. Their matching algorithm causes a perfect storm for my Irish Chromosome 20 matches showing that they match much more closely than they should. I expect that this is true to a lesser degree with some of my other matches.

23andMe was the most difficult as they focus the least on genealogical trees. It would take a bit of time to contact some of the critical matches there. I believe that 23andMe have more test results than FTDNA, so they have that going for them.

Summary and Conclusions

  • So far, it has been easiest to interpret the AncestryDNA clusters. I would like to take the cM levels down once some of the bugs have been worked out.
  • I got many more clusters at FTDNA and 23andMe, but some of the clusters descriptions are more vague than I would like.
  • I would like to look more into the Hartley/Snell clusters. I am interested in Hartley’s that don’t match Snell’s as my genealogical brick wall goes back on my Hartley line – pre-Snell.
  • It would seem that I should be able to cross-reference the clusters. Even though the matches are different at the different companies, the common ancestors are the same.
  • This utility is new, so people are still experimenting with it. For example, is there a cluster sweet spot that isn’t too high or too low. Obviously, I have 32 third great-grandparents representing fourth cousins. This may be a good number of clusters to shoot for. There may be those in the 3rd great-grandparent level that may be too obscure to have clusters. However, this could be off-set by 4th great-grandparents with a lot of descendants that would make good clusters.
  • A lot of the clusters have two people in them. Is it worthwhile looking at such small clusters?
  • The AutoCluster utility has given me a fresh look at my DNA matches. I have also been entering some of the larger matches into my match spreasheet.