Walking My Clusters Back – Jim Bartlett Method

I recently read two interesting articles by Jim Bartlett on the use of Shared Clustering. Jim’s most recent article discussed walking the clusters back. Shared Clustering is a free program developed by Jonathan Brecher.

Shared Clustering

Last Night while the New England Patriots were playing football, I downloaded Jonathan’s program and used that program to download my AncestryDNA matches and Shared Matches.

 

I used the first two radio buttons above. The first button downloads your matches up to the fourth cousin level. That is a match of 20 cM or more. As I recall, this was about 978 matches. I may be off, because I just checked AncestryDNA and I have 908 matches of 4th cousin or closer. The second button gets your matches and Shared Matches down to a level of 6 cM. It took overnight to gets all these downloaded. However, once I have those, I don’t have to connect ot AncestryDNA again – unless I need an update. The download is in the form of a text file and not overly useful in that form. It is sort of a dump of my AncestryDNA match data.

Clustering

Next, I chose the recommended button for clustering under the cluster tab:

This outputs to an Excel spreadsheet file. If I shrink my spreadsheet to the minimum 10%, I can see half of the clusters:

This gets me to about Cluster 18 out of 50 clusters. So, though this is theoretically, my 4th cousins, it must go out further than that. 4th cousins would represent my 3rd great-grandparents. I have 32 great-grandparents and 50 clusters. 18 or more of those clusters must go beyond the level of the 3rd great-grandparents.

Here is the bottom half of most of my clusters down to Cluster 50 in the lower right of the screen:

Walking My Clusters Back

Jim Bartlett recommends walking back your clusters from your 4 grandparents further back a generation at a time. My first Blog on clustering was about a year ago using the Auto Cluster program. Here was my first Auto Cluster:

In this simple analysis, I had 5 clusters. However, as far as I could tell, none of these represented my maternal grandfather:

  1. Paternal grandfather – orange
  2. Paternal grandmother – green, purple and brown
  3. Maternal grandmother – red

My paternal grandfather was a German from Latvia who came to this country in the early 20th Century. So, not many relatives had tested. Not really a problem, but something to be aware of.

Shared Clustering 90 cM or Greater

Next, I tried the Shared Cluster 90 cM or Greater. It looks like this should give me 3rd cousins or greater. Somewhat surprisingly, this only gave me two clusters:

A few notes:

  • The Shared Cluster program does not appear to have an upper limit for matching. Because of that my immediate family is included. They show up as a a horizontal bar in the middle of the image.
  • The first two people are in a cluster of sorts, but Shared Cluster only includes clusters of three or more by default. They fit in on my paternal grandmother’s side.
  • The third person (the first person in Cluster 1) is actually on my maternal grandfather’s side. This was a new person who tested since last year. She is in Cluster 1 because she matches with my mother, my maternal first cousin and her two daughters.
  • Cluster 2 is all my paternal side. The matches go back further than that but the Cluster is holding together due to my close family being included in the Cluster.

Tweaking the Shared Cluster Program

Under advanced options on the Cluster Tab, I don’t see any option for screening out close relatives:

So I’ll try to ratchet down the lowest centimorgans to cluster to try to break open these clusters. I’ll try 50 cM for the lowest:

Above, I picked up one more Cluster. Cluster 1 is now my paternal grandmother’s cluster. This was the one that wasn’t a cluster previously, but I picked up one more person to make it a cluster:

  1. Paternal grandmother
  2. Maternal
  3. Paternal grandfather

The first person in the previous Cluster 1, Donna, is now the last person in the new equivalent Cluster 2. So far, I have not split a cluster but added to a previous non-cluster. This is fun to play with.

I Need to Get to About 8 Clusters Next

Trying 40 cM still resulted in 3 Clusters, so I’ll try 30 cM. I know that the three represent four grandparents as they are, but I only have one tested person for my maternal grandfather’s side tested at Ancestry. I know that at 20 cM, I have 50 clusters, so I need a match number that will get me about eight clusters. I think I see an issue. On the advanced tab, there is a maximum shared match number. When I ran 50 cM, I had a maximum shared match of 90 cM. I need to change that to 50 cM:

This flipped the clusters around:

  1. Paternal grandfather
  2. Maternal
  3. Paternal grandmother – now up to a cluster of 6 people who match me and each other by DNA

I think I’m getting the hang of this.

A 40 cM Cluster Gives Me 6 Clusters

This may be about what I want. Again, I set my shared match limit to 40 cM:

There are two-person clusters where I have the arrows. There is also a one-person cluster at the lower right of the image above. The Clusters are:

  1. Lentz
  2. Nicholson – the first two clusters look like one. I believe that that is because Cluster 1 is Lentz/Nicholson and Cluster 2 is Nicholson without the Lentz.
  3. McMaster/Frazer (Ireland) – These families intermarried more than once in my ancestry
  4. Unidentified, but believed to be Spratt (Ireland)
  5. Most likely Clarke (Ireland)
  6. Hartley – Paternal grandfather, but not further split out

Here are those Clusters on my family tree:

  • I know least about the Clarke line, yet this seems split out to the two parents of Clarke and Spratt
  • Cluster 6 is stuck probably because Hartley and Snell had 13 children and I have a lot of 2nd cousin matches at AncestryDNA
  • Cluster 2 appears to be split between three great-grandparents on my maternal side. I’m not sure why. I have some other Rathfelder cousins, but they tested at MyHeritage and FTDNA.

Some Walk Back Analysis

This shows what happened between matches of 50 to 40 cM when my clusters went from three to six.

  • My mother’s Rathfelder Cluster split into her maternal grandparents of Lentz and Nicholson
  • My paternal grandfather’s Cluster got stuck and was not further divided
  • My paternal grandmother’s Cluster seemed to skip a generation and form two clusters further out.

As my Clarke and Spratt Lines are brick walls, I would like to look at them. I am quite sure of Cluster 5. My common ancestors with two of the people in this Cluster are Thomas Clarke and Jane Spratt. That being the case, I could have put the Cluster 5 up a generation at Ancestor #11.

The four matches in Cluster 4 are all just above 40 cM, so they didn’t appear in the 50 cM analysis.

Here are Clusters 4 and 5. There are a few connections between these two Clusters. I interpreted that to mean that Cluster 4 is the ancestor of Cluster 5. Here is my modified summary:

A 35 cM Threshold Results in 10 Clusters

It’s a free program, so I can play around with it:

10 is still pretty close to 8, so let’s see what we have for Clusters:

  1. Nicholson
  2. Lentz
  3. Frazer
  4. Clarke/Spratt
  5. Snell or Colonial MA?
  6. Snell/Bradford – this was a larger cluster in my previous run
  7. Parker Nantucket?
  8. McMaster Ireland?
  9. Hartley English?
  10. Snell or Colonial MA?

I’m not sure that this is any clearer than the previous Cluster of 6. Some of my matches that were previously in clusters fell out in this analysis.

35 cM Cluster Analysis

For the 10 35 cM Clusters, it would be nice if I were able to trace where they came from. I had a question on Cluster 5. However, it is still as good as it can be right now. There are only three in this cluster. They have no usable trees and they are shown matching Hartley’s in my 2nd cousin large Cluster.

On Cluster #7, I don’t agree with the way the program drew up the Cluster, so I would rather ignore that Cluster. Half of the Cluster seems to match Cluster 6 (Massachusetts Colonial) and half seems to match Cluster 8 (Irish ancestors). Cluster 9 is difficult as there are only three in the Cluster. One tree has English ancestors, but not all are English.

A 30 cM Match LImit Gives Me 16 Clusters

So by accident, I have come upon 16 clusters. In a perfect World, this would represent my 16 2nd great-grandparents. I have already shown that theoretical perfect numbers are not showing up in my case, so I don’t see a lot of purpose in getting a perfect 4, 8 and 16 clusters.

Here I have pointed out my maternal side. They only match with the first two Clusters. That means that the following 14 Clusters appear to be paternal.  The largest Cluster is #6. That is the one with a lot of my second cousins.

Here are my guesses for these 16 Clusters:

Had this previously as possibly Hartley English due to someone with a Heaton in their ancestry. Heaton is a name that was in the area where my Hartley ancestors came from. I had that one of my Hartley ancestors possibly married a Heaton. However, I had this wife of dying before they had children. Based on others in the group I would go back to saying that this is probably a Colonial Massachusetts Cluster

Cluster 2

I would interesting in knowing about Cluster 2. One of the matches in this Cluster was part of a New Ancestor Discovery at Ancestry that I never figured out. One match has a tree, so I could try building that out. My guess is that this Cluster is along the lines of my Irish ancestors.

I don’t have a lot of hope in figuring out this line, but I’ll give it a shot:

John McLean goes back to Ireland, so that is where I was trying to get. Going out further, I get this:

The trees are going back to Scotland on many lines. I tend to put some of these lines on the Clarke/Spratt as I don’t know much about those lines except that they were from Ireland.

Back to the guesses:

  1. Snell and before Massachusetts Colonial
  2. Clarke or Spratt Ireland
  3. English Hartley ancestors?
  4. One match correlates to Cluster 7 (Hartley 2nd cousins) but one match maps to Frazer by Visual Phasing, so say Frazer side
  5. Possibly Spratt
  6. Hartley side by shared matches
  7. Snell/Bradford based on one match with common ancestor
  8. Isaac Parker/Prudence Hatch (1778)
  9. Correlated with Cluster 11;

A Cluster 9 Tree

One of the Cluster 9 matches has a tree:

I have come up with many of these names before, but the name of Reed sounds familiar. Here is the detail on Alexander Reed:

Here is Hastings:

Here is the Reid I have:

Apparently William Wynn Fraser marries a Rachel Reid. My guess was that Reid was her married name. However, this family lived in Kenilsworth, Ontario:

I’m not sure if the Reid and Reed families are the same or whether there is any connection with my family. A search for Alexander Reid/Reed shows that there were many by that name living in Ontario.

Cluster 14

I joined the Shared Cluster Facebook Group. It looks like this Cluster is actually more than one Cluster.

Because my Mom, her niece and two grand-nieces are in this Cluster, it formed a super Cluster. I’ll call them 14a, 14b and 14c.

  • 14a Nicholson
  • 14b Rathfelder
  • 14c Lentz

Rather than look at each Cluster in detail, here is a summary:

I skipped a few Clusters. This exercise reinforces my thought that getting the exact 16 clusters for 16 2nd great-grandparents is not important. I had 16 Clusters but only 2 were maternal. That means that 14 were paternal and far in excess of the 8 paternal great-grandparents expected. Cluster 16 was maternal and most likely my maternal grandfather’s side. I haven’t placed this group yet. They seem to go back to a German Colony in Russia which was a long way from my grandfather’s family’s German Colony in Latvia. There was some connection to the two colonies, but I haven’t made the connection genealogically with my family.

25 cM Cutoff – 27 Clusters

This is 5 cM above the cutoff that Ancestry uses for 4th cousin. This is equivalent to a 4th great-grandparent common ancestor. I expect that a 25 cM cutoff should be equivalent to 4th cousin.

Here is the general look of the clusters:

I am in a vertical and horizontal group that splits the chart about equally in two. My mother and her close relatives form a lop-sided plus sign in the lower right side of the chart.

Clusters 1 and 2

These two clusters hold a lot of potential. These were previously Cluster 15 and I had assigned them to my ancestor Fanny McMaster. Now that Cluster 15 has broken into two, it appears that each cluster could represent one of Fanny’s Parents who were William McMaster and Margaret Frazer. I have recently learned a lot about this family through researching their move to Ontario from Ireland. Two of the people in the new Cluster 2 share my common ancestors William McMaster and Margaret Frazer. If I could identify Cluster 1, it should help to identify Cluster 2. I know that on of the matches in Cluster 1 has an unidentified Jane Frazer or Frazier in her tree. That means that Cluster 1 could be Frazer and Cluster 2 McMaster. This is important as I have at least three Frazers in my ancestry and at least two McMasters.

To accommodate this, I have lengthened my ancestor chart down to the 4th great-grandparent level:

This would be a theory to follow up on based on the fact that a match in Cluster 1 has a Frazer ancestor but no known McMaster ancestry.

Cluster 3

There are only three people in Cluster 3. Based on correspondence from someone with a private tree, our common ancestors are Simon Hathaway born 1711 and Hannah Clifton. That is two generations back from the extension I made on my cluster summary chart, so I’ll just add Cluster three to my Hathaway 4th great-grandparent.

Cluster 4

Cluster 4 brings into question my previous Parker Cluster. I had a match with at least one person in this cluster with a common ancestor going back our shared Parker ancestor in Nantucket. However, now there are two others in this clusters. One has an ancestor in County Roscommon where I had ancestors. Another person is from Australia. Now my match with the Parker ancestor also has an Irish ancestor. Perhaps this is the real match I should be looking at?

Cluster 5 – Spratt

In my 30 cM analysis Cluster 5 was also Spratt coincidentally. However, this new Cluster 5 goes back another generation and has split off the Clarke from the Spratt:

The new cluster 5 at the 25 cM threshold has moved from my 2nd great-grandparent level (Jane Spratt born  to my 3rd great-grandparent level. This is important as Spratt is my most severe brick wall.

Triangulating Spratt Trees in Cluster 5

My thought is that if I can find common ancestors in some of the trees represented by Cluster 5, I may find my common ancestors. First in order to not duplicate effort, I checked to see if I had an existing Spratt Tree. I did:

Unfortunately, I don’t remember who Ed, Deb and Helena are. I do note with interest a George Spratt who married a Jane McGuire. Could they be the parents of my Jane Spratt thought to be born about 1830? William and Christopher are also potential candidates.

My first match in Cluster 5 is Craig. I’ll add him to the tree:

Craig matches me with a healthy 33.9 cM of DNA. One question would be whether Christopher was married previous to marrying Margaret McKay.

Next in Cluster 5 is Deb. She is already on my chart and matches me with 34.1 cM of DNA. The last person in my Cluster 5 with a tree is Helena who again is already on my tree. She matches me at 25.2 cM.

This leads me to two theories:

  • I descend from Christopher Spratt and a first wife, or:
  • I descend from William Spratt born 1775 and then from one of his sons

In now see Ed and match him by 44.8 cM.

Here is another Cluster 5 Tree:

I’ll call this person Shar. She must be on the Margery Spratt Line:

The tree is now shaping up with DNA matches. Shar’s tree ended with Jane, but I assumed it was the same Jane Hayes that was in Helena’s tree. The good news is that I have the start of a good Spratt DNA project. The bad news is, I’m not much closer to knowing where Jane came from. It’s interesting how clearly this Cluster points to this genealogy, yet I don’t have the specifics. I’m slowly getting closer to the answer.

Clusters 6-9 – Irish, But Which Families?

I’ll start with Cluster 9 as Gladys is in that Cluster. I manage her DNA:

From what I can tell, James at the top married his cousin Violet Frazer. I could safely assign this Cluster to George W Frazer as Gladys has no known McMaster ancestry. I would like to go back at least another generation, but at this time, I can’t match up the genealogy of my other matches in this Cluster.

I don’t have a good guess for the other clusters other than possibly on the Clarke side.

Cluster 11 – Schwechheimer

Through hard work and diligence, I came up with a common ancestor for one of my three matches in Cluster 11:

However, this gets confusing. Rosine Schwechheim, my ancestor married a Gangnus. Also Rosine’s mother was a Gangnus. Technically, the common ancestor would be further out, but it is safe to say that the line on my side went through Rosine Schwechheimer.

Cluster 13 – Clarke

I know that I have a Clarke/Spratt common ancestor with two matches in this Cluster. I see another match with a person in this cluster but Patricia has a private tree. She has uploaded to Gedmatch:

Cluster 14 – Snell?

There are only three in this Cluster. One match has a tree that goes to Hannah Snell. She is probably the granddaughter of my ancestor Samuel Snell born 1708. I’ll stick this Cluster with a later Snell ancestor because I don’t want to extend my list too far:

This Anthony is Samuel’s grandson, so technically, I should have gone back another generation.

Cluster 15 Hartley English Side

This is a side I am interested in if it is Hartley English. There are three in the Cluster. I have looked at one tree with no luck. Perhaps looking at a second tree will help. The matchup seemed like it should be on Mark’s maternal side:

Here is the tree from the other person in Cluster 15:

Cluster 16 has only three also. The one person in Cluster 15 without a tree had a connection to Cluster 16.

Clusters 17 and 18

Cluster 17 is picking up in size which may mean my Snell side which has the Massachusetts background. I can’t find many good trees in this Cluster. Cluster 18 is large. Despite the size, I couldn’t find common ancestors and Ancestry didn’t suggest any.

Cluster 19

This is the Cluster I am in as well as my siblings, close relatives and second cousins. Two matches in the group have the common ancestors Snell and Bradford. One match has Greenwood Hartley and Ann Emmet. That means that this Cluster should be two Clusters.

These show in the same Cluster due to all my close relatives in this Cluster. I would split Cluster 19 like this:

The grey horizontally highlighted row is the Greenwood Hartley match. This is an important distinction for me as one side represents my English Hartley side and the other side represents my Colonial Massachusetts Snell side.

Clusters 20-27

  • 20 – probably MA Colonial
  • 21 – probably Irish
  • 22 – probably maternal grandfather
  • 23 – maternal grandfather. Some match my maternal cousin but not my mother, so that seems odd.
  • 24 – more maternal grandfather
  • 25 – The is a compound cluster. 25a is Lentz. 25b is Rathfelder. This was previously 14a, b, and c so the Nicholson cluster broke off this below
  • 26 – Nicholson
  • 27 – probably Irish

Summary of the 25 cM Clusters

Some splitting out of known clusters are interesting as they suggest descent from a specific older ancestor. This was the case with my ancestor Fanny McMaster where I was able to split out matches between her parents William McMaster and Margaret Frazer. Where I didn’t know the previous cluster, when these were split out it just split out to other clusters that I didn’t know.

The Parker Cluster was confusing. I had a common ancestor for two of the matches, but two other matches seemed to indicate that they didn’t have the same common matches. This could be the case where they match each other on a different line.

When I put the clusters into my summary chart, I am putting them in vertically. However, it is important to check vertically also to make sure the clusters are being picked up. I also looked into some genealogy. I filled out a share DNA Spratt tree. I don’t know where I fit in this tree, but I am all the more certain that I do fit into this particular tree, so that narrows down where I should be looking for genealogical clues.

It seems I need a better way of presenting the results of the clusters. Right now the results are very spread out do to the increasing numbers of ancestors. It would be possible to collapse these results to include only the ancestors with clusters, but that would omit all the ancestors that I don’t have clusters for.

20 cM – 50 Clusters

At the risk of making this a marathon Blog, I’ll look at my 50 Clusters down to 20 cM. This is the matching limit for AncestryDNA. Apparently this program can take the level lower, but the shared matching limit will still be at 20 cM. I expect some more of the same of what I found out above.

I see a problem already with Cluster 1. All the levels are below 25 cM. That makes it difficult to place this Cluster. One person in the Cluster has a tree of 5:

It may be possible to build this out, but it would be a low priority for me to do this right now. I don’t see this person on my mother’s match list, so I suspect this is a paternal match.

Cluster 2 has only four in it. Two are between 25 and 30 cM, but they did not form a Cluster under my 25 cM analysis.

Cluster 3 matches are all under 25 cM, but match my mother.

Clusters 6 and 7

The program split 6 and 7 strangely. Two of my sisters are in #6 and one in #7. My son is in Cluster 6 and my daughter in Cluster 7. What is more important is the splitting of Cluster 7:

This splitting is important to me as I am trying to find English Hartley ancestors who don’t have Snell ancestry. The larger part of Cluster 7 has Snell ancestry (outlined in green).

More Detail on Cluster 7b

There are 8 people in Cluster 7b. It also looks like 7b forms two clusters. My guess is that this represents Hartley and Emmet:

The first match in the Cluster is Kristen. I think we have been in touch, but I can’t find any Ancestry messages. Here is the connection:

The second on the list is Mark. I’ve been building out the part of his tree where I think there is a possibility we might match up. That is his maternal grandfather’s side:

Lucy Priestly died in Hull, but was born in Halifax which is a bit closer to where my ancestors lived.

Lucy’s mother Sarah Ann Wilson was the one born in the Halifax area. Here is Sarah Ann’s baptismal record from 1825:

My guess is that her mother could have been Susannah? Her father was a bookbinder. I didn’t make a genealogical connection between myself and Mark yet, but I will likely come back to his tree.

The next match is Arlene. She doesn’t have a tree, but I sent her a message.

The next match with Howard appears to be important:

Even though Howard doesn’t have a tree, It appears that he may descend from my Pilling ancestor:

I guess I hadn’t realized that two separate Wilkinson lines descended from Pilling. At any rate, my guess is that Howard descends from one of these two lines. I believe that on the right, next to Richard should be a Paul also. I don’t match Paul but some of my relatives do. As far as I know, the David Watson above isn’t closely related to William Wilkinson.

Another question I have for the above Cluster is whether Bessey should be included in the Cluster. I would guess not, because I have that Bessey’s ancestors are Snell and Bradford. Also Bessey is linked to Clusters 12 and 15.

A further point to consider is that Arlene and Howard appear to be in both sub-clusters above. Assuming that Howard is a Pilling match, that may mean that both sub-clusters are Pilling clusters. That could mean that one sub-cluster is more for Mary Pilling’s mother and the other for Mary’s father. However, that is just a guess. Mary’s parents were Greenwood Pilling and Nancy Shackleton:

Dave, Bruce, Mark and Michael

Dave and Michael have trees. I’ve been working on these trees, but haven’t found the connection yet. However, I see connections in the Greenwood surname. I haven’t found a Greenwood surname in my ancestry, but it may be there. Mary Pilling’s father was Greenwood Pilling. Mary’s son was Greenwood Pilling. Many of these genealogies seem to have West Riding connections but not to bordering Lancashire where my ancestors lived.

Summary and Conclusions

This could be a good place to stop. I want to continue this Blog as I have come up with a better way to present my results.

  • Walking the clusters back is a good way to look at your clusters.
  • This is a way of organizing your cluster, making sure you have contacted the important matches and making sure the clusters are placed in the right area of your genealogy.
  • I started my clusters with a 50 cM limit. From there I went to a 40 cM limit and went down by 5 cM increments until I got to 20 cM.
  • The clusters did a good job at identifying my most recent brick wall, Jane Spratt born about 1830 in Ireland. From there I was able to place Jane in the correct Spratt tree, though I could not tell for sure which branch she was from. This could further direct genealogical research.
  • I tried to connect other genealogies from other clusters with limited success.
  • I came to the realization through this analysis that I have DNA matches with two separate Wilkinson lines descending from my ancestor Mary Pilling.
  • As I walked these clusters back, some split cleanly into two parental clusters, some didn’t. Some unknown clusters split into further unknown cluster as might be expected.

To be continued….

2 Replies to “Walking My Clusters Back – Jim Bartlett Method”

  1. Joel,
    A great blog post! Thanks – here is Clustering in Action! And in a relatively short time giving you new, useful, information.
    I too have found that the Clusters often don’t obviously separate on parents; but often it becomes clearer when new Clusters with higher cousinship Matches come into play.
    Some points:
    The 20cM cutoff at AncestryDNA includes many 5C and some 6C (and not all of our 4C). Random DNA means cousins with wide ranges of cM.
    You can “Export” your .txt file to Excel (see the TAB) – very handy
    I impute the unknown Matches in a Cluster to the same ancestor as the known Matches in the Cluster – and follow them, also, to the next Clusters. I follow them by typing a Tag at the beginning of the Note field – then use the “Upload Notes” TAB to update the Notes box at AncestryDNA, where it is then available and used in the next Cluster run.
    It’s OK to delete rows (and/or columns) in any Cluster spreadsheet – Jonathan confirms that it doesn’t change anything – and it might clear up the view somewhat.
    Using a download from DNAGedcom Client of my 5,713 Matches at FTDNA (with all of their shared matches), I was able to use that file as the “download” file at Shared Clustering – at 6cM threshold I got 370 Clusters which included some 9th cousin Matches. And the approx 5% false positives showed up as singletons (no shares)

    Looking forward to your next Clustering post. Jim Bartlett

  2. Nice writeup.

    Now that you’ve gone down to 20 cM, try one more cluster diagram down to 6 cM. You won’t get more clusters for adding the matches below 20 cM, but you’ll get more matches added to the existing clusters. More matches sometimes come with more trees. The trees from matches under 20 cM can help a lot when trying to understand the clusters generated from matches over 20 cM.

Leave a Reply

Your email address will not be published. Required fields are marked *