BigY Update On R1a Frazers

The Frazers originating from North Roscommon, Ireland are R1a in YDNA terms. That makes them a bit of an oddball compared to other Frazers. Most other Frazers spell their name Fraser and are R1b. Our Frazer branch is L664 under R1a. That group of people lived around the North Sea according to the L664 YDNA Project administrator.

That means that at some time our Frazer ancestors probably moved from the Netherlands or Germany up to Denmark or Norway and then over to Scotland. Or they may have gone directly to Scotland or up through the England. We don’t know. We do know that this probably happened before the time when surnames were used. Once in the area of present day Scotland, they mixed with the earlier Britains who were R1b. Perhaps this is the area where they lived when they took on the Fraser/Frazer name:

The map above shows Fraser, Chisolm, Grant and Stewart. All these names have been found to be related to Frazer by YDNA. Hayes is also related by YDNA, but I think Hayes may actually be a Grant around the year 1600 or after. Here is a closeup of the Fraser Lands in 1587, showing proximity to the Chisolm and Grant Lands:

Stewart Update

In my previous Blog on BigY, I had drawn a STR tree without Stewart. Here is the new one with him included:

Stewart/Stuart is in red above. He is important, because his STR signature is the same as the common ancestor for Grant, Hayes and Stewart. If I had room, I would draw another line to the bottom of the page with Stewart showing no STR changes. Here is Stewart added to the SNP Tree:

The Stewart on the chart has expressed interest in BigY testing, so there should be more updates to come.

Grant Update

I was pleasantly surprised to see the results of a recent Grant BigY test. In the SNP tree above, the bolded names have taken the BigY, so I will need to update Grant. In my STR tree, I had two Grants. The one that took the BigY test had his most distant ancestor as:

James GRANT “of Carron”, 1728 – 1790

Here is my STR update for Grant of Carron. All I did was make it more clear which Grant was which:

Grant of carron BigY

The Grant BigY test threw me off a bit as the results showed that he was one SNP away from Paul and Jonathan. Usually, I am looking for a zero SNP difference. Grant of Carron shows a L1012 SNP that Paul and Jonathan do not have. Unfortunately, I don’t know why that is the case. Also I don’t know much about the L1012 SNP. It could be that the L1012 SNP was tested in error, or that Paul and Jonathan should have that SNP or that the L1012 SNP is branching below the green box where I have Grant on my SNP tree. The last option does not seem likely as I don’t have named SNPs in the green box, so there shouldn’t be named SNPs below the box.

Grant matched Paul and Jonathan on Variant 23614618. However, Hayes did not match on that variant. That could lead to this tree:

This change pointed out an earlier mistake I had made. I had 23619535 in the Archibald Line and in the orange box. I should have had 23614618 in the orange box. At any rate, that variant is now moved up to the Frazer/Grant mustard colored box. Another option would have been to move 23614618  to the green box of Hayes, Grant and Stewart. This would be assuming that Hayes should have been positive for 23614618 but had a poor test result. All these trees are preliminary until I wait for the R1a Administrators to come up with a more official tree. Another option would be to wait for the YFull analysis. However, that is dependent upon testers using their service. At any rate, it is good to have fewer SNPs in the orange box as we are bumping up against a likely Frazer date of 1690. The final change in the SNP Tree has to do with Chisolm. We don’t have a BigY for this YDNA relative. That means I don’t know if Chisolm goes with the mustard box or the orange one. I’ll leave him with the orange right now as there are so many SNPs there.

Summary and What’s Next

  • I have added Stewart to my SNP Tree and STR Tree
  • A BigY Test for Grant pointed out a mistake I made earlier for one of the variants on my proposed tree
  • The Grant BigY results may result in a small node where the Grants and Frazers had a common ancestor.
  • Once the R1a and L664 administrators are done with their analysis, I would like to see three or four levels below the official level of R-YP432 for Frazer. These would include branching for Hayes and Grant also.
  • I’m a bit unsure of Patton. He tested positive for R-YP5515 but is missing some of the other variants that are seen in other BigY results. However, that would not make a difference in the overall structure of the SNP Tree.
  • I am looking forward to a BigY test for the Stewart/Stuart in the group.

More On Early Butlers In the US

In my previous Blog on the subject, I noted how two Cincinnati Butler families were connected by DNA. These were the George Butler family and the Edward Butler family. Edward Butler is an ancestor of my wife. Since then, with the help of Peter Butler, I have expanded the George Butler tree a little. Now it looks like this side by side with the Edward Butler tree.

Previously, I was hoping that Edward and George were brothers and that they would have the same fathers. However, that now does not look likely. However, it could be that George and Edward were first cousins. If that is the case, that would make Lorraine, Richard and Virginia 4th cousins to Pat and 4th cousins twice removed to Uncle Naffy.

DNA Connections Between the George and Edward Butler Families

The tree above is pared down to include only those in lines that have had their DNA tested. Uncle Naffy tested at FTDNA and uploaded his results to Gedmatch. Lorraine, Richard and Virginia are also at Gedmatch, but Uncle Naffy matches only Richard and Lorraine. Here is the Uncle Naffy’s match with Richard which is the same as his match to Lorraine.

Assuming Richard and Uncle Naffy are 4th cousins twice removed, this was a fortunate match as the chance of them matching is only a little over 10%.

By comparing Lorraine, Richard and Virginia to each other and with the help of matches with Uncle Naffy and a paternal second cousin, I was able to map out the DNA for these three siblings:

Here I presumed that the Uncle Naffy match was on the Butler DNA side of my in-law’s family. That meant that the paternal cousin’s match below had to be Kerivan as that is the only other paternal grandparent my in-law’s have. Further, the paternal cousin Gaby only matched Lorraine on the left side of the pink segment, so that meant Gaby and Lorraine had to match on their Butler side DNA.

Lorraine and Virginia match Patty

Lorraine, Virginia and Patty all tested at AncestryDNA and match each other. Lorraine and Patty are predicted 4th cousins at AncestryDNA. Unfortunately, Ancestry doesn’t show on what Chromosome the match is like Gedmatch does. Virginia and Pat also show as 4th cousins. Further Pat, Lorraine and Virginia have shared matches with those on the blue line of the tree above. All of this confirms the DNA connection between the George and Edward Butler families.

Life For the Butlers in Civil War Era Cincinnati

I would not like to have lived in Cincinnati around the time of the Civil War. For one thing, there was a war going on. For another thing Cholera outbreaks were rampant. Here was a Mrs. Butler that died of cholera in 1866:

This could have been George’s first wife Mary Whitty – except the address seems off. At this time, people didn’t understand that cholera was the result of drinking contaminated water. At this time there was a George Butler, laborer listed in the Cincinnati Directories as living at 890 East Front Street. Perhaps around here:

The 17 on this 1869 map is for Ward 17 where George Butler lived in 1860 and 1870. My research friend Peter was able to obtain a copy of George Butler’s second marriage to Margaret Sinnott.

I have the greatest sympathy for the transcriber who wrote down Surwott for Margaret’s maiden name. The marriage was on November 11, 1866 at All Saints Roman Catholic Church in the Fulton area of Cincinnati. I’m not sure where Fulton is, but there is a Fulton Avenue in the map above. Apparently Fulton was a Town in the area that got incorporated into Cincinnati around the 1840’s.

Edward Butler and Family

According to the 1860 Census, Edward also lived in Ward 17. The Cincinnati Directory of 1860 lists a laborer named Edward Butler living at the c. (corner?) of Goodloe and Leatherbury. I was interested in this location because during the same year there was a listing for George who was also a laborer b. (boards?) Reed and Leatherbury. Here is the 1869 Ward 17 map again:

Here Leatherbury is spelled Litherbury for some reason. The Street above “Continued” is East Front. The Street below “Continued” may be Goodloe. For some reason, it gives me pleasure to figure out where ancestors lived. In this case, my wife’s ancestor Edward Butler and his likely cousin George Butler.

Edward: 17th Ward to the 3rd ward

For some reason, Edward Butler and family moved to the 3rd Ward where they are listed in the 1870 Census.  When I was looking at the Ward 3 map I found the All Saint’s Church. It looks like the Church also had a school.

It is near the T and L of LYTLE’S in the bottom right of the map above. The Church appears to be in Ward 1 and Whittaker’s in Ward 3. Here is how Wards 1, 3 and 17 connect:

Edward Butlers in the Cincinnati directory

There appear to be more than one Edward Butler in Cincinnati at the time. Here are some of my listings from 1859 to 1869:

The most consistent listing is for 66 Avery, but I don’t think that is our Edward. I mentioned that I liked the 1860 listing of Goodloe and Leatherbury. Then in  1862 928 R. Front looks good. After that, in 1865, Front and Whitaker looks good. That location is on the Ward 3 Map above. That listing matches up with his Civil War service that I have elsewhere. Here are some more listings from 1870 to 1876:

I had forgotten that I had ruled out Avery in the past as I have that Edward had a son George who was believed to be born in Chicago in 1873. Here the 1870 listing of e. 3rd is a possibility. The southern half of East 3rd is in Ward 3 which is consistent with Edward Butler’s 1870 Census listing. High Street in the Ward 3 map above is also 3rd.

The takeaway story could be that Edward lived near his cousin George in Ward 17 when he first moved from St. John, Nova Scotia to Cincinnati. He moved to nearby Ward 3 to work for the Navy at the end of the Civil War. He stayed in Ward 3 until moving out of Cincinnati. This move was probably around 1870 as his son George was believed to be born in Chicago in June 1873.

Edward Butler Family 1880 Census

That leads me to the 18880 Census. I had found this Census a while ago and have gone back and forth as to whether it is my wife’s Edward Butler family or not. There is a lot right and a lot wrong with the Census record.

I’ll look at each thing that appears wrong:

  • The mother is listed as Ellen rather than Mary. However, I have her as Mary E. Crowley. Mary’s mother was Ellen which is likely her middle name.
  • Ellen is listed as widowed which I don’t believe she is. She is also listed as what appears to be wife, though possibly transcribed as ‘self’.
  • Ellen is listed as being born in Illinois. However, her parents are shown as being born in Ireland which would be correct.
  • Edward Butler is not listed. Perhaps he is traveling or working away from home?
  • I have no record of Cornelius but he may have married in the area or died. He would have been born around 1871, so this agrees with the apparent early move date to Illinois for the family.
  • Henry born in 1875 could be Edward Henry
  • I have no other record of John being born in 1879, so he may have died young or stayed in the area.

This means that I am convincing myself that this is a valid document. I notice that other related Butler researchers have used this Census as a reference in their Family Trees. This does not place the family in Chicago, but at least they are in Illinois.

Milton, Illinois

Here is a Google map of Milton:

Milton is a lot closer to St. Louis than it is to Chicago. According to Wikipedia, Milton is in Pike County.

1920 Census: A nail in the coffin for Milton

On the other hand, there is the 1920 Census. This shows that we had the wrong family in Milton in 1880:

Here we have the same Ellen, Cornelius, George and John. However, this cannot be my wife’s family as the mother Mary (Ellen?) was dead by now and George and Edward Henry were living in Massachusetts. However, that is helpful as there is no need to further pursue Milton, Illinois. We still need to find the family in 1880.

where was the edward butler family in 1880?

I have been looking for census records for quite some time. I have basically lost track of the family between 1870 and 1890. I have the Cincinnati Census of 1870. I have indirect evidence that they were in Chicago for the birth of George Butler in 1873 and Edward Butler in 1875, but no direct records of the family being in Chicago. Here  is the Cincinnati Edward Butler family in 1870.

Here is Edward Butler 20 years later in Massachusetts.

This is from FamilySearch. The top says Newton. However, the bottom of the handwritten schedule says Newtonville. The transcription on the bottom says Watertown. All these places are very close.

The search continues.

 

Comparing Frazer Big Y Tree With STR Trees

Recently, I have written some Blogs on Frazer BigY results. Here is the most recent BigY Blog. My cousin Paul’s results are in and Jonathan’s results are in. These two people represent the major Frazer lines from North Roscommon, Ireland in the early 1700’s. Maurice Gleeson was one of the first people to compare BigY results and STR results. His video on the subject is here:

Building a Family Tree with SNPs, STRs, & Named People (Maurice Gleeson)

BigY Frazer Results: Looking Into the Future

I have built a tree based on the initial two Frazer BigY results. I call this looking into the future as the variants shown as just numbers below, will be the future SNPs which people will test to find out what branch of the YDNA tree they are in. Here is the SNP tree I have so far:

This is a compressed zig zag tree to save space. The tree is with the reference of the Frazers as those are the tests I’m familiar with. This doesn’t mean that Frazer descended from Hayes who descended from Patton. Patton and Hayes should have their own branches descending down also. This tree means that at the Hayes level, Frazer and Hayes shared the same ancestor (and variants). Likewise, at the R-YP5515 level, Patton, Hayes and Frazer all shared the same common ancestor in the quite distant past.

STR Trees: What About the Grants?

My distant cousin on the James Line of the Frazers wondered what happened to the Grants after we did the BigY test. She wondered because the Grant name was the one that came up quite consistently as a Frazer STR match. Well, I don’t think that the Grants that matched Jonathan have taken the BigY test, so they didn’t show up there. However, the closest non-Frazer match in the BigY test was a Hayes. Here is a first shot at a Frazer/Grant/Hayes STR Tree with dates:

The idea behind making a STR Tree is to find the common STR values. These become the ancestral STRs at the top of the tree. Then find the fewest changes going down to create a tree. Finally, make a guess as to the dates. At the 67 STR level, I think there is a chance of a new STR every 150 years or so. However, this varies. Also, as in the SNP tree above, I know that the common ancestor between Paul and Jonathan is about 260 years ago. This STR tree should correspond roughly with the SNP Tree up to where the Hayes come into the picture. That means the 700 year guess for my STR tree corresponds with the SNP tree of 260-760 years plus 348-900 years or 608-1660 years. What the second tree does is to help calibrate the dates. As the SNPs are more set in stone than the STRs, the SNP tree also sets the structure for the STR tree. The STR tree has to follow the SNP tree.

The STR tree also points out that Paul and Jonathan should be equally related to Grant1, Hayes and Grant2. That is because, if the tree is drawn correctly, they all have the same Frazer/Grant/Hayes ancestor. This is despite the fact that Grant1, Hayes and Grant2 have different genetic distances to Paul and Jonathan. This is also assuming that they all have about the same number of generations to the common ancestor.

The other thing that the STR tree shows is that Hayes should be more closely related to Grant than the Frazer family.

On the Chisolm Trail

Now that I see that the SNP tree supported the Frazer/Grant/Hayes STR tree, I will add Chisolm to the STR Tree. Two names that are on Paul and Jonathan’s STR match list are Chisolm and Stuart. I had looked at Stuart before and the Stuart STRs seem to fall in line with Grant and Hayes. However, after my first look at the Chisolm STRs, it appears that Chisolm is more aligned with the Frazers.

Chisolm STRs

Here are some of the Chisolm STRs at the Chisolm YDNA Project page:

The first line is the Chisolm mode. The mode is the most commonly occurring STR value. The next four lines are R1a Chisolms. The Chisolm that matches the Frazers is on the bottom line. Note that any of the highlighted STRs indicate a variation from the mode. That means that this Chisolm is not a very good match to the other Chisolms. Here are some of the Chisum/Chisolm STRs on the bottom row compared to Frazers, Grants and a Stuart:

Most notably, Chisum is aligning with Frazer at position 389b = 30 and 534 = 14 rather than with Grant, Hayes or Stuart. This appears to be leaving 447 – 24 as a signature Frazer STR.

New STR Tree with chisolm

This is a bit of odds and sods tree with four different surnames.

Paul/Chisolm Parallel mutation

Paul and Chisolm have a parrallel mutation at 576=19. This has the effect of the STR test making it look like Paul is a closer match to Chisolm than he really is. Chisolm shows up as Paul’s closest STR match after Paul’s match with his cousin Jonathan. FTDNA show that both Paul and Chisolm have a value of 19 for STR 576. However, assuming the STR Tree is correct, Paul and Chisolm both developed that STR mutation independently. Regardless, if my STR tree is correct, then Chisolm is a closer match to Frazer than to either Grant or Hayes. I had not expected this result.

Where Do We Go From Here?

Ideally, a BigY test for Grant and Chisolm would sort things out.

Based on the STR tree, I have put in where I think Grant and Chisolm would be on the SNP tree. If Chisolm were to take the BigY test, then it would be clear which of the orange variants are Frazer variants and not Chisolm and which new variants are Chisolm and not Frazer. A BigY test by one of the Grants would also sort out the Grants and Hayes variants. By the way, a Stuart match STR match should be included with Hayes and Grant on the above SNP Tree.

Summary and Observations

  • In broad strokes a SNP change should happen about at the same rate that a 67 STR marker would happen. This means that a SNP tree should mimic a STR tree in both shape and the rough number of mutations of both STRs and SNPs.
  • A SNP tree should be the undisputed tree when comparing SNP trees and STR trees. This is because a SNP is a one-time event. A STR mutation may be a one time event, a back mutation or a parallel mutation.
  • Comparing SNP trees and STR trees can be helpful in calibrating dates of trees. A known common ancestor date is certainly helpful also.
  • When considering dates, it is important to know when the use of surnames became common practice. One reference I read for Scotland was that the date was the 16th century. That date is interesting as my STR tree guesses at a common ancestor for Chisolm and Frazer at about 1400 A.D.
  • The same reference says that in the Highlands and northern isles of Scotland surnames did not fully take root until the year 1800. If Hayes and Grant were from the Highlands, this could explain the different surnames.
  • This late date of adoption of surnames could explain why the surnames are not matching well with the YDNA testing. A late-adopted surname would not have time to build up a head of steam or a large amount of descendants.
  • I will be looking forward to FTDNA adopting the R-YP5515 SNP. FTDNA also needs at least two more levels of SNPs. One at the Hayes/Frazer level and one at the Frazer level.

Frazer Big Y Results: Archibald Line and James Line

I have previously written Blogs on my cousin Paul’s Big Y results here and here. Paul is my 2nd cousin once removed. He is from the Archibald Line. Archibald and James are believed to be two Frazer brothers living in North Roscommon in the early 1700’s. Just yesterday, Jonathan’s Big Y results came in. Jonathan is from the James line.

Paul is two steps below Hubert on the left and Jonathan is one step below Walter on the right hand side.

What is a Big Y?

The Big Y is an expensive YDNA test that looks at SNPs. SNPs are stable locations where mutation occur on the male Y Chromosome. These mutation happen around every 150 years. The could happen more quickly or more slowly, but 150 years would be an average. Like a laser beam, these SNP mutations make a map straight down the Frazer male line heading toward the distant past. The special feature of the Big Y is that it discovers new SNPs that have not been previously discovered. These newly discovered SNPs are helpful in verifying genealogical trees – especially when taken in tandem like we did with Paul and Jonathan.

In my previous Blog, I had looked at these SNPs for my cousin Paul and came up with a tree that looked like this:

FTDNA that does the Big Y testing has Paul as R-YP432. They don’t yet have listed YP5515 which YFull has. YFull is a service that looks a Big Y and similar results for a fee. Using that information, they create YDNA trees, date the connections, and do other things. Just yesterday I sent Paul’s Big Y results to YFull for analysis.

All the numbers in the green boxes above are SNPs. The numbers with no letters are SNP positions that haven’t been named yet. The bottom green box is for Paul. He has more unique SNPs that I didn’t include in the bottom box. I would expect that out of these SNPs, Paul will share some with Jonathan and that Jonathan and Paul would have their own unique SNPs that happened since the two branches split in the early 1700’s.

Let’s Compare Paul and Jonathan’s SNPs

According to FTDNA Paul and Jonathan share 36 Novel Variants. However, many of those shared between Paul and Jonathan are not uniquely shared. In other words they would be shared with Patton or especially Hayes above the Frazers. First, I’ll add in the SNPs that were only Paul’s before Jonathan’s results came in:

I compressed the tree above to save space. There is still a Patton block of SNPs and under that a Hayes block of SNPs. The orange SNPs under Hayes were Paul’s unique SNPs before Jonathan had his Big Y results. When I compare the 36 SNPs that Paul and Jonathan share, only six of those are in the orange block above. When I separate out Paul’s newly unique SNPs, I get the Archibald Line:

The brown box labelled Archibald Line is Paul’s version of the Archibald Line. If others were to do this test in the Archibald line, there would be some shared and some unique SNPs again. Those SNPs would represent the different branches in the Archibald Line. The orange box shows all the SNPs that are shared by the Frazers in the DNA Project. These SNPs represent the father of the Archibald and James Lines who was probably another Archibald. Note that Paul has 5 mutations since the lines split. That would be more than expected. If we use the average of 150 years, that would put the common Frazer ancestor at 750 years ago. As we believe that the common ancestor lived about 300 years ago, then there must have been a mutation in Paul’s line about every 52 years or every other generation. I am guessing that there will be fewer mutation on Jonathan’s James Line side.

Jonathan’s SNPs

I’m curious to see how these come out. Jonathan has 28 Novel Variants (the same number that Paul now has). From what I can tell, FTDNA calls the unnamed SNPs Novel Variants. Here is my spreadsheet showing the overlaps and unique SNPs between Jonathan and Paul:

Paul’s 5 unique SNPs are shown in blue. Jonathan’s 5 unique SNPs from Paul are shown in yellow. However, I have a note. The note is that Hayes shares 9510807 with Jonathan. Hayes is upstream from the Frazers SNPs. That means that Paul should have also had 9510807. That means that Jonathan has 4 unique SNPs compared to Paul.

Now For the Complete Frazer Y SNP Tree

I put the SNP that Jonathan had in common with Hayes up in the Hayes Block with an asterisk. That is the SNP that Paul should have had but didn’t test positive for.

A Problem With Dating the Frazer Common Ancestor

Let’s assume that the common Frazer Ancestor, the parent of Archibald and James was born in 1690. Let’s further assume that Paul and Jonathan were born in 1950. That leaves 260 years. I will double that for the two lines and divide by the total number of unique SNP which is 9. That gives me roughly 58 years per mutation. That seems to push down the rough estimate of 150 years per mutation quite a bit.

I do get a little consolation in the fact that if our genealogy is right, Paul is 8 generations from the Frazer common ancestor and Jonathan is 7 generations away. That means that Paul’s line had one more generation to form an extra SNP compared to Jonathan – which he apparently did.

Let’s assume that 150 years per mutation is correct. That would mean that the common Frazer ancestor would be 6-700 years ago. To me, this seems unlikely. We have two male Frazers living in North Roscommon in the early 1700’s. We also have a documented Frazer widow, believed to be the mother. Family tradition has the father of Archibald and James as an Archibald born around 1690. Also we have autosomal DNA matches between the Archibald and James Lines. These have not been proven to be linked to the Frazer common ancestor, but seem likely.

It figures that this Big Y test created additional questions! We will have to await more analysis from YFull and the R1a YDNA Project Administrators. Here is one more try at adding dates using the 58 years per mutation versus the 150 years per mutation:

Oddly enough, this makes me feel better. The reason is, that even with 150 years per SNP, I am getting up to 4200 years ago up at the YP432 Level. This is more than the 2800 years what YFull currently has for a most likely time to a common ancestor at YP432.

Summary

  • The Big Y test for Paul and Jonathan resulted in more unique Variants than expected for both Paul and Jonathan
  • Using average years per SNP mutation, this would push back the common ancestor for the James and Archibald lines quite a way into the past.
  • Future analysis may resolve this issue. YFull will be one company analyzing the Frazer Big Y test. I will also ask for advice from others.
  • There is one other Frazer from Canada who is expecting YDNA STR results. These results may also help
  • Once the James Line and Archibald Line SNPs are named and tests developed for those SNPs, male line Frazer descendants will be able to determine their Line by testing the new SNPs. Certain SNPs could also define sub-branches below the Archibald and James Lines.

 

Two Cincinnati Butler Families

One of my Butler genealogy breakthroughs happened with a DNA match between my father in law and someone I called Uncle Naffy. I wrote a Blog on that in 2015.

Prior to that breakthrough, I had trouble tracking my wife’s immigrant ancestor Edward Butler. Uncle Naffy was from St. John, New Brunswick and told me his great great grandmother was Mary A Butler. She was living in Cincinnati and moved to St. John. There she married. Armed with that information, I was able to find the marriage record between my wife’s ancestors, Edward Butler and Mary Crowley in St. John. The record was found in scrawly handwriting on a microfilm that was in the New England Historical and Genealogical Society Library in Boston.

This St. John/Cincinnati connection confirmed the research that I had done that had located Edward Butler and family in Cincinnati in the 1860 and 1870 censuses.

Uncle Naffy’s Great Great Grandmother Mary A Butler

Recently it occurred to me that it would be a good idea to create a tree for Mary A Butler to see if we could match up the two Cincinnati Butler families (George and Edward).

This was my first attempt. As I show later, the older children of George Butler would be from a second wife. My hope was that I would find that the George Butler above was the brother of my wife’s ancestor Edward Butler. One good thing is that I have that George Butler above, married Mary Whitty. Whitty is a less common name than Butler. A search for George Butler at Ancestry turned up this as a clue:

Here is a George Butler and Mary Whitty that gave birth to an Anne Butler on March 31st 1850 in the Parish of Ferns, Wexford. The good news is that the George Butler Family in 1860 in Cincinnati also had a daughter named Ann born about 1850 in Ireland. This is a good match.

A little more searching revealed a marriage between George “Butta” and Mary Whitty:

The transcriber saw Butta, but I can also see Butler there. I doubt that Butta is a very common name! As in the birth of Anne above, there is a Whitty and Hendricks as witness. The additional information is that they lived in Mountain Gate. I was curious as to where Mountain Gate is and was able to find a Mountaingate:

I have panned the map out a bit to show the relationship between Mountaingate and Mooncoin. They appear to be about 25 miles from each other. In one of my previous Blogs, I pointed out the my wife’s ancestor Edward Butler is listed as being from Wexford on one of his son’s death certificate.

Another Wife for George Butler?

There is also a tree at Ancestry that has Margaret Sinnett as George Butler’s wife. It appears to me that Mary Whitty died sometime between 1860 and 1870 and that George remarried.

Here is it clear that Mary must be from the first marriage as she was born before 1860 when Mary Whitty was still around. I could guess that Henry would be the son of Mary Whitty as there are 7-1/2 years between him and Rebecca. However, I cannot be sure just from the Censuses. So my basic take is like this:

I’m missing some children from George’s second marriage to Margaret Sinnett. I was having a hard time making this family come out right on the Ancestry Tree.

One last point about Margaret is that Pat has her mother Catherine as being from Killaspy, County Kilkenny. Here is a map showing an arrow where Killaspy is:

This was interesting to me because with the help of a Butler researcher in England, my wife’s Butler family has been located near Mooncoin on the top left of the map above. Mooncoin appears to be about 5 miles away from Killaspy.

Another Cincinnati Butler DNA Match

The previous image brings up another interesting point. My wife’s two Aunts have had their DNA tested at Ancestry. They both match Pat who descends from Rebecca Butler b. 1869 above. My father in law matches Uncle Naffy at Gedmatch. That makes a good case that George Butler is related to Edward Butler, my wife’s ancestor who also lived in Cincinnati.

Here is Rebecca Butler’s Certificate of Death showing her two parents.

This could be a case where the death record is not the best source of a birth date as Rebecca was shown as being 6 months old in the 1870 Census and born in October. So the day and month only are probably right in the death certificate.

Here is how my wife’s Aunt Lorraine matches Pat:

Pat matches my wife’s Aunt Virginia a little less: 29.9 cM across two segments.

Another Shared DNA Match At Ancestry

Pat and my wife’s two Aunts also have two shared DNA matches. These matches have this tree:

I’m not sure if it was Donna that took the AncestryDNA test. It appears that more than one in the family did. At any rate, the match is much higher. It is now at 183 cM across 9 segments. The average amount of DNA shared between a 2nd cousin once removed is 129 cM.

In comparison, here is Lorraine and Virginia’s tree next to the previous tree:

In the above scenario, Lorraine, Richard and Virginia would be 2nd cousins once removed to Donna and family. I’m not sure if Cornelias and John in Donna’s tree are right. Also, Donna’s tree has Henry, where I have Edward Henry. They are apparently the same person.

So Where Does That Leave the Butlers?

Here is a partially combined tree:

I say partially combined, because I haven’t connected the orange with the green side by genealogical research. I slimmed the tree down to just include the direct lines of those who have had their DNA tested. Uncle Naffy matches Richard at Gedmatch. Pat and Donna’s lines have not uploaded their results to Gedmatch. Pat and Donna’s line have shared DNA matches at AncestryDNA where they tested. Pat also matches Lorraine and Virginia at AncestryDNA. In addition, Donna matches Lorraine and Virginia. Richard and Uncle Naffy have tested at FTDNA, so unless Donna’s line and Pat upload to Gedmatch, those matches won’t be made known.

Summary and Conclusions

  • The George and Edward Butler families are linked by new and old world locations and DNA
  • More work is needed to link the George and Edward Butler families by paper research.

 

Chasing Down Some Massachusetts Colonial DNA

Recently I was contacted by someone I knew in high school who said, “Who knew we were related? Skot had tested his DNA at Ancestry and had found me as a Shared Ancestor Hint. Ancestry compares your trees and if there is a match in ancestors and a match in DNA you are put on a list.

Shared Hathaway Ancestors

Skot’s and my genealogy research both lead to Simon Hathaway and Hannah Clifton.

I have the above chart to my grandfather and Skot’s grandmother. The chart says that Skot and I are seventh cousins. Simon and Hannah were born in the early 1700’s and married in Rochester, Massachusetts. This is interesting as Skot and I both grew up in Rochester.

Does Skot and My Shared  DNA Point to Hathaway and Clifton?

AncestryDNA doesn’t show that the DNA you share is the same DNA of your shared ancestor. It sort of implies that but doesn’t prove that. To prove that, we need to use triangulation and have chromosome browser. I asked Skot to upload his DNA results to Gedmatch where we could compare the DNA results. Here is what my match with Skot looks like at Gedmatch.com:

This shows that we match on Chromosome 10. I have a paternal phased kit at Gedmatch, and Skot also matched me there. That match shows that we match on my father’s side who had the Hathaway ancestors, so that is good.

Further, I have mapped my Chromosome 10 and it shows we match in an area where I got my DNA from my Hartley grandparent and not my Frazer grandparent whose parents were from Ireland. That is also a good sign:

This map shows me as J on the fourth bar. The Hartley is in orange and for me it goes from position 32M to 114M. According to Gedmatch, I match Skot from 68M to 77M, so that is well within my orange Hartley grandfather DNA area.

Triangulation of DNA

Triangulation of DNA is when A matches B, B matches C and A matches C. This is fairly easy to do. Once this triangulation occurs, it indicates a common ancestor. It is more difficult to find the common ancestor of that triangulation for various reasons. The next thing I look at is my sister Lori’s spreadsheet of matches. These matches have tested at various places and uploaded their results to Gedmatch.com. I’m looking at Lori’s matches because she matches Skot also, and because her test is more recent, so I have more matches for her.

Lori’s biggest match is 54, but that is with me. Lori matches Skot from about 68 to 77M, so these all start before that point. A few end before then. Lori has other matches in this region. Lori’s matches tested at AncestryDNA, 23andme and FTDNA. I tend to prefer AncestryDNA matches as the family trees are easier for me to read.

Lori’s first match of 22 cM is with Cheryl. Skot and Cheryl match at about the same spot and about the same cM as Lori and Skot match. That means the three triangulate.

Now the Hard Part – Finding the Common Ancestor

Cheryl has over 25,000 people in her tree. Does she have Hathaways or Cliftons? At Ancestry, Cheryl and Lori are not Shared Ancestor Hints to each other. According to AncestryDNA, the common surnames between Lori and Cheryl are:

However, Baker and Schmidt appear to me on my mom’s side, so I won’t look at those. Phillips and Warren didn’t show anything obviously helpful. When I click on Cheryl’s White, I get this:

This is interesting as I have ancestors in Dighton on my Snell Line and also White and Hathaway ancestors. With a little trial and error, I see that Elizabeth Hathaway’s mother is Elizabeth Talbot. That is one of my ancestral names also. Elizabeth’s parents according to Cheryls were Jared Talbot and Sarah Andrews. I have a match in that couple. Here is my tree:

This is what I meant when I said that finding common ancestors among triangulated matches was not easy. I’m not happy that Lori and Cheryl’s common ancestor is from the 1600’s, but at least we found a match. Perhaps we will come back to Cheryl. Right now, a tie-breaker would help. Hathaway/Clifton or Talbot/Andrews?

Skot’s Genealogy

Here is the spot of Skot’s genealogy where Ancestry has us matching:

Note that Ancestry simplified the situation a bit. We are matching on Simon Hathaway and Hannah Clifton. However, we also match on Arthur Hathaway. It is even more confusing than that because Arthur Hathaway was also the father of Simon Hathaway by his first wife Maria Luce. Wow. Then Skot has more than one Clifton in there.

Shamus Match

One of my good matches at Chromsome 10 in this area of interest is Shamus. He matches me closely at 43.8 cM by FTDNA and 39.4 by Gedmatch.com. According to FTDNA, we share the following surnames:

Barstow Cook Swift Samson Talbot Taylor Townsend White Wing Ward

I looked through these names, but saw no obvious connection before the 1700’s.

Sarah Match

Sarah matches Lori at 18 cM. She is at FTDNA. Her surnames that match are:

Clark Hatch Jewett Johnson Lutzelburger Lutzelberger Lombard Richmond Spooner Smith White Wing

At least between Shamus and Sarah are the common White and Wing names. By the way, Sarah has a different last name at Gedmatch and FTDNA, but I assume that she is the same person. Actually there is a way to prove it, because FTDNA has a chromosome browser. Here is how Sarah matches me using FTDNA’s chromosome browser:

Again, the DNA part is easy. It is the genealogy that is a bear.

Here is Sarah’s White and Wing connection:

Here is how I connect:

Again it is not a very satisfying connection. We connect only on Daniel Wing at the top. Our ancestors appear to be from two different mothers and Daniel who was born in 1617. I wasn’t able to place Sarah’s Hannah White.

I didn’t find out much about Joanne or Joanna Hatch. I did read an account of a family tradition that said that Joanna and Bachelor Wing were cousins.

At this point, I’m ready to call it quits.

Summary of Genealogy Linked to DNA

So far I match:

  • Skot on Hathaway/Clifton – early 1700’s Rochester, MA
  • Cheryl – Talbot/Andrews 1640’s Dighton, MA
  • Shamus and Sarah – Wing 1617 Sandwich, MA

I’m sure there are other connections.

Continuing to Work Down My Sister Lori’s Match List

There are some 23andme matches, but I have no idea how to find their ancestry without contacting them. Next I see Michelle. I am able to find her using a Chrome add-on to AncestryDNA which I think is called DNA Helper. She matches at 22 cM at Gedmatch. Oddly, she matches at 27.6 cM at AncestryDNA where the matches are usually less than at Gedmatch. Unfortunately, her tree is private. I have been in touch with her by email and she says she is related to the Hatch family somehow. The next match is Sean at FTDNA, but he has no family tree.

Summary and Conclusion

  • The DNA shows that there is a common ancestor between the paternal matches that I have on a particular segment of Chromosome 10
  • Finding the one common ancestor of a triangulated group is difficult
  • It is likely that there are holes in the ancestry trees of these Chromosome 10 matches. If all those holes were filled in, then the common ancestor may become apparent.
  • While I was doing this exercise I filled in some missing ancestors on my Jewett line. One ancestor was a Reverend up in Rowley which I found interesting. So this exercise wasn’t a total waste of time.
  • Skot and I still likely match on Hathaway and Clifton. However, the DNA tests we both took don’t necessarily point to those two ancestors.
  • At this point, the only triangulated ancestors I found in this Chromosome 10 group was Daniel Wing from Sandwich b., 1617.
  • In summary, the DNA is saying that there is some kind of colonial Massachusetts ancestry passed down. However, whether that ancestry is from Dighton, Rochester or Sandwich, MA or even somewhere else is not clear.

 

 

 

 

First Frazer Big Y Results in a YP4415 SNP

In my last Blog, I wrote about my cousin Paul’s BigY results. The BigY takes a look at a large region of YDNA looking for existing SNPs and new SNPs. SNPs are what define the Y tree going back to genetic Adam. As a refresher, YDNA looks at the father’s father’s father’s line only. So if you are a Frazer, your father is a Frazer. At some point two different Frazer lines merge into one. That merging point is the two lines’ TMRCA or Most Recent Common Ancestor. (I don’t know what the T stands for – the?) Then at some point all the Frazers tested bump into a common ancestor. For Paul and Jonathan who took the BigY test, that bumped-into Frazer would be the father of the Archibald and James Lines. However, the YDNA doesn’t stop there, it keeps going back and back and back.

Paul’s YDNA Matches

In my last Blog, I had mentioned that Paul had been designated as YP432 by FTDNA. That SNP has common ancestors, but they go back to 2800 years ago. As such, others that are YP432 will be from diverse background. I had mentioned some Norwegian and Swedish names. This makes sense as the L664 SNP which YP432 comes from is Germanic. These Germanic people moved into Scandinavia, England and apparently Scotland at some point.

FTDNA R1a Projects: L664, YP432, YP431 and YP5515

In my previous Blog, I had looked at matches at the R1a and all Subclades Project. However, FTDNA has another YDNA Project called simply the R1a Project. I find it a bit confusing that there are two R1a projects, but here is what the R1a Project has under YP432:

This shows some of the people that have tested positive for YP432. There are two branches shown here. The larger branch looks to mostly have ancestors from Norway and Sweden and is the YP431 Branch of YP432. The Frazers are on the YP5515 Branch. The Grants are also listed under YP5515. This is likely due to STR similarities as the Grants have not had their SNPs tested – just the STRs. In my previous Blogs, I had mentioned similarities between the Grants and the Frazers in the YDNA.

This doesn’t mean that the Frazers came from Norway or Sweden. Perhaps one branch of YP432 went to Norway and Sweden (YP431) and our branch of YP5515 went to Scotland and/or England.

The Hayes that I mentioned in my previous Blog is also listed, but in a separate group. Our Frazers are called YP5515 – x and Hayes is plain YP5515. I’m not sure why.

another YP5515 Match – Patton

The YP5515 SNP Group is a very select group so far. There is Hayes and Patton. Assuming that these were the first two YP5519, then Frazer is the third. Patton shares YP5515 according to Paul’s BigY Match List:

I highlighted in gold the SNPs that Paul shares with Hayes and Patton and not the other YP432 matches. I haven’t seen Patton in the R1a Project, so he probably never joined it. Two of those SNPs have no name yet – just a position number. As far as I know, all YP5515 people share these 7 gold SNPs.

What Are the SNPs Unique to Frazer?

We will know that better when Jonathan’s BigY results come in. However, for now, I can guess. The BigY tells me the SNPs that Paul has that Hayes doesn’t have. There are 11 of these SNPs. The SNPs that Paul has that Patton doesn’t have are quite a bit more. Paul has 20 SNPs that Patton doesn’t have. What does this mean?

First, here are the 11 SNPs that Paul has that neither Hayes nor Patton has:

These would be the SNPs unique to Paul. I would expect to see some of these in Jonathan’s results.

Additional Shared SNPs With Hayes – A New Branch?

Recall that I said that Paul had additional SNPs not shared with Patton. There were 20 altogether. Here are the SNPs Paul doesn’t share with Patton that are different than the ones he doesn’t share with Hayes. I know, there are a lot of negatives here.

I have marked those 9 SNPs in blue. It turns out that those SNPs Paul doesn’t share with Patton, he does share with Hayes. To me, that means that Paul and Hayes should be in a new branch together.

In my new tree, I’ve simplified the YP431 Branch. In YP5515 there are 7 SNPs shared by Patton, Hayes and Frazer. Below that are the 9 SNPs shared by Hayes and Frazer. Below that are the 11 SNPs that Frazer has that appear to be unique. I say appear because there could be others that share at least some of these SNPs. All these SNPs together add up to 27 SNPs. I’m not sure how to date the SNPs. If these 27 SNPs were since 2800 years ago, that would be about 100 years per SNP on average. If I’m right, then that would mean around 1100 years up to the Frazer/Hayes common ancestor. That should be 900 A.D or before the time of surnames. It will be interesting to see if all my guesses are right.

Another interesting point is that Paul and Jonathan’s TMRCA was around 300 years ago. That means that there should be a few SNPs different between Paul and Jonathan. They will each have their own branch off the Frazer Tree.

 

 

A New Frazer Big Y Test Is In

I found out today that the Big Y results are in for Paul. He is my second cousin once removed on my Frazer side. So far, I can see that his SNP now is R-YP432. The Big Y will tell you what your lowest known FTDNA accepted SNP is. It will also tell you your SNPs that don’t even have a name yet.

L664

R-YP432 is a branch of L664 which is part of a much larger R1a YDNA group. The chart below shows the L664 people as “Germanic”. Who knew? Wouldn’t one think that the Frazers would be Scots – not Germanic?

A more likely guess would have been that the Frazers would be with the Norsemen at Z284. The Norsemen probably made their way to Scotland. However, the YDNA seems to see it differently. The insert map above gives possible routes of migration. It shows the L664 coming out of the area of Germany and going up to England or Denmark. My history is not the best, but I do know that the Danes invaded the British Isles at some point. Could this have been related to the start of our branch of Frazers? Or perhaps some R1a ancestors joined up with the Norsemen. The Frazers could have even come in with the Anglo Saxons or William the Conqueror. Who knows?

Previous Predictions Based on STRs

Back in November 2015, I had written a Blog on Frazer YDNA. At that time, I had talked to an R1a administrator, Martin. He was quite sure, based on the STR testing, that our Frazers were L664. Further, based on values of specific STRs that Martin knew about, I had shown this Chart:

Martin had thought it unlikely that the Frazers would be in crossed out SNP areas based on their STR values. Notice that they turned out to be in YP432 on the bottom right.

How Old Is YP432?

YFull is a service that dates SNPs among other things. Here is their date for R-M198:

FTDNA previously had put Paul into the R-M198 Group. This is a very general R1a Group. Comparing Paul with other M198’s would put their most recent common ancestor at 8500 years ago. Aah, the good old days. The YFull Tree above brings us through 4,400 years of Frazer history – up to 4100 years ago. This is where I left off on the last Blog. The L664 Administrator for the R1a Project could tell that is where the Frazers should be based on their STR testing.

The YFull YP432 Tree

YFull shows a common ancestor for YP432 at 2800 years before present. I’m sure that gave the Frazers plenty of time to go from wherever they came from to Scotland and then to Ireland.

I plan to submit Paul’s Big Y results to YFull for further analysis. People that have submitted their Big Y results to YFull show up as ID’s. For example, it appears that id: YF09214 has English ancestors. Once YFull has a chance to look at the results, they may show a new branch of the YP432 Tree. One goal would be for the Frazers to have their own family SNP identified.

Competing Trees

The YFull Tree is above and appears to be the better tree. Here is the FTDNA Haplotree which seems to be lagging in the YP432 Department:

One next step would be to compare the FTDNA “Novel Variants” to see if any of them are named SNPs on the YFull Tree. The other, as I mentioned is to submit Paul’s Big Y results to YFUll for analysis. I note that FTDNA does have YP431, but Paul is not listed under that SNP.

Where Are Our Frazers On the YP432 Tree?

I have trouble seeing the YFull Tree, so I drew my interpretation of it:

Our Frazers, according to FTDNA are at YP432*. However, as I’ve shown above, FTDNA doesn’t have as many SNPs listed as YFull does. All the ‘YP’ SNPs, in fact, are YFull identified SNPs. According to ISOGG:

YP = SNPs identified by citizen scientists from genetic tests, then submitted to the Y Full team for verification.

Who Are Some Other YP432 People?

The Frazers are part of the R1a YDNA Project. That project appears to have two small YP432 groups.

These five YP432 people appear to have ancestors from Norway or Sweden.

Other Big Y Matches

It took a little while for Paul’s matches to show up. It appears that the closest ones have a zero known SNP difference, so I chose them. Then the list is sorted by those that share the most Novel Variants. My question is, how novel could they be if they are shared? I think that what they mean is unnamed SNPs.

The numbers on the right are the SNPs that do match.

Paul’s matching Novel SNPs with Hayes

As noted above, Paul shares 30 Novel SNPs with Hayes. I looked up all the positions at ybrowse.org and many of those ‘Novel’ SNPs have names. Here are the first 26:

I was especially interested in the YP5500 series SNPs as that sounded like the YP5515 SNP which forms one of the branches of the YP432 Tree.

I did find YP5515. It was the 27th Shared Novel Variant between Paul and Hayes.

That is good news as that further defines the Frazer Branch. When I go back to the YFull Tree, I see that the one person there that is YP5515 is also YP5516, YP5517, YP5518 and YP5519. This is what is called a block of SNPs. Both Paul and Hayes are positive for these SNPs. YP5515 was probably chosen as representative of these SNPs and likely because it was the best quality SNP for testing.

What About Jonathan?

Jonathan’s test should be coming in shortly. His Big Y was ordered not too long after Paul’s. I had a bit of a scare, because I was looking at my old Blog. In that Blog, Jonathan was listed as R-M458. When I compared that to Paul’s R-L664, they were no where near each other. However, sometime since my old Blog and now, Jonathan has been stealthily changed by FTDNA to the more generic R-M198. I fully expect FTDNA to have Jonathan as R-YP432 when his Big Y results come out.

Next Steps

The Big Y’s strong suit isn’t in predicting the YP432. There are other tests that could have done that. The next step is to look at the private SNPs. Jonathan’s Big Y should be coming in next. That test should show some shared SNPs that should create a new branch off the YP432 tree. In fact, I’ve shown one branch already. I expect that there will be more branching off from R-YP5515.

It is interesting that the YDNA goes so far back. We wanted to find out where the Frazers were in Scotland. Instead, at this time, we’ve skipped Scotland and appear to be somewhere in Noway or Sweden! However, I feel like the Hayes match at YP4415 will reel us back into the area of Scotland and England at least.

Part 7 – Raw DNA From 5 Siblings and a Mother – DNA From Mom

I’ve spent my last 6 Blogs on this topic finding out which alleles came from my dad. In this Blog, I would like to work on finding my siblings’ and my alleles that come from mom.

The Ironic Step of Phasing – Mom Alleles from Dad Alleles

I call this ironic step in that it was my mom that was tested for DNA. Based on her results we found out a lot of the alleles that her children got from our dad who passed away quite a while ago. Now, we use those alleles we got from dad to figure out which alleles we got from mom. From the Whit Athey Paper referenced at the ISOGG Web Page on Phasing:

If a child is heterozygous at a particular SNP, and if it is possible to determine which parent contributed one of the bases, then the other parent necessarily contributed the other (or alternate) base.

 

First I copy my FillinOne Table to a MomfromDadOne Table. Then I’ll do a query on that.

This says where I am heterozygous, and I have an allele from dad, I want to see where I’m missing one from mom.

I have over 50,000 of these which will be easy to update. I will want to put Joelallele2 in the blank where JoelfromDad = Joelallele1. Then I will want Joelallele1 in the JoelfromMom space when my allele from Dad is Joelallele2.

I ran this query twice for each sibling, so 10 times. This updated 50-60,000 alleles per sibling, so about a quarter of a million alleles altogether.

Finding Mom Patterns

Now that I have filled in more alleles from Mom, it should be easier to find Mom Patterns. Here is a Query to find Min and Max for the AAAAB Pattern:

Results in:

This saves a lot of time and gives me the start and stop positions of all the AAAAB Mom Patterns. In my previous look which I now see as premature, I only found 2 AAAAB Patterns. Now thanks to my MomfromDad update above, I have at least 17 AAAAB Patterns. The only drawback is that if there is more than one AAAAB Pattern within a Chromosome, it will not show that. However, if I run all the Mom Patterns, and find overlapping Patterns, that can be reconciled later. In fact, I see an overlap already:

The first AAAAB Pattern I found was 162-233M which I did see as large. I already had found an AAABA Pattern from 192-249M. This could mean that AAAAB goes from 162-192 and that the 233M AAAAB pattern was just an outlying singleton.

I also recall that I want ID’s, so I’ll add that to my query:

Because I have so much new information, I’ll put this into a new spreadsheet:

AAABA Mom Pattern

I just have to change the Query slightly to get the AAABA Mom Pattern:

The results of this Query go into the new spreadsheet. This spreadsheet will be sorted by Chromosome later.

I added a column for IDEnd minus IDStart:

Where this is zero, it would indicate a single Pattern.

I went through all the Mom Patterns and got a spreadsheet of 194 rows that need to be reconciled. Here are Chromosomes 1 and 2 sorted:

Reconciling Chromosome 1

I have added in a column for possible assignment of a crossover to a sibling. Note that up to about 20M everything looks OK. There are discrete Patterns. ABBBA to AABBA is a change in the second position which belongs to Sharon. The change from AABBA to AABBB goes to Lori. Then the AABBB is the same as BBAAA which goes to ABAAA. That would be my crossover [Joel].

I did a Query showing where all the alleles were filled in for the Mom Patterns:

This shows where my Crossover is at ID # 8984. I have added a few more columns to my Mom Pattern Spreadsheet to add the more refined cut points:

Next I’ll look at 77M.

As best I can tell, there are two single AABAB’s in the middle of an AABBB Pattern. Next I will want to find the start of that AABBB Pattern. To find that I do a query to look for the AABBB Pattern in Chromosome 1. That Query results in more AABBB Patterns.

A Problem

I have a problem in that it appears that the Mom Patterns of AABBB and AABAB appear to overlap each other on Chromosome 1. I assume that means that I did something wrong.

refilling the dad patterns

That means that I should go back and fill the Dad Pattern back in:

First I recreate a Fill-in Table using the old Three Principles Table. Then I do update queries on that. Hopefully these numbers will work:

Back to Mom Patterns From Dad Patterns

Just so I’m not going backwards, I’ll redo this step. I copied my revised fill-in Table to a revised Mom from Dad Table. This time I’ll keep track of the alleles for fun:

So in retrospect, I don’t know if I made a mistake with the Dad fill-in’s or in the Mom fill-in from the Dad Pattern. Hopefully, there were no mistakes this time.

 

Part 6 – Raw DNA From 5 Siblings and a Mother – Filling In Paternal Blanks

In my last Blog, I said that I would work on the Maternal Patterns and then fill in blanks. However, my Maternal Pattern Table is not very filled. After some thought and re-reading the Whit Athey Paper on Phasing, on which I base this work, I decided to:

  1. Fill in the Paternal Blanks
  2. Use the Paternal Data to fill in the Maternal alleles
  3. Fill out the Maternal Pattern Table
  4. Fill in the Maternal blanks based on the Maternal Pattern Table

Filling In Paternal Blanks

I might as well start filling in the AAAAA Patterns. On my Dad Pattern Excel Spreadsheet, I can filter for that pattern:

However, I now need a formula for Excel including all the ID positions above. This was the point of my starting this project over – to get those IDs. The formula will be in the form of “Between A And B OR Between C And D OR…” So first I need a formula in Excel to create the formula in Access. That formula is called Concatenate. According to a Google search, concatenate means to “link (things) together in a chain or series”. The symbol in Excel for concatenate is simply the ampersand (&).

Here is my formula and the outcome:

However, I have another idea. I can concatenate the concatenation. First, I add an extra space on the end of my “Or”. Then I drag down the formula to fill in the other chromosomes. Then I take off the last “Or”.

That gives me this helpful string of AAAAA Positions:

This will save me a lot of cutting and pasting in Access.

Back to Access

First I copied my old Table to a new one called tblFillInOne. I will create an Update Query for that Table.

I am only updating Dad alleles from other Dad alleles, so I import those 5 alleles plus the location ID. Then I use the expression builder, to paste in the location of the AAAAA Patterns in all 22 chromosomes. So now I have the Pattern and the location, but I need some more criteria. I would like the criteria to say if there is any allele in any of the five columns and any blanks in those columns, then replace the blank space with one of the existing alleles.

Here is a simple Update Query:

This says, that if my allele is null and Sharon’s isn’t, then replace mine with Sharon’s. The problem is that this would take four separate Update Queries. With 5 siblings, that would be 20 queries.

Another risky Update Query would use this form:

Here I am saying if any allele is not null (other than mine) replace that in my slot where I have a blank. The thing I don’t know if the Update To: field can have a variable criteria. I’ll try it. When I run this as a Select Query, it puts a bit of a strain on my computer. Eventually, it gives me 18,385 rows. When I run the View function on the Update Query, I get the same number of rows, so I’ll hit the Run button and hope for the best.

If I run this Select Query again, I should get no results if everything updated.  I did get no results, so I assume that it worked. I want to save this Update Query and use it for the other four siblings.

Filling in Sharon’s missing alleles from the AAAAA Paternal pattern

I used the same logic for Sharon:

Now she has all the Is Null values and I don’t. I moved the Update To: criteria over to Sharon. I took out Sharon’s allele and added mine in her place. Again, this gives my old computer a workout. I get 18,315 rows again which seems suspicious. I see the problem. I appears that Access updated my results with a (-1) rather than with an allele.

That means that I just have to do 20 Queries. However, they should go quickly.

Back to the Simple AAAAA Query

Due to all the Update Queries, I’ll make a Spreadsheet to keep track of each Update Query I do:

It turns out that it is easier to run this Update Query sorted by ‘From’:

That way, I can just move Sharon’s allele from Dad and the Is Null along the Update Query:

With these fast 20 Update Queries, I updated over 100,000 alleles:

AAAAB Fill-in

This could be a little easier. For this one, we don’t want to touch the last ‘B’. The last B represents Lori, so we will only be filling in to and with the other four siblings.

And then we need the fill-in locations.

AAABA and AAABB Fill-in

AAABA is about the same as AAAAB except the B in the AAABA corresponds to Jonathan. He is all alone as a B so he gives no alleles and takes no alleles. The other siblings share their AAAA’s in this Pattern.

In an AAABB Pattern, the three A’s will share with each other and the two B’s will share with each other. This happens to break down along V1 and V2 lines, so I expect there will not be as much sharing as between AncestryDNA versions.  The sharing of A’s and B’s looks like this in my Fill-in Tracker:

I have darkened out the areas where an A cannot share their A with a B and a B cannot share their B with an A. As I predicted, the AAABB filled-in alleles were less:

All the other patterns filled in

All the other patterns will be of the same type. There is one AAAAA which is all the same. The other combinations are four of one type and one of the other or three of one type and two of the other.

There are 20 fill-in’s for AAAAA. As a quality check, there are 12 fill-in’s for a 4-1 Pattern and there are eight fill-in’s for a 3-2 Pattern. I would recommend using a fill-in tracker to make sure all the combinations are being covered. The specific numbers of alleles being filled in for each combination of each Pattern are not all that important, but they are interesting.

Fixing an abbab mistake

When I was filling in the ABBAB Pattern, I noticed a mistake I made. I filled in 3754 rows of Joel alleles into Heidi blank spots. In an ABBAB Pattern, I am only supposed to be filling in my alleles into Jon’s blanks. Here is the mistake:

That means in those positions, I’ll have an ABAAB Pattern where I should have an ABBAB Pattern. Oh no. So how do I fix that? I need a fix query. Under Pos ID, I’ll put in all the locations that are supposed to be ABBAB. Then I’ll make sure the first position isn’t the same as the second:

That results in only 103 rows.

If I update those 103 rows to Null, that should be a start:

Next I set the first position to be different than the last in this ABBAB Pattern:

That fixes another 212 rows. That may be all the rows to fix. I looked for more JoelfromDad = Heidi from dad where JoelfromDad <> LorifromDad and where JonfromDad <> LorifromDad, but didn’t see anything. The other updates must have been in areas with AAAAA by chance areas. In the meantime, I copied the first two columns on the left to the right, so I don’t lose my place when I am scanning across the spreadsheet.

Dad Pattern Fill-in First Round

The dark blue areas are the ones where there should not be any filling in based on the Pattern.

Summary

  • The Fill-in Step is a major part of phasing. In this step I filled in over 1 million paternal alleles in my DNA and in my 4 siblings’ DNA.
  • I noticed a mistake I made along the way, but figured out a way to fix it.
  • I figured out a shortcut to describe the different patterns by way of ID’s. The shortcut involved using a concatenation of a concatenation.
  • I haven’t yet filled in the random AAAAA Patterns that are within the other patterns. I imagine that would be important to do at some point. I know that David Pike has a utility to find Runs of Homozygosity. I suppose that would be useful for filling in alleles.