In a past blog, I mentioned My Big Fat Chromosome 20. That blog is also referenced on the ISOGG Chromosome Mapping Page. This particular Chromosome had puzzled me for a while due to the preponderance of matches I was getting there. I used visual phasing and determined that the overload of matches was on my paternal grandmother’s Frazer side rather than the Hartley side. I had previously supposed that the Hartley side held the key to all my matches as that side had colonial Massachusetts roots. Since that time, I had my brother’s DNA tested. He is shown as F in the bottom row below. I thought that his results might add some clarity to Chromosome 20.
Rather than clarifying things, I just got a shorter version of what I already had for Jon (F) than I had for myself (J) and my two sisters. The problem is the phenomenon of close crossovers at the beginning and end of each chromosome. Jon also has quite a few matches in Chromosome 20 (unlike my sister Sharon who had Hartley DNA in most of her paternal Chromosome 20). He has almost 30% of his phased matches there according to his match spreadsheet based on Gedmatch.
Going to the Source – Raw Data Phasing
I have been learning how to phase my raw data based on a Whit Athey article, MS Access and the work that M Macneill has done. The Whit Athey Paper describes how to manipulate the raw DNA data of one parent and four siblings to get Dad Patterns and Mom Patterns. I have found these patterns to be useful.
Even though my dad never had his DNA tested, based on the certain principles, I have come up with a spreadsheet that shows for various sections of the chromosomes matching patterns that I have with my other three siblings. I use A’s and B’s to give a generalized pattern. The patterns will be in the order of Joel, Sharon, Heidi and Jon. Here is my Dad Pattern spreadsheet showing Chromosome 20:
I find my gap to next column handy. The first thing that I notice is that there are not many large gaps. If there were very large gaps, that might indicate an AAAA pattern where all the siblings match (in this case a paternal grandparent). One thing that I added today is a Start and Stop. This is the first and last tested position of the Chromosome. This is good to know in case a pattern is hiding at the beginning or end of the chromosome. Let’s just look at the second line of the spreadsheet. This shows that there is a pattern of ABAB from position 0 to 10M. This means that the first and third people (Joel and Heidi) match the same paternal grandparent and the 2nd and 4th siblings (Sharon and Jon) match the other paternal grandparent.
In the third row of the spreadsheet, a new paternal pattern starts (at 10M). This is ABAA. Now sibling 1, 3, and 4 (Joel, Heidi, and Jon) match each other. The difference between ABAB and ABAA is in the last position where I have Jon. He switched from a B to an A and now no longer matches Sharon, but he does match his other three siblings on the paternal side. As Jon is the one that changed, he gets the paternal crossover at this position.
A few other notes
- These patterns are gradual. That means that there can be only one change at a time.
- If it looks like there are two or more changes, then either something was done wrong or you have to invert the A’s and B’s
- For example, above in row 4, I have an AABA pattern that goes to and ABAB. On face value, it looks like three changes. However, AABA is the same as BBAB. Actually it is the first B changing to an A. This is my position A, so I have a crossover around 54M on the paternal copy of my Chromosome 20.
- These areas of patterns are also used to fill in bases received from Dad or Mom in the particular areas that the patterns occur in each chromosome.
- If there are only three siblings tested, these patterns are not as informative.
Mom Pattern spreadsheet
I would not want to leave mom out. Here is the pattern of her 4 children matching on the maternal side:
Like the Dad Pattern Spreadsheet, everything looks well behaved as there are no large gaps between patterns. Also there are no gaps at the beginning or end of Chromosome 20. So there you have it. That is the phased DNA for myself and my other three siblings. But it doesn’t jump out at you and I don’t have a map yet. That is where I bring in the MacNeill <email@example.com> Spreadsheet.
MacNeill’s Excel Spreadsheet
I adjusted MacNeill’s Chromosome 1 spreadsheet by replacing default numbers for Chromosome 20. Then I added in the locations I had in the spreadsheet above. Those are the Start36 and Stop36 columns. The 36 refers to Build 36 locations which Gedmatch uses. After that I colored in the bars to be consistent with the visual phasing I had done previously.
Actually, I now see that I colored Sharon’s paternal bar backwards. She should have mostly Hartley (blue). This transposition also carried through to the next image, but I corrected it in the final image. I like having labels, so I copied this into PowerPoint and added some:
Next I add any appropriate cousin matches for Chromosome 20. I also made the sibling names on the left a little bigger. My mistake above on Sharon’s paternal bar is corrected and verified by her large paternal Hartley cousin match with Jim below.
I had to bring this back into PowerPoint to re-add the surnames. The places where the cousin matches start or stop may be crossovers for me and my siblings. From comparing the top part of the chart to the bottom, it should be obvious which crossovers are for me and my siblings and which are for the cousins. The good news is that the raw DNA phasing confirms my initial visual phasing done in January, 2016. The raw DNA phasing just filled in what I was unable to. The other good news was that there were significant cousin matches on both the paternal and maternal side of Chromosome 20 to make sure that all the grandparents were identified correctly. Since I did the original visual phasing last January 2016, I have gotten the DNA results of 2 more cousins. Also one additional cousin who previously had her match to only me at 23andme uploaded her results to Gedmatch.
- The hard work in Raw DNA phasing is assigning all the bases of the siblings to the correct parent. Then patterns are discerned and noted.
- The fun part is mapping out the results.
- Raw DNA phasing and mapping is more accurate and complete than visual phasing. However, it takes a lot of work and works best when there is at least one tested parent.
- The comparison of the raw DNA mapping to the actual cousin matches points out the fuzzy boundaries noted by others. This may be seen in Sharon’s short Lentz segment. Her cousin Judy match (who has Lentz ancestry) appears to exceed the length of Sharon’s Lentz segment.
- Out of the four siblings, Sharon is the one who didn’t get the huge dose of Frazer ancestor matches. That means that she would be the best for looking for smaller matches at Gedmatch.com. Her smallest match is 9.3 cM (5.9 Gen) and my smallest match at Gedmatch is 10.7 cM (5.2 Gen).
- At a glance, one can see who is the best person for finding matches with each of the four side of the family. For example, I received a full dose of Lentz DNA on Chromosome 20. Here is my Lentz grandmother (b. 1900) in her younger days. Her DNA is represented in yellow in the charts above.