Solving Joanna’s Mystery DNA Match with Visual Mapping

Recently I had a question from Joanna, who is part of a Frazer DNA Project that I’m working on. She has a large mystery match and would like to know which side of the family the match is on. Joanna is also interested in having her chromosomes mapped using Visual Phasing. Visual Phasing is a method that Kathy Johnston has pioneered using the DNA results of at least 3 siblings. Blaine Bettinger has also written a 5 part series on this subject.  Perhaps the mapping could help her find out what side of her family this mystery match is on.

Joanna’s Mystery DNA Connection with Mystery Vickey

Joanna’s siblings are Janet and Jonathan. I will check Gedmatch.com to see how the three siblings match up with Vickey.

vickeytojoanna

On Vickey’s One to Many list, I saw Joanna and Janet, but not Jonathan. You can see why Joanna is interested as her match with Vickey is 55 cM. I didn’t want to leave out Jonathan, so I ran a One to One between him and Vickey at Gedmatch;

vickeyjonathan

Jonathan does match Vickey, but he just fell off the bottom of Vickey’s One to Many List. The start of Jonathan’s match is at the start of Joanna’s orange bar above. His match with Vickey ends before his sister Janet’s match with Vickey starts. Now, in Joanna’s family we have a small, medium and large match with Mystery Vickey.

Visual Mapping of Joanna, Janet and Jonathan

As all the above matches are on Chromosome 13, it would make sense to start there. The first step is to compare the 3 siblings in the Gedmatch Chromosome Browser:

chromosomebrowserjjj

I then added crossover lines and attempted to assign the right sibling or siblings to the right crossover. This Chromosome was not simple. It looks like there are or could be close crossovers in three different places – around position 29, 33 and 98. In addition, something strange seems to be going on at the 72/73/74 location. That leaves only 2 crossovers which appear to be less than complicated. Those are: the first crossover which I have given to Jonathan and; the crossover at 62 which I gave to Janet.

Mapping the JJJ siblings

From 33 to about 73, Joanna and Jonathan have a Fully Identical Region (FIR). That means in that area, Joanna and Jonathan got their DNA from two of the same grandparents. One of those grandparents was on their Paternal side and one on the Maternal side.

chr13jjjfir

Above I’ve portrayed Joanna and Jonathan’s shared grandparents as blue and red. The next step is extending the blue and red bars. I’ll keep Jonathan’s two grandparents where they are as he has crossovers on either side. However, I’ll extend Joanna’s DNA from her 2 grandparents to the right. Should I extend them to 98 or to the end?

A single or double crossover at 98?

The simplest scenario at Position 98 would be a crossover assigned to Joanna. Joanna is the common factor in the first two comparisons, so that would make sense. In that scenario, neither Janet nor Jonathan would have crossovers at 98. However, that does not appear to go along with what we know. Recall above that I looked at Vickey’s match with Joanna and Janet:

vickeytojoanna

Why isn’t Janet matching Vickey for 55 cM as Joanna is? Something happened around Position 97. That something has to be a crossover for Janet. That sets a few things in motion. Now that we know Janet has a crossover, that means that Jonathan also has a crossover there. Our two options at 98 were either a single crossover for Joanna or a double crossover including Janet and Jonathan. My conclusion:

  • Janet has a crossover at 98
  • Jonathan also has a crossover at 98
  • Joanna does not have a crossover at 98

The immediate result is that I can send Joanna’s DNA over to the right side of the page:

joannamap

In the next step, I want to take advantage of the Joanna to Janet FIR and the places where siblings don’t match at all. This is where there is red in the Browser and no blue bar below. In these areas I will use two different colors.

firhir

The Irony of Vickey’s match

Next, I’d like to use Vickey’s match with Joanna and Janet set the crossover at position 98. We will now pick a ‘side’ to the colors. We will say that Vickey is a red match. In the area between 62 and 98 Joanna and Janet are in a Half Identical Region (HIR). That means that they have one grandparent in common our their four. The irony is that we are mapping this Chromosome to find out where Vickey fits in. Yet we are first using Vickey’s match to map the Chromosome.

vickeyonmap

Here I have added in Vickey’s matches with Joanna, Janet and Jonathan. Note that Janet’s match with Vickey starts right near her crossover. Before that, Janet matched another of her four grandparents (shown in yellow). Also Jonathan’s match with Vickey ends right by his crossover where he received his DNA from another grandparent (the same yellow). Now we have Joanna and her two siblings’ Chromosome 13 roughly mapped out. We have relative positions for their four grandparents. Can we now find out which of Joanna’s grandparents the Mystery Vickey is related to?

Finding Which of Joanna’s Grandparents Matches Vickey

In January, 2016, I wrote a blog about Joanna and her Chromosome 15. Here are the results I came up with (after I went back and corrected a mistake I found):

chr15frazermaprev

I show this to indicate the possible grandparents that Vickey could be related to. Chromosome 15 was also a much easier Chromosome to analyze. There were only three crossovers for Joanna and her siblings on Chromosome 15. The 90-97 area was where Joanna matched a Williams relative. From 67-92, Joanna and Janet matched Frazer relatives.

For Chromosome 13, however, the [maternal] Williams relative did not match with Joanna, Janet and Jonathan. So it will not be possible to make determinations on Joanna’s maternal side. There were, however matches on the Frazer side. Betty has not uploaded her results to Gedmatch, but she is on FTDNA which has a browser and gives Chromosome match locations.  Betty is our last chance to at least identify the paternal part of Joanna and her siblings’ Chromosome 13.  Fortunately, Betty matches Joanna twice and Jonathan once:

bettychr13

Now all I have to do is see where Betty’s DNA would fit on our Chromosome 13 Map.

chr13mapbetty

It looks like Betty has shown that Vickey is related somewhere along the line of Frazer – or more specifically Edward Frazer born in 1867. Here are a few notes on what I did:

  • I extended Jonathan’s Seymour DNA to the right as he had no Frazer match there where his two sisters did. They both matched Vickey. Then I added a blue segment above Jonathan’s yellow segment as Jonathan has a HIR with his two sisters at the end of Chromosome 13.
  • Question: Why didn’t I match Betty to the blue maternal side of the Chromosome? There is room for her DNA there also.
  • Answer: Imagine that I moved all the Betty matches up to the blue segments. That would leave a problem for Janet. In that scenario Joanna would have a blue match and Janet would also have to have the same blue match, but Janet didn’t match Betty from 75-96. Janet also did not match Vickey from 63-107. So that alternative scenario does not work out.
  • For more distant relationships, one would not want to make deductions based on lack of matches. However, with siblings there has to be an explanation as to why one sibling would have a match and the other would not based on the visual mapping.

Edward Frazer is wearing a top hat at the bottom right. In our Chromosome 13 Map, his DNA is shown as red and his Seymour wife as yellow.

frazer1867

The fact that Vickey matches on this Frazer line doesn’t mean that Vickey has to have Frazer ancestors. It just means that she and Edward Frazer must have a common ancestor. That common ancestor may be along the Palmer line, for example (Edward Frazer’s mother’s Line).

Betty to the rescue again

Let’s use Betty’s results to fill in the rest of Joanna’s family’s Chromosome 13:

chr13joannafinished

  • Note that Betty matches Joanna from 29-43 and Jonathan from 33-43. That tells me that Jonathan has a paternal crossover at 33. Because Joanna doesn’t have a paternal crossover at 33, that means she has a maternal crossover there. The rest I fill in using the FIR and HIR regions.
  • The smaller segments at the beginning of the Chromosome correspond with all the crossovers at the beginning of the Chromosome. There are 5 crossovers up to position 33. In Chromosome 15 that I mention earlier in the blog, there were 3 crossovers for the whole Chromosome. After position 33 on Chromosome 13, there are fewer, more spaced out crossovers which account for the larger segments of inherited grandparent DNA.

Summary and Conclusions

  • Visual Mapping can be fun and helpful in finding out where mystery matches come from
  • Without the help of Joanna’s 2nd cousin Betty, we would not have a complete map. We would also not be able to know which grandparent Vickey was related to.
  • If Betty’s results were only at AncestryDNA, we would not be able to do this analysis as AncestryDNA does not give detailed information on DNA matches. The fact that she tested at FTDNA helped us come to these conclusions, even though her results were not uploaded to Gedmatch.com
  • Joanna may know of more test results with known relatives that could help fill out the maternal side of Chromosome 13 or we may find out more in the future.

 

 

 

Using M MacNeills Raw DNA Phasing Spreadsheet and My Problem Chromosome 10

I have written many blogs about phasing my own raw DNA. One of the things that was bothering me while going through the process was the presentation of the results. It is possible to phase millions of bases using the raw DNA results from one parent and at least 3 siblings. But once the DNA is phased, how can those results be best portrayed? In my previous Blog on the subject, I was able to figure out a fairly simple way to show my results, but the outcome was not totally satisfactory.

chr7patmatmap

I liked how I was able to get the grandparents’ surnames at least in the first 2 bars. I also liked how I had a simple scale at the bottom. However, one of my bars went too far. Also, my simple chart started at zero and Chromosomes start at different positions. I was able to fix the bar going too far today. Excel makes these bars based on distance rather than positions, so one of my equations was wrong.

I told M MacNeill <prairielad_genealogy@hotmail.com> of my concerns and he sent me his spreadsheet. One feature I really liked about the MacNeill Spreadsheet is that it had a place for cousin matches at the bottom. Below is the first Chromosome where I used my phased raw data from my mom and 3 other siblings to create a MacNeill Chart.

chromosome15macneill

Sharon’s maternal first little segment didn’t work out perfectly, but that didn’t bother me. I know that the beginning and ends of Chromosomes can have small problematic segments. Note at the bottom that my match to Carolyn in yellow shows where my maternal crossover is in the upper part of the chart where I go from red to orange.

My Chromosome 10

I am looking at my Chromosome 10 because, for one thing, I have had trouble trying to visually phase this Chromosome in the past. Here is my attempt at visual phasing from early in 2016:

chr10visphase

Here is another try including additional cousins that tested:

10r1visphase

Note how different the maternal (lower) side is. I switched most of the maternal grandparents around.

Here is the MacNeill spreadsheet showing just the cousin matching part:

cousinmatch10macneill

I have some good matches here. Blue is Hartley, green is Frazer, yellow is Lentz. Red is Rathfelder. This makes it clear that my chromosome is mapped wrong. I need more Hartley and Lentz. The above chart includes my brother who I had tested not too long ago.

Here is another try with my brother’s DNA results included:

10visphase3

My sister Sharon (S) has a better look now on her maternal side. I got rid of the small purple segment.

Looking At the Raw DNA Phasing – Paternal Side

I have two spreadsheet summarizing the results of the many hours of work it took to phase my family’s DNA  from the raw data. One spreadsheet is for the paternal side phased DNA and the other is maternal. I have patterns for both sides. They are based on the order of my siblings: me (Joel), Sharon, Heidi and Jonathan. So an ABBB pattern would mean that Sharon, Heidi, and Jonathan all get their DNA from one grandparent, and I get mine from the other. Here is the paternal spreadsheet:

dadpatternchr10

These patterns go logically one to the other. The first pattern goes from AABA to AAAA at position 2,605,158. The B changed to an A in Heidi position, so the crossover goes to her at that position. I have a column called GaptoNext. This is based on the number of tested SNPs between patterns. When this number is large, I suspect an AAAA pattern. That was the case above highlighted in yellow. Except there is a problem. To go from ABAB to AAAA means 2 changes, and there should only be one change (or crossover) at a time. This caused me to look at the bases.

A Paternal pattern missed

Here is what I found.

chr10patternmissed

I had missed an AABA pattern at Build 36 Position 30,683,878. I took another look by setting my MS Access query so that Sharon and Heidi would have a different base from Dad:

chr10rawpatterns

This shows that the there is a change from ABAB to AABA even sooner than I thought between ID 400008 and 400045. This is an ID I created that sequentially numbers the tested SNPs. You can see another way I missed this pattern, because I didn’t fill in the missing bases. TTC? should be TTCT. CCT? should be CCTC.

What does the missing pattern represent?

The pattern of ABAB TO AABA is actually my crossover (Joel). It is a bit more difficult to see than the others. That is because the ABAB pattern is the same as BABA. The change of BABA to AABA is my change of the first B to the first A. Naturally, I put myself in the first position. In rough terms, that gives me a paternal crossover at about position 30.5M. This is a good location as it does not interfere with a large match that I have with an unknown paternal DNA relative named Shamus:

shamus

Here is my corrected Dad Pattern for Chromosome 10:

dadpatternchr10corrected

I have gone from 6 to 8 crossovers as the previous correction lead to another one. I also took out one of Heidi’s crossovers that I had wrongly identified. So fixing one problem fixed a lot of others. It helps to describe the start and stop of each pattern and to describe each crossover. The important results are the person and the last Position column. These show who the crossover belongs to and where that crossover occurs on the chromosome. I then entered the paternal crossover results into the MacNeill Spreadsheet and got this:

patchr10chart

I took out the large space between the siblings. The problem is that the space is now the same as between the maternal and paternal phased part for each sibling. Excel has no happy medium that I’ve found.

The blue is Hartley and green is Frazer. The raw phasing in the upper part of the chart matches with the cousin matches below. It is interesting that some of the cousin matches define the crossovers. For example, the Jim to Sharon match gives Sharon’s crossover. Also the Paul to Sharon match gives Sharon’s other crossover. The Paul to Jonathan match gives Jon’s first crossover.

The Maternal Side

Hopefully resolving the maternal phasing will be easier than the paternal side. My visual phasing only showed four crossovers. Here is my unfinished spreadsheet showing 5 crossovers (under the Person column):

maternalchr10

Here, it looks like I already added an AAAA pattern to the end. That was because the AABA pattern ended at about 114M and the Chromosome itself ends at about 135M. My GapstoNext column showed that gap as almost 20,000 SNPs. My question now is: should I add an AAAA pattern to the beginning also? Perhaps. An AAAA pattern means that 4 siblings match and all got their DNA in that area from their maternal (in this case) grandmother. Those results were consistent with how I had the visual phasing done. In fact, the visual phasing indicated that the 4 siblings should all get their maternal DNA from the Lentz side up until about 60M. Let’s take a closer look. This gets at my first note above in the spreadsheet image. There were only 3 single SNPs showing the AAAB pattern and they were spaced a long way apart – over 10 Megabases each. In this case, I will disregard those 3 widely spaced patterns as some type of mistake and stay with the AAAA pattern. Once I made the change from the AAAA pattern to the AAAB pattern, that brings us up to about 60M for my (Joel’s) first crossover. That seems to fit well. That leaves us with 4 crossovers – one per sibling as opposed to the two per sibling on the paternal side.

First I’ll compact the Gedmatch browser results, then show the raw DNA Phasing results on the MacNeill Chart:

gedmatchcheckofrawphase

chr10phasemap

When I compare the results, I see a problem I had with the visual phasing. The next to the last crossover looked to belong to Sharon, but instead it belonged to Heidi. Also Jon’s second paternal crossover should have been marked as an “F” above. That was just a typo. The third J for Joel crossover that I had above was not a crossover. In the middle, the 2 close crossovers of J and S should be instead S and J if I’m reading the MacNeill Chart correctly. It looks like all the FIRs and HIRs, etc. match. Once I did the raw DNA phasing, it is obvious how the gedmatch browser results had to match the raw DNA phasing results. Before, I did the raw DNA phasing it was not so obvious.

I’m happy with the results. I get to pick whatever colors I want for the four grandparents. It still would be nice to have some sort of labels or color key. After a hard day of phasing DNA, it is rewarding to see the results displayed so nicely. Thank you Mr. MacNeill.

A few observations:

  • The 4 siblings did not inherit any Rathfelder DNA (brown) on the left side of Chromosome 10
  • Lentz DNA (yellow) is missing from the right side of the Chromosome for the same 4 siblings
  • As I have my mother’s DNA results, that would make up for the missing DNA from those 2 maternal grandparents
  • Short segments of Hartley DNA (blue) are missing near the beginning and near the end of the Chromosome (i.e. none of the four siblings inherited Hartley grandfather DNA in those areas).

Summary

  • M MacNeill has the best display that I am aware of for mapping phased DNA.
  • The final mapping is like the final exam where previous mistakes are brought out, but there is a chance to correct them.
  • The phasing process is difficult, but there are built in checks and balances to find and correct mistakes or missed patterns.
  • The raw DNA phasing procedure (I use the Athey method) would generally be used if a parent has been tested and the visual one is used if a parent has not been tested. However, the visual phasing as developed by Kathy Johnston is important to use as a framework for the raw DNA phasing as well as a check for the end result.
  • The raw DNA phasing results appear to be better than what I was able to get using the visual phasing. Not because the visual phasing method is bad; more because I have not mastered it.
  • If you are using someone else’s spreadsheet, it is a good idea to know how they work in case anything goes wrong.
  • After writing many blogs on visual and raw data DNA phasing, it is nice to see everything come together using the MacNeill Spreadsheets and Charts.

More Dicks DNA – Marilyn’s Brother

I just finished 2 Blogs on the Henry Dicks Line which is a parallel line to my wife’s Christopher Dicks Line. Then I heard that Marilyn had her brother tested. Marilyn is on 2 different Christopher Dicks Lines.

Henry Dicks Line Updates

In other news, I found out that Eric’s dad, Claude, has been tested for DNA. What is more it is Claude that Eric believes to be likely related on the Henry Dicks Line. The confusing part was that Eric was in a Triangulation Group with my mother in law Joan and Joan’s half Aunt Eshter. So isn’t that confusing. That means that for now (as I understand it) Eric’s TG with my wife’s side of the family may not refer back to a Dicks ancestor. I’ll take Eric off the TG Matrix for now and put his father into the Dicks family comparisons. The good news is that there are a lot of Dicks descendants around. The bad news is that is is difficult to keep track of all of them.

I also got this note recently from Crystal from the Henry Dicks Line:

In looking at Ivy’s ancestors, We also share another ancestor. We are both related to The Vatchers as well as the Matthews and the Dicks. Burgeo is so small that you bound to be related in 2 or 3 different ways going way back!

In addition, Crystal tells me she has extra Dicks DNA on her dad’s side as shown here on this Henry Dicks Chart. Her mom’s side of the Dicks line leads up to the first pink rectangle. I have Crystal in a slightly different green to make sure I don’t forget she is in two Dicks Lines.

henrychartnew

Back to the Christopher Dicks Line and Marilyn’s Brother

Here is an updated Christopher Line Chart. All I did was add Marilyn’s brother Howie to an old chart I had:

marilynsbrotherchart

The chart is getting tiny. So I will point out that Marilyn and her brother are on the Joyce and Cran Lines. The Joyce Line is the large Line to the right of center and the Cran Line is on the right. That reminds me of something I brought up in an email. My wife’s 1/2 great Aunt Esther has 2 Dicks Lines also. One is through Christopher. The other one she doesn’t share with Joan due to the 1/2 part. However, I noted that Esther is in 3 TGs that she does not share with Joan. In those 3, she shares all 3 of them with people from the Adams Line. The Adams line is the one on the left.

esthernonjoantgs

These are the non-Joan, Esther TGs. They all have Nelson in them and two of them have Sandra. I just need to check to see if Esther’s other Dicks ancestor might fit in. “Hi Sandra, any room for Esther’s ancestor?”

However, when I look at Esther’s tree, this is what I see:

estherstree

Assuming that this tree is right, there is no room for Jane Ann Dicks in Sandra’s tree. That is because Jane Ann was b. 1841 and Sandra descends from Elizabeth Dicks b. 1809 who married Thomas Adams. Sandra would have descended from a male Dicks. I will leave this as a mystery for now. Perhaps the 3 TGs above between Esther and Nelson are non-Dicks TGs.

Marilyn’s Brother and Claude

Now I will compare all those who have Dicks ancestors. I will look especially at Marilyn’s brother and Claude (Eric’s) dad who may have Dicks ancestors. This resulted in 754 lines of matches. However, each match is listed twice, so there are only 377 matches. A lot of these matches are between close relatives. There would be a lot more matches if I had included Eric and Larry in the mix.

Chromosome 2

Here we have a complicated stretch of DNA:

tg2dicks

This may take a bit of explaining. Previously, I had this as two TGs:

  • TG2D (180-192) with Sandra, Nelson, Denise and Joan
  • TG2E (201-209) with Sandra, Nelson and Marilyn

I see now that Denise should have been in the TG2E. Now we can add Howie to TG2E also. There is another way to look at this TG. That would be that it is a larger TG and that Joan’s DNA didn’t extend to the higher end of it and Marilyn and Howie’s DNA didn’t extend to the first part of it. A few other things:

  • Kenneth and Judy are not in this TG. As they both descend from a Miller line, that would be a likely source of their DNA match.
  • Kirsten also does not appear to be in the TG. I’m not sure how to explain the matches between Kirsten and Marilyn and Kirsten and Howie. The simplest explanation would be that Marilyn and Howie are in the TG through their father’s side and match Kirsten on their mother’s side. However, I don’t know enough about everyone’s genealogy to know if that is feasible.

Here is the larger TG drawn out:

tg2chart

This was a little tricky to draw. What this is supposed to represent is that Sandra, Nelson and Denise are in the larger TG. Joan (in yellow) is in the first part of it and Marilyn and Howie are in the second part or it. I guessed that Marilyn and Howie might be in the box on the right as none of the other four Joyce line descendants are in this TG.

crossovers

I can give a likely reason Joan dropped out of this TG and Marilyn and Howie dropped in. It has to do with crossovers. Let’s look at Joan first. Joan has 2 copies of her Chromosome 2 as we all do. One is maternal and one is paternal. Joan’s Dicks DNA comes from her maternal side. Joan’s maternal DNA is made up of her mother’s two parents DNA joined together (recombined). Those 2 parents were Joan’s grandfather Frederick Upshall and grandmother Daly. Joan’s maternal Chromosome 2 is alternating between Upshall (whose mother was a Dicks) and Daly.

Here is a map of my actual Chromosome 2 showing the alternating pattern:

joelchr2

This chart was created by M MacNeill [prairielad_genealogy@hotmail.com]. It is possible to map this out if you have 2 parents tested, or if you have one tested and 2 or 3 siblings tested. There is even a way to map your grandparents with siblings and no parent tested. In the case above. Light blue represents my maternal grandmother and dark blue is my maternal grandfather. The light red is my paternal grandmother and the dark red is my paternal grandfather. Everyone’s DNA follows the same type of pattern. The actual configurations where the changes are will be different. The place where a color goes from one to another is called a crossover. Sometimes there is no crossover or recombination and you will have all your DNA on a particular copy (maternal or paternal) of a chromosome from one grandparent instead of two.

Back to the TG at Chromosome 2:

chr2joan

Notice what Joan’s matches with Sandra, Denise and Nelson have in common: they all end around 192M. That should be the place where Joan’s DNA switches from her grandpa Upshall to grandma Daly.

Here is Joan’s Chromosome 2:

joanchr2

This shows her matches with:

  1. Esther
  2. Nelson
  3. Sandra
  4. Denise

To the right of the one blue bar on top of the 2 green bars is where Joan drops out of this Dicks TG. I can almost map Joan’s Maternal grandparents from this gedmatch chromosome broswer. Here is my guess:

joanchr2map

A few notes:

  • Joan’s Daly grandmother is not from Newfoundland
  • Another possibility could be that the Upshall segment could extend to Joan’s matches with #2, 3, and 4, eliminating the first Daly segment I have.

Another interesting question is: Why doesn’t Esther match Joan where Joan matches Nelson, Sandra, and Denise? The answer would be that Esther has Upshall DNA in this area rather than Dicks and Joan got Dicks DNA in this area. It’s a bit confusing as you have to picture what is happening on each side of the match between Esther and Joan.

Marilyn and Howie’s appearance in TG2

I’d like to bring up an interesting point about siblings. Siblings represent the only relationship where you will find appreciable FIRs. FIRs are Fully Identical regions. Here is Marilyn’s match with her brother Howie on Chromosome 2:

marilynhowiechr2

This shows that Marilyn and Howie match each other along the blue line. That is from 0 to 147M. Then they don’t match from 147M to 182M. Then they match again to the end of the Chromosome 2. Above the blue bar are green and yellow areas. The yellow is how we match everyone other than siblings. The green is the FIR. That means a double match. As siblings, Marilyn and Howie share all their 4 grandparents: 2 Paternal and 2 Maternal grandparents. Looking at Marilyn and Howie’s Chromosome 2, I can know what the green, yellow and red regions mean:

  • Green – Marilyn and Howie both share a maternal grandparent and a paternal one. We just can’t tell which one right now.
  • Yellow – Marilyn and Howie both share a maternal grandparent or a paternal grandparent. Again we can’t tell which one right now.
  • Red – Marilyn and Howie share the DNA of neither their maternal nor paternal grandparent.

Here is the 2nd part of the TG at Chromosome 2:

tg2marilynhowie

The appearance of Marilyn and Howie in this TG is clear: 201M. I just found out recently that there is a way to expand matches to great detail as shown in the Gedmatch Chromosome Broswer. Here is Marilyn and Howie expanded at around 201M:

marilynhowiechr2-201

This is difficult to see. The number in the middle is 200M. That is one tick mark away from 201 where Marilyn and Howie enter the TG. Another interesting thing is that Marilyn (Molly) above gets out of the TG at 208 and Howie gets out between 212 and 218.  What does all this mean?

  • Based on the expanded view, Marilyn and Howie are FIR from a little after 195M. They jump into the TG at 201. FIR means that Marilyn and Howie share the same 2 grandparents – one maternal and one paternal. However, without the comparison of another sibling, this is difficult to see. I am assuming that from 195 to 201M, Marilyn and Howie share the same 2 grandparents, but not necessarily the same two as after 201M. At 201M, Marilyn and Howie both get their DNA from their paternal grandmother Sarah Priscilla. Sarah is the one with Dicks DNA.
  • At 208M, Marilyn drops out of the TG before Howie.

Here is an expanded view of an already expanded view of Marilyn and Howie at 208M:

chr2-210

Every little tick mark [^] is 1M. So 2 ^’s before 210M is 208M. That is where Marilyn and Howie go from FIR to HIR. An HIR is a Half Identical Region. That means that Marilyn and Howie match one grandparent (on the maternal or paternal side) and they don’t match the other grandparent (on the opposite of the maternal or paternal side where they do match). This is easier to show by mapping it out:

chr2map

It is clear that from 201 to 208, that Marilyn and Howie are in a TG. They are also FIR. That means that they have 2 grandparents the same (one paternal and one maternal, here represented by blue and yellow). The TG identifies the paternal grandparent as Sarah. She is the one that descends from the Dicks family. We don’t know which Maternal grandparent that Marilyn and Howie got their DNA from. We just know that it is the same grandparent.

At 208M, two things happen. Marilyn exits the TG and is now in an HIR with her brother Howie. HIR means that Marilyn gets her DNA from one grandparent (on the maternal or paternal side). On the other side from where she gets her DNA, she doesn’t get her DNA from the other. In this case, that means that she continues to match the same maternal grandparent and switches the paternal grandparent that she gets her DNA from from Sarah to Jesse.

All this is to say that it is helpful to have a sibling or more tested.

Chromosome 12

Like the TG at Chromosome 2, the TG that Howie is in at Chromosome 12 is not new. It has been described previously. Here is what it looks like in a spreadsheet:

tg12howie

The difference is that there is a Joyce Line TG within an apparent Dicks TG (in gold). Also within the gold TG there are single matches of people from the Henry Dicks Line. That could mean a few things:

  • The green matches are in non-Dicks lines
  • The green matches are with Dicks lines. If that is true, that would mean that the gold TG must point back to the wife of Christopher Dicks who I have as Margaret b. 1789.

In TG2, I had missed Denise in part of the TG. Previously I had missed Pauline in this one. Part of the reason I may have missed Denise in TG2 is that her match with Marilyn was less than 7 cM so didn’t show up at Gedmatch at threshold levels. In this case Marilyn doesn’t match Pauline, because she drops out of the TG right around the spot where Pauline joins in the TG (127M).

Here is Joan compared with Esther, Howie, Marilyn and Pauline:

joanchr12

In the above browser image, Joan’s maternal grandparent mapping would likely go Upshall, Daly, Upshall. One can see where Howie and Marilyn jump into the TG in the 2 yellow bars. You can also see how Marilyn (#3) jumps out of the TG on the right and Pauline (#4) jumps in (green bar).

For comparison, I will show the same matches from Esther’s point of view:

estherchr12

Esther’s view has to be exactly the same for #1 as they are comparing the same 2 people (Joan and Esther). Esther’s view gives a crisper indication of Marilyn’s crossover.

Chromosome 12 is shorter than Chromosome 2, so it should be simpler. Here are Marilyn and Howie compared at Chromosome 12:

mhchr12

Marilyn and Howie have 3 HIR’s, one FIR and one area where they don’t match either of their grandparents. In that area where they don’t match, if Marilyn got her DNA from her her maternal grandmother and paternal grandfather, for example, it would mean that Howie would have to get his DNA from his maternal grandfather and paternal grandmother.

We have more detail on the positions from the TG:

tgpauline

Howie and Molly jump into the TG at 114M. Molly jumps out at 126M and Howie jumps out at 132M. Actually, he had to as that is the end of the Chromosome!

Looking at Marilyn and Howie’s expanded view of Chromosome 12, their FIR starts at 101M. That switches to an HIR at about 126.5M. That corresponds where Marilyn gets out of the TG. It also corresponds where the green goes to yellow in the Gedmatch Chromosome browser in the image above.

mh12map

This looks similar to the Chromosome 2 map of Marilyn and Howie. This time I was a bit more brave due to my experience with Chromosome 2 and mapped their DNA to the beginning of their HIR rather than just to the beginning of where they jumped into the TG (113M). The reason for this is for there to be a change at 113M would require a double crossover for these two which is unlikely. Another note is that the yellow grandparent in this example may not be the same as the yellow one in Chromosome 2. It is just meant to represent one of the maternal grandparents in each case.

One More Question On Crossovers

I’m learning this as I go along. I had determined a crossover above at 126.5M above where one sibling left the TG and the other stayed in. However, I did not have a crossover at 114M where both siblings entered the TG. Why is that? I had a crossover at 126.5 because the chromosome browser verified that the siblings were switching from a FIR to an HIR at 126.5. To me, this verified the crossover in conjunction with the change in TG at the same location. At 114M, there was no change:

chr12pos114m

Above is the close-up view of Marilyn’s match to Howie on Chromosome 12 between positions 110 and 120M. The whole area on either side of 114M is FIR. That likely indicates no crossover at Marilyn’s and Howie’s grandparent level. However, it was Marilyn’s great grandmother Bertha Joyce that had her grandparents’ Dicks and Joyce DNA recombining into a crossover. It is likely that this TG represents the DNA that Bertha Joyce received from her grandparents probably sometime around the American Civil War. I note that the TG that I looked at above at Chromosome 2 followed the same pattern. The crossover was where one sibling left the TG and the other remained. Where the two siblings started in the TG, there was no change in the FIR region to an HIR.

So the answer is that there was a crossover at some point at position 114M, but quite a while before the time period that we are looking at here. So it is hidden in my map above.

Dicks TG Matrix Update

dicksmatrixupdate

  • Here I took out Eric as his father Claude (who is believed to be the one descended from the Dicks family) was not found to be in a TG. Eric was in a TG with Joan and Esther, but that must have been on his maternal (non-Dicks) side.
  • I didn’t add 2 extra columns for Howie, but put him in the appropriate boxes where the existing TGs for his sister Marilyn were.
  • I added Denise to TG2E and Pauline to TG12B. That was an important addition for Marilyn and Howie as it seem to indicate that their Dicks DNA comes from the Joyce rather than the Cran Line in this case. Recall that in TG2E, I was suggesting that this might represent the Cran line for Marilyn and Howie.

The All-Dicks Comparison

autosomalmatrix

The top left box are the Christopher Line descendants. The bottom right box is the Henry Dicks Line descendants. This now includes Claude and Howie. For an interesting comparison, run down the two columns of Molly and Howie and see how the total cMs of their matches differ.

Summary and Conclusions

  • I didn’t add any new TGs by the addition of Marilyn’s brother Howie and Eric’s father Claude.
  • Marilyn and Howie are the first known Dicks descendant siblings to have their DNA tested. So I took advantage of that to explain how crossovers work and how they are important in mapping DNA.
  • The combination of the sibling comparisons and TGs made it possible to partially map two of Marilyn’s and Howie’s paternal grandparents on portions of Chromosomes 2 and 12.
  • I also showed a likely scenario for Joan’s crossover point within a TG which would lead to mapping segments that she received from her maternal grandparents
  • I clarified a few issues and refined the Dicks Triangulation Group Matrix

 

 

A Bad AncestryDNA Hint Analyzed

In a previous Blog, I looked at what I called a false AncestryDNA hint. What I meant by that was that the DNA match itself was false. Because I did not match the other person’s mother’s or father’s DNA results, I could not match the person. There was much discussion on Facebook as to whether the AncestryDNA Shared Ancestor Hint (SAH) was good, bad, false, unconfirmed, etc. However, whatever the hint should have been called, there was no sense in following up on a DNA match that was false.

In this Blog I want to look at a SAH that is not false, but only bad. I don’t have a generic definition for what a bad SAH is, but in this case it is an AncestryDNA member match that lead to an ancestry tree hint on my father’s father’s side. In this blog, I will show that the actual DNA match as shown at Gedmatch was on my father’s mother’s side.

What AncestryDNA Shows

Here is the shared DNA of my member match Carol:

shareddna

Ancestry has their little DNA symbol and even gives the amount of DNA we share across 1 segment. This DNA matching information is on the very same page with this heading:

dnamatch

Somehow this leads me to believe that the DNA match is leading me to an Ancestry Tree hint right below my DNA member match information:

davissah

Incredible. AncestryDNA has found a hint of two shared 7th great grandparents nine generations away from my match and me. Except that it is incredibly wrong based on the DNA match. How do I know?

My Ancestry Match at Gedmatch

Fortunately, my match, Carol,  was wise enough to upload her results to gedmatch. Here is our match at gedmatch:

joelcarol

This looks like a decent match. Now I know where we match. I can check on my Visual Phasing map of Chromosome 11. Thanks to Kathy Johnston for the method.

visphase11

The middle bar is me (J). The paternal half is shown as blue and purple. Oh no, it shows that the area that I match Carol (24-36M) is a Frazer segment for me. This is my father’s mother’s side, not my father’s father’s side that had ancestors going back to colonial Massachusetts (John Davis and Hannah Lombard). All my Frazer ancestors were in Ireland before the 1880’s or so. I must have made a mistake. I’m just sitting here at a 10 year old computer that is about to die and Ancestry with it’s billions of dollars of resources is giving me a hint that they think is right. Ancestry really makes me doubt my work. So I check other Hartley reference matches. I add my brother to the visual phasing. No, it looks like I had it right after all. This is definitely an Irish Frazer match.

Shared Matches?

Perhaps if AncestryDNA had given me some shared matches, it would have tipped me off that this is the wrong DNA. However, at this level of match, apparently they don’t do this.

noshared

But that’s OK. I have Gedmatch which will give me shared matches. Gedmatch will even show how the shared matches match Carol on Chromsome 11 on a Chromosome Browser:

chr11shared

#1 and #2 are matches to Carol that I don’t know. #3 is me and #4 is my sister Sharon. It even looks like these matches to Carol could triangulate. However, Ancestry has told us that triangulation has about a one percent chance or less of happening at this level of relationship. That is why they use circles. Should I go against the advice of the mighty Ancestry? Below is Ancesty’s probability that 3 people will match at the same segment (triangulation).

ancestrydna-insights

I’ve gone this far, so let’s see what happens. Perhaps I will have beat the Ancestry odds of my finding a Triangulation Group (TG). At Gedmatch, I used the Multiple Kit Analysis for Carol and her first 3 matches as shown above and downloaded the segments for Chromosome 11:

tg11carol

I didn’t bother adding my sister Sharon to the mix. It looks like Cheri and Hazel are closely related, but that’s OK. I see that:

  • Hazel matchs Cheri
  • Carol matches Cheri
  • Carol matches Hazel
  • Carol matches Joel
  • Joel matches Cheri
  • Joel matches Hazel

That meets the definition of a Triangulation Group.

So far with Carol I have:

  • Checked her against my paternally phased kit to make sure she matched me on my paternal side.
  • Checked her results against my visual phasing map and mapped her to the appropriate grandparent
  • Shown that she was in a TG

To me, this confirms that my match with Carol is a real match on my father’s mother’s Frazer side and not my Hartley side.

Can Ancestry Redeem Itself?

After giving me a ‘bad’ hint, no chromosome browser, and telling me that resistance as well as triangulating is futile, can Ancestry redeem itself? Now that I know my match with Carol is not in Colonial Massachusetts, but Ireland, I can go back and check Carol’s Map and Locations button at AncestryDNA. Hmmm…. where should I look? Perhaps Ireland?

carol-map

Oh look. Carol has ancestors in Enniskillen, not too far from my blue ancestors. In fact, some of my other DNA relatives along the Frazer line have shown Enniskillen as a home base.

Summary and Conclusion

  • Ancestry gave me a ‘bad’ hint in that the DNA they used to point me to Colonial Massachusetts should have pointed me to Ireland
  • By implying that their DNA match leads to a specific tree, they also imply where the DNA came from. In this case the implication was the DNA match inferred my Hartley ancestors. In fact, I have shown that the DNA points to my Frazer grandmother whose parents were both born in Ireland.
  • Ancestry Shared Ancestor Hints take the easy way out. They point to places with good records and trees that are relatively easily created rather than to the places with more difficult ancestry such as Ireland. That is not helpful in furthering my research.
  • The Colonial Massachusetts match between Carol and myself may be correct. However, there is no sense trying to confirm or denying our shared Colonial Massachusetts ancestry with a DNA match that leads to late 1800’s Ireland.
  • I seriously doubt Ancestry’s probability of finding triangulation at the 4th cousin level between three people as being 1% or less. The 2 surname groups that I have worked on have large matrices of TGs for each surname.
  • Ancestry Hints are not useless without Gedmatch. However, they can be misleading.
  • It looks that close to 90% of my SAH’s are in the Distant Cousin range. Be very wary of these Distant Cousin matches that have not been verified by Gedmatch.

 

 

 

 

A False AncestryDNA Shared Ancestor Hint Analyzed

In this Blog, I would like to look at a Shared Ancestor Hint at AncestryDNA that appears to be a false DNA match. Deborah matches myself and my 3 siblings in the 6 cM range according to AncestryDNA using their own method which is supposed to take out false matches. Deborah’s match also results in a Shared Ancestor Hint (SAH) with myself and each of my 3 siblings:

deborahsah

This SAH looks fairly straightforward. It shows we have colonial Massachusetts ancestors that lived in Plymouth. Here is my phased paternal match with Deborah at Gedmatch:

phasedpatdeb

These results were also consistent with the visual phasing I did between 2 of my siblings:

chr13

The above Chromosome 13 map shows that the match with Deborah was in my Hartley grandfather’s region to the right of each blue bar. The first bar shows Sharon’s DNA, the second shows mine and the third is my sister Heidi’s.

This was initially exciting. I now had a chance to identify specific DNA and assign that DNA to a specific set of colonial ancestors. I contacted Deborah and found out that both her parents were tested also and  those results were at Gedmatch. She affirmed that the match would be on her mother’s side (based on genealogy). I compared my kit with Deborah’s mother’s and got no match. That appeared to prove that the Ancestry Shared Ancestry Hint was incorrect. The DNA that Ancestry matched with Churchill and Burbank could not be right.

This lead me to believe that I matched on Deborah’s father’s side. I checked that match at DNA which came up a blank. I wrote back to Deborah to say that this looked like my first confirmed Identical By Chance match.

Just to make sure, I compared Deborah’s family and my family:

deborahandhartley

What this shows is that neither I nor any of my siblings match Deborah’s parents. That is shown by the blank pink areas in the upper right and the lower left portions of the chart. Again, that means our match is not real. Yet, Deborah shows up as having a Shared Ancestor Hint for me and my 3 siblings. What is also interesting is that my sister Sharon has a 21.6 cM false match with Deborah. However, when I compare Sharon and Deborah ‘One to One’ at Gedmatch, I get a more reasonable result:

sharondeb

I suppose Sharon and Deborah share smaller segments. I checked again with a 5 cM threshold and indeed Sharon and Deborah share 2 more segments between 5 and 7 cM.

Any False Triangulation?

One other way to check Deborah’s match with my family is to see if she triangulates. I compared Deborah and Sharon in the Utility at Gedmatch that shows others that match both Sharon and Deborah. Here is the results of those matches at Chromosome 13 as shown in a browser:

sharondebchr13

Deborah and Sharon’s match is the blue top right one. Here there is clearly no indication of triangulation with Sharon and Deborah.  I would expect many other matches lining up below the top right blue match bar if there was any triangulated matches. This is further indication of a false match.

Summary and Conclusions

  • Ancestry is wise in not supplying me with a Chromosome Browser as it would prove some of their Shared Ancestor Hints as false.
  • A corollary would be, if you don’t want to prove AnceteryDNA Shared Ancestor Hints wrong, don’t use Gedmatch to compare your results
  • As others have noted, it is not enough to match someone through your phased kit. You also have to match the other person’s phased kit (or one of their parents) to be a real match.
  • This analysis applies to a relatively small match. This Shared Ancestry Hint was also at or near the bottom of the AncestryDNA list.
  • Be wary of the smaller matches. Focus on the larger ones unless you have good analysis such as triangulation to verify a smaller match.

 

DNA Phasing of Raw DNA When One Sibling is Missing: Part 10

In this Blog, I would like to portray my phasing results in an Excel Bar Chart if possible. This has been one of the most difficult parts a phasing my DNA for me.

I have looked at Stacked Bar Charts in Excel as they seem to be the closest to what I am looking for. Today I looked at a method for producing Gantt Charts at ablebits.com which seems to hold some promise of application for DNA mapping:

bar-chart-excel

I had my Maternal Patterns’ Starts and Stops from my last blog. I took those and converted them to Build 36 and put them in a spreadsheet:

momcrossoverstable

Start is the ID# I was using. Start36 is the Chromosome position of the Start of the pattern in Build 36. App ID is the approximate position of the Crossover. Then I have that same location in Build 37 and Build 36. Following the logic in the Ablebits.com tutorial, I have the first Maternal Crossovers for Chromosome 7 in my simplified Chart:

matfirstxover7

I got this by choosing the Build 36 column and choosing Insert Stacked Bar. I suppose a better Title would have been Chromosome 7 Maternal Crossover rather than Build 36. This was taken from my Column Header. The goal is to get a 2 color bar above. However, I already see a problem. The bar needs to be different colors for different people. Well, I have to start somewhere.

Next, I put in the next crossover location for each person. I took this position and subtracted from it the first Crossover to get a length.

step2crossexcel

You may note that the Bar Chart inverts the original order. It gives Sharon a 4 which is now on top. Here is my visual phasing of Chromosome 7 that I am trying to replicate:

chr7visphase

My Excel Bar Chart order is Sharon, Jon, Joel, Heidi. My visual phasing order is Sharon, Joel, Heidi, Jon. The 2 maternal colors I have above are green and orange representing Lentz and Rathfelder. If I keep orange as Rathfelder, that means I want to change bar 2 and 3 (Joel and Jon) on the Excel Bar Chart. One way to do this is to move over the first Crossovers for Joel and Jon in my spreadsheet:

modchart

However, that made the 2 male siblings’ first maternal grandparent match too long. I needed to move the start over 2 places in my spreadsheet:

mat7revised

Now the Chr7 Maternal Crossover column can be called Lentz and the 2length column can be called Rathfelder.

Next, I added another column for the next Lentz portion of DNA:

chr73rdxover

I was hoping that if I named the next column Lentz, that Excel would give me the same blue as the first Lentz. I was able to right click on the gray and change it to blue. I then added another Rathfelder segment. For this to work in Excel, a Rathfelder length is added rather than a start and stop location.

chr7xover3

Again, I had to reformat the Excel-chosen color to be consistent with what I had for Rathfelder. I chose the last position for Heidi and Sharon as the highest that I had as this was their last segment. After a bit of wrangling with Excel, I was able to get this:

chr7

So that is the presentation. However, I notice that on my visual phasing, I had 5 segments for Jon and only 4 here. I missed his last Rathfelder segment. I had ended Jon’s Chromosome too early. Here is the correction:

chr7corrected

It still looks like one of Jon’s crossovers in the middle of the Chromosome may be off, but I’ll have to figure that out later.

Paternal Bar Chart

Now that I have something that looks like a maternal Chromosome Map, I need the paternal side to go along with it. It looks like if I add 4 more rows to my spreadsheet, I may have it.

I did this and I added Hartley and Frazer (my paternal side grandparents) to the right of the maternal side grandparents. I had to make a new chart that came out like this:

chr7matpat

Here #4 is my Paternal DNA. I found it a bit disconcerting that my paternal side was longer than the maternal. Here I’ve added a bit of formatting and made the colors consistent (one color per grandparent):

chr7patmatmap

Well, I guess I’ll just leave this imperfect. It will give me something to work on later. I did change the scale from millions to M’s to be easier to read.  The above shows that Jon and Heidi share their paternal grandfather’s Hartley DNA un-recombined on Chromosome 7.

Summary and Conclusions

  • Learning how to phase my raw DNA has been interesting and time consuming
  • Delving into the A’s, G’s, T’s and C’s promotes understanding of one’s DNA
  • I owe a lot to M MacNeill and Whit Athey in learning how to do this phasing
  • Due to the data intensive nature of phasing, I would recommend the use of MS Access or some other database software.
  • An understanding of Excel or similar spreadsheet software is also important.
  • I had tested my brother Jon as an afterthought. It turned out that his test results were important in determining the phasing of the 4 siblings.
  • I have the overall skeleton of the phasing with crossovers. There is still a lot of work to complete the individual Chromosomes and trouble shoot problem areas.
  • Further, I have not worked on the X Chromosome due to the different nature of that Chromosome. My brother and I are already phased. My sisters are not.
  • Once these maps are done they will be a reference to all matches to my 3 siblings and myself.

DNA Phasing of 4 Siblings When One Parent Is Missing: Part 9

Mom Patterns

Up to this point, I have phased 4 siblings based on 3 principles outlined by Whit Athey. I have looked at the bases the 4 siblings had from their Dad. Those Dad bases made up patterns. Based on those patterns, other Dad bases were added to those siblings within those pattern areas. After those bases were added, mom bases were added where the siblings were heterozygous. The changes were documented in a Base Tracker.

Start stop using access min max – AAAB Mom Pattern

I can just look at my previous Blog to see what I did for my dad pattern. The results of this query:

mompatternquery

get copied to this spreadsheet where I added a column for Pattern:

mompatternspreadsheet

That was my big time saving step from my last query. Before I run each Min Max Total Query, I check a regular Select Query to make sure I have the right pattern. For example, here is my ABBA Mom Pattern check:

abbacheck

In a few minutes, I have 111 Start/Stop Mom Pattern pairs. This time, I’ll add conditional formatting to point out the one position patterns:

startstoponepattern

These single patterns tend to mess me up as I’m looking for patterns, so I’ll take them out of my spreadsheet, but not out of my Access data tables. There were 10 of these. I don’t know if that is a lot.

Getting better starts and stop for the mom patterns

The next step takes a little while. I look at the [now] 95 Start/Stop pairs for the various patterns. I highlighted the overlapping areas in yellow:

mompatternoverlaps

Actually, the first pattern overlaps into the second also. Some of these may be caused by single location patterns. For example at Chromosome 1, when I got to ID# 548 I find this:

momchr1

There is an ABAA Pattern, but it only lasts for one position and then is on to an ABBB pattern. I copied the end location for ABAA and put it at the end of Chromosome 1 to check later and made note of the one position pattern:

onepatternchr1mom

After that, it makes more sense that the ABBB pattern Stop at 2314 goes into an AABB Pattern Start at 2317. Here is the adjusted Chromosome 1 for my siblings’ Mom Patterns:

cleaneduppatternsmom

I moved the first Start to the Start of Chromosome 1 and last Stop to the end of Chromosome 1 as they were already pretty close to those positions. All combinations of patterns are represented here except for ABAB. I don’t have a start and stop for the single patterns as I’ll be taking them out later.

Filling In Mom Patterns

Now that I have all the mom patterns and their starts and stops as well as I can, I will fill in the patterns. I’ll start with AAAB. First I use the Concatenate formula in Excel to get my starts and stops in Access language. Then I sort the patterns in Excel:

mompatterssorted

I have 19 AAAB Mom Patterns. Next I go into Access and create an Update Query using the table called tbl4SibsNewMomPatternsFillin. In the AAAB Pattern, I will want to fill in the missing A’s.

updatequeryaaabmom

This looks like a good query, but I want to track how many bases I’m updating, so this query would make it difficult to track that as I’m adding bases to Sharon and Heidi. So again, I will go with the simpler query.

aaabsimplerupdate

Here is the first Mom Pattern Fill-in update on the Base Tracker:

basetrackeraaabmom

I continued the same process down the Mom Patterns, filling in what was missing from each of the siblngs:

momfillinupdatetracker

In each case for each pattern, I added less than 5,000 bases to each sibling. I also added to my spreadsheet a percentage of overall phasing which is now at 89.1%. This is how the 4 siblings are phased on average. Jon, who tested with the Ancestry V2 is bringing the other siblings’ overall average down.

Principle 3 – Dad Bases From Mom Bases

This is the icing on the cake for me. After all the work of determining Patterns and Starts and Stops, I have an easy step to add bases. Principle 3 says if you are heterozygous and you know one of your bases is assigned to one parent, then the other base must be assigned to the other parent.

I had to look at my previous blog to see how I did this. Let’s see if this looks right:

udatedadfrommom

The first column makes sure that I am heterozygous as my 2 alleles are not the same. The 2nd columns says that I know that I got allele2 from Mom. The 3rd column says to put my allele1 as the one I got from dad. That seems to make sense. This results in 9523 rows of updates in 22 Chromosomes. In part 2 of this Update Query, I switch the alleles:

dadfrommomforjoel

This says if my allele1 is from Mom assign allele2 to be from Dad.

Summary of Pattern Filling In and Dad Bases from Mom Bases

btdadfrommom

Here the overall phasing is 90%, but I had a pretty strict measure of phasing. It involved alleles that Jon was not even tested for. Here we are getting a diminishing return. I could continue the process, but I won’t.

Next Steps

Now I have a good idea where all the crossovers are. I need to assign those to siblings. Then I need to figure out how to portray the final results.

Assigning Crossovers to Siblings

I might as well jump right in. I’ll try a Chromosome that McNeill has mapped. Actually, he only did the 3 siblings at the time, so it may be a little different.

Chromosome 7 Crossovers

This has been mapped by MacNeill to 3 siblings. Let’s see how my mapping compares. Here is the mom pattern:

momstartstop7

Here I have by my own ID’s the start and stop. Then I have gap to the next pattern. This may indicate an AAAA pattern. Under description, I have what the pattern changes are. Then I have the person assigned to the Crossover. Then I have the approximate location of the Crossover. The first line I have the description as ABBA to ABBB. Here, Jon (in the last position) was matching with me as I’m in the first position of course. Then he changed to match with Sharon and Heidi. So I assigned the crossover to him.

Look at the 5th line. The pattern is ABAB to AAAB. This goes through a gap of over 6,000 ID’s. That usually means there is an AAAA pattern there.  AAAA could go to AAAB easily, but to go from ABAB to AAAA would take two crossovers. I don’t have a good idea where the crossover is, so I’ll go to gedmatch. The good news is that I have already tried using visual phasing on this Chromsome:

chr7vismapjon

The crossovers that I looked at above in my spreadsheet were on the maternal side. So that would be the top part of the bar (green-orange). It looks like I have 11 or 12 maternal crossovers, if I did it right. Looking at the top part of the image above, notice the non-match areas. These have no blue bar below and have red areas above. These are important. The reason is that if there is any of these areas at any place, there cannot be an AAAA pattern for maternal or paternal. That means that all 4 siblings cannot match the same grandparent in any of these areas. The only potential AAAA patterns, then are at either ends of the Chromosome or in the middle. The middle locations are about 60-70M. Also note that I have Rathfelder as the same match for each sibling from 56-70M.

There is a discrepancy between my spreadsheet crossovers (7) and the visual above (11 or 12). The other problem is that I need a double conversion from my spreadsheet. The spreadsheet is in ID’s which refers to Build 37 locations and Gedmatch is in Build 36.

Before I start converting numbers, I’ll look at what I have for the Dad Crossovers.

dadpatterncrossover7

Here I added a position number for the Chromosome (Build 37). This matches up with the visual phasing above. What is missing would be the crossover for Joel after an AAAA pattern at the beginning of the Chromosome.

Where is Heidi?

As I look at the maternal visual phasing, I see that Heidi has 3 crossovers. On my spreadsheet, she doesn’t have any. One can be explained as going onto the right end of the Chromosome to an AAAA pattern, but what about the other 2 crossovers, in the middle of the Chromosome? I got these positions from an old file where I compared myself to my 2 sisters. Then I put those in a spreadsheet and converted them to Build 37:

findingheidi

The Chromosome position numbers in blue were where I had Heidi’s crossovers. I then went to my Access Database.

Heidi found

heidi

Here is an ABBB Mom Pattern that I missed. Going through the list, I updated my crossover list:

updatedchr7xover

Now I am up to 12 Maternal Crossovers. The AAAA patterns tend to fit in naturally. Note next to the first blue ‘Joel’. There would be no way to go to an ABBB pattern to an AAAB pattern without 2 changes. That is why an AAAA pattern is required within the other 2 patterns.

Paternal Crossovers – Chromosome 7

crossoverschr7

Here I only show 2 crossovers, where on my map above, I show 3. I am just missing my own crossover from AAAA to ABBB. This is at the beginning of Chromosome 7. Here is my database table for my Dad Patterns:

chr7dadpattern

The position I have highlighted would still be an AAAA pattern as I have A??A. So that is the last position with that pattern. Id 285993 is the first spot I have the ABBB pattern, so I chose the crossover as ID# 285992 (under App. ID):

 

dadcrossover7

Here is what MacNeill had for 3 siblings at Chromosome 7:

macneill-chr7

What is now clear from have 4 DNA tested siblings is that my first crossover is paternal and not maternal. For my first crossover to be maternal, I would have had to have gone from an AAAA pattern to an ABBA pattern which would have been a double crossover. Having my brother Jon (the last ‘A’) tested made that clear.

Summary

In this Blog, I have looked at the Mom Patterns created by 4 siblings. Based on those patterns I have filled in alleles from other siblings. I have also filled in alleles for heterozygous siblings. This is based on the Mom allele being known and assigning the other allele as from the Dad. Then I looked at assigning crossovers to the various siblings. Based on the Patterns, it seemed clear who the crossovers should be assigned to. I then checked the crossovers I had with a visual phasing based on gedmatch. This showed where I was missing crossovers, which I was able to add using Chromosome 7 as an example.

Next: How to show the final results?

DNA Phasing of 4 Siblings When One Parent Is Missing: Part 8

Dad Patterns

In my last Blog, I looked at the Whit Athey 3 Principles and used MS Access to assign bases to the paternal or maternal side for the 4 tested siblings of my family. The next step is to look at Dad Patterns. I have been doing this by querying for a pattern and then scrolling down for start and stop positions. This has been quite tedious. It occurred to me that there may be another way to do this.

MS Access Min Max Functions

Access has a function that finds a minimum or maximum value in a group. In this case the group can be Chromosome.

AAAB Dad Pattern – Access to the rescue

 

aaabminmax

To get the total line I hit the summation [totals] icon in the Show/Hide Group. This adds a Group By to each field to in the Total row. Here I looked for the Minimum and Maximum ID for each chromosome for the AAAB Dad Pattern. That is where Joel’s base from dad was the same as Sharon’s. Sharon’s base from dad was the same as Heidi’s and Heidi’s base from Dad was different from Jon’s. Here is the output for the AAAB Dad Pattern:

aaabminmaxresults

This step has revolutionized my work as it saves me from scrolling through 100’s of thousands of dad base AAAB Patterns.  This takes about 2 minutes vs. the old way which seemed like an hour.

The upside of this method is that it is fast. The downside is that it only finds the minimum and maximum of a pattern within a chromosome. It doesn’t find all the breaks in the patterns within the chromosomes.

Using this method, in a couple of minutes I have 91 Start and Stop locations for all the possible patterns – except for AAAA.

Here are the sorted results for Chromosome 1:

dadpatternchr1

Note that there are some overlaps that will need to be resolved. However, there also clean breaks such as between ABBB and ABAB. ABBB stops at ID# 19797 and ABAB starts at 19837. Also note the last line. AABA has the same Min and Max ID#. This means that this is a single AABA pattern apparently within the AABB pattern.

Looking at the Table

In this step, I’ll look at tbl4SibsNewDadPattern and use the Access Pattern Mins and Maxes to get more accurate Start and Stop points. My spreadsheet above shows that ABAA starts at ID 52. I scroll up from there:

chr1tabledadpattern

At ID# 18 I see ?AG?. I can imagine that being an ABAA pattern, so why not start the ABAA Dad Pattern at ID# 1? Out of 680,000 ID’s, that doesn’t seem too much of a stretch.

Next it seems like the ABAA should stop somewhere before ID# 6605. I’ll hasten the process by a query that looks at the case where Sharon’s base from Dad is not equal to Heidi’s Base from Dad:

abaa-stop

Clearly, there is a break at ID# 5127, so I’ll use that.

chr1dadpatternstartstop

Here, I’ve added a finer Start and Stop for Dad Pattern ABAA. What that means is that in this segment of Chromosome 1, I got my DNA from one of my dad’s grandparents as did Heidi and Jon. Sharon got here DNA from the opposite paternal grandparent.

Here is the Start/Stops filled in:

chr1dadpatternfilledin

I highlighted the 57205 as a reminder that I needed to add an extra ABAA pattern in later. There is a gap between ABAA and ABBB of 1477 ID’s where there is a likely AAAA pattern, which means the 4 siblings got their DNA from the same paternal grandparent.

Finished Start Stop Dad Pattern Spreadsheet

I took out the single patterns and re-sorted by pattern. Then I wrote a formula to get the locations in Access language:

dadpatternstartstop

Next I made a copy of my working table in Access to a new table called tbl4SibsNewDadPatternFillin. I’ll use this to fill in the Dad Patterns.

Filling in the First AAAB Pattern

In this pattern, I will be filling in all the missing ‘A’s of the AAAB pattern. I won’t fill in the B as I won’t know if an ‘A’ or a ‘B’ belong there. Here is my first update query:

aaabupdate

This says if I am missing a base from dad in any of the AAAB Pattern areas that I am in and Sharon has that base, I’ll take the base she has. I can save a little time, by adding on to that query:

joelaaabfromsharonheidi

It is important to put the second ‘Is Not Null’ and ‘Is Null’ on a separate line as that is the ‘or’ line. Otherwise, I would only get the Sharon from Dad and Heidi from Dad bases where they equaled each other.

First I run the query to make sure it shows what I want.

aaabqueryex

It does [although, see below. For one thing I missed the ID criteria in the 2nd line of criteria!]. If I had the criteria all on one line, I wouldn’t have gotten the Heidi from Dad bases where Sharon is missing a base (ID# 63) and visa versa. I will want to check my query later, so I can check it at least two ways. One way is to check at ID# 63 and 99 to see if that base was added. The other way is to see if the Update Query updates 49094 lines as that is the number of lines in the above query.

When I went to run my query, I got this error:

udateaaaberror

Before I give up on this double query, I’ll try one more thing:

heidiorsharonaaabtojoel

Here I say if the conditions I mentioned above apply give either Heidi’s base from Dad or Sharon’s base from dad to me. I note that the update is for 49094 rows, so that seems on the right track. The reason why I don’t mind doing a double query here is that either Heidi’s base from Dad or Sharon’s should always be the same in an AAAB pattern.

I ran this and now I am checking ID# 63:

erroraaab

Unfortunately, Access gave me a -1 instead of Heidi’s C Base from dad. Part of why I wanted to do the one query is so I wouldn’t have to add the 2 queries. However, instead, I’ll just add a line to my base tracker:

basetrackernew

That means that I am back to my simpler query. Sharon should add 3975 bases from Dad to my bases from Dad:

3975row

Heidi was going to add over 2200 of her bases from Dad before Sharon gave me hers. Now it is a lower number:

heidibasestojoel

Now check Line 63:

line63

My base from Dad still isn’t filled in. But that is a good thing. When I checked my double query above, it gave me areas outside the AAAB Pattern area. ID# 63 is actually a different pattern. So that is why the number was so high also. The lesson learned is to keep the queries simple.

Now I’ve updated my Base Tracker for the AAAB Dad Pattern:

aaabbasetracker

Note that the Heidi from Dad Bases didn’t go up in the second round of this query. After she had gotten her extra Dad bases from me in the AAAB region, Sharon didn’t have any extra ones to give to her that I hadn’t already.

nodadbasestoheidifromsharon

AABA Fill-in

This time Heidi will be left out and Joel, Sharon and Jon will get new bases from dad based on others from the AABA areas. This is the same simple query as before, except that the ID#’s are different:

aabafillin

Here is Jon’s first bases from Dad from one of his siblings:

jonfromdad

This brings up an interesting point. There may be cases where Jon has a phased base at a location which his DNA test didn’t cover.

AABB Fill-IN

Here there should be Bases for all siblings. Wherever there is an A and an missing A, add it, and the same for B. Again my first query is the same except for the ID#’s:

aabbfillin

On the AABB bases from Dad, Jon doesn’t have a lot to add to Heidi’s bases, but Heidi has a lot to add to Jon’s:

aabbbasetracker

abaa dad pattern fill-in

Here we start with Joel being updated with Heidi’s bases from Dad because Sharon is the lone B.

abaa

There are more rows updated as the ABAA Dad Patterns had more regions than the other patterns.

In my last update query, I made a mistake:

jonfromjoelmistake

I’m not sure if it makes a difference. I said that in the case where my base from Mom is not null, give Jon my Base from Dad where he doesn’t have any. To check, I run the correct query:

abaaquerymistake

This shows that there are still 2063 bases that didn’t get added to Jon from my bases from Dad. I will add them now. Plus I will add that number to the previous 29113 bases I added to Jon’s bases from Dad from my bases from Dad.

abaatracker

As there were 3 siblings the same in this pattern, I again took 2 rows to add the bases to the table.

ABAB Dad Pattern Fill-in

ababtracker

Jon now has more bases phased than he had tested on his paternal side. He already had more than he had tested on the maternal side.

ABBA and ABBB Dad Pattern Fill-ins

basetrackersummary

As expected, Jon made out best in this Pattern Phasing.

Mom Bases From Dad Bases

This is the part of the project that seems ironic. My dad who wasn’t tested for DNA is now supplying bases to his children that were from their mom. Here I’m looking for where the siblings are heterozygous. In those cases where there is now a Dad base from the patterns and a mom base is missing, we can fill it in.

First, I am making another copy of my table called tble4SibsNewMomPatternFillin.

Here is my first Mom from Dad Update Query:

joeldadfrommom

It says where I am heterozygous and my Dad base is my 2nd one put my first base in as the base I got from Mom, but only if she doesn’t already have a base there. The last part is just an extra precaution so that I don’t overwrite anything.

In the next query, I just reverse the Joelallele1 and 2 to get 12,000 more rows of phased DNA:

momfromdad2

Summary of Mom Bases from Dad Bases

trackermomfromdad

Check the numbers

I have been adding up the rows added. But now I will check my table to see of the Total Bases Phased added up. And the answer is:

countfromtbl

The numbers are pretty close. The above Heidi from Dad is higher than my tracker. I’m guessing the table sums are correct and mine are a little off. The means that Heidi’s paternal phasing should be a little lower.

Part 8 Summary

  • The use of MS Access Min and Max functions to get Dad Pattern starts and stops saved a lot of time
  • It still takes time to verify those starts and stops
  • The Base Tracker makes it easier to track the numbers and the process. It is also interesting to see how the % phased goes up with each round of updates
  • I wasn’t expecting the numbers from my base tracker and actual updated bases to reconcile perfectly, but most of the numbers did. It is possible the discrepancies are from the 2 minor errors I made and tried to correct along the way.

 

 

DNA Phasing of 4 Siblings and One Parent: Part 7 (Starting Over)

In my last blog, I found a few errors when I was checking some odd results. This lead me to think that it would be better to start the phasing process from the beginning. The beginning means using 4 siblings’ raw data and my mom’s raw data. This time I will be more methodical and keep track of the results. I have a new spreadsheet called The Base Tracker. Every step that I take, it will keep track of the bases from each sibling when they assigned to a parent.

A New Table

First I’ll create a new table from the raw data. I’ll start with my mom, me and my 2 sisters as they are all tested using Ancestry Version 1.

3sibtable

I called the table tbl3Sibs.

Next, I combined tbl3Sibs with Jon’s Ancestry V2 results into a new table called tble4SibsNew. I made sure I had a right connect on the arrow. That means that I wanted everything in the 3Sibs table plus what was in Jon’s information. If I had left it an equal join, I would have lost the bases that are in Version 1 but not Version of the AncestryDNA results.

mergejonwsibs

It is important here to connect by rsid. I made the mistake of connecting by IDs last time. As the different AncestryDNA test results versions had different ID’s, this produced crazy results. I also used only Chromosome 1-22 as there are too many special cases for the X Chromosome.

tbl4sibsr1

Then I used a count function to count the number of bases each sibling had. I also figured out how many blank lines there were out of the 682549 and subtracted those 8229 sibling blanks from the total to get 674,320. I’ll use that number to figure out the percent phased. This is the Count Query showing the Totals button in the Access Ribbon:

countrawbases

The results of this query were put in the RawBases Row below.

My New Base Tracker: % Phased

basetracker

The first column has the step taken. P1 is Principle 1. JoelFD is the Joel from Dad column, so all the Dad bases are on the left and mom bases are on the right. This table will give me the % phased for each sibling.

Principle 1 Query – Homozygous Siblings

This Principle is on the Principles from a Whit Athey Paper where you have 2 bases the same and each one is from each parent. The last time I did this, I may have had too much in a query at a time. This time, I’ll do the query separately for each sibling.

First, I opened up my tbl4SibsNew in design view and added more fields to put the new dad and mom bases.

newbasefields

First, I copied the table, so I’d have the raw data table with no additions. I called my new table tbl4SibsNewPrinciples. That is where the phased bases will go.

Here is a simple Principle 1 Update Query for me:

joelprinciple1

It says where I am homozygous, put both those bases in my JoelFromDad and JoelFromMom columns in the new tbl4SibsR1Principles.

joelprq

That little query phased over 900,000 of my bases into Paternal and Maternal sides.

I was interested in seeing the effect of Jon’s testing using AncestryDNA V2:

jontracker

Jon has a ways to go to catch up on being phased. This is due to the differences in AncestryDNA V1 and V2.

Principle 2 – Homozygous Mom

Here if my mom has the same base twice, one of those has to go her child. Here is a query to update my mom bases. As my dad’s DNA was not tested, he gets a non-applicable in that column.

joelpr2

Note that I have a criteria ‘Is Null’. This means only update this base if there is a blank there already. Here is the Principle 2: Homozygous Mom summary:

p2summary

Here I don’t know why my Principle 2 Bases were so low. I think it is because I made a mistake above, so I’ll do these steps over from the beginning.

Here I get more consistent results for my mom bases:

pr2joelfix

Here is the revised Principle 2 Summary:

prtrackerrev

Jon’s results also changed to be more realistic to where he was after Principle 1. I can also use the Access Count function to check these numbers:

countpr2

All the numbers match up except for JonsFromMom. For some reason, the spreadsheet is showing a higher number of Total Bases from Mom for Jon of 540956. If I subtract that from his Principle 1 bases from Mom, I get 272250. I’ll put that in as his Principle 2 bases from mom and assume that I made a mistake in writing down Jon’s Principle 2 base from Mom number.

pr2summaryreconciled

I suppose it’s like reconciling my bank statement. I assume that these are Jon’s mom bases filling in where Jon didn’t have test results that lined up with the AncestryDNA V1 results for his mom and siblings.

Moving On To Principle 3: Heterozyous Siblings

This works when the child is heterozygous and has one base phased to one parent. Then the other base is phased to the other parent. It appears that this would have to work just from the mom side for now to fill in the dad side. That is because we haven’t filled in the ‘fromDad’ side with any Heterozygous sibling results yet.

pr3joel

This query says in the situation where I am heterozygous and I get my allele2 from mom, assign my allele1 to be from my dad. But only do that where there isn’t already a JoelFromDad base there.

However, this raises a question. Here is the same query without the ‘Is Null’ criteria:

pr3joellarge

As you can tell, I am beginning to doubt my work. The question is, if there has been no previous addition of Joel bases from dad based on my heterozygous results why is there a difference between the two queries?

I checked Sharon’s results and found that she didn’t have the same situation. Where she was heterozygous, she didn’t have any bases from dad assigned to her.

Here is a query showing my problem:

p3problem

It is not a problem for phasing, but more for what I will enter into my Base Tracker. Fortunately, I can do a Count Query:

countjoelfromdad

This shows that my JoelFromDad bases have gone up by 25589 somehow since I last tracked them. This means that I should use the larger number for my Base Tracker.

Here is the Principle 3 Summary in my Base Tracker:

p3summary

In a few hours, I’ve phased over 4 million bases. And that time includes making mistakes and fixing them. All siblings are phased at over 80% at this point except for Jon. His Paternal phasing is lagging at only at one half.

I suppose that this is the time for me to say that it takes 20% of your time to get 80% of the result and 80% of your time to get the last 20% of your result.

Summary Part 7

  • After making mistakes, it feels good to start with a clean slate
  • Principals 1-3 of the Athey paper are easy to implement using MS Access
  • If a mistake is found, it is usually good to start from a clean table of data and fix it from there
  • The Patterns don’t lend themselves as well to Access and take more time to get
  • Having a Table to track the work and results is helpful and interesting.
  • In the next Blog (Part 8), I will be back looking at filling in the Patterns areas

Raw DNA Phasing of 4 Siblings Using One Parent’s DNA: Part 6

In my last Blog, I was still playing catchup in going from my original 3 sibling phasing, to incorporating my brother’s new DNA results.

Missing Principle 2 for Jon

Here is Principle 2 from the Whit Athey Phasing Paper I’ve been using:

Principle 2 –If data from one of the parents are available, and that parent is homozygous at a SNP location, then another almost trivial phasing is possible
since obviously that parent had to send the only type of base s/he had at that location to the child
I checked this in MS Access. Here is the query:
homozygousmom
This says if mom is homozygous, here allele1 is the same as her allele2. For those if Jon has null values in his FromMom column, then I skipped this step.
homomomerror
Clearly, I did mess this step from position one. As I was doing my previous steps, I thought that Jon’s results were very sparse.
principle 2 fix

For this, I will again use the update query.

homomomfix

In this case, I didn’t bother writing ‘Is Null’ under the JonFromMom column. That is because even if there is something in there, I would just as soon overwrite it, as this is such a basic principle. I only missed 481,000 rows.

second part of fix

Now that I have mom’s bases, I will go back and fill in Jon’s dad bases based on his mom bases. Those are also Principle 2 fillns where Jon is heterozygous. I don’t mind doing these updates in Access as they are so easy.

dadsbasefrommomjon

This says in the case where Jon is heterozygous and his mom has allele1, put Jon’s allele2 in as Jon’s allele from Dad. This query says if a Jon’s has allele1 from his Mom, the allele2 has to go to his dad.

jonallele1fordad

So that is an easy way to update over 7,000 rows in a few minutes.

Next, On to Mom Patterns

It’s a good thing that I added these mom bases to Jon, because now it is time to look at mom patterns. From Athey:

In the next step, we use the pattern on the mother’s side to fill in as many more cells as possible. Finally, we can project the information in those newly filled cells back to the father’s side using Principle 3 again.

 This procedure will be the same one that I used for the Dad Patterns.
aaab mom pattern
 I might as well go in alphabetical order. In this pattern, Jon will not match the other siblings.
aaabmompattern
This works, but it doesn’t include the areas where there are missing mom bases. So I will use it to get rough ID’s. There were about 45 AAAB Mom Patterns that I found. Perhaps the rough ID’s will do.
AAAB Quality Control

My spreadsheet counts the numbers of ID’s between patterns.

aaabqc

619 is close to the cutoff that I had set. I went back to the original spreadsheet and found other AAAB patterns between the Stop and next Start. So I can combine those 2 AAAB Chromosome 15 patterns. I checked another pattern with about 700 ID’s from the Stop to the next AAAB Start. However, there was another pattern between, so those were a valid Stop and Start. There were about 45 AAAB Mom Patterns or about 2 per chromosome which seems like a lot.

ABAA Mom Pattern

The query should be similar to the previous one. If Sharon isn’t the same as her siblings, we will have an ABAA Pattern.

abaaqpattern

This pattern was easier to figure out. There were about 35 of them.

aaba Mom pattern

This is the one I should have done second if I had wanted to stay in alphabetical order. I checked a few with differences of about 500 between Stop and next Start, but they looked OK. There were a few single allele patterns.

aabb mom pattern

I have 3 criteria for this one:

aabbmompatternquery

I had to enter that Sharon’s allele from mom could not be the same as Heidi’s allele from mom or I would get a lot of AAAA Patterns. When I looked for these, there appear to be 19 AABB maternal patterns.

abab mom pattern

Again, this is a bit out of alphabetical order. This query is not unlike the previous one.

ababq

When I make Heidi’s mom base different from Sharon’s mom base, that gives me the ABAB pattern:

ababresults

Here I have Excel on the left where I am entering the results from the Mom Patterns that I found in Access.

ababworksheet

The jump in Chromosome 4 from position 6M to 37.9M indicates a change in pattern. That is entered in Excel on the left. The change from the previous pattern is shown as 7544 ID’s. ID’s should be the same as SNPs.

A change in Chromosome is an obvious Stop and Start:

ababex

There were about 30 ABAB mom patterns for me and my 3 other siblings. I’ve done:

  • AAAB
  • AABA
  • AABB
  • ABAA
  • ABAB
abbb mom pattern

It looks like this must be the last Mom Pattern. This is the mom pattern where I show my individualism – unlike my siblings who have the same mom base:

abbbq

Here’s an ABBB example:

abbbex

In this case on Chromosome 9, there is a jump from position 38M to 71M. However, the SNP (or ID) count between the two is only 190. That means this must be an area where the SNPs are not counted for some reason, so I would think that I could continue the Mom Pattern through that area. However, when I look at my Access table, I see this:

chr9ex

Above ID 370485 is a different pattern of AABB in the last four columns. This would have come out when I merged all my patterns and I would have had to fix it then. However, I might as well get this as good as I can now. As it is, there will be a discrepancy to work out:

chr9discrepancy

The AABB pattern started at 369193 which is before the ABBB Pattern stopped at 370295. This means I need to go back to the Table:

 

chr9problem

 

Here is position 370295 where I had the ABBB Pattern ending. However, this is a a very small pattern, going only up to ID# 370290. Before that is the AABB pattern again. Here the AABB Pattern picks up again.

chr9aabb

Here is how I corrected my Chromosome 9 Mom AABB Pattern:

chr9aabbcorrected

However, note that I had to break my 500 ID/SNP rule. That 51 represents the tiny ABBB Pattern between two AABB Mom Patterns.

Here is the start of the AABB pattern at 369193:

morechr9issues

First note, that it would actually start at 369192. Before that is a single ABBB pattern. Then above that in the first row is an ABAA pattern. The first row is the end of an ABAA Pattern that I already recorded in my spreadsheet at ID# 369181, so that doesn’t need to change:

chr9abaa

At 369190 there is a single pattern of ABBB. This will be noted in my spreadsheet, but not entered as a start/stop position.

Re-Sort the Mom Patterns by Pattern

Now I have 426 lines of Mom Pattern Locations. I need to sort them by pattern and hope there are not many weird issues like I found in Chromosome 9. I will also take out the single patterns. When I do this, I get quite a mess. Here is Chromosome 1:

chr1mompatternsorted

Here we have quite a few nested patterns.

chr1fix

The first AABB pattern is a single, so I can take that out, but what do I do with the AABB Stop? It looks like that was a single also, so I can take that out.

chr1fix2

The AAGG is between a CTTT and an AGG? which would turn out to be an AGGG. What I had previously described was a single pattern going to another single pattern within a valid non-single pattern.

Next, starting at ID# 6608 I have three starts in a row which cannot be good. Looking at the first two patterns of ABAA and AABB, they look like they could be good.

fixchr1

I’ll add a ‘G’ where the cursor is above and call that the end to a very short ABAA Mom Pattern.

chr1fix3

Here is the corrected ABAA stop. I highlighted the next ABAA Stop in yellow as that will need work.

Next I’ll look at the ABBB Start at 19885. It looks like I missed the previous AABB Stop at 19884.

chr1fix4

 At least that makes for a clean cut. I made a note of my correction:
chr1fix5
I also made note to look at the next AABB Stop (in yellow). Now there is a Start for an ABBB followed by a Stop for an AABB which looks fishy. Here is the area following ID# 19885:
chr1table
It seems that there are about 5 ABBB patterns followed by a single AABB Pattern, a single ABBB pattern and another AABB Pattern. As this looks confusing, let’s look at the full table for the single ABBB Pattern area at ID# 19905:
fullspreadsheet

Time for Quality Control

Are there any errors here? Principle 1 says that if a person is homozygous, then one base is from the dad and one is from the mom. I have CC and Jon has TT. My assignment is correct, but Jon is missing a T from his dad.
Let’s look at this Query:
jonqc
This looks for missing Dad bases for Jon that should be there where Jon is homzygous. It turns out he is mising about 1300 results:
jonqcresults
I ran this query to see if Jon was missing any mom bases and he wasn’t. I also ran this query for myself and saw that I was missing dad bases. I will have to re-run this update to the current table. This is not a problem as this is an easy thing to do in Access.

Just Like Starting Over

Based on the errors that I’ve found, I will start from scratch in Part 7