Skip to content

Added an x_chr example + doc polish #199#215

Draft
gregorgorjanc wants to merge 2 commits intodevelfrom
gregorgorjanc/issue199
Draft

Added an x_chr example + doc polish #199#215
gregorgorjanc wants to merge 2 commits intodevelfrom
gregorgorjanc/issue199

Conversation

@gregorgorjanc
Copy link
Member

I have added a simple x_chr example in line with #199. It shows one issue with dosages (they are wrong for males, unless I have wrong tinyhouse? no, I pulled the most recent and still get the same results!) and one issue in the haplotype for one individual.

See the file examples/simple_example_x/simple_true.txt for the true X chromosome haplotypes and genotypes. You will have to also look at simple_pedigree.txt in the directory to make sense of the example.

I propose the following. @AprilYUZhang and @XingerTang can you please have a look at my changes (I only formatted the docs) and if the example makes sense for X chromosome inheritance (with recombination here) and we get this merged in. Then we go and address the problem with the dosage and the odd 9 in haplotype in a separate issue. I suggest we leave the simple_true.txt in the directory the way it is and then we polish it in the next issue.

Copy link
Contributor

@XingerTang XingerTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me!

@XingerTang
Copy link
Contributor

Hi @gregorgorjanc , I think what you obtained from the called haplotypes, phased genotype probabilities, and (unphased) genotype probabilities are all expected.

The 9 of the individual F in the output haplotype is caused by the call threshold set as 0.5, since none of its four phased genotype states has a probability over 0.5, it is set as unknown. You may find that for the same individual and same locus, the genotype output is correct and gives a confident 1, which should suggest what the haplotype output should be at the same locus. This is determined by the output process and is what we are going to solve for Issue #211

The phased genotype probabilities and (unphased) genotype probabilities outputs are expected because we chose to use two states of a pair of regular chromosomes to represent one state of the X chromosome. It is assumed to be read in a way that sums up the first and third rows and then compares the sum of the second and fourth rows for the phased genotype probabilities. While for the unphased, it simply mixed the middle two rows of the phased genotype probabilities together, so the same logic applies. We can create special output functions for the X chromosome for the genotype probabilities in the future to make it easier to interpret.

The dosage does require further normalisation to make sense. For X chromosomes, the dosage is essentially the second row of the phased genotype probabilities. Because in the case of the X chromosome, we use the probability of two states to represent one state, its probability is split up into two parts. So the actual dosage should be two times the output.

For the interpretation of the segregation probabilities of X chromosomes, we can look at the first two rows and get insight into which haplotype of the mother is inherited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants