-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalise simulateAlleleFreq()
function to any ploidy
#6
Comments
Hi Hugo, |
Hi Ben, I'm no polyploidy expert either, so it seems sensible to discuss with someone more familiar with polyploid genetics! I suppose the way I put it made two assumptions:
If that's not true, then the model would be wrong, I guess. As you say, a more "realistic" model might be hard, because it probably depends on homology between chromosome copies, which will probably vary between chromosomes, individuals, varieties, species, etc... I suppose if the function was made general, this could be explicitly mentioned in the documentation and it would be up to the user to decide. Also, if the default ploidy = 2, then a substantial number of users don't even have to worry about this. 😄 |
Im glad you are thinking about this. We have been trying QTLSeqR in some pooled data of hybrid backcross autotetraploids segregating for a major gene where the minor allele frequency of interest is 0.25. Results look sensible (similar to Popoolation CMH but sharper!) but Im interested how default expectations might not fit our system. |
Has anyone ever followed up on this? I would be very interested in learning what code modifications might be employed to facilitate better default modeling when running QTLSeqrR with a tetraploid species. |
In #5 I asked about the
bulkSize
option inrunQTLseqAnalysis()
if:The answer is no,
bulkSize
should be the number of diploid individuals. I see this is right, because of the way the null expectation is being simulated.I guess there are two levels to the simulation, because there are two levels of sampling:
simulateAlleleFreq()
function).simulateSNPindex()
function).I hand't noticed but indeed the first level of the simulation is assuming individuals are diploid, because it samples diploid individual genotypes (
c(0, 0.5, 1)
with probabilities relating to the expected segregation ratios in an F2 (c(1, 2, 1)/4
).But what if one is working with higher ploidy? Then the above simulation would not work.
However, the way it is implement at the moment, I think is equivalent to sampling from a binomial with probability of the event (picking an alternative allele) being 0.5 and number of trials being equal to the number of alleles sampled (2 x number of individuals).
To illustrate with code:
I guess the advantage is that this is general, regardless of the ploidy (besides the bonus of being faster).
I think the RIL implementation is already general as it is, because in that case we assume the individuals are fully homozygous, in which case they are equivalent to "haploid" organisms. In any case, I think the implementation can be also be made faster, by sampling from a binomial:
@bmansfeld please do check all of this, as I might be making some wrong assumption somewhere (I should also probably go and read the Tagaki paper in more detail! 😄)
The text was updated successfully, but these errors were encountered: