Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vertical ordering of variants in beadplots #58

Closed
ArtPoon opened this issue Jun 1, 2020 · 25 comments
Closed

Optimize vertical ordering of variants in beadplots #58

ArtPoon opened this issue Jun 1, 2020 · 25 comments
Assignees

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Jun 1, 2020

Presently, the length of a vertical edge bears no relation to the genetic distance (i.e., number of mutations) it represents. Variants are ordered by pre-order traversal, which helps keep subtrees together.

Is there a more optimal arrangement of variants possible?

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 4, 2020

The beadplot we use for the README document actually provides a useful example:

The two beads at the bottom of the plot the following distances to the "baseline" variant at the top of the plot (Scotland/EDB058):

  • Wales/PHWC-2B746 - 1 mutation
  • Scotland/EDB2095 - 1 mutation

but the variant Netherlands/NA_17 is near the top of the plot and has a distance of 4.02 mutations.
The cluster of samples from Australia (rooted by Australia/VIC774) is located "above" the Wales and Scotland variants and has a distance of 4 mutations.

There are total of 9 child variants that descend from Scotland/EDB058.

@ArtPoon ArtPoon added the question Further information is requested label Jun 5, 2020
@ArtPoon ArtPoon self-assigned this Jun 15, 2020
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 20, 2020

Now sorting vertical position of variant by number of mutations from parental variant, using collection date of earliest sample of variants to break ties:

covizu/scripts/hclust.R

Lines 67 to 74 in 332b6c5

row <- tn93[which(headers==node), ]
adj.dists <- row[match(children, headers)]
adj.dates <- as.Date(gsub(".+\\|([0-9]+-[0-9]+-[0-9]+)$", "\\1", children))
children <- children[order(adj.dists, adj.dates)] # increasing
for (child in children) {
edges <- traverse(child, node, el, edges)
}

@ArtPoon ArtPoon closed this as completed Jun 20, 2020
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 25, 2020

New idea (the above isn't bad, but this could be better) - instead of ordering children by genetic distance and then collection date for tie-breaking, what about genetic distance and then total number of descendant variants? The problem is that some variants that are only 1 mutation away are being bounced downwards by an earlier variant that is ancestral to a large number of descendant variants.

@SyouTono242
Copy link
Collaborator

SyouTono242 commented Jun 28, 2022

Sorting with ascending recursive width (last date sampled for all children - first date sample for all children) and then initial sample date for all subtrees.
It seems to be working now but the unsampled ones are sticking out for some reason. I think it has something to do with NA being added to the total length... I'll be trying to assign them some values (e.g. max date on the graph) to see if we can solve the issue.
image

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 28, 2022

Unsampled lineages can inherit the earliest sample date from their "parent" lineage (if that is sampled) - if parent is also unsampled, then proceed down the tree until you reach a sampled parent.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 28, 2022

We should consider re-rooting these neighbor-joining trees so that the sampled lineage with the earliest sample date (and also largest number of descendants as a tie-breaker?) becomes the root.

@ArtPoon ArtPoon removed the question Further information is requested label Jul 5, 2022
@ArtPoon ArtPoon changed the title Front-end: Can vertical ordering of variants in beadplot be optimized? Optimize vertical ordering of variants in beadplots Jul 5, 2022
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 5, 2022

@SyouTono242 can you please start commiting your code changes to a branch of this repo? thanks!

@SyouTono242
Copy link
Collaborator

@ArtPoon Yes... But I was an idiot who forgot to checkout and I accidentally committed and pushed my script to master... I am extremely sorry; I shouldn't be checking GitHub at 4am and I learnt my lesson hard... I didn't commit anything else than the one single script I wrote, and as it's a standalone script it shouldn't interfere with any other existing file. Please roll back masters if you can... Again I apologize for all the inconvenience if anyone else is fetching the incorrect changes. I'm really sorry.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 12, 2022

Haha whoops! Don’t stress about it @SyouTono242 - accidents happen and I’ve made the same mistake before. @GopiGugan can you please roll back this commit and @SyouTono242 can you push your changes to the dev branch?

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 12, 2022

Code commited to dev branch, incorporate into a PR and review affects of new code on beadplots

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 19, 2022

Keep an eye on this new code while reviewing PR #409

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 27, 2022

Let's work on porting this into JS for the next PR (after #409)

@ArtPoon ArtPoon removed this from the The backburner milestone Aug 9, 2022
@ArtPoon ArtPoon assigned GopiGugan and bonnielu and unassigned SyouTono242 Aug 16, 2022
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Aug 16, 2022

@GopiGugan to generate toy data sets to run beadplot.py script
@bonnielu to port @SyouTono242 's code from R to Python (in beadplot.py)

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Aug 24, 2022

Apply sorting algorithm directly to data structure before it is serialized as a JSON file

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Aug 30, 2022

@bonnielu has test data from @GopiGugan , code analysis in progress

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Sep 13, 2022

porting is finished, @bonnielu requesting larger test data

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Sep 20, 2022

We need to write some unit test fixtures (trees that should be re-ordered a specific way) to determine why these test data sets are not being changed.

@bonnielu
Copy link
Collaborator

bonnielu commented Oct 4, 2022

I tested it out using the data that Gopi gave me and this is what the beadplots look like after reordering by date! The larger lineages do seem like they have a tendency to line up if we reorder by this method.

image
image

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Oct 4, 2022

Can you please post the "before" layouts?

@GopiGugan
Copy link
Contributor

Screen Shot 2022-10-04 at 1 54 35 PM
Screen Shot 2022-10-04 at 1 54 57 PM

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Oct 4, 2022

Ok thanks! I'm willing to call the first, smaller beadplot an improved layout. Difficult to say for the second one - but I wonder what the heck is going on with all those samples taken on the same date?

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Oct 4, 2022

Can we please see what the beadplots look like if we prioritize earliest sampling date, and then tree traversal?

@bonnielu
Copy link
Collaborator

This is what it looks like if we prioritize sorting by (last date - first date), earliest sampling date, followed by level-order tree traversal.

Screen Shot 2022-10-11 at 11 03 42 AM Screen Shot 2022-10-11 at 11 03 28 AM

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Oct 11, 2022

Thanks for implementing this different approach @bonnielu - after reviewing the outputs, I think we should go ahead with the first revision (Yiran's version). Please revert to that algorithm and submit a PR to dev.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jan 23, 2024

I don't think we're going to find a good solution to this - the problem might be better addressed by collapsing variants (#434)

@ArtPoon ArtPoon closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants