Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenate and merge info, preserving conflicts as lists. #691

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joseppinilla
Copy link
Contributor

If there is interest in merging info when concatenating samplesets since right now all info is ignored.
This preserves conflicts by listing them, but squeezes unique values.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2020

Codecov Report

Merging #691 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #691      +/-   ##
==========================================
+ Coverage   91.85%   91.86%   +0.01%     
==========================================
  Files          60       60              
  Lines        4222     4229       +7     
==========================================
+ Hits         3878     3885       +7     
  Misses        344      344              
Impacted Files Coverage Δ
dimod/sampleset.py 94.32% <100.00%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ec6944...370123d. Read the comment docs.

@arcondello arcondello added the enhancement New feature or request label Aug 27, 2020
@arcondello
Copy link
Member

Hmm, I am worried that the conflict resolution would lead to inconsistent/confusing results. I wonder if we could combine it with #643 by always making a list, even when the values are identical?

Copy link
Member

@randomir randomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice feature, but I share @arcondello's concern about consistency.

I suggest adding an optional info_merge_strategy argument (or similar) which would control how info fields are combined:

  • "squeeze" could be your current implementation,
  • "list" would always form lists,
  • "drop" would skip the merge altogether,
  • "inplace" or "recursive" would recursively combine all info dicts,
  • custom callable would allow merge delegation to user's custom function of n dict args.

Inplace/recursive merge might be an overkill, so we can drop that one for now.

In addition, you'd probably want to make copies before merging them, due to dict's mutability.

@joseppinilla
Copy link
Contributor Author

I was considering one option flag, e.g. squeeze_info=False which would either behave like "list" by default or "squeeze" if True.

I see now how "drop" would allow compatibility with the current implementation, but I'm not sure if adding more flexibility is necessary.

@randomir
Copy link
Member

I prefer categorical option to boolean because it allows future expansion. Accepting a callable (in addition) is trivial and provides an absolute flexibility.

@joseppinilla
Copy link
Contributor Author

One reason I didn't originally implement "list" is because I had to decide whether lists are always len(samplesets), e.g filled with None, or if they only list values of existing keys.

I think I'd go for the latter, even though it wouldn't allow tracking.

@randomir
Copy link
Member

randomir commented Aug 27, 2020

IMO, "list" should include fields from all samplesets. We might consider splitting "list" into "list-all" and "list-existing", although "squeeze" is conceptually already very similar to "list-existing". @arcondello, thoughts?

@joseppinilla
Copy link
Contributor Author

squeeze may not be the right name since it's not the same behaviour as numpy squeeze, it only lists conflicts... so... "list-conflicts"?

@randomir
Copy link
Member

I don't see how numpy.squeeze is relevant here, but I agree it's too general. unique sounds much better.

@arcondello arcondello changed the base branch from master to main November 27, 2020 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants