Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.copy silently adding name attribute #1402

Open
yck011522 opened this issue Oct 22, 2024 · 3 comments
Open

Data.copy silently adding name attribute #1402

yck011522 opened this issue Oct 22, 2024 · 3 comments
Assignees
Labels

Comments

@yck011522
Copy link
Contributor

Describe the bug
I found that the Data.copy() mechanism somehow added the name attribute in the json_string after the copy. This affects native compass geometry classes (e.g. Frame) and also classes that I have inherited from data.

The addition of this attribute only happens in the to_jsonstring() but not if I read the __data__. However, it changes the result of sha256() and the copied object will return a differnt hash. (see example code below)

Now, I'm not sure if this is the intended behavior (for version control?). My goal is to use a hash function to compare the data content of the objects. I assumed the sha256() would be for this purpose, but maybe it is not? If not, can you maybe clarify what is the best practice for comparing the data content between two objects, especially when the data contains geometry, str, and list of things.

To Reproduce
The following example shows not only a problem related to the addition of name attribute, but also floating point difference during the copy of the frame. Both of which would throw off the hash comparison.

if __name__ == "__main__":
    import compas
    print(compas.__version__)

    frame = Frame([1, 2, 3], [0.1, 0.2, 0.3])

    print(" to_jsonstring(): ")
    print(frame.to_jsonstring())
    print(frame.copy().to_jsonstring())

    print(" __data__: ")
    print(frame.__data__)
    print(frame.copy().__data__)

    print(" sha256(): ")
    print(frame.sha256())
    print(frame.copy().sha256())

output:

2.4.2
 to_jsonstring():
{"dtype": "compas.geometry/Frame", "data": {"point": [1.0, 2.0, 3.0], "xaxis": [0.2672612419124244, 0.5345224838248488, 0.8017837257372731], "yaxis": [-0.16903085094570336, 0.8451542547285167, -0.50709255283711]}, "guid": "fe05d20f-69b4-4bc0-ba25-88e8d1404933"}
{"dtype": "compas.geometry/Frame", "data": {"point": [1.0, 2.0, 3.0], "xaxis": [0.26726124191242445, 0.5345224838248489, 0.8017837257372732], "yaxis": [-0.16903085094570341, 0.8451542547285167, -0.5070925528371101]}, "name": "Frame", "guid": "3a58794d-273c-4285-b875-cbff06e68c34"}
 __data__:
{'point': [1.0, 2.0, 3.0], 'xaxis': [0.2672612419124244, 0.5345224838248488, 0.8017837257372731], 'yaxis': [-0.16903085094570336, 0.8451542547285167, -0.50709255283711]}
{'point': [1.0, 2.0, 3.0], 'xaxis': [0.26726124191242445, 0.5345224838248489, 0.8017837257372732], 'yaxis': [-0.16903085094570341, 0.8451542547285167, -0.5070925528371101]}
 sha256():
b'\xa7\x1c\xea+U\xd1\xd4\xea%u\xb5\x86+r\x10\xc4\xcb\x13\xc3\xb0\xa3\xb3\xfaK\xc6 Rt\x974\x83\x8e'
b'S\x9f\xe1\xdb9.\xf4\xc6\xb61f\xb8\x87\xee\x16@\x15}\xb1\x1c\xb4\xbff\xdf^5\xe1\xf4\xe3\x1f\xce\xb6'

Expected behavior
I expect the frame.sha256() and frame.copy().sha256() to return the same results.

In general I want a copy mechanism that would actually return me the same object with the same data (I don't know what is the deal about the guid though, perhaps users can have a choice to copy the same guid too) . And that I want to be able to verify the result of that copy using some comparison function. I hope that these two functions would act like a pair, such that I can check my class implementation to make sure I did the __data__ right.

myclass.data_hash() == my_class.some_copy_data_function().data_hash()
@tomvanmele
Copy link
Member

yes, the hash method is meant for comparisons but is a bit experimental and not very well tested.

the addition of the name is indeed unintentional and can be easily fixed (will do). what is a bit more difficult to solve is the introduction of small numerical differences, due to subsequent operations like unitized being applied to the input data...

>>> vector = Vector(0.1, 0.2, 0.2)
>>> vector.unitized()
Vector(x=0.2672612419124244, y=0.5345224838248488, z=0.8017837257372731)
>>> Vector(*vector.unitized()).unitized()
Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)
>>> Vector(*Vector(*vector.unitized()).unitized()).unitized()
Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)

the first time around, unitized is applied to Vector(0.1, 0.2, 0.3).
the second time it is applied to Vector(x=0.2672612419124244, y=0.5345224838248488, z=0.8017837257372731)
and after that always to Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)
which is when the number stays completely stable...

@tomvanmele
Copy link
Member

perhaps hashing needs to take some kind of tolerance into account...

@yck011522
Copy link
Contributor Author

Cool. Thanks for the quick reply. So I guess I hold back at using hash for comparing geometry for now.

I remember a while ago there was this concept of using geometric keys (some form of string representation and truncation) for comparison. And of course now comparison is much more robust using Tolerance class. I guess, comparison between geometry classes should still rely on the eq functions that can be customised.

The thing about using hash as a comparison kind of implies that it is fast for me. I don't know enough. Perhaps hashing floats is just generally a bad idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants