-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYNPY-1322] Object Orientated Programming Interfaces #1013
Conversation
Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-01-22 20:36:28 UTC |
synapseclient/api/annotations.py
Outdated
""" | ||
annotations_dict = asdict(annotations) | ||
|
||
# TODO: Is there a more elegant way to handle this - This is essentially being used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another dataclass library that adds some more functionality. Its asdict
function has an exclude
argument that you can pass to leave out class attributes you don't want in your dictionary:
from dataclasses import dataclass
from dataclass_wizard.dumpers import asdict
@dataclass
class Foo:
bar: str
baz: int
foo = Foo('bar', 42)
print(asdict(foo, exclude=['baz']))
> {'bar': 'bar'}
You'd still be hard-coding the excluded values though. Not sure of any other ways aside from implementing an asdict
method specific to the class that excludes attributes not used in the API.
Edit: if you used a base class you could potentially implement an asdict
function that could be reused across all extending classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea - but i'm starting to think in the other direction now. What I mean by this is specify only those things I want to include and have a dataclass -> json/dict mapping process to format the input for the REST API as the API is expecting.
synapseclient/api/annotations.py
Outdated
filtered_dict = {k: v for k, v in annotations_dict.items() if k != "is_loaded"} | ||
|
||
# TODO: This `restPUT` returns back a dict (or string) - Could we use: | ||
# TODO: https://github.com/konradhalas/dacite to convert the dict to an object? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the look of dacite
, but is that more efficient than just wrapping the response in the class object like we have done elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet - I'll have to give it some more thought.
One of the things we will need to do it at least have a thin translation layer between the REST api and the dataclasses because the names we are giving them in the python client are snake_case, vs the REST api is all in camelCase.
@BryanFauble Have you thought about having some sort of |
@BWMac Yes, however inheritance is an easy way to have large refactoring down the line for simple changes. Especially in cases where a base class is a part of many parent classes. IMO The easiest way I've seen to deal with it is to just replicate things where they are needed. I do understand it adds some duplicated code, but the duplication far out-weighs the negatives. |
I definitely agree that we should use inheritance carefully. In the case of Entities, technically FileEntities do derive from Entities, so it could make sense for that to use inheritance to avoid the duplication of code across 7 entities: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/Entity.html and having to update the code 7 separate times if something changes. Each implemenation comes with its pros/cons. It definitely makes sense to use MixIns/composition for different functionality for entities like
|
True - But also a recent example I came across is that a Project inherits an Entity, but I do like composition though to contain common functionality. Excellent discussion, let's continue this further when I get more models in place and we can combine as we see fit when we find overlapping functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I found the example scripts to do a good job of demonstrating the power of the OOP changes. As a user I would much prefer this way when using the client within other python packages that I work on.
test_scripts/oop_poc_table.py
Outdated
# Querying for data from a table ===================================================== | ||
destination_csv_location = os.path.expanduser("~/temp/my_query_results") | ||
|
||
await Table.query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somthing about using a method from the Table
class outside the context of an instance of that class, but then feeding an attribute of the instance copy_of_table
rubs me the wrong way.
Not sure how else it should be done though since you have to pass the query string as an argument to the query
function somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is one of the things that I was struggling to figure out the best practice for. One thing to clarify is that we are not passing an instance of copy_of_table
, in this case I am just grabbing the ID of the table. This is equivalent to these lines:
table_id_to_query = copy_of_table.id
await Table.query(
query=f"SELECT * FROM {table_id_to_query}",
result_format=CsvResultFormat(download_location=destination_csv_location),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, not an instance itself, but an attribute of that instance.
It would feel better if you could do something like
copy_of_table.query(query_args)
So that it reads like you are querying the table you have already. But formatting the query string could be weird and it's not like we don't want to let people do Table.query
to query a table already in Synapse that they don't care to read into a Python object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very much a future work/off topic thing, but what if we didn't make people pass the whole query string themselves?
We could use something like jinjasql to allow people to pass parameters that the Table
class could format into a query. Then someone could either pass the id of a table they want in an uninstantiated case or when there is an instance of the Table
class we are already using, we just grab the id
on the back end and plug it into the query.
Unfortunately, it looks like jinjasql
has not been maintained for a while, but maybe there is something else out there that could provide similar functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting library. My concerns when we get to solutioning on this are around continuing to provide the same if not more functionality for folks querying for data.
I had considered something like a "reference to my class instance" string that you could add into a query like:
select * from :my-table-instance-id
, but that relies on those querying for the data to add in that string constant.
By not tying the query to a specific instance of a Table it makes things like joins easier to work with. SQL can be very complex and it'd be hard to implement everything we want into a Python class or semi-structured way to doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 I'm going to pre-approve here. Overall, I think we can refactor the syncToSynapse function which will help the upload speeds.
This will also make it as seamless of a transition as possible.
synapseclient/models/file.py
Outdated
|
||
is_loaded: bool = False | ||
|
||
def convert_from_api_parameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a generalization camel case to snake case converter?
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 54 New issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple minor comments but LGTM!
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 103 New issues |
Background:
Working with the Synapse python client is a mix of many competing technologies, frameworks, and documentation.
Frameworks used:
Performance insights:
Some initial testing with the benchmark script herehttps://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/docs/scripts/benchmark.py - Using these new classes
Examples of how this OOP approach works:
See all of the scripts created in the
test_scripts/
folder. ie: https://github.com/Sage-Bionetworks/synapsePythonClient/blob/SYNPY-1322-OOP-POC/docs/scripts/object_orientated_programming_poc/oop_poc_project.pyThings left to do/investigate (Will be completed in follow-up epics):
requests
library (Like httpx: https://www.python-httpx.org/advanced/ ) lead to some better performance and async programming? Answer: Yeshttps://sagebionetworks.jira.com/browse/SYNPY-1411