-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write JSON state #997
base: master
Are you sure you want to change the base?
Write JSON state #997
Conversation
Right now we make at most 2N calls to the Tron API during config deployments: N to get the current configs and at most N if all services have changes. To start, I'd like to reduce this to N by allowing GET /api/config to return all the configs so that the only requests needed are POSTs for changed configs. Depending on how this goes, we can look into batching up the POSTs so that we can also do that in a single request. In terms of speed, it looks like loading all the configs from pnw-prod (on my devbox) with this new behavior takes ~3s - which isn't great, but there's a decent bit of file IO going on here :(
…d to_json methods for classes in DynamoDB restore flow. Write an additional attribute to DynamoDB to capture non-pickled state_data.
…odb_state_store to something a little more explanatory now that we have 2 versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty for adding all these types as well!
@staticmethod | ||
def to_json(state_data: dict) -> str: | ||
return json.dumps( | ||
{ | ||
"status_path": state_data["status_path"], | ||
"exec_path": state_data["exec_path"], | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these to_json()
functions be normal on the methods so that it's easier to add additional data to be serialized in the future? as-is, we'd need to track down any calls of to_json()
for any modified classes and ensure that the state_data dict that we build for those calls has any new fields
e.g.,
@staticmethod | |
def to_json(state_data: dict) -> str: | |
return json.dumps( | |
{ | |
"status_path": state_data["status_path"], | |
"exec_path": state_data["exec_path"], | |
} | |
) | |
def to_json(self) -> str: | |
return json.dumps( | |
{ | |
"status_path": self.status_path, | |
"exec_path": self.exec_path, | |
} | |
) |
i'm also debating whether or not we should have this return a dict and we call json.dumps() before saving, but that's probably not too big a change if we wanna do that later - and the current approach means that if something cannot be serialized to json, we'll get a better traceback :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see - a lot of these have state_data()
properties! i think that that makes this less pressing since the calls will (i assume) look like `SomeClass.to_json(some_object.state_data)
but it might still be nicer to switch since for things that do implement state_data() this would look like
@staticmethod | |
def to_json(state_data: dict) -> str: | |
return json.dumps( | |
{ | |
"status_path": state_data["status_path"], | |
"exec_path": state_data["exec_path"], | |
} | |
) | |
def to_json(self) -> str: | |
state_data = self.state_data | |
return json.dumps( | |
{ | |
"status_path": state_data["status_path"], | |
"exec_path": state_data["exec_path"], | |
} | |
) |
and for classes that don't have a state_data
property we'd be in the scenario described above and would benefit from this no longer being a staticmethod :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(another benefit would also be avoiding any potential circular imports in the future since we won't need to import classes just to call to_json() :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see - when this gets called in tron/serialize/runstate/dynamodb_state_store.py we just have a key and a dict ;_;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, maybe this is better for a post-pickle-deletion refactor where we update the simplified code to pass around the actual objects rather than state_data dicts
end_time: Optional[datetime.datetime] = None, | ||
run_state: str = SCHEDULED, | ||
exit_status: Optional[int] = None, | ||
attempts: Optional[list] = None, # TODO: list of...ActionCommandConfig? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is Optional[List[ActionRunAttempt]]
run_state: str = SCHEDULED, | ||
exit_status: Optional[int] = None, | ||
attempts: Optional[list] = None, # TODO: list of...ActionCommandConfig? | ||
action_runner: Optional[Union[NoActionRunnerFactory, SubprocessActionRunnerFactory]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe one day we'll either remove NoActionRunnerFactory
(i don't think we use it outside of tests) or we'll add an abstract base class or something with the expected interface so that we don't need a union here :p
@@ -134,6 +154,7 @@ def _merge_items(self, first_items, remaining_items) -> dict: | |||
return deserialized_items | |||
|
|||
def save(self, key_value_pairs) -> None: | |||
log.debug(f"Adding to save queue: {key_value_pairs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be too noisy in production?
if val is not None: | ||
self[key] = pickle.dumps(val) | ||
if pickled_val is not None: | ||
self.__setitem__(key, pickle.dumps(pickled_val), json_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: how come we need __setitem__
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i see - i think we probably want to keep the existing signature since __setitem__
is an established "dunder" method in python
what we'd probably need to do here instead is to type that the value for __setitem__
is going to be a tuple :)
def get_type_from_key(self, key: str) -> str: | ||
return key.split()[0] | ||
|
||
def _serialize_item(self, key: str, state: Dict[str, Any]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this only accepts two values for key
, i think we can write key: Literal[runstate.JOB_STATE, runstate.JOB_RUN_STATE]
rather than key: str
return sorted_groups # type: ignore | ||
return sorted_groups # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add comments for these ignores?
"json_val": { | ||
"S": json_val[index * OBJECT_SIZE : min(index * OBJECT_SIZE + OBJECT_SIZE, len(json_val))] | ||
}, | ||
"num_json_val_partitions": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i always thought that this num_partitions stuff was something that dynamo required 🤣 - i guess this is just for our usage (to know how many partitions we're using per-item?)?
What
We store job and job run state as pickles. We would like to not do that. This is part of the path to not doing that by writing state data as JSON alongside our current pickles. Restoring from JSON will follow in another PR.
Why
In DAR-2328, the removal of the mesos-related code led to resetting of tron jobs state resulting in job runs starting from "0". The pickles weren't being unpickled correctly as this relies on classes that were deleted in the mesos code.