Write JSON state #997

KaspariK · 2024-09-17T16:29:34Z

What

We store job and job run state as pickles. We would like to not do that. This is part of the path to not doing that by writing state data as JSON alongside our current pickles. Restoring from JSON will follow in another PR.

Why

In DAR-2328, the removal of the mesos-related code led to resetting of tron jobs state resulting in job runs starting from "0". The pickles weren't being unpickled correctly as this relies on classes that were deleted in the mesos code.

…onfiguration

Right now we make at most 2N calls to the Tron API during config deployments: N to get the current configs and at most N if all services have changes. To start, I'd like to reduce this to N by allowing GET /api/config to return all the configs so that the only requests needed are POSTs for changed configs. Depending on how this goes, we can look into batching up the POSTs so that we can also do that in a single request. In terms of speed, it looks like loading all the configs from pnw-prod (on my devbox) with this new behavior takes ~3s - which isn't great, but there's a decent bit of file IO going on here :(

…d to_json methods for classes in DynamoDB restore flow. Write an additional attribute to DynamoDB to capture non-pickled state_data.

…odb_state_store to something a little more explanatory now that we have 2 versions

…adata

…learer.

nemacysts

ty for adding all these types as well!

nemacysts · 2024-09-20T20:56:11Z

tron/actioncommand.py

+    @staticmethod
+    def to_json(state_data: dict) -> str:
+        return json.dumps(
+            {
+                "status_path": state_data["status_path"],
+                "exec_path": state_data["exec_path"],
+            }
+        )


should these to_json() functions be normal on the methods so that it's easier to add additional data to be serialized in the future? as-is, we'd need to track down any calls of to_json() for any modified classes and ensure that the state_data dict that we build for those calls has any new fields

e.g.,

Suggested change

@staticmethod

def to_json(state_data: dict) -> str:

return json.dumps(

{

"status_path": state_data["status_path"],

"exec_path": state_data["exec_path"],

}

)

def to_json(self) -> str:

return json.dumps(

{

"status_path": self.status_path,

"exec_path": self.exec_path,

}

)

i'm also debating whether or not we should have this return a dict and we call json.dumps() before saving, but that's probably not too big a change if we wanna do that later - and the current approach means that if something cannot be serialized to json, we'll get a better traceback :)

oh, i see - a lot of these have state_data() properties! i think that that makes this less pressing since the calls will (i assume) look like `SomeClass.to_json(some_object.state_data)

but it might still be nicer to switch since for things that do implement state_data() this would look like

Suggested change

@staticmethod

def to_json(state_data: dict) -> str:

return json.dumps(

{

"status_path": state_data["status_path"],

"exec_path": state_data["exec_path"],

}

)

def to_json(self) -> str:

state_data = self.state_data

return json.dumps(

{

"status_path": state_data["status_path"],

"exec_path": state_data["exec_path"],

}

)

and for classes that don't have a state_data property we'd be in the scenario described above and would benefit from this no longer being a staticmethod :)

(another benefit would also be avoiding any potential circular imports in the future since we won't need to import classes just to call to_json() :)

oh, i see - when this gets called in tron/serialize/runstate/dynamodb_state_store.py we just have a key and a dict ;_;

hmm, maybe this is better for a post-pickle-deletion refactor where we update the simplified code to pass around the actual objects rather than state_data dicts

nemacysts · 2024-09-20T20:58:58Z

tron/core/actionrun.py

+        end_time: Optional[datetime.datetime] = None,
+        run_state: str = SCHEDULED,
+        exit_status: Optional[int] = None,
+        attempts: Optional[list] = None,  # TODO: list of...ActionCommandConfig?


i think this is Optional[List[ActionRunAttempt]]

nemacysts · 2024-09-20T21:01:28Z

tron/core/actionrun.py

+        run_state: str = SCHEDULED,
+        exit_status: Optional[int] = None,
+        attempts: Optional[list] = None,  # TODO: list of...ActionCommandConfig?
+        action_runner: Optional[Union[NoActionRunnerFactory, SubprocessActionRunnerFactory]] = None,


maybe one day we'll either remove NoActionRunnerFactory (i don't think we use it outside of tests) or we'll add an abstract base class or something with the expected interface so that we don't need a union here :p

nemacysts · 2024-09-20T21:13:17Z

tron/serialize/runstate/dynamodb_state_store.py

@@ -134,6 +154,7 @@ def _merge_items(self, first_items, remaining_items) -> dict:
        return deserialized_items

    def save(self, key_value_pairs) -> None:
+        log.debug(f"Adding to save queue: {key_value_pairs}")


would this be too noisy in production?

nemacysts · 2024-09-20T21:17:45Z

tron/serialize/runstate/dynamodb_state_store.py

-                if val is not None:
-                    self[key] = pickle.dumps(val)
+                if pickled_val is not None:
+                    self.__setitem__(key, pickle.dumps(pickled_val), json_val)


just curious: how come we need __setitem__ now?

ah, i see - i think we probably want to keep the existing signature since __setitem__ is an established "dunder" method in python

what we'd probably need to do here instead is to type that the value for __setitem__ is going to be a tuple :)

nemacysts · 2024-09-20T21:29:13Z

tron/serialize/runstate/dynamodb_state_store.py

+    def get_type_from_key(self, key: str) -> str:
+        return key.split()[0]
+
+    def _serialize_item(self, key: str, state: Dict[str, Any]) -> str:


if this only accepts two values for key, i think we can write key: Literal[runstate.JOB_STATE, runstate.JOB_RUN_STATE] rather than key: str

nemacysts · 2024-09-20T21:33:19Z

tron/utils/crontab.py

+            return sorted_groups  # type: ignore
+        return sorted_groups  # type: ignore


could we add comments for these ignores?

nemacysts · 2024-09-20T21:35:44Z

tron/serialize/runstate/dynamodb_state_store.py

+                        "json_val": {
+                            "S": json_val[index * OBJECT_SIZE : min(index * OBJECT_SIZE + OBJECT_SIZE, len(json_val))]
+                        },
+                        "num_json_val_partitions": {


i always thought that this num_partitions stuff was something that dynamo required 🤣 - i guess this is just for our usage (to know how many partitions we're using per-item?)?

nemacysts and others added 9 commits September 17, 2024 07:17

Released 2.4.1 via make release

b94469e

Fix negative value check for non-retryable exit codes in Kubernetes c…

5f5d69e

…onfiguration

Fix negative value check for non-retryable exit codes in Kubernetes c…

76da1b8

…onfiguration

Released 2.4.2 via make release

fcec5de

Add types to DynamoDB restore flow. Add Persistable abstract class an…

42fd9ea

…d to_json methods for classes in DynamoDB restore flow. Write an additional attribute to DynamoDB to capture non-pickled state_data.

Resolve conflict

fda8c2f

Remove old comments. Delete mcp state metadata.

32d5df0

Return these

f38f24c

KaspariK marked this pull request as ready for review September 17, 2024 16:59

KaspariK added 5 commits September 17, 2024 12:38

Try itest validation without mcp_state

bfc4f66

Fix isinstance check for Action to_json

871aa60

Comment cleanup. Add cleanup TODOs for TRON-2293. Rename val in dynam…

389601b

…odb_state_store to something a little more explanatory now that we have 2 versions

Remove shelve key test since we no longer save mcp_state or state_met…

84f4148

…adata

Remove deprecated itest. Rename some variables to make things a bit c…

e65aef5

…learer.

KaspariK requested review from EmanElsaban and nemacysts September 18, 2024 16:11

Remove unnecessary list call

13bbc75

nemacysts reviewed Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write JSON state #997

Write JSON state #997

KaspariK commented Sep 17, 2024

nemacysts left a comment

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

nemacysts Sep 20, 2024

		return sorted_groups # type: ignore
		return sorted_groups # type: ignore

Write JSON state #997

Are you sure you want to change the base?

Write JSON state #997

Conversation

KaspariK commented Sep 17, 2024

What

Why

nemacysts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment