Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Add delay to L0_lifecycle test_load_new_model_version after each model file update #7735

Merged
merged 3 commits into from
Oct 24, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions qa/L0_lifecycle/lifecycle_test.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to thread.

Base on the finding, my hypothesis for the failure is the model config update has been written to disk, according to the Python script, which at that point the server should see the updated file, but for some reason the server was seeing the old version by the time the load operation arrived. I will update the test accordingly.

Is there a bug in checking file modification time or something?

Copy link
Contributor Author

@kthui kthui Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unique to the CI, because I ran the test from the failing container locally 10 times and they all passed. I ran the test ~560 times last night on the CI without any time.sleep() and they all passed, see job 118186442.

Edit: The ~560 times CI run has --log-verbose=2, which could change some timing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another possibility is the timestamp is the same before and after the file update, but this breaks our assumption that a ns precision timestamp is sufficient at determining the order of events.

Original file line number Diff line number Diff line change
Expand Up @@ -3531,6 +3531,7 @@ def test_load_new_model_version(self):
f.write(config)
# reload the model
client.load_model(model_name)
time.sleep(0.2)

# version 1 is unmodified so it should not be reloaded
# version 2 is modified so it should be reloaded
Expand All @@ -3552,6 +3553,7 @@ def test_load_new_model_version(self):
Path(os.path.join("models", model_name, "dummy_dependency.py")).touch()
# reload the model
client.load_model(model_name)
time.sleep(0.2)

# all 4 versions should be reloaded
self.assertTrue(client.is_model_ready(model_name, "1"))
Expand Down Expand Up @@ -3579,6 +3581,7 @@ def test_load_new_model_version(self):
f.write(config)
# reload the model
client.load_model(model_name)
time.sleep(0.2)

# only version 4 should be available and no reloads should happen
self.assertFalse(client.is_model_ready(model_name, "1"))
Expand Down Expand Up @@ -3606,6 +3609,7 @@ def test_load_new_model_version(self):
f.write(config)
# reload the model
client.load_model(model_name)
time.sleep(0.2)

# version 1 should be loaded and version 4 should not be reloaded
self.assertTrue(client.is_model_ready(model_name, "1"))
Expand Down
Loading