Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use regex for parsing step_dir #739

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Use regex for parsing step_dir #739

wants to merge 1 commit into from

Conversation

nlusskin
Copy link

@nlusskin nlusskin commented Oct 7, 2024

Instead of assuming the end of a step directory will be a series of digits to be converted to a step number, uses regex to find the last group of STEP_NUM_DIGITS from the directory passed into the function.

@@ -81,8 +82,10 @@ class CheckpointValidationType(str, enum.Enum):


def parse_step_from_dir(step_dir: str) -> int:
# TODO(markblee): use regex.
return int(step_dir[-STEP_NUM_DIGITS:])
step = re.findall(r'(\d{' + str(STEP_NUM_DIGITS) + r'})', step_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I do think we should be restrictive about the patterns that we accept, specifically that it ends with STEP_NUM_DIGITS digits. Should we update to a pattern that matches that and also add a test case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run into an issue quite often where we'll have some checkpoint suffixed (i.e. checkpoints/step_00001000-ext) where the only way to run this with axlearn is to copy this directory to step_00001000. How would you suggest we incorporate a pattern like this? Happy to add test cases to this PR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably need to understand more about your use-case and why we need to suffix checkpoints. Can you open an internal PR for discussion?

Copy link
Contributor

@ruomingp ruomingp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will defer to @markblee for approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants