Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] fix bugs when loading KV files and more. Also make horovod saving function is callable independent bug not patched origin save function. #364

Merged
merged 8 commits into from
Oct 28, 2023

Commits on Oct 27, 2023

  1. [fix] fix load too many value from DE values files when using TF data…

    …set op, which could cause insert error.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    3fd94c9 View commit details
    Browse the repository at this point in the history
  2. [fix] The default save function for tensorflow will not be patched at…

    … this time, as this can lead to unexpected errors.
    
    Now using de.keras.models.de_hvd_save_model to replace tf.keras.models.save_model.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    0fd6f47 View commit details
    Browse the repository at this point in the history
  3. [fix] Fatal error: When restoring from kv files using insert_or_assig…

    …n, the recovered data will be the same first data in vector.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    a46ffed View commit details
    Browse the repository at this point in the history
  4. [fix] TrainableWrapper and DEResourceVariable should not be save or r…

    …estore parameter when using tf.train.Saver.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    d8169c1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a667e50 View commit details
    Browse the repository at this point in the history
  6. [feat] Add the DEHvdSaver class, which is similar to tf.train.Saver a…

    …nd is used to save DE KV files with different rank when using horovod all2all training.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    43b6b33 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    eb7b4cb View commit details
    Browse the repository at this point in the history
  8. [feat] Add support to tf.train.Checkpoint and tf.train.CheckpointMana…

    …ger when using HvdAllToAllEmbedding by calling de.train.DEHvdCheckpoint.
    MoFHeka committed Oct 27, 2023
    Configuration menu
    Copy the full SHA
    10e2160 View commit details
    Browse the repository at this point in the history