Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add signature method for Serializable objects * Update signature to only keep track of the step names and not it's internal info * Refactor hash generation * Add dummy batch manager from dag * Update batch manager cache tests to start batch manager from a DAG * Draft of integration tests for new caching * Checkpoint draft * Add cache directory location * Add use_cache argument to Step for future use * Change output names to keep track of them while debugging * Make use of use_cache at the step level * Add docstrings for internal batch manager arguments * Remove path from add_batch method * Move step caching to get_batch method in batch manager step * Read batches from cached dir * Set every step cache to False if the pipeline has the cache as False * Comment for the batch manager * Move back to caching from add_step * Checkpoint current status * Add use_cache on step * If there's previous data saved, concatenate the content of the parquet files * Only read the distiset from cache if all the steps are the same, otherwise overwrite * Add changes to make loading a new and modified step feasible * Set use cache to True by default * Move logic of registering the batches to BasePipeline._register_batch to do it before calling _manage_batch_flows * Avoid reading parquet file from cache when any of the steps has use_cach=False * Add is_convergence method to DAG and cleanup batch_manager * Add integration tests for the new caching mechanism * Update unit tests related to register_batch * Fix signature serialization case of void list * Add use_cache to argilla tests * Fix tests related to use_cache * Fix tests * Remove undefined object input * Add `_invalidate_steps_cache_if_required` method * Initial work for loading batches from `batch_manager_data` directory * Draft cache updates * Update pipeline signature * Add signature mixin from other PR * Moved pipeline cache to executions folder with different data per pipeline * Testing new updates to read from cache * Checkpoint with loading working while adding new steps * Point of control * Fix not all the batches where being saved * Sort batches after loaded * Fix `load_from_cache` to load batches from `steps_data` directory correctly * Update test * Add `step_has_finished` method * Update invalidate cache function * Update integration caching tests * Refactor to extract logic to methods * Refactor to remove `cached_data_dir` * Update stages message * Refactor `invalidate_cache_for` method * Fix `_BatchManager` unit tests * Update to not serialize `exclude_from_signature` attribute * Fix pipeline unit tests * Remove write buffer data if `use_cache=False` * Fix offline batch generation attributes were being not ignored by signature * Fix print test * Fix routing batch function --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
- Loading branch information