Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add incremental algorithms support #160

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

olegkkruglov
Copy link

@olegkkruglov olegkkruglov commented Sep 20, 2024

Description

  • Added support of incremental algorithms
  • Added config example for introduced functionality
  • Fixed bug in report generator which led to fail in case if one of estimator attribute is not hashable
  • Fixed warning in report generator which appeared in geomean calculation in case if Dataframe is empty.

@samir-nasibli samir-nasibli changed the title Add incremental algorithms support ENH: Add incremental algorithms support Sep 20, 2024
@md-shafiul-alam
Copy link
Collaborator

/azp run CI

Copy link

Azure Pipelines failed to run 1 pipeline(s).

@md-shafiul-alam
Copy link
Collaborator

/azp run ml-benchmarks

Copy link

No pipelines are associated with this pull request.

@md-shafiul-alam
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -239,6 +239,7 @@ def get_result_tables_as_df(
bench_cases = pd.DataFrame(
[flatten_dict(bench_case) for bench_case in results["bench_cases"]]
)
bench_cases = bench_cases.map(lambda x: str(x) if not isinstance(x, Hashable) else x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is non-hashable object you are trying to convert?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic statistics result_options parameter is a list

sklbench/benchmarks/sklearn_estimator.py Show resolved Hide resolved
sklbench/benchmarks/sklearn_estimator.py Show resolved Hide resolved
configs/incremental.json Outdated Show resolved Hide resolved
configs/incremental.json Outdated Show resolved Hide resolved
configs/incremental.json Outdated Show resolved Hide resolved
configs/incremental.json Outdated Show resolved Hide resolved
Comment on lines +337 to +344
def create_online_function(
estimator_instance, method_instance, data_args, num_batches, batch_size
):

if "y" in list(inspect.signature(method_instance).parameters):

def ndarray_function(x, y):
for i in range(n_batches):
for i in range(num_batches):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave old simple logic with batch_size only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size before partial_fit call in real world cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change?

Adding new feature which can be useful.

It overcomplicates data slicing with extra parameter checks and calculations

It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.

it is more common to know batch size before partial_fit call in real world cases.

But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.

Comment on lines 427 to 429
if method == "partial_fit":
num_batches = get_bench_case_value(bench_case, "data:num_batches")
batch_size = get_bench_case_value(bench_case, "data:batch_size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of separate branch for partial_fit, extend mechanism of online_inference_mode to partial fitting too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.

Copy link
Contributor

@Alexsandruss Alexsandruss Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, online_inference_mode was removed as unnecessary before merge of refactor branch. This mode is enabled by batch_size != None only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can split batch size into two for training and inference.

Copy link
Author

@olegkkruglov olegkkruglov Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, online_inference_mode was removed as unnecessary before merge of refactor branch.

what should I extend then?

Comment on lines 43 to 44
"library": "sklearnex",
"num_batches": {"training": 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"library": "sklearnex",
"num_batches": {"training": 2}
"library": "sklearnex"

Comment on lines 52 to 53
"library": "sklearnex",
"num_batches": {"training": 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"library": "sklearnex",
"num_batches": {"training": 2}
"library": "sklearnex"

Comment on lines 61 to 62
"library": "sklearnex.preview",
"num_batches": {"training": 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"library": "sklearnex.preview",
"num_batches": {"training": 2}
"library": "sklearnex.preview"

configs/incremental.json Outdated Show resolved Hide resolved
configs/incremental.json Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants