ENH: Add incremental algorithms support #160

olegkkruglov · 2024-09-20T00:16:40Z

Description

Added support of incremental algorithms
Added config example for introduced functionality
Fixed bug in report generator which led to fail in case if one of estimator attribute is not hashable
Fixed warning in report generator which appeared in geomean calculation in case if Dataframe is empty.

md-shafiul-alam · 2024-09-23T18:40:07Z

/azp run CI

azure-pipelines · 2024-09-23T18:40:14Z

Azure Pipelines failed to run 1 pipeline(s).

md-shafiul-alam · 2024-09-23T18:45:08Z

/azp run ml-benchmarks

azure-pipelines · 2024-09-23T18:45:14Z

No pipelines are associated with this pull request.

md-shafiul-alam · 2024-09-23T18:49:46Z

/azp run

azure-pipelines · 2024-09-23T18:49:56Z

Azure Pipelines successfully started running 1 pipeline(s).

sklbench/benchmarks/sklearn_estimator.py

Alexsandruss · 2024-09-25T16:03:17Z

sklbench/report/implementation.py

@@ -239,6 +239,7 @@ def get_result_tables_as_df(
    bench_cases = pd.DataFrame(
        [flatten_dict(bench_case) for bench_case in results["bench_cases"]]
    )
+    bench_cases = bench_cases.map(lambda x: str(x) if not isinstance(x, Hashable) else x)


What is non-hashable object you are trying to convert?

basic statistics result_options parameter is a list

sklbench/benchmarks/sklearn_estimator.py

configs/incremental.json

Alexsandruss · 2024-09-26T12:00:35Z

sklbench/benchmarks/sklearn_estimator.py

+def create_online_function(
+    estimator_instance, method_instance, data_args, num_batches, batch_size
+):

    if "y" in list(inspect.signature(method_instance).parameters):

        def ndarray_function(x, y):
-            for i in range(n_batches):
+            for i in range(num_batches):


Leave old simple logic with batch_size only.

Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size before partial_fit call in real world cases.

Why change?

Adding new feature which can be useful.

It overcomplicates data slicing with extra parameter checks and calculations

It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.

it is more common to know batch size before partial_fit call in real world cases.

But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.

Alexsandruss · 2024-09-26T12:02:02Z

sklbench/benchmarks/sklearn_estimator.py

+                if method == "partial_fit":
+                    num_batches = get_bench_case_value(bench_case, "data:num_batches")
+                    batch_size = get_bench_case_value(bench_case, "data:batch_size")


Instead of separate branch for partial_fit, extend mechanism of online_inference_mode to partial fitting too.

could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.

Actually, online_inference_mode was removed as unnecessary before merge of refactor branch. This mode is enabled by batch_size != None only.

You can split batch size into two for training and inference.

Actually, online_inference_mode was removed as unnecessary before merge of refactor branch.

what should I extend then?

Alexsandruss · 2024-10-02T17:56:57Z

configs/incremental.json

+                    "library": "sklearnex",
+                    "num_batches": {"training": 2}


Suggested change

"library": "sklearnex",

"num_batches": {"training": 2}

"library": "sklearnex"

Alexsandruss · 2024-10-02T17:57:08Z

configs/incremental.json

+                    "library": "sklearnex",
+                    "num_batches": {"training": 2}


Suggested change

"library": "sklearnex",

"num_batches": {"training": 2}

"library": "sklearnex"

Alexsandruss · 2024-10-02T17:57:18Z

configs/incremental.json

+                    "library": "sklearnex.preview",
+                    "num_batches": {"training": 2}


Suggested change

"library": "sklearnex.preview",

"num_batches": {"training": 2}

"library": "sklearnex.preview"

configs/incremental.json

olegkkruglov requested a review from Alexsandruss as a code owner September 20, 2024 00:16

olegkkruglov requested review from ethanglaser and md-shafiul-alam September 20, 2024 00:18

samir-nasibli changed the title ~~Add incremental algorithms support~~ ENH: Add incremental algorithms support Sep 20, 2024

Add incremental algorithms support

535c1e4

olegkkruglov force-pushed the inc-support branch from 5986180 to 535c1e4 Compare September 23, 2024 18:00

Fix win yml

d6952ac

olegkkruglov added 2 commits September 24, 2024 02:46

Remove samples/ms info

03a152a

Remove BS from config (need to add after pip version update)

3ac5c23

Alexsandruss requested changes Sep 24, 2024

View reviewed changes

sklbench/benchmarks/sklearn_estimator.py Outdated Show resolved Hide resolved

Add condition for finalize

9461fad

olegkkruglov requested a review from Alexsandruss September 25, 2024 09:01

Alexsandruss requested changes Sep 25, 2024

View reviewed changes

Fix num_batches usage

b82d772

Alexsandruss requested changes Sep 26, 2024

View reviewed changes

olegkkruglov added 2 commits September 27, 2024 02:29

Reduce config

b5ad233

Add covariance module to incremental config

fc4ad2b

Alexsandruss reviewed Oct 2, 2024

View reviewed changes

olegkkruglov added 4 commits October 4, 2024 02:49

Rename example config

040802d

Remove bs mentioning in config (need to be added later)

69cc4c1

Fix num_batches and batch_size reading from config

f275062

Revert accidentally pushed changes

5a9be80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add incremental algorithms support #160

ENH: Add incremental algorithms support #160

olegkkruglov commented Sep 20, 2024 •

edited

Loading

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

Alexsandruss Sep 25, 2024

olegkkruglov Sep 25, 2024

Alexsandruss Sep 26, 2024

olegkkruglov Sep 26, 2024

Alexsandruss Oct 2, 2024

olegkkruglov Oct 4, 2024

Alexsandruss Sep 26, 2024

olegkkruglov Sep 27, 2024

Alexsandruss Oct 2, 2024 •

edited

Loading

Alexsandruss Oct 2, 2024

olegkkruglov Oct 4, 2024 •

edited

Loading

Alexsandruss Oct 2, 2024

Alexsandruss Oct 2, 2024

Alexsandruss Oct 2, 2024

	"library": "sklearnex",
	"num_batches": {"training": 2}
	"library": "sklearnex"

		"library": "sklearnex.preview",
		"num_batches": {"training": 2}

	"library": "sklearnex.preview",
	"num_batches": {"training": 2}
	"library": "sklearnex.preview"

ENH: Add incremental algorithms support #160

Are you sure you want to change the base?

ENH: Add incremental algorithms support #160

Conversation

olegkkruglov commented Sep 20, 2024 • edited Loading

Description

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

md-shafiul-alam commented Sep 23, 2024

azure-pipelines bot commented Sep 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexsandruss Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olegkkruglov Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olegkkruglov commented Sep 20, 2024 •

edited

Loading

Alexsandruss Oct 2, 2024 •

edited

Loading

olegkkruglov Oct 4, 2024 •

edited

Loading