Default arguments for xarray_filters.datasets.make_* functions #17

PeterDSteinberg · 2017-09-19T19:00:56Z

@gpfreitas This is related to issue #5 and #6 and tries to condense them into a TODO list.

Items to do related to the argument specs of make_* functions from xarray_filters.datasets:

Make MLDataset be the default return value rather than Dataset
Remove the requirement for the n_samples argument in this case: MLDataset(make_blobs(n_samples=2000, shape=(200,10))) where n_samples can be taken from shape
For functions that exist in dask_glm, e.g. make_classification, we should default to making a MLDataset as in the xarray_filters.datasets so far, but use dask_glm's funcs for a dask.array in each DataArray rather than sklearn.datasets numpy based approach.
- Provide a use_dask_glm=True keyword to control whether the functions in dask_glm.datasets are used.
Change the sequence of acceptable strings for astype to the following (or equivalent way of specifying the data structures below as the output type):
( 'pandas.dataframe','dask.array', 'dask.dataframe', 'numpy.ndarray', ,'dataset', 'mldataset')
xnames should be layers
docstring edits - See below: This is current docstring for make_blobs from xarray_filters - I think it needs more of the docs from the transformation part explained, e.g. that it typically outputs N-D DataArrays in an MLDataset or any differences between sklearn and xarray_filters like n_samples versus shape:

In [3]: ?make_blobs
Signature: make_blobs(n_samples=100, n_features=2, centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, *, astype='dataset', **kwargs)
Docstring:
Like sklearn.datasets.samples_generator.make_blobs, but with added functionality.

Parameters
---------------------
Same parameters/arguments as sklearn.datasets.samples_generator.make_blobs, in addition to the following
keyword-only arguments:

astype: str
    One of ('array', 'dataframe', 'dataset', 'mldataset') or None to return an NpXyTransformer. See documentation
    of NpXyTransformer.astype.

**kwargs: dict
    Optional arguments that depend on astype. See documentation of
    NpXyTransformer.astype.

Note - where I said dask_glm above - also look at dask-ml

The text was updated successfully, but these errors were encountered:

PeterDSteinberg · 2017-09-22T22:56:08Z

Other TODOs I need to add:

Ensure that the named dims can be controlled, i.e. that dims like x,y,z,t can be named rather than dim_0 dim_1 by default.

gpfreitas · 2017-10-03T19:48:50Z

MLDataset default: check
no need for n_samples when shape passed: check (I chose to let shape overrides n_samples)
layers instead of xnames: check

I think letting shape be a dict should be enough for letting the user customize dimension names.

So, what's left is the harder part:

support dask data structures, see dask-glm
change the sequence of acceptable strings for astype (already supported in master)

For astype, @PeterDSteinberg, we should leave the to_* methods intact, right? So, passing astype='numpy.ndarray' would call XyTransformer.to_array. Sounds good?

gpfreitas · 2017-10-04T15:37:19Z

Working on the dask-glm support.

PeterDSteinberg · 2017-10-25T22:12:35Z

Note the dask-ml / dask-glm related work is being addressed in a separate issue: #36

PeterDSteinberg assigned gpfreitas Sep 19, 2017

gpfreitas mentioned this issue Oct 3, 2017

[WIP] Major fixes to datasets.py #20

Merged

This was referenced Oct 18, 2017

Xarray_filters Quarter 3, 2017 Priorities #28

Open

Change dask_searchcv / dask_glm imports to daskml ContinuumIO/elm#217

Open

gbrener assigned gbrener and unassigned gpfreitas Oct 25, 2017

gpfreitas mentioned this issue Oct 26, 2017

Dask-ml datasets.py related changes #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default arguments for xarray_filters.datasets.make_* functions #17

Default arguments for xarray_filters.datasets.make_* functions #17

PeterDSteinberg commented Sep 19, 2017 •

edited by gpfreitas

Loading

PeterDSteinberg commented Sep 22, 2017 •

edited by gpfreitas

Loading

gpfreitas commented Oct 3, 2017 •

edited

Loading

gpfreitas commented Oct 4, 2017

PeterDSteinberg commented Oct 25, 2017

Default arguments for xarray_filters.datasets.make_* functions #17

Default arguments for xarray_filters.datasets.make_* functions #17

Comments

PeterDSteinberg commented Sep 19, 2017 • edited by gpfreitas Loading

PeterDSteinberg commented Sep 22, 2017 • edited by gpfreitas Loading

gpfreitas commented Oct 3, 2017 • edited Loading

gpfreitas commented Oct 4, 2017

PeterDSteinberg commented Oct 25, 2017

PeterDSteinberg commented Sep 19, 2017 •

edited by gpfreitas

Loading

PeterDSteinberg commented Sep 22, 2017 •

edited by gpfreitas

Loading

gpfreitas commented Oct 3, 2017 •

edited

Loading