Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more domain specific transformations to XyTransformer #6

Open
3 tasks
gpfreitas opened this issue Sep 6, 2017 · 4 comments
Open
3 tasks

Add more domain specific transformations to XyTransformer #6

gpfreitas opened this issue Sep 6, 2017 · 4 comments
Assignees

Comments

@gpfreitas
Copy link
Contributor

@PeterDSteinberg

We need to implement

Assigning to myself. That should be doable now from within PR #3 (work from PR #2 was just merged into PR #3).

@gpfreitas gpfreitas self-assigned this Sep 6, 2017
@PeterDSteinberg
Copy link
Contributor

Partially related: The default should be returning an MLDataset rather than xarray.Dataset but either should be possible.

@PeterDSteinberg
Copy link
Contributor

PeterDSteinberg commented Sep 7, 2017

@gpfreitas Please handle this issue by taking over PR #4 - whatever needs to be done there. PR #4 is a clean up of PR #3

-- Gui's Edit: Peter meant PR #8 instead of PR #4

@PeterDSteinberg
Copy link
Contributor

Another thing on datasets.py and astype:

  • The logic regarding astype should also include dask.array and dask.dataframe classes as output options. Any place where we are returning (or taking as input) numpy.ndarray or pandas.DataFrame objects, we should offer essentially the same functionality with equivalent dask.
  • @gpfreitas Take a look at this datasets.py in dask-glm and the make_* functions in there that work with dask.array and consider moving the module's important parts to this repo, or installing dask-glm as a temporary solution, and then we'll work out next week what is long term best plan. I think it makes sense to get into one repo all the ML dataset generators for 1) large data, 2) xarray data structures, 3) dataset generators that are both 1 and 2.

cc @gbrener

@PeterDSteinberg
Copy link
Contributor

@gpfreitas Regarding the second bullet above (datasets.py from dask-glm), see this snippet for the idea:

from dask_glm.datasets import (make_classification as dsk_make_classification,
                               make_regression as dsk_make_regression,
                               make_poisson as dsk_make_poisson)
from xarray_filters import MLDataset
from xarray_filters.datasets import NpXyTransformer, _make_base

dsk_make_regression = _make_base(dsk_make_regression)

dset = dsk_make_regression(shape=(10, 10, 5,2))

Note that dset has the y data in there as well.

<xarray.Dataset>
Dimensions:  (dim_0: 10, dim_1: 10, dim_2: 5, dim_3: 2)
Dimensions without coordinates: dim_0, dim_1, dim_2, dim_3
Data variables:
    X0       (dim_0, dim_1, dim_2, dim_3) float64 1.198 -0.8405 0.05915 ...
    X1       (dim_0, dim_1, dim_2, dim_3) float64 -0.04466 1.034 0.9808 ...
    X2       (dim_0, dim_1, dim_2, dim_3) float64 0.8524 0.3813 -0.01171 ...
    X3       (dim_0, dim_1, dim_2, dim_3) float64 0.3281 0.6048 -0.898 ...
    X4       (dim_0, dim_1, dim_2, dim_3) float64 -1.735 -0.2174 -2.693 ...
    X5       (dim_0, dim_1, dim_2, dim_3) float64 -0.2715 0.7407 -0.264 ...
    X6       (dim_0, dim_1, dim_2, dim_3) float64 -0.9609 -0.2701 1.117 ...
    X7       (dim_0, dim_1, dim_2, dim_3) float64 -1.034 0.757 0.01273 ...
    X8       (dim_0, dim_1, dim_2, dim_3) float64 0.5281 0.7713 -0.1763 ...
    X9       (dim_0, dim_1, dim_2, dim_3) float64 -0.2696 -1.016 -2.404 ...
    X10      (dim_0, dim_1, dim_2, dim_3) float64 -0.2317 -0.2335 1.058 ...
    X11      (dim_0, dim_1, dim_2, dim_3) float64 -0.5542 0.5587 0.3457 ...
    X12      (dim_0, dim_1, dim_2, dim_3) float64 -0.01836 -0.1698 0.5389 ...
    X13      (dim_0, dim_1, dim_2, dim_3) float64 -0.1467 -0.7628 -0.7719 ...
    X14      (dim_0, dim_1, dim_2, dim_3) float64 -1.339 0.4981 -0.614 ...
    X15      (dim_0, dim_1, dim_2, dim_3) float64 -0.5876 -0.2263 1.673 ...
    X16      (dim_0, dim_1, dim_2, dim_3) float64 -0.597 0.2797 0.5805 ...
    X17      (dim_0, dim_1, dim_2, dim_3) float64 -1.565 -1.012 -0.3787 ...
    X18      (dim_0, dim_1, dim_2, dim_3) float64 0.4312 1.362 -2.138 0.3758 ...
    X19      (dim_0, dim_1, dim_2, dim_3) float64 0.3094 -1.47 1.551 -0.1707 ...
    X20      (dim_0, dim_1, dim_2, dim_3) float64 0.6104 2.048 -1.663 0.3121 ...
    X21      (dim_0, dim_1, dim_2, dim_3) float64 0.0822 0.07778 -0.7176 ...
    X22      (dim_0, dim_1, dim_2, dim_3) float64 1.048 -0.3477 0.3577 1.472 ...
    X23      (dim_0, dim_1, dim_2, dim_3) float64 -1.311 -0.1665 -0.2449 ...
    X24      (dim_0, dim_1, dim_2, dim_3) float64 -0.02405 0.3544 0.4243 ...
    X25      (dim_0, dim_1, dim_2, dim_3) float64 -0.7914 -0.3405 -0.1322 ...
    X26      (dim_0, dim_1, dim_2, dim_3) float64 -0.6356 1.196 -1.078 ...
    X27      (dim_0, dim_1, dim_2, dim_3) float64 0.5543 -0.6515 2.419 ...
    X28      (dim_0, dim_1, dim_2, dim_3) float64 -0.1693 0.1158 -0.7771 ...
    X29      (dim_0, dim_1, dim_2, dim_3) float64 -0.9334 0.6122 0.01986 ...
    X30      (dim_0, dim_1, dim_2, dim_3) float64 -1.541 0.2548 1.138 -1.405 ...
    X31      (dim_0, dim_1, dim_2, dim_3) float64 -2.504 -0.1263 0.6807 ...
    X32      (dim_0, dim_1, dim_2, dim_3) float64 2.91 -0.87 1.036 -0.1009 ...
    X33      (dim_0, dim_1, dim_2, dim_3) float64 -0.9434 -0.4315 -0.8788 ...
    X34      (dim_0, dim_1, dim_2, dim_3) float64 1.01 -0.9922 -2.01 -1.19 ...
    X35      (dim_0, dim_1, dim_2, dim_3) float64 1.351 1.105 -1.041 -0.4063 ...
    X36      (dim_0, dim_1, dim_2, dim_3) float64 0.7281 -0.8127 1.494 ...
    X37      (dim_0, dim_1, dim_2, dim_3) float64 1.132 -0.5667 0.5411 ...
    X38      (dim_0, dim_1, dim_2, dim_3) float64 1.357 0.2587 -0.2264 1.5 ...
    X39      (dim_0, dim_1, dim_2, dim_3) float64 -0.6693 2.36 0.303 -0.6379 ...
    X40      (dim_0, dim_1, dim_2, dim_3) float64 -0.4611 -0.3154 -0.4205 ...
    X41      (dim_0, dim_1, dim_2, dim_3) float64 -1.791 -1.04 -0.7568 ...
    X42      (dim_0, dim_1, dim_2, dim_3) float64 -0.4003 -1.433 0.7501 ...
    X43      (dim_0, dim_1, dim_2, dim_3) float64 -0.2469 2.086 0.3483 ...
    X44      (dim_0, dim_1, dim_2, dim_3) float64 0.9493 -1.673 -0.6541 ...
    X45      (dim_0, dim_1, dim_2, dim_3) float64 0.4842 -0.7728 -0.7685 ...
    X46      (dim_0, dim_1, dim_2, dim_3) float64 0.5022 1.329 0.5884 0.128 ...
    X47      (dim_0, dim_1, dim_2, dim_3) float64 0.3703 -0.7266 0.05236 ...
    X48      (dim_0, dim_1, dim_2, dim_3) float64 -0.9601 -0.9074 1.586 ...
    X49      (dim_0, dim_1, dim_2, dim_3) float64 2.239 -1.434 -0.1234 ...
    X50      (dim_0, dim_1, dim_2, dim_3) float64 -0.5597 -0.7352 0.6369 ...
    X51      (dim_0, dim_1, dim_2, dim_3) float64 0.7343 -0.9907 1.602 ...
    X52      (dim_0, dim_1, dim_2, dim_3) float64 0.652 -0.1846 0.2019 ...
    X53      (dim_0, dim_1, dim_2, dim_3) float64 1.19 -0.5057 -0.5732 ...
    X54      (dim_0, dim_1, dim_2, dim_3) float64 1.507 -0.1003 0.4117 ...
    X55      (dim_0, dim_1, dim_2, dim_3) float64 -0.29 -0.03191 -2.249 ...
    X56      (dim_0, dim_1, dim_2, dim_3) float64 0.5846 0.6985 -0.4681 ...
    X57      (dim_0, dim_1, dim_2, dim_3) float64 1.597 -0.6932 -0.9913 ...
    X58      (dim_0, dim_1, dim_2, dim_3) float64 -0.9191 -0.7749 -0.1987 ...
    X59      (dim_0, dim_1, dim_2, dim_3) float64 0.2881 -0.9133 0.01558 ...
    X60      (dim_0, dim_1, dim_2, dim_3) float64 -0.4724 0.07609 0.6339 ...
    X61      (dim_0, dim_1, dim_2, dim_3) float64 -1.439 0.2592 -0.204 ...
    X62      (dim_0, dim_1, dim_2, dim_3) float64 -0.7825 -0.3711 0.998 ...
    X63      (dim_0, dim_1, dim_2, dim_3) float64 -1.146 -1.215 -0.7049 ...
    X64      (dim_0, dim_1, dim_2, dim_3) float64 -1.034 -0.174 -1.788 ...
    X65      (dim_0, dim_1, dim_2, dim_3) float64 1.484 -1.569 -1.355 0.6273 ...
    X66      (dim_0, dim_1, dim_2, dim_3) float64 1.11 0.7746 -0.5281 ...
    X67      (dim_0, dim_1, dim_2, dim_3) float64 -0.03064 -0.4691 -0.6601 ...
    X68      (dim_0, dim_1, dim_2, dim_3) float64 0.2942 -1.201 -0.1188 ...
    X69      (dim_0, dim_1, dim_2, dim_3) float64 -0.898 2.048 1.541 2.252 ...
    X70      (dim_0, dim_1, dim_2, dim_3) float64 -0.6586 0.394 0.403 0.4938 ...
    X71      (dim_0, dim_1, dim_2, dim_3) float64 1.587 -1.614 -0.3451 ...
    X72      (dim_0, dim_1, dim_2, dim_3) float64 0.9434 1.333 -0.1681 ...
    X73      (dim_0, dim_1, dim_2, dim_3) float64 0.4618 -0.7314 -1.435 ...
    X74      (dim_0, dim_1, dim_2, dim_3) float64 -1.023 0.5184 0.1257 ...
    X75      (dim_0, dim_1, dim_2, dim_3) float64 -0.1761 0.9387 0.5874 ...
    X76      (dim_0, dim_1, dim_2, dim_3) float64 0.5568 0.5877 -1.121 ...
    X77      (dim_0, dim_1, dim_2, dim_3) float64 0.09141 -1.948 -1.098 ...
    X78      (dim_0, dim_1, dim_2, dim_3) float64 -1.819 0.1203 -0.4328 ...
    X79      (dim_0, dim_1, dim_2, dim_3) float64 -1.142 0.2465 0.1287 ...
    X80      (dim_0, dim_1, dim_2, dim_3) float64 0.4324 -1.483 -1.662 ...
    X81      (dim_0, dim_1, dim_2, dim_3) float64 -1.148 0.6937 -0.09142 ...
    X82      (dim_0, dim_1, dim_2, dim_3) float64 0.7523 -1.617 1.322 ...
    X83      (dim_0, dim_1, dim_2, dim_3) float64 -1.029 -1.159 -1.534 ...
    X84      (dim_0, dim_1, dim_2, dim_3) float64 0.1843 -0.5327 -0.07401 ...
    X85      (dim_0, dim_1, dim_2, dim_3) float64 1.706 0.3055 0.2959 ...
    X86      (dim_0, dim_1, dim_2, dim_3) float64 -0.9138 0.544 0.5866 ...
    X87      (dim_0, dim_1, dim_2, dim_3) float64 -0.5957 -0.9242 1.011 ...
    X88      (dim_0, dim_1, dim_2, dim_3) float64 0.9815 -0.3946 -1.416 ...
    X89      (dim_0, dim_1, dim_2, dim_3) float64 -0.1027 -0.5938 -0.4344 ...
    X90      (dim_0, dim_1, dim_2, dim_3) float64 0.1197 0.1577 -0.473 ...
    X91      (dim_0, dim_1, dim_2, dim_3) float64 0.3592 -0.4976 0.707 0.259 ...
    X92      (dim_0, dim_1, dim_2, dim_3) float64 0.03991 -0.7257 -1.017 ...
    X93      (dim_0, dim_1, dim_2, dim_3) float64 -1.091 -1.044 0.3796 1.459 ...
    X94      (dim_0, dim_1, dim_2, dim_3) float64 -1.01 -1.49 -0.243 0.2858 ...
    X95      (dim_0, dim_1, dim_2, dim_3) float64 -1.901 -0.6847 -0.954 ...
    X96      (dim_0, dim_1, dim_2, dim_3) float64 0.2343 0.896 0.2329 -1.402 ...
    X97      (dim_0, dim_1, dim_2, dim_3) float64 0.4155 0.4699 -1.237 2.078 ...
    X98      (dim_0, dim_1, dim_2, dim_3) float64 0.3437 1.417 1.97 -0.1689 ...
    X99      (dim_0, dim_1, dim_2, dim_3) float64 -1.113 -0.5201 -0.3969 ...
    y        (dim_0, dim_1, dim_2, dim_3) float64 0.9904 0.4146 0.9054 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants