Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert clojure data -> NDArrayWritable #18

Open
hswick opened this issue Jul 9, 2018 · 2 comments
Open

Convert clojure data -> NDArrayWritable #18

hswick opened this issue Jul 9, 2018 · 2 comments

Comments

@hswick
Copy link
Owner

hswick commented Jul 9, 2018

See here for a discussion on the desired feature.

In summary: It would be great to have a way to input plain clojure data into a model to train.

Seems like the best way to do this is to go from clojure vectors to NDArray to NDArrayWritable. jutsu.matrix already provides a way to go from clojure vectors to NDArrays, so the next step would to convert the NDArray to a NDArrayWritable.

Im posting this as a separate issue because I think this would be a great first issue for someone to tackle, and would be very helpful.

@behrica
Copy link

behrica commented Nov 26, 2018

I did something like this in parts
The snippet below takes "features" as a vector of maps,
and "y" as a vector of targets
and it will create an NDArray
and then a DataSet and then a DataSetIterator
which could be used with the existing method "train-net!"

(def data (Nd4j/create (into-array (map double-array (map vals features))))
 (def labels (.transpose (Nd4j/create y)))
(def data-set (DataSet. data labels))

.. split  "data-set" into test and train

(def dataset-iterator (ViewIterator. train-data-set 1000) 10))

What makes it complex, is that the logic of

  • epochs
  • batches vs. mini-batches
  • in-memory vs stream data from disk
  • test-train split
  • normalization

is all
encapsulated or uses the DataSetIterator interface, so it has lots of implementations, and the user needs to be able to choose and configure them.

To have this functionality fully on the Clojure site, a lot of implementations of DatasetIterator
logic needed to be ported to Clojure:
https://deeplearning4j.org/api/latest/org/nd4j/linalg/dataset/api/iterator/DataSetIterator.html

And this code is very much state-full OO style, so hard to make functional.

But maybe a single function , implementing "one scenario",
like

(defn iterate-data [features (vector of maps)   labels (vector)   test-train-split-percentage num-batches num-epochs)

-> returns 2  data-set iterators (for train and test)

could be useful. And it would not required any change in existing methods, as they can already handle any DataSetIterator

@hswick
Copy link
Owner Author

hswick commented Nov 30, 2018

I made attempts at this before and ran into the same complexity. I actually think the best way to do is to write some java code to make a more clojure friendly DataSetIterator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants