Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notmydata #47

Open
wants to merge 2 commits into
base: dmerge
Choose a base branch
from
Open

Notmydata #47

wants to merge 2 commits into from

Conversation

marqh
Copy link
Owner

@marqh marqh commented Feb 18, 2017

No description provided.

@pp-mo
Copy link

pp-mo commented Feb 19, 2017

Just some ideas about how to add a switch to make 'has_lazy_data' do something useful.
#49
Code is possibly still a bit clumsy, so just fleshing out some ideas.

? Should this now be targetting dask instead of dmerge ?

@pp-mo
Copy link

pp-mo commented Feb 19, 2017

In case it's useful, I also wrote myself a "Summary of what this achieves" ....

  • Functionally : mostly "as-is" ...
    • cube still has .data and .lazy_data([optional-new])
    • accessing .data loses the original derivation graph
  • cube data content
    • the single internal source of data is the new .data_graph property
    • assigning a 'real' data array goes round a loop :
      • into dask and back out
      • 'rationalises' representation
        • cube.data can always be stored in dask + retrieved the same
        • "cube.data = newdata", may not leave newdata unchanged
      • converts masked arrays to NaN-arrays (and back)
      • rationalises combined masks + NaNs
      • raises an error for any masked integer arrays (at present, anyway)
  • cube.shape, dtype and fill_value are now simple cube properties
    • they are all in "cube.metadata"
      • This will affect merging
    • dtype can change on data assignment
    • shape can never change after cube is created
    • fill_value
      • can be None
      • can be used to control saving
        • including "can't save, please specify fill-value"
      • could be invalidated by operations (much as cube.units is)
      • needs to be "inferred" by combination operations
        • especially merge + concatenate
        • this is the logic currently implemented in Biggus
  • ? possibly missing aspects ?
    • prevent user assignment of cube.dtype, cube.shape (make readonly)
    • what is relationship between fill_value and dtype ?
    • possible translation between provided data and cube.dtype ?
      • this might allow us to support masked-integers by transforming to masked-floats ...
      • ... but current thinking is : too tricky to be automatic ; require user-explicit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants