-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xarray backends for fgmax and fgout gridded formats #611
Conversation
Thanks @kbarnhart for providing this! I haven't experimented with it much yet, but a few comments...
Eventually it would be nice to integrate this better with I hope to revamp the But for now it seems fine to merge this in, perhaps with some tweaks, so that it can be used for your purposes. @mandli and @ketch: I note that our new attempt at CI throws an error for this PR in the setup step:
Not sure if this is something to investigate? |
@rjleveque thanks for your comments. I'll make some updates to address them and ping you here when they are ready to consider. happy to be involved in integrating these better. |
Would the file |
@mandli - I thought a tiny amount about this and it seemed to me like there were two options. Either all the xarray backends go in one module or each xarray backend goes inside a module associated with the relevant clawpack data structure (e.g., the FGOutBackend goes in fgout_tools.py). I have a slight preference towards the latter option. However, I implemented the former because I didn't want to make the fgout_tools and fgmax_tools files longer and make it hard to discover the backends. All of this is to say that I'm not sure the best place for these within clawpack.geoclaw. |
@kbarnhart Those were the two options that came to my mind as well. Given that gauges are not actually handled here in GeoClaw but in PyClaw (it's common across most of the Clawpack packages) it may make more sense to keep them in the associated data structures. Similarly we could/should do something with other data formats, such as friction from land use or storm data, for which there may be multiple backends. Should be easy to change up until we merge this fully and make a release though. |
Variables when it makes sense should be named the same thing probably to reduce confusion although I do like the more descriptive variable you mentioned. |
Some notes based on conversation with @rjleveque:
|
…oclaw into add_xarray_backends
@kbarnhart: I'm finally getting back to looking at this and I see that the flexibility recently added to fgout variables means that these xarray tools won't always work. What you call Class I'm not sure if it's worth trying to incorporate these changes in the xarray backend, or perhaps easier to read in using fgout_tools and then convert to xarrrays, or maybe fine to merge this for now and work on enhancements later? |
Another thing you might want to add to your sample
Or maybe there's a cleaner way to do this. At any rate it could be useful for reducing all the fgout files to a single file. |
@rjleveque - Today I've finally gotten around to updating this for compatibility with your changes. I think I've now made the changes needed to make it work and have tested it with the geoclaw example and the radial slide example over on geoflows/dclaw#17. My only thought is that it would be really nice if dclaw could have a default that would not require always setting fgout.q_out_vars=[...] However, I don't think it is a big deal to always need to set this for dclaw. I've discussed with Dave and we agree that it is fine as is. It is going to be great to not need to write all elements of q. I added an example of opening all the files per your suggestion. There is a slightly easier way to do this with xr.open_mfdataset() that I used. I think that this is ready to merge. |
"B": "meters", | ||
"huc": "meters squared per second", | ||
"hvc": "meters squared per second", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjleveque I did have a question about whether the units for huc and hvc were correct. I don't know how the corrections are applied (if they are added or multiplied or something else, so I wasn't sure what to put).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbarnhart: Interesting question, I think for the SGN equations the units are "meters per second squared" but for the MS equations "meters squared per second squared" because of the different scalings that get used when using these terms to update hu and hv. So I'm not sure what to put here, maybe "varies"? Normally the user would not use these terms for anything directly and would only output them for debugging purposes, perhaps.
Another related question: For dclaw you call the 7th element of q "delta_a" whereas in qmap
I called it "bdif" because in digclaw_module.f90, i_bdif=7
is the index name used. We should be consistent, is delta_a
a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the units information. I've changed it to 'varies'.
'bdif' is better than 'delta_a'. This has been called different things at different times. At present it is also called 'b_eroded' internally and in the dclaw data.py docstrings (I'll change that for consistency).
I've changed delta_a to bdif as well. Good catch.
elif qmap == "geoclaw": | ||
_qelements = _qelements_geoclaw | ||
elif qmap == "geoclaw-bouss": | ||
_qelements = _qelements_geoclaw_bouss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If qmap
isn't one of these three choices, then I think _qelements
will be left uninitialized, leading to an error later? Should this have:
else:
_qelements = list(qmap.keys())
or even just set _qelements
this way in general without needing _qelements_geoclaw
etc and this if test at all?
Also, maybe qmap
can be used more directly below without needing _qelements
at all?
But I'm not sure what drop_variables
is intended for, is this to further drop things that were already output in the fgout files, or is it to list variables that were not output to begin with (which is information already included in qmap
if that is set properly, I think)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like I had made _qelements
vestigial and not removed it. Now removed.
drop_variables
is a requirement of the BackendEntrypoint class. It would be used to further drop things from the fgout file (e.g., h and eta in the fgout file but only h in the xarray dataset).
Thanks @kbarnhart! Reading a single fgout frame still works fine, but the new code in
throws an error for me when trying to read the second file:
Apparently dask is required for this approach to work. Should we be requiring this, or would it be better to revert to the slightly clumsier approach I used? |
(now tested for both absence of h, and absence of h, eta, and B)
@rjleveque Just updated to have options for both with/without Dask. I also tested and improved the drytolerance (hardcoded to 0.001) handling. Let me know if it works for you. If there is a way to read in the user-specified drytolerance, that is one final improvement to this that I can think of. |
@kbarnhart:
with
|
Is it clear you always want to mask out dry values in You could have a default in
for example to mean no masking, or let the user specify the value to use at this point? |
@rjleveque I've added user-specified dry_tolerance (default = 0.001, but dry_tolerance=None means no masking at all). mask is never applied to B or eta, this seems like good default usage to me, but if you think it would be better to do something different, I'm happy to change. Also, if it would be easier for a quick zoom to discuss, let me know. I'm pretty free for the rest of the day. Thanks for all these comments! |
I should have fixed this. I just got rid of the specific exception type. |
I think you removed Otherwise it looks fine and the dry_tolerance makes sense to me. |
🤦♀️ Thanks. Should be fixed now. |
Works for me, I'm going to merge it. Thanks!! |
@rjleveque, @mandli -
Here is a PR that addresses #598.
It may not be the most efficient (I flipud and transpose each array to put them in the orientation that xarray wants), but I find that it works.
I've tested the fgmax method on dclaw and geoclaw (the chile 2010 fgout-fgmax example) with each option for the number of variables written.
I've tested the fgout method with dclaw and geoclaw for both binary and ascii type files.
One point of discussion is whether you like the variable names I've use for fgmax (they are not exactly the same as in the fgmax object (e.g., I use h_max instead of h). I'm happy to change them to be identical if that is important to either of you.
I added an example usage script in the chile2010 fgout-fgmax example as it seemed the most sensible place to put an example.
Happy to improve this as you see fit.