-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aerosol zonal plots #291
Aerosol zonal plots #291
Conversation
Add more aerosol info, aerosol calc list, and constants section. Also add subsection for difference colorbar options and flag for non_linear scale plotting/color scale
This moves away from the nco way of deriving variables and moves it to xarray. This is mostly so that any derived variable that has more than two constituents can be processed. The nco way only worked for two constituents. Also add check for any aerosol variables that will require rho (density of dry air) and create PMID and T time series files (if not already requested). Additionally, add section in derived variables to calculate aerosol values.
lib/adf_diag.py
Outdated
variables[i] = ds[constit_list[0]].dims | ||
|
||
#Get coordinate values from the first constituent file | ||
coords={'lat': ds[constit_list[0]].lat.values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be changed to
coords = ds.coords
That way we get the metadata in the coordinate variables, and we don't have to make assumptions about what coordinates are present. I'm thinking of a day when we can ingest data on unstructured grids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this way, but when it gets to the aerosols (vertical levels) it seems to have a mismatch between coordinates and dimensions:
ValueError: coordinate ilev has dimensions ('ilev',), but these are not a subset of the DataArray dimensions ('time', 'lev', 'lat', 'lon')
coords = ds.coords
Coordinates:
* lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
* lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.3 357.5 358.8
* lev (lev) float64 0.008135 0.01834 0.03482 ... 983.2 991.2 997.5
* ilev (ilev) float64 0.004259 0.01201 0.02467 ... 987.4 995.1 1e+03
* time (time) object 1997-02-01 00:00:00 ... 2001-01-01 00:00:00
As per @nusbaume suggestion for setting the dims once:
const_dims = ds_constits[constit_list[0]].dims
('time', 'lev', 'lat', 'lon')
So I just grabbed the dimensions and coordinates from the first constituent dataArray
#Grab subset for first constituent for dimensions and coordinates
ds_const0 = ds[constit_list[0]]
coords = ds_const0.coords
KeysView(Coordinates:
* lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
* lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.3 357.5 358.8
* lev (lev) float64 0.008135 0.01834 0.03482 ... 983.2 991.2 997.5
* time (time) object 1997-02-01 00:00:00 ... 2001-01-01 00:00:00)
const_dims = ds_const0.dims
const_dims ('time', 'lev', 'lat', 'lon')
It seemed to work, let me know if there is anything that stands out about going this way.
lib/adf_diag.py
Outdated
data_arrays.append(da) | ||
|
||
#NOTE: this will need to be changed when derived equations are more than sums! - JR | ||
ds[var] = sum(data_arrays) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break RESTOM which is a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, for now we could just have a specific check for RESTOM
here, e.g.:
ds[var] = sum(data_arrays) | |
if var == "RESTOM": | |
#Assume FSNT comes before FLNT: | |
ds[var] = data_arrays[0] - data_arrays[1] | |
else: | |
ds[var] = sum(data_arrays) |
lib/adf_diag.py
Outdated
|
||
variables = {} | ||
for i in constit_list: | ||
variables[i] = ds[constit_list[0]].dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused about what this does. Looks like the keys of variables
will be the names of the constituents of the derived quantity, and the values will all be the same tuple of the first constituent's dimensions.
variables
then gets used below to loop through the dimensions.
I wonder if you get the same result if by replacing lines 1071-1092 with this:
s = 0
for v in constit_list:
s += ds[v]
s.name = var
and then either ds[var] = s
, or just write s.to_netcdf(...)
when it is ready for output (see my comment below about the output file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might vote to remove this section of code entirely, and instead just extract the dims once:
const_dims = ds[constit_list[0]].dims
as they should (?) be the same between all of the constituents.
Then down below instead of looping over the variables
quantity for the constituent name just loop over constit_list
again. This will help reduce the total number of variables in this function, and the total number of iterations it has to perform.
lib/adf_diag.py
Outdated
if var == "SO4": | ||
ds[var] = ds[var]*(96/115) | ||
|
||
ds.to_netcdf(derived_file, unlimited_dims='time', mode='w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ds has the constituents and the derived values?
Do we want to store all of them in the output file? I think in the version using ncap2 that was necessary because of how NCO works, but since we already have the constituents in files, I think this would be replicating data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I would remove all of the variables in ds
except for the derived variable just before you run the to_netcdf
command.
@justin-richling and @nusbaume Following up on an email I sent you about this... I wonder if we should provide an improved approach for derived variables. If we think bringing in numexpr as a new dependency is a reasonable approach, I think I've got a potential drop-in replacement, as follows:
|
I thought about this PR some more. I think given my previous suggestions, it should be acceptable to have derived variables that could be derived from derived variables themselves. For example, say PRECT is derived from PRECC + PRECL. Then the convective fraction of precipitation could be derived as PRECCFRAC = PRECC / PRECT. This seems fine, but requires doing the derived variables in a correct order. I believe that we could just reorder the def get_derive_order(vres):
"""provides a valid order for deriving variables
such that derived quantities that are dependencies
for other quantities are earlier in the output list
parameters
----------
vres : dict
dict of each derivable variable with values that are the dependencies (as list)
return
------
list
The list of derived variables in a valid order for derivations to succeed
"""
# use deque for efficient inserting/appending
from collections import deque
# Get the full set of values
derivable = set(vres.keys()) # This also serves as our definition of having dependencies
q = deque()
while len(q) < len(derivable): # every derivable needs to be in q
for v in derivable: # Scan through all derivables
if v not in q:
q.append(v)
vindex = q.index(v) # we might need to know this position
if v in vres:
vdepends = vres[v]
if not isinstance(vdepends, list):
vdepends = [vdepends] # in case it's just a single dependency given as a str
vdtest = [vd for vd in vdepends if vd in vres] # list of dependencies with dependencies
if not vdtest:
continue # v in q and no dependencies have dependencies
else:
vdd = [vd for vd in enumerate(vdtest) if vd in q] # true if already in q
if any(vdd):
ndx = vdd.index(True) # index of first True value
q.insert(vindex, vdd[ndx]) # put ahead of v in q
break # leave the for-loop, to re-start the scan
else:
print(f"ERROR: encountered {v} which is not listed as a derived quantity")
return list(q) # convert deque to list And that could be used right before calling if vars_to_derive:
var_order = get_derive_order(vars_to_derive) # already a dict as per my other suggestion
vars_to_derive_ordered = {k: vars_to_derive[k] for k in var_order}
self.derive_variables(vars_to_derive=vars_to_derive_ordered, ts_dir=ts_dir[case_idx]) |
@justin-richling -- Sorry for all this code, I think I got carried away! If you and/or @nusbaume would prefer to move ahead with your PR, and then I can put my suggested changes in separately, that would be fine with me. Let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! Just have some minor change requests and a few suggestions that will hopefully help us maintain current functionality while waiting for the new derived variable infrastructure.
lib/adf_diag.py
Outdated
|
||
variables = {} | ||
for i in constit_list: | ||
variables[i] = ds[constit_list[0]].dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might vote to remove this section of code entirely, and instead just extract the dims once:
const_dims = ds[constit_list[0]].dims
as they should (?) be the same between all of the constituents.
Then down below instead of looping over the variables
quantity for the constituent name just loop over constit_list
again. This will help reduce the total number of variables in this function, and the total number of iterations it has to perform.
lib/adf_diag.py
Outdated
data_arrays.append(da) | ||
|
||
#NOTE: this will need to be changed when derived equations are more than sums! - JR | ||
ds[var] = sum(data_arrays) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, for now we could just have a specific check for RESTOM
here, e.g.:
ds[var] = sum(data_arrays) | |
if var == "RESTOM": | |
#Assume FSNT comes before FLNT: | |
ds[var] = data_arrays[0] - data_arrays[1] | |
else: | |
ds[var] = sum(data_arrays) |
lib/adf_diag.py
Outdated
if var == "SO4": | ||
ds[var] = ds[var]*(96/115) | ||
|
||
ds.to_netcdf(derived_file, unlimited_dims='time', mode='w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I would remove all of the variables in ds
except for the derived variable just before you run the to_netcdf
command.
Clean up derived variable function to grab coordinates and dimensions for new variable netcdf file. Also, remove constituents from derived variable file since they will have their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me now, thanks!
This PR will add calculations for zonal aerosol plots, and add new functionality for colorbar options.
Files changed:
adf_variable_defaults.yaml
adf_diag.py
plotting_functions.py
regrid_and_vert_interp.py: