-
Notifications
You must be signed in to change notification settings - Fork 36
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* SDAP -390 Update NetCDF reader tool for data match-up * Update CHANGELOG.md * Update cdms_reader.py * Update README.md * Update cdms_reader.py * Updated README.md. Co-authored-by: Jordan Gethers <[email protected]> Co-authored-by: nchung <[email protected]>
- Loading branch information
1 parent
5c96c3d
commit 1dc62e2
Showing
6 changed files
with
546 additions
and
211 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# CDMS_reader.py | ||
The functions in cdms_reader.py read a CDMS netCDF file into memory, assemble a list of matches from a primary (satellite) and secondary (satellite or in situ) data set, and optionally outputs the matches to a CSV file. Each matched pair contains one primary data record and one in secondary data record. | ||
|
||
The CDMS netCDF files holds the two groups (`PrimaryData` and `SecondaryData`). The `matchIDs` netCDF variable contains pairs of IDs (matches) which reference a primary data record and a secondary data record in their respective groups. These records have a many-to-many relationship; one primary record may match to many in secondary records, and one secondary record may match to many primary records. The `assemble_matches` function assembles the individual data records into pairs based on their `dim` group dimension IDs as paired in the `matchIDs` variable. | ||
|
||
## Requirements | ||
This tool was developed and tested with Python 3.9.13. | ||
Imported packages: | ||
* argparse | ||
* string | ||
* netcdf4 | ||
* sys | ||
* datetime | ||
* csv | ||
* collections | ||
* logging | ||
|
||
|
||
## Functions | ||
### Function: `assemble_matches(filename)` | ||
Read a CDMS netCDF file into memory and return a list of matches from the file. | ||
|
||
#### Parameters | ||
- `filename` (str): the CDMS netCDF file name. | ||
|
||
#### Returns | ||
- `matches` (list): List of matches. | ||
|
||
Each list element in `matches` is a dictionary organized as follows: | ||
For match `m`, netCDF group `GROUP` ('PrimaryData' or 'SecondaryData'), and netCDF group variable `VARIABLE`: | ||
|
||
`matches[m][GROUP]['matchID']`: netCDF `MatchedRecords` dimension ID for the match | ||
`matches[m][GROUP]['GROUPID']`: GROUP netCDF `dim` dimension ID for the record | ||
`matches[m][GROUP][VARIABLE]`: variable value | ||
|
||
For example, to access the timestamps of the primary data and the secondary data of the first match in the list, along with the `MatchedRecords` dimension ID and the groups' `dim` dimension ID: | ||
```python | ||
matches[0]['PrimaryData']['time'] | ||
matches[0]['SecondaryData']['time'] | ||
matches[0]['PrimaryData']['matchID'] | ||
matches[0]['PrimaryData']['PrimaryDataID'] | ||
matches[0]['SecondaryData']['SecondaryDataID'] | ||
``` | ||
|
||
### Function: `matches_to_csv(matches, csvfile)` | ||
Write the CDMS matches to a CSV file. Include a header of column names which are based on the group and variable names from the netCDF file. | ||
|
||
#### Parameters: | ||
- `matches` (list): the list of dictionaries containing the CDMS matches as returned from the `assemble_matches` function. | ||
- `csvfile` (str): the name of the CSV output file. | ||
|
||
### Function: `get_globals(filename)` | ||
Write the CDMS global attributes to a text file. Additionally, | ||
within the file there will be a description of where all the different | ||
outputs go and how to best utlize this program. | ||
|
||
#### Parameters: | ||
- `filename` (str): the name of the original '.nc' input file | ||
|
||
### Function: `create_logs(user_option, logName)` | ||
Write the CDMS log information to a file. Additionally, the user may | ||
opt to print this information directly to stdout, or discard it entirely. | ||
|
||
#### Parameters | ||
- `user_option` (str): The result of the arg.log 's interpretation of | ||
what option the user selected. | ||
- `logName` (str): The name of the log file we wish to write to, | ||
assuming the user did not use the -l option. | ||
|
||
## Usage | ||
For example, to read some CDMS netCDF file called `cdms_file.nc`: | ||
### Command line | ||
The main function for `cdms_reader.py` takes one `filename` parameter (`cdms_file.nc` argument in this example) for the CDMS netCDF file to read and calls the `assemble_matches` function. If the -c parameter is utilized, the `matches_to_csv` function is called to write the matches to a CSV file `cdms_file.csv`. If the -g parameter is utilized, the `get_globals` function is called to show them the files globals attributes as well as a short explanation of how the files can be best utlized. Logs of the program are kept automatically in `cdms_file.log` but can be omitted or rerouted with the -l parameter. P.S. when using the --csv, --log, or --meta options, these are the same three commands but --log cannot take any parameters like its' recommended syntax (-l) does. | ||
``` | ||
python cdms_reader.py cdms_file.nc -c -g | ||
``` | ||
python3 cdms_reader.py cdms_file.nc -c -g | ||
``` | ||
python3 cdms_reader.py cdms_file.nc --csv --meta | ||
### Importing `assemble_matches` | ||
```python | ||
from cdms_reader import assemble_matches | ||
matches = assemble_matches('cdms_file.nc') | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse | ||
import string | ||
from netCDF4 import Dataset, num2date | ||
import sys | ||
import datetime | ||
import csv | ||
from collections import OrderedDict | ||
import logging | ||
|
||
#TODO: Get rid of numpy errors? | ||
#TODO: Update big SDAP README | ||
|
||
LOGGER = logging.getLogger("cdms_reader") | ||
|
||
def assemble_matches(filename): | ||
""" | ||
Read a CDMS netCDF file and return a list of matches. | ||
Parameters | ||
---------- | ||
filename : str | ||
The CDMS netCDF file name. | ||
Returns | ||
------- | ||
matches : list | ||
List of matches. Each list element is a dictionary. | ||
For match m, netCDF group GROUP (PrimaryData or SecondaryData), and | ||
group variable VARIABLE: | ||
matches[m][GROUP]['matchID']: MatchedRecords dimension ID for the match | ||
matches[m][GROUP]['GROUPID']: GROUP dim dimension ID for the record | ||
matches[m][GROUP][VARIABLE]: variable value | ||
""" | ||
|
||
try: | ||
# Open the netCDF file | ||
with Dataset(filename, 'r') as cdms_nc: | ||
# Check that the number of groups is consistent w/ the MatchedGroups | ||
# dimension | ||
assert len(cdms_nc.groups) == cdms_nc.dimensions['MatchedGroups'].size,\ | ||
("Number of groups isn't the same as MatchedGroups dimension.") | ||
|
||
matches = [] | ||
matched_records = cdms_nc.dimensions['MatchedRecords'].size | ||
|
||
# Loop through the match IDs to assemble matches | ||
for match in range(0, matched_records): | ||
match_dict = OrderedDict() | ||
# Grab the data from each platform (group) in the match | ||
for group_num, group in enumerate(cdms_nc.groups): | ||
match_dict[group] = OrderedDict() | ||
match_dict[group]['matchID'] = match | ||
ID = cdms_nc.variables['matchIDs'][match][group_num] | ||
match_dict[group][group + 'ID'] = ID | ||
for var in cdms_nc.groups[group].variables.keys(): | ||
match_dict[group][var] = cdms_nc.groups[group][var][ID] | ||
|
||
# Create a UTC datetime field from timestamp | ||
dt = num2date(match_dict[group]['time'], | ||
cdms_nc.groups[group]['time'].units) | ||
match_dict[group]['datetime'] = dt | ||
LOGGER.info(match_dict) | ||
matches.append(match_dict) | ||
|
||
return matches | ||
except (OSError, IOError) as err: | ||
LOGGER.exception("Error reading netCDF file " + filename) | ||
raise err | ||
|
||
def matches_to_csv(matches, csvfile): | ||
""" | ||
Write the CDMS matches to a CSV file. Include a header of column names | ||
which are based on the group and variable names from the netCDF file. | ||
Parameters | ||
---------- | ||
matches : list | ||
The list of dictionaries containing the CDMS matches as returned from | ||
assemble_matches. | ||
csvfile : str | ||
The name of the CSV output file. | ||
""" | ||
# Create a header for the CSV. Column names are GROUP_VARIABLE or | ||
# GROUP_GROUPID. | ||
header = [] | ||
for key, value in matches[0].items(): | ||
for otherkey in value.keys(): | ||
header.append(key + "_" + otherkey) | ||
|
||
try: | ||
# Write the CSV file | ||
with open(csvfile, 'w') as output_file: | ||
csv_writer = csv.writer(output_file) | ||
csv_writer.writerow(header) | ||
for match in matches: | ||
row = [] | ||
for group, data in match.items(): | ||
for value in data.values(): | ||
row.append(value) | ||
csv_writer.writerow(row) | ||
except (OSError, IOError) as err: | ||
LOGGER.exception("Error writing CSV file " + csvfile) | ||
raise err | ||
|
||
def get_globals(filename): | ||
""" | ||
Write the CDMS global attributes to a text file. Additionally, | ||
within the file there will be a description of where all the different | ||
outputs go and how to best utlize this program. | ||
Parameters | ||
---------- | ||
filename : str | ||
The name of the original '.nc' input file. | ||
""" | ||
x0 = "README / cdms_reader.py Program Use and Description:\n" | ||
x1 = "\nThe cdms_reader.py program reads a CDMS netCDF (a NETCDF file with a matchIDs variable)\n" | ||
x2 = "file into memory, assembles a list of matches of primary and secondary data\n" | ||
x3 = "and optionally\n" | ||
x4 = "output the matches to a CSV file. Each matched pair contains one primary\n" | ||
x5 = "data record and one secondary data record.\n" | ||
x6 = "\nBelow, this file wil list the global attributes of the .nc (NETCDF) file.\n" | ||
x7 = "If you wish to see a full dump of the data from the .nc file,\n" | ||
x8 = "please utilize the ncdump command from NETCDF (or look at the CSV file).\n" | ||
try: | ||
with Dataset(filename, "r", format="NETCDF4") as ncFile: | ||
txtName = filename.replace(".nc", ".txt") | ||
with open(txtName, "w") as txt: | ||
txt.write(x0 + x1 +x2 +x3 + x4 + x5 + x6 + x7 + x8) | ||
txt.write("\nGlobal Attributes:") | ||
for x in ncFile.ncattrs(): | ||
txt.write(f'\t :{x} = "{ncFile.getncattr(x)}" ;\n') | ||
|
||
|
||
except (OSError, IOError) as err: | ||
LOGGER.exception("Error reading netCDF file " + filename) | ||
print("Error reading file!") | ||
raise err | ||
|
||
def create_logs(user_option, logName): | ||
""" | ||
Write the CDMS log information to a file. Additionally, the user may | ||
opt to print this information directly to stdout, or discard it entirely. | ||
Parameters | ||
---------- | ||
user_option : str | ||
The result of the arg.log 's interpretation of | ||
what option the user selected. | ||
logName : str | ||
The name of the log file we wish to write to, | ||
assuming the user did not use the -l option. | ||
""" | ||
if user_option == 'N': | ||
print("** Note: No log was created **") | ||
|
||
|
||
elif user_option == '1': | ||
#prints the log contents to stdout | ||
logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', | ||
level=logging.INFO, | ||
datefmt='%Y-%m-%d %H:%M:%S', | ||
handlers=[ | ||
logging.StreamHandler(sys.stdout) | ||
]) | ||
|
||
else: | ||
#prints log to a .log file | ||
logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', | ||
level=logging.INFO, | ||
datefmt='%Y-%m-%d %H:%M:%S', | ||
handlers=[ | ||
logging.FileHandler(logName) | ||
]) | ||
if user_option != 1 and user_option != 'Y': | ||
print(f"** Bad usage of log option. Log will print to {logName} **") | ||
|
||
|
||
|
||
|
||
|
||
if __name__ == '__main__': | ||
""" | ||
Execution: | ||
python cdms_reader.py filename | ||
OR | ||
python3 cdms_reader.py filename | ||
OR | ||
python3 cdms_reader.py filename -c -g | ||
OR | ||
python3 cdms_reader.py filename --csv --meta | ||
Note (For Help Try): | ||
python3 cdms_reader.py -h | ||
OR | ||
python3 cdms_reader.py --help | ||
""" | ||
|
||
u0 = '\n%(prog)s -h OR --help \n' | ||
u1 = '%(prog)s filename -c -g\n%(prog)s filename --csv --meta\n' | ||
u2 ='Use -l OR -l1 to modify destination of logs' | ||
p = argparse.ArgumentParser(usage= u0 + u1 + u2) | ||
|
||
#below block is to customize user options | ||
p.add_argument('filename', help='CDMS netCDF file to read') | ||
p.add_argument('-c', '--csv', nargs='?', const= 'Y', default='N', | ||
help='Use -c or --csv to retrieve CSV output') | ||
p.add_argument('-g', '--meta', nargs='?', const='Y', default='N', | ||
help='Use -g or --meta to retrieve global attributes / metadata') | ||
p.add_argument('-l', '--log', nargs='?', const='N', default='Y', | ||
help='Use -l or --log to AVOID creating log files, OR use -l1 to print to stdout/console') | ||
|
||
#arguments are processed by the next line | ||
args = p.parse_args() | ||
|
||
logName = args.filename.replace(".nc", ".log") | ||
create_logs(args.log, logName) | ||
|
||
cdms_matches = assemble_matches(args.filename) | ||
|
||
if args.csv == 'Y' : | ||
matches_to_csv(cdms_matches, args.filename.replace(".nc",".csv")) | ||
|
||
if args.meta == 'Y' : | ||
get_globals(args.filename) | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Oops, something went wrong.