Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple time series datasets via glob and fix enso_diags, streamflow, tc_analysis #866

Open
wants to merge 13 commits into
base: cdat-migration-fy24
Choose a base branch
from

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Oct 8, 2024

Description

Summary of changes

  • Add support for glob filepath
    • Replace open_dataset() with open_mfdataset() in _get_time_series_dataset_obj()
    • Update _get_time_series_filepaths and _get_matching_time_series_filepaths to return a list of filepath(s) or None if no filepaths found
    • Update _get_time_slice() to support parsing a list of filepath(s) for start and end years via _parse_years_from_filepaths()
  • Fix enso_diags (related errors)
    • Fix calculate_nino_index_model() not catching the correct IOError message before trying to get the sst dataset using the "TS" variable
    • Fix slow time slice retrieval with bounds -- need to perform .load() with time series dataset for downstream operations
    • Results after fixes (successful):
     2024-10-09 13:48:27,949 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'num_workers', 'short_test_name', 'multiprocessing']
     2024-10-09 13:48:27,950 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'num_workers', 'short_test_name', 'multiprocessing']
     2024-10-09 13:48:27,950 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'test_name', 'num_workers', 'multiprocessing']
     2024-10-09 13:48:33,776 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: model_vs_obs_1987-1988/prov/environment.yml
     2024-10-09 13:48:33,777 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: model_vs_obs_1987-1988/prov/cmd_used.txt
     2024-10-09 13:48:33,781 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: model_vs_obs_1987-1988/prov/run_script.py
     2024-10-09 13:48:34,866 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:34,866 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:34,948 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:34,948 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:35,361 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:35,362 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: LHFLX
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FLNS
     2024-10-09 13:48:36,805 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:48:38,792 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
     2024-10-09 13:48:38,792 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
     2024-10-09 13:48:38,793 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:38,798 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:38,902 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.json
     2024-10-09 13:48:38,930 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:39,038 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: SHFLX
     2024-10-09 13:48:40,874 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:40,874 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:40,875 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:40,879 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:41,062 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:41,163 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: LHFLX
     2024-10-09 13:48:43,234 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
     2024-10-09 13:48:43,234 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
     2024-10-09 13:48:43,235 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:43,238 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:43,270 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:43,270 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:43,271 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:43,273 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:43,369 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:43,403 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:43,472 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:43,472 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUY
     2024-10-09 13:48:43,503 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: TAUX
     2024-10-09 13:48:44,604 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:48:45,429 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
     2024-10-09 13:48:45,429 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
     2024-10-09 13:48:45,430 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:45,437 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:45,598 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:45,697 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:45,697 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: NET_FLUX_SRF
     2024-10-09 13:48:46,323 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.json
     2024-10-09 13:48:49,068 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
     2024-10-09 13:48:49,068 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
     2024-10-09 13:48:49,069 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:49,076 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:49,246 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:49,349 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: NET_FLUX_SRF
     2024-10-09 13:49:16,382 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:17,699 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
     2024-10-09 13:49:17,699 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
     2024-10-09 13:49:17,702 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:49:17,721 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:17,917 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:18,033 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FSNS
     2024-10-09 13:49:18,714 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.json
     2024-10-09 13:49:20,821 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
     2024-10-09 13:49:20,821 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
     2024-10-09 13:49:20,823 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:20,829 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:20,957 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:21,058 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:21,058 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUX
     2024-10-09 13:49:22,389 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:22,591 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
     2024-10-09 13:49:22,591 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
     2024-10-09 13:49:22,594 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:22,597 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:22,732 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:22,837 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:22,837 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: SHFLX
     2024-10-09 13:49:24,111 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.json
     2024-10-09 13:49:24,240 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:25,922 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.json
     2024-10-09 13:49:27,114 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
     2024-10-09 13:49:27,114 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
     2024-10-09 13:49:27,116 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:27,122 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:27,284 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:27,387 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:27,388 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: PRECT
     2024-10-09 13:49:30,922 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
     2024-10-09 13:49:30,922 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
     2024-10-09 13:49:33,589 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:34,665 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.json
     2024-10-09 13:49:37,054 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
     2024-10-09 13:49:37,054 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
     2024-10-09 13:49:37,102 [INFO]: main.py(create_viewer:132) >> enso_diags model_vs_obs_1987-1988/viewer
     2024-10-09 13:49:37,216 [INFO]: main.py(create_viewer:135) >> ('ENSO Diagnostics', 'enso_diags/index.html')
     2024-10-09 13:49:37,220 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at model_vs_obs_1987-1988/viewer/index.html
     2024-10-09 13:49:37,223 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in model_vs_obs_1987-1988/prov/e3sm_diags_run.log
  • Fix tc_analysis
    • The TE stitch file is empty
    • /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988/cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
        2024-10-09 16:25:18,711 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.tc_analysis_driver
      Traceback (most recent call last):
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
          single_result = module.run_diag(self)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
          test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 181, in generate_tc_metrics_from_te_stitch_file
          te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 258, in _get_vars_from_te_stitch
          year_start = int(lines[0].split("\t")[2])
      IndexError: list index out of range
      2024-10-09 16:25:22,252 [WARNING]: e3sm_diags_driver.py(main:378) >> There was not a single valid diagnostics run, no viewer created.
      2024-10-09 16:25:22,253 [ERROR]: run.py(run_diags:91) >> Error traceback:
      Traceback (most recent call last):
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
          params_results = main(params)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 397, in main
          if parameters_results[0].fail_on_incomplete and (
      IndexError: list index out of range
  • Fix streamflow
    • Add support for finding .nc files using glob and regex under nested directories -- not sure if this will work because how will e3sm_diags know which .nc files to use if there are multiple sub-directories with the same matching files?
      • Alternative: Update zppy to use exact filepath for root directory containing file
      • e.g., For streamflow: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr/
  • Perform regression testing with run script on time series datasets to ensure changes work properly for all sets
  • Add unit tests [IN PROGRESS]

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder added the cdat-migration-fy24 CDAT Migration FY24 Task label Oct 8, 2024
@tomvothecoder tomvothecoder self-assigned this Oct 8, 2024
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob Add support for multiple time series datasets via glob and fix enso_diags and tc_analysis Oct 8, 2024
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob and fix enso_diags and tc_analysis Add support for multiple time series datasets via glob and fix enso_diags, streamflow, tc_analysis Oct 9, 2024
@tomvothecoder
Copy link
Collaborator Author

@forsyth2 For the failing tc_analysis set on main and the cdat-migration-fy24 branch, I found the TE stitch file is empty. This results in the ambiguous Python bug related to not being able to parse the lines of the file.

Can you fix this file? /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988/cyclones_stitch_v3.LR.historical_0051_1987_1988.dat

@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Oct 10, 2024

@tomvothecoder thank you for helping identify this problem. In this version of v3 Low Resolution datasets (documented here), much lower TC activity is found. It is likely that no TC is detected during the testing period (1987-1988). Therefore no file called "cyclones_stitch_v3.LR.historical_0051_1987_1988.dat" was generated from an upstream process. So the testing result that skipped tc_analysis figure is expected in case of v3..

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder thank you for helping identify this problem. In this version of v3 Low Resolution datasets (documented here), much lower TC activity is found. It is likely that no TC is detected during the testing period (1987-1988). Therefore no file called "cyclones_stitch_v3.LR.historical_0051_1987_1988.dat" was generated from an upstream process. So the testing result that skipped tc_analysis figure is expected in case of v3..

Thank you for clarifying. It sounds like @forsyth2 should exclude tc_analysis from zppy for now and open a separate GitHub issue to add it back later.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Oct 10, 2024

@forsyth2 The run script in zppy specifies input dir paths that contain sub-directories. e3sm_diags cannot determine which .nc files to use if there are multiple sub-directories containing the same matching filenames.

Instead, the zppy run script should specify the exact input dir path containing the input .nc files that should be used by e3sm_diags

Why?

I tried implementing functionality to parse the input data path for .nc files nested under these sub-directories. However, this presents issues where multiple sub-directories might contain the same input file pattern.

For example:

  • Input Path: /input_path/
  • Pattern: a_file.{13}.nc
  • Dir path 1: /input_path/dir_1/a_file_195001_198501.nc -> we want this file
  • Dir path 2: /input_path/dir_2/a_file_195001_198501.nc -> we don't want this file, but it will still be considered match

Fix -> set Input Path to /input_path/dir_1/

Fixes for zppy e3sm_diags run script

I used the e3sm.py run script from the provenance here and noticed the test_data_path parameters aren't exact (e.g., streamflow_param.test_data_path = 'rof')

enso_diags:

  • Instead of: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/
  • It should be: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr
(e3sm_diags_dev_673) [ac.tvo@chrlogin1 e3sm_diags]$ ls -al /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr
total 1249540
drwxrws---+ 4 ac.forsyth2 E3SM      4096 Sep 17 16:13 .
drwxrws---+ 3 ac.forsyth2 E3SM      4096 Sep 17 16:13 ..
drwxrws---+ 2 ac.forsyth2 E3SM      4096 Sep 17 16:13 1985_1986
drwxrws---+ 2 ac.forsyth2 E3SM      4096 Sep 17 16:13 1987_1988
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDHGH_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDHGH_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDLOW_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDLOW_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 CLDMED_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 CLDMED_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 CLDTOT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 CLDTOT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FLNS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FLNS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759400 Sep 17 16:13 FLNT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759400 Sep 17 16:13 FLNT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759404 Sep 17 16:13 FLUT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759404 Sep 17 16:13 FLUT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FSNS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FSNS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759396 Sep 17 16:13 FSNT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759396 Sep 17 16:13 FSNT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759416 Sep 17 16:13 FSNTOA_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759416 Sep 17 16:13 FSNTOA_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 PRECC_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 PRECC_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759324 Sep 17 16:13 PRECL_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759324 Sep 17 16:13 PRECL_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 PRECSC_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 PRECSC_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759332 Sep 17 16:13 PRECSL_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759332 Sep 17 16:13 PRECSL_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759348 Sep 17 16:13 QFLX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759348 Sep 17 16:13 QFLX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759364 Sep 17 16:13 SHFLX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759364 Sep 17 16:13 SHFLX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759284 Sep 17 16:13 TAUX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759284 Sep 17 16:13 TAUX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759292 Sep 17 16:13 TAUY_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759292 Sep 17 16:13 TAUY_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759352 Sep 17 16:13 TREFHT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759352 Sep 17 16:13 TREFHT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759336 Sep 17 16:13 TS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759336 Sep 17 16:13 TS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 504428552 Sep 17 16:13 U_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 504428552 Sep 17 16:13 U_198701_198812.nc

streamflow:

  • Instead of /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/
  • It should be: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr
(e3sm_diags_dev_673) [ac.tvo@chrlogin1 e3sm_diags]$ ls -al /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr
total 50690
drwxrws---+ 2 ac.forsyth2 E3SM     4096 Sep 17 16:13 .
drwxrws---+ 3 ac.forsyth2 E3SM     4096 Sep 17 16:13 ..
-rw-rw----+ 1 ac.forsyth2 E3SM 25938612 Sep 17 16:13 RIVER_DISCHARGE_OVER_LAND_LIQ_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 25938612 Sep 17 16:13 RIVER_DISCHARGE_OVER_LAND_LIQ_198701_198812.nc

@forsyth2
Copy link
Collaborator

forsyth2 commented Oct 11, 2024

@tomvothecoder Thank you for looking into all this.

should exclude tc_analysis from zppy for now and open a separate GitHub issue to add it back later.

I also added a failure if the stitch file is empty (included in E3SM-Project/zppy#628) to catch this in the future. But yes, for now, we'll just have to exclude tc_analysis from v3 testing.

Instead, the zppy run script should specify the exact input dir path containing the input .nc files that should be used by e3sm_diags

I'm going to have to dive deeper into this. In theory, zppy is constructing the exact paths given other parameters passed to it. E.g., in e3sm_diags.bash:

ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
ts_daily_dir={{ output }}/post/atm/{{ grid }}/ts/daily/{{ '%dyr' % (ts_num_years) }}
ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"

@tomvothecoder
Copy link
Collaborator Author

I'm going to have to dive deeper into this. In theory, zppy is constructing the exact paths given other parameters passed to it. E.g., in e3sm_diags.bash:

Got it. I will do final testing then merge this PR soon for you to try in zppy.

- Fix units and long name
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chengzhuzhang, I would appreciate a PR review to ensure my code changes look good to you as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main of file of interest to add support for glob of .nc files.

Comment on lines +178 to +179
new_var.attrs["units"] = "W/m2"
new_var.attrs["long_name"] = "Surface latent heat flux"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses #818

var_start_year = int(filename.split("/")[-1].split("_")[-2][:4])
var_end_year = int(filename.split("/")[-1].split("_")[-1][:4])
start_yr_int, end_yr_int = int(self.start_yr), int(self.end_yr)
var_start_year, var_end_year = self._parse_years_from_filepaths(filepaths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder can we parse_years based on ds_subset, so that we don't need this _parse_years_from_filepaths function?

Copy link
Contributor

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good! I only have one minor comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdat-migration-fy24 CDAT Migration FY24 Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants