Sensed related charts not being generated for open-access-openpath #132

iantei · 2024-04-19T23:44:40Z

Currently, there is issue with the generation of open-access-openpath:
https://open-access-openpath.nrel.gov/public/

The primary reason is unavailability of column cleaned_section_summary in expanded_ct dataframe.

Error call stack:

AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 expanded_ct, file_suffix, quality_text, debug_df = scaffolding.load_viz_notebook_sensor_inference_data(year,
      2                                                                             month,
      3                                                                             program,
      4                                                                             include_test_users,
      5                                                                             sensed_algo_prefix)

File /usr/src/app/saved-notebooks/scaffolding.py:229, in load_viz_notebook_sensor_inference_data(year, month, program, include_test_users, sensed_algo_prefix)
    227 print(f"Expanded_ct columns: \n {expanded_ct.columns}")
    228 if len(expanded_ct) > 0:
--> 229     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    230     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    231     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/generic.py:5902, in NDFrame.__getattr__(self, name)
   5895 if (
   5896     name not in self._internal_names_set
   5897     and name not in self._metadata
   5898     and name not in self._accessors
   5899     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5900 ):
   5901     return self[name]
-> 5902 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'cleaned_section_summary'

For the dataset fc_*, whichhas issue with creating sensed related charts, below are the expanded_ct columns:

 Expanded_ct columns: 
 Index(['source', 'end_ts', 'end_fmt_time', 'end_loc', 'raw_trip', 'start_ts',
       'start_fmt_time', 'start_loc', 'duration', 'distance', 'start_place',
       'end_place', 'cleaned_trip', 'inferred_labels', 'inferred_trip',
       'expectation', 'confidence_threshold', 'expected_trip', 'user_input',
       'start_local_dt_year', 'start_local_dt_month', 'start_local_dt_day',
       'start_local_dt_hour', 'start_local_dt_minute', 'start_local_dt_second',
       'start_local_dt_weekday', 'start_local_dt_timezone',
       'end_local_dt_year', 'end_local_dt_month', 'end_local_dt_day',
       'end_local_dt_hour', 'end_local_dt_minute', 'end_local_dt_second',
       'end_local_dt_weekday', 'end_local_dt_timezone', '_id', 'user_id',
       'metadata_write_ts'],
      dtype='object')

For the dataset openpath_prod_cortezebikes which doesn't have issue with creating sensed related charts.


Expanded_ct columns: 
Index(['source', 'end_ts', 'end_fmt_time', 'end_loc', 'raw_trip', 'start_ts',
      'start_fmt_time', 'start_loc', 'duration', 'distance', 'start_place',
      'end_place', 'cleaned_trip', 'inferred_labels', 'inferred_trip',
      'expectation', 'confidence_threshold', 'expected_trip', 'user_input',
      'additions', 'inferred_section_summary', 'cleaned_section_summary',
      'start_local_dt_year', 'start_local_dt_month', 'start_local_dt_day',
      'start_local_dt_hour', 'start_local_dt_minute', 'start_local_dt_second',
      'start_local_dt_weekday', 'start_local_dt_timezone',
      'end_local_dt_year', 'end_local_dt_month', 'end_local_dt_day',
      'end_local_dt_hour', 'end_local_dt_minute', 'end_local_dt_second',
      'end_local_dt_weekday', 'end_local_dt_timezone', '_id', 'user_id',
      'metadata_write_ts'],
     dtype='object')
_default

The difference in columns for expanded_ct while using these two dataset are enlisted below:

-  'additions', 
- 'inferred_section_summary', 
- 'cleaned_section_summary'

The text was updated successfully, but these errors were encountered:

iantei · 2024-04-20T00:04:51Z

Upon further investigation:

def load_all_confirmed_trips(tq):
    agg = esta.TimeSeries.get_aggregate_time_series()
    all_ct = agg.get_data_df("analysis/confirmed_trip", tq)
    print("Loaded all confirmed trips of length %s" % len(all_ct))
    print(f"Columns of all_ct: {all_ct.columns} \n")
    disp.display(all_ct.head())
    return all_ct

The all_ct data frame doesn't have additions, inferred_section_summary and cleaned_section_summary columns.

Need to understand further, why these columns are missing - which is coming from analysis/confirmed_trip.

iantei · 2024-04-22T17:34:17Z

Looking into the server side of code:

Inside emission/analysis/userinput/matcher.py

def create_confirmed_entry(ts, tce, confirmed_key, input_key_list):
    # Copy the entry and fill in the new values
    confirmed_object_data = copy.copy(tce["data"])
    # del confirmed_object_dict["_id"]
    # confirmed_object_dict["metadata"]["key"] = confirmed_key
    if (confirmed_key == esda.CONFIRMED_TRIP_KEY):
        confirmed_object_data["expected_trip"] = tce.get_id()
        logging.debug("creating confimed entry from %s" % tce)
        cleaned_trip = ts.get_entry_from_id(esda.CLEANED_TRIP_KEY,
            tce["data"]["cleaned_trip"])
        confirmed_object_data['inferred_section_summary'] = get_section_summary(ts, cleaned_trip, "analysis/inferred_section")
        confirmed_object_data['cleaned_section_summary'] = get_section_summary(ts, cleaned_trip, "analysis/cleaned_section")
    elif (confirmed_key == esda.CONFIRMED_PLACE_KEY):
        confirmed_object_data["cleaned_place"] = tce.get_id()
    confirmed_object_data["user_input"] = \
        get_user_input_dict(ts, tce, input_key_list)
    confirmed_object_data["additions"] = \
        esdt.get_additions_for_timeline_entry_object(ts, tce)
    return ecwe.Entry.create_entry(tce['user_id'], confirmed_key, confirmed_object_data)

We have "additions", "cleaned_section_summary" and "inferred_section_summary" missing.
@shankari Could we have access to the server log so we can understand why this is happening?

Abby-Wheelis · 2024-04-22T17:43:12Z

When we do look at the server logs, I think it would help to look first for the log statements from 'get_section_summary'

Abby-Wheelis · 2024-04-23T14:57:17Z

Yesterday evening/this morning I had a problem with the sensed notebook on my survey additions branch see here. This isn't the same error, as it happened later when making the 80% chart, but we should keep an eye out for that case once this error is resolved and when testing the stacked bar chart changes.

Abby-Wheelis · 2024-04-23T18:51:16Z

Just checked on open-access and the behavior there is different than the error I was working with. In my case, the number of trips (sensed) was ok, but the entire notebook errored out on the number of trips under 80% (sensed) chart. If it was the same error, I would expect to see the first chart, but all of the sensed charts are nulled out and none of them are showing.

iantei · 2024-04-25T12:53:56Z

Tried to load the dataset into Mongo, using the below script:

bash viz_scripts/docker/load_mongodump.sh <mongodump_file>

for the snapshot of open-access dataset April 24. The dataset is considerable huge i.e. ~ 4.4 GB.

With the resource maxed to 16 GB Container Memory and 10 core for Container CPU Usage.
The entire dataset could not be loaded, resulting in below case:

Terminal	Docker Resource Profile Chart

Corresponding to the resource usage on the right, the script exited early, as it reached the threshold of the container memory allocation.

Next: Trying with the Container Resource CPU core allocated to 16 core.

shankari · 2024-04-25T14:31:28Z

Please see the workaround for loading less data for testing the public dashboard

iantei · 2024-04-25T17:01:34Z

Error Stack:

Is there any cleaned_section_summary which has NaN values?: True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(year,
      2                                                                             month,
      3                                                                             program,
      4                                                                             include_test_users,
      5                                                                             sensed_algo_prefix)

File /usr/src/app/saved-notebooks/scaffolding.py:246, in load_viz_notebook_sensor_inference_data(year, month, program, include_test_users, sensed_algo_prefix)
    242 if len(expanded_ct) > 0:
    244     print(f"Is there any cleaned_section_summary which has NaN values?: {participant_ct_df['cleaned_section_summary'].isna().any()}")
--> 246     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    247     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    248     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4661 def apply(
   4662     self,
   4663     func: AggFuncType,
   (...)
   4666     **kwargs,
   4667 ) -> DataFrame | Series:
   4668     """
   4669     Invoke function on values of Series.
   4670 
   (...)
   4769     dtype: float64
   4770     """
-> 4771     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
   1120     return self.apply_str()
   1122 # self.f is Callable
-> 1123 return self.apply_standard()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
   1172     else:
   1173         values = obj.astype(object)._values
-> 1174         mapped = lib.map_infer(
   1175             values,
   1176             f,
   1177             convert=self.convert_dtype,
   1178         )
   1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1181     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1182     #  See also GH#25959 regarding EA support
   1183     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()

File /usr/src/app/saved-notebooks/scaffolding.py:246, in load_viz_notebook_sensor_inference_data.<locals>.<lambda>(md)
    242 if len(expanded_ct) > 0:
    244     print(f"Is there any cleaned_section_summary which has NaN values?: {participant_ct_df['cleaned_section_summary'].isna().any()}")
--> 246     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    247     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    248     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

TypeError: 'float' object is not subscriptable

Added the below line of code:

{participant_ct_df['cleaned_section_summary'].isna().any()}

Result: Is there any cleaned_section_summary which has NaN values?: True
which shows there are NaN values for cleaned_section_summary. Therefore, some operation over it would lead to the below error.

This seems identical to issue described here: Issue 93

iantei · 2024-04-25T19:38:07Z

Proposal for solution:

expanded_ct = participant_ct_df.copy()
expanded_ct = expanded_ct_copy.dropna(subset=['cleaned_section_summary'])

Create a copy of participant_ct_df such that the original df is not modified.
Drop the rows from the data frame wherever cleaned_section_summary is NaN.

shankari · 2024-04-26T01:22:06Z

@iantei dropna will just paper over the real issue. The cleaned_summary_section should always exist.
you can:

see if there are patterns around missing section summaries - maybe the backwards compat code was not executed on this deployment
run the pipeline on the snapshot to see where it fails

iantei · 2024-05-04T01:41:49Z

There are 3878 records which has NaN for cleaned_section_summary.

Script to filter out NaN values' for cleaned_section_summary's end_fmt_time in sorted way

    nan_rows = participant_ct_df[participant_ct_df['cleaned_section_summary'].isna()]
    print(len(nan_rows))
    end_fmt_times = []

    for index, row in nan_rows.iterrows():
        end_fmt_times.append(row['end_fmt_time'])
    end_fmt_times.sort()

    # Print the sorted list)
    for timestamp in end_fmt_times:
        print(timestamp)

There is an observation for pattern:

2022-07-07T20:52:04.129278-07:00
2022-07-07T21:46:43.999819-07:00
...
2023-08-04T15:10:53.000056-04:00
2023-08-04T15:15:06.000034-04:00
2023-08-04T17:28:29.755166-04:00
2023-08-04T18:45:00.000004-04:00

All these entries have timestamp with end_fmt_time prior to the deliverable of #92 which was delivered on 11th September 2023.
This indicates a strong likelyhood of the possibility you mentioned about backwards compat code not being executed on this deployment.

iantei added this to OpenPATH Tasks Overview Apr 19, 2024

iantei moved this to Issues being worked on in OpenPATH Tasks Overview Apr 19, 2024

iantei moved this from Issues being worked on to Questions for Shankari in OpenPATH Tasks Overview Apr 22, 2024

shankari moved this from Questions for Shankari to Issues being worked on in OpenPATH Tasks Overview Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sensed related charts not being generated for open-access-openpath #132

Sensed related charts not being generated for open-access-openpath #132

iantei commented Apr 19, 2024

iantei commented Apr 20, 2024

iantei commented Apr 22, 2024

Abby-Wheelis commented Apr 22, 2024

Abby-Wheelis commented Apr 23, 2024

Abby-Wheelis commented Apr 23, 2024

iantei commented Apr 25, 2024

shankari commented Apr 25, 2024

iantei commented Apr 25, 2024

iantei commented Apr 25, 2024

shankari commented Apr 26, 2024 •

edited

Loading

iantei commented May 4, 2024

Sensed related charts not being generated for open-access-openpath #132

Sensed related charts not being generated for open-access-openpath #132

Comments

iantei commented Apr 19, 2024

iantei commented Apr 20, 2024

iantei commented Apr 22, 2024

Abby-Wheelis commented Apr 22, 2024

Abby-Wheelis commented Apr 23, 2024

Abby-Wheelis commented Apr 23, 2024

iantei commented Apr 25, 2024

shankari commented Apr 25, 2024

iantei commented Apr 25, 2024

iantei commented Apr 25, 2024

shankari commented Apr 26, 2024 • edited Loading

iantei commented May 4, 2024

shankari commented Apr 26, 2024 •

edited

Loading