Reading "wide" t-route
flow velocity depth csv
's has high performance penalty
#204
Labels
t-route
flow velocity depth csv
's has high performance penalty
#204
ngen-cal/python/ngen_cal/src/ngen/cal/ngen_hooks/ngen_output.py
Line 181 in 2823b2c
ngen.cal
supports readingt-route
output in a variety of formats (see #153). One supported format iscsv_output
. This format contains simulated flow, velocity, and depth values for each waterbody for eacht-route
timestep. For example:t-route
csv_output
configurationCrucially, this means the longer the simulation time the wider each row will be.
csv
parsers likepandas
c
parser orarrow
'scsv
parser optimize for reading longcsv
files rather than widecsv
files. Both of these parsers use a "chunking" approach where they allocate a buffer, read rows from thecsv
file into the buffer until its full, and process the data. However, when a row is sufficiently long it cannot fit fully into the buffer. Because of this and other implementation specific details, parsing and deserializing these wide csv files into apandas.DataFrame
can take on the order of minutes. In a local test I found that acsv
file with 3 years of 5 minute timestep data (315360 timesteps) took roughly 3.5 minutes to deserialize into apandas
dataframe on an M2 pro macbook.One potential solution to this is to disable
pd.read_csv
'slow_memory
flag:In local testing it too ~9 seconds to read and deserialize the same file.
For now, my general recommendation is to use
t-route
'sstream_output
instead ofcsv_output
if possible.stream_output
still supportscsv
, but instead uses a long format instead of a wide format that does not suffer the same performance penalty. See the most up to date examples of this on thet-route
repo or in #153.The text was updated successfully, but these errors were encountered: