Parcels runs slowly with large velocity fields/weird lack of scaling with different time-steps #1404

croachutas · 2023-07-27T09:27:47Z

croachutas
Jul 27, 2023

G'day,
Been a few years since I've use Parcels (relocating from LOCEAN to MetOcean Solutions saw me shift to using OpenDrift), but I've moved on from MetOocean (back) to UTAS and I'm back using Parcels now (due mostly to OpenDrift's pre-built reader routines not handling non-CF compliant netcdf files and the utter pain in either developing a custom reader or reprocessing a large number of files to CF-complaint netcdfs...).

I'm using daily model output from the ACCESS-OM2-01 0.1deg global model (which uses MOM5 code for the ocean component) as a part of a study of trans-Tasman larval connectivity of a lobster species. I've got it set up with 189 spawning sites spitting out 10 particles per day for a month (giving a total of about 50,000 particles) and then tracking the particles for quite some time (about a year in the "final" version but just 30 days for an initial test case). I've cobbled together a custom kernel to try to work around the known repeatdt zarr chunking issue #1387 which I had initially thought was why things were running slow (spawn all particles on day 0 but only allow each particle to move once we're at the right day, also some crude debeaching behavior, if the next step take the particle into land we instead kick it backwards a quarter of a step instead). Needless to say the velocity fields (u, v, w) I'm using are stored in some rather large files (~35GB for each of u, v and w with 30 days of data plus ~15TB for the temperature data, so 1.6TB for a one-year run, of cause not all loaded at once).

But I'm still finding that things are running rather slowly. In itself that's an annoyance rather than a serious problem: I can accept a longer runtime or I can use longer timesteps (not ideal but given year long tracking you've gotta accept some compromises). But I noticed when I change the timestep dt things aren't behaving as you'd expect:

You'd expect in an idealized case (no I/O cost) doubling dt should roughly halve the runtime, but nope! Of cause, realistically there's I/O costs but even then doubling the length of the timestep should reduce total runtime.

I've found this when running both on NCI's Gadi HPC and on my personal laptop (source for the plot above). I don't think it's a memory issue (if it was would expect Gadi to kill the process when it exceeds the 8GB I've requested for tests rather than just run slow, and I know from looking at memory usage on my laptop that the test case only uses about 1-2GB of RAM).

I've tried running my code without writing data to an output and that makes little difference. So, it doesn't look like writing output is the bottleneck. So, it's either something to do with reading input files or with my custom kernel.

I've set up indexing and chunk size to try to keep the amount of data loaded manageable but that doesn't seem to have helped :
indices = {'lon': range(500, 1100), 'lat': range(500, 1100)}

    cs = {"U": {"lon": ("xu_ocean", 200), "lat": ("yu_ocean", 200), "depth": ("st_ocean", 5), "time": ("time", 1)},
            "V": {"lon": ("xu_ocean", 200), "lat": ("yu_ocean", 200), "depth": ("st_ocean", 5), "time": ("time", 1)},
            "W": {"lon": ("xt_ocean", 200), "lat": ("yt_ocean", 200), "depth": ("sw_ocean", 5), "time": ("time", 1)},
            #"S": {"lon": ("xt_ocean", 200), "lat": ("yt_ocean", 200), "depth": ("st_ocean", 5), "time": ("time", 1)},
            "T": {"lon": ("xt_ocean", 200), "lat": ("yt_ocean", 200), "depth": ("st_ocean", 5), "time": ("time", 1)},
        }

It could be something with the custom kernel, but tests with monthly rather than daily fields seemed to run much quicker suggesting that probably isn't it:

def ParticleAgeSettlementRK4MixingDebeaching(particle, fieldset, time):
    #General sampling
    particle.temp = fieldset.T[time, particle.depth, particle.lat, particle.lon]
    #particle.salt = fieldset.S[time, particle.depth, particle.lat, particle.lon]
    #particle.seafloor = fieldset.bathy[time, particle.depth, particle.lat,particle.lon]

    #Since repeatdt and similar were buggered with the conversion to zarr output we initialize all particles at the start but only let them move when time is later than starttime
    if time >= particle.starttime:
        
        #Start tracking particles age
        particle.age += particle.dt/60/60/24 #Particle age in days
        
        #RK4 3D Stepping
        (u1, v1, w1) = fieldset.UVW[particle]
        lon1 = particle.lon + u1*.5*particle.dt
        lat1 = particle.lat + v1*.5*particle.dt
        dep1 = particle.depth + w1*.5*particle.dt
        (u2, v2, w2) = fieldset.UVW[time + .5 * particle.dt, dep1, lat1, lon1, particle]
        lon2 = particle.lon + u2*.5*particle.dt
        lat2 = particle.lat + v2*.5*particle.dt
        dep2 = particle.depth + w2*.5*particle.dt
        (u3, v3, w3) = fieldset.UVW[time + .5 * particle.dt, dep2, lat2, lon2, particle]
        lon3 = particle.lon + u3*particle.dt
        lat3 = particle.lat + v3*particle.dt
        dep3 = particle.depth + w3*particle.dt
        (u4, v4, w4) = fieldset.UVW[time + particle.dt, dep3, lat3, lon3, particle]
        
        #RK4 steps
        rk4lon = (u1 + 2*u2 + 2*u3 + u4) / 6. * particle.dt
        rk4lat = (v1 + 2*v2 + 2*v3 + v4) / 6. * particle.dt
        rk4depth = (w1 + 2*w2 + 2*w3 + w4) / 6. * particle.dt

        #Uniform mixing
        # Wiener increment with zero mean and std of sqrt(dt)
        dWx = ParcelsRandom.normalvariate(0, math.sqrt(math.fabs(particle.dt)))
        dWy = ParcelsRandom.normalvariate(0, math.sqrt(math.fabs(particle.dt)))

        bx = math.sqrt(2 * fieldset.Kh_zonal[particle])
        by = math.sqrt(2 * fieldset.Kh_meridional[particle])
        
        #Mixing steps
        klon = bx * dWx
        klat = by * dWy

        #Debeaching... If we run into land nudge the particle back offshore
        nudge_dist = 0.25 #Nudging distance as a fraction of lon_step and lat_step
        
        #Interpolate velocities at next location
        (un, vn) = fieldset.UV[time + particle.dt,particle.depth+rk4depth, particle.lat + rk4lat + klat, particle.lon + rk4lon+klon]


        #If velocity at next step is invalid run debeaching
        if math.fabs(un) < 1e-14 or math.fabs(vn) < 1e-14 or math.isnan(un) or math.isnan(vn):
        
        #Check if next step places us in terrain
        #if particle.depth+rk4depth > particle.seafloor:
            #If depth at next step places us within terrain run debeaching and nudge particle BACKWARDS a fraction of an rk4 step
            particle.lat +=(-nudge_dist * rk4lat)
            particle.lon +=(-nudge_dist * rk4lon)
            #particle.beachstatus = 1
        else:
            #Otherwise add RK4 and mixing lat, lon and z steps:
            particle.lat += (rk4lat + klat)
            particle.lon += (rk4lon + klon)
            
        #Check to make sure vertical step doesn't take particle out of the water... We don't want levitating lobsters...
        #If depth step keeps us below the surface all well and good...
        if particle.depth+rk4depth>0:
            #If depth step keeps us below the surface all well and good...
            particle.depth += rk4depth
        else:
            # ... if not we do not take a depth step
            particle.depth += -0*rk4depth
                
        #If larvae are older than 12 months assume they die
        if particle.age > 366:
            #particle.status = -1
            particle.delete()

Have any of you experienced similar issues when running experiments with large input data files? Is there anything obviously wrong with the kernel?

Full code provided in zip file:
ACCESS_01_tracking_Au_release_delayed_particle_start_benchmarking_no_output.py.zip

croachutas · 2023-07-28T06:21:39Z

croachutas
Jul 28, 2023
Author

Tested again, this time with inbuilt RK4 and mixing kernels releasing all particles at once, similar result. So, unlikely to be a kernel issue.

Tried pointing the lat, lon and depth grid data to a separate files instead of reading it from the first u ,v, wt and T files, in case attempting to read many 35GB files was the issue. Again made no difference.

Runs with monthly data scale with dt. So, it's not something wrong with interpolation and advection routines

Still need to try splitting the daily data from the 30-day files into 1-day files, and that might not be practical for the full scale run due to available storage space or the inevitable overhead if I process files "just in time".

Otherwise might be because the netcdf files are using compression (deflate level 5 pops out when I bash the files with ncdump -sh ) or maybe I need to match the chunking provided to the fieldset with the chunking used by the netcdf files?

@erikvansebille Any advice on working with large datasets? And tricks to chunking?

0 replies

erikvansebille · 2023-07-28T06:33:30Z

erikvansebille
Jul 28, 2023
Maintainer

Thanks for reporting this, @croachutas. I don't have too much time to delve into this right now (busy working on #1402), but could it be that you are simply running with too few particles for the particle-dependent processes (e.g. advection) to make a difference; that your job runtime is completely dominated by reading the hydrodynamic input data? What if you up the particle count by a factor 100 (to 5 Million particles), do you then still see the same scaling as the graph at the top?

By the way, have you looked at @CKehl's paper on the efficiency and scaling of Parcels? See here for the article Efficiently simulating Lagrangian particles in large-scale ocean flows — Data structures and their impact on geophysical applications. That might provide some background/guidance too?

0 replies

JamiePringle · 2023-07-28T12:55:42Z

JamiePringle
Jul 28, 2023
Collaborator

I would certainly play with the chunking of the fieldset to match that of the netCDF file (or be an even subset of it -- for example, if the netcdf is chunked 1,512,512, try having parcels be 1,256,256 or 2,1024,1024... I ended up rechecking my netCDF files for optimal performance. This is straightforward, and I can send you a script to do so if you wish.

0 replies

croachutas · 2023-07-31T05:43:18Z

croachutas
Jul 31, 2023
Author

Hi all,
Thanks for the advice.

Run further tests with larger number of particles, and yep, the scaling for the daily fields is correct once input is no longer the dominant constraints.

I've also played round with chunking and found that chunking in line with the input netcdf files is more efficient than my initial arbitrary chunking (~half the runtime). I'll continue playing with exact chunking dimensions to see if I can get thing more efficient.

0 replies

croachutas · 2023-08-03T04:52:32Z

croachutas
Aug 3, 2023
Author

Hey @erikvansebille I see the paper mentions autochunking with an apparent fairly significant increase in runtime. How do you turn autochunking on? Just wondering if it might improve performance.

Also, I've noticed that runtime per day increases later in the runs (understandably as the particles are more dispersed, thus, there's a need to read in more chunks at a time). Are there any recommendations on how to handle this? For instance pause, change fileset chunking then resume?

0 replies

erikvansebille · 2023-08-03T05:16:45Z

erikvansebille
Aug 3, 2023
Maintainer

Yes, you can also control the chunking yourself. See e.g.

https://github.com/OceanParcels/parcels/blob/9cec473cc5634545f0156f87a7902170af865d14/docs/examples/example_dask_chunk_OCMs.py#L42-L47

And then add this chs dictionary as chunksize to the FieldSet creation
https://github.com/OceanParcels/parcels/blob/9cec473cc5634545f0156f87a7902170af865d14/docs/examples/example_dask_chunk_OCMs.py#L58

Examples for other hydrodynamic data are in docs/examples/example_dask_chunk_OCMs.py

I wouldn't change the chunking halfway. If you also use MPI you could rebalance the particle sets halfway (creating. new ParticleSet from the old locations?) to make sure the particles are somewhat more aligned with the chunks; but this is fiddly and I'm not sure how much it will improve performance. See also https://docs.oceanparcels.org/en/latest/examples/documentation_MPI.html#Future-developments:-load-balancing for a short discussion on this

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parcels runs slowly with large velocity fields/weird lack of scaling with different time-steps #1404

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Parcels runs slowly with large velocity fields/weird lack of scaling with different time-steps #1404

croachutas Jul 27, 2023

Replies: 6 comments

croachutas Jul 28, 2023 Author

erikvansebille Jul 28, 2023 Maintainer

JamiePringle Jul 28, 2023 Collaborator

croachutas Jul 31, 2023 Author

croachutas Aug 3, 2023 Author

erikvansebille Aug 3, 2023 Maintainer

croachutas
Jul 27, 2023

croachutas
Jul 28, 2023
Author

erikvansebille
Jul 28, 2023
Maintainer

JamiePringle
Jul 28, 2023
Collaborator

croachutas
Jul 31, 2023
Author

croachutas
Aug 3, 2023
Author

erikvansebille
Aug 3, 2023
Maintainer