Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with particle spanning multiple data blocks #4

Open
huangynj opened this issue Jul 2, 2024 · 9 comments
Open

Issue with particle spanning multiple data blocks #4

huangynj opened this issue Jul 2, 2024 · 9 comments

Comments

@huangynj
Copy link

huangynj commented Jul 2, 2024

Hello,

I have encountered an issue with spifpy while working with SPEC 2DS data. Specifically, when a particle spans two data blocks, spifpy reads only the partial particle in the first data block and ignores the remaining part in the second data block.

To address this issue, I propose that we pass two continuous data blocks to each processor. This way, we can process the data in the first data block and include the very beginning of the second data block if a particle spans the two blocks. I am still familiarizing myself with the spifpy structure, so a developer who is more experienced with spifpy would be better suited to implement this solution.

Thanks,

Yongjie

@kennybala97
Copy link
Collaborator

Hi Yongjie,

Thank you for your detailed bug report and solution proposal. This issue was actually solved previously, but was in another branch(overload_hk branch). I have merged the changes from overload_hk into main. When you have time, please try out the new code from the main branch, and let us know if the fix worked fully. Once this is confirmed, I will mark the issue closed.

Thank you for your help and engagement!

@huangynj
Copy link
Author

huangynj commented Aug 2, 2024

Hi Kenny @kennybala97,

I have tested the latest version, and it only partially addresses the issue. There are still problems when particle information spans two data blocks.

In the following examples, the particle will be missed if the particle information spans two data blocks.

Particle ID: 335

data block 7:
    ...
    (12883, 0, 34)

data block 8:
    (335, 32, 16512, 16768, 16896, 17152, 17280, 17408, 17792, 17920, 18176, 18304, 18432, 18688, 19072, 19200, 19328, 19328, 19328, 19456, 19456, 19584, 19584, 19584, 19584, 19456, 19328, 18947, 18947, 18565, 18183, 18056, 17801, 16907, 193, 46142)
    ...

Particle ID: 624

data block 13:
    ...
    (12883, 0, 21, 624)

data block 14:
    (19, 16512, 16640, 16768, 16768, 16768, 16768, 16768, 16768, 16768, 16768, 16768, 17024, 17024, 17024, 16896, 16896, 16768, 16640, 16512, 357, 25079)
    ...

Particle ID: 1296

data block 28:
    ...
    (12883, 0, 78)

data block 29:
    (1296, 76, 16639, 16639, 16639, 16639, 16766, 16893, 16893, 16893, 17020, 17020, 17147, 17147, 17274, 17274, 17274, 17401, 17401, 17528, 17528, 17655, 17655, 17782, 17782, 17782, 17909, 17909, 18036, 18036, 18036, 18163, 18163, 18290, 18417, 18417, 18417, 18544, 18544, 18671, 18671, 18671, 18798, 18798, 18798, 18925, 18925, 19052, 19179, 19179, 19306, 19306, 19433, 19433, 19433, 19560, 19560, 19560, 19560, 19433, 19433, 19306, 19306, 19306, 19306, 19179, 19179, 19052, 19052, 18925, 18671, 18671, 18036, 17528, 17401, 17274, 17020, 16766, 721, 31891)
    ...

Particle ID: 2445

data block 54:
    ...
    (12883, 0, 43)

data block 55:
    (2445, 31, 16928, 17311, 17566, 17694, 17822, 17569, 17570, 17571, 17699, 16792, 1416, 17046, 1799, 17046, 2055, 17174, 2309, 17302, 2435, 20246, 20246, 19094, 898, 18838, 773, 18583, 647, 18329, 519, 18075, 391, 17821, 17694, 17567, 17696, 17825, 17825, 17698, 17443, 17316, 16551, 1434, 54387)
    ...



The following examples shows the package obtaining the wrong slice data.
Particle ID: 3251 (slices: truth is 23, the package obtains 25)

data block 72:
    ...
    (12883, 0, 32, 3251, 23, 16803, 17058, 17186, 17188, 17061, 17189, 17445, 16540, 1160, 16668, 1414, 16795, 1669, 16923, 1796, 19355, 18204, 516, 18076, 390, 17822, 262, 17567, 17440, 17313, 17187, 17315, 17316, 17062, 16679)

data block 73:
    (1907, 20699)
    ...

Particle ID: 4771 (slices: truth is 9, the package obtains 11)

data block 105:
    ...
    (12883, 0, 16, 4771, 9, 16691, 16818, 16818, 260, 16818, 387, 16945, 514, 16945, 514, 16562, 388, 16823, 16568, 2824)

data block 106:
    (7550,)
    ...

Particle ID: 7203 (slices: truth is 3, the package obtains 6)

data block 157:
    ...
    (12883, 0, 5, 7203, 3, 16639, 16639, 16639)

data block 158:
    (4306, 13527)
    ...

I haven't yet checked how the overload_hk branch handles situations where particle information spans two data blocks. However, I believe that my previously proposed method could solve the above issues.

Thanks,

Yongjie

@kennybala97
Copy link
Collaborator

@huangynj Thank you for the extra detail, this is really helpful. I think your fix is a solid idea and it does also seem to be the simplest way to solve this problem. I will do some more testing on my end and figure out the best way to proceed. In the meantime, what dataset are you using to test SPIF out?

@huangynj
Copy link
Author

huangynj commented Aug 2, 2024

Hi @kennybala97,

I am using the synthetic data generated by Aaron Bansemer (NCAR) for the cloud probe workshop 2024. He will publish the data in future, so it may be not suitable to put the link on this open platform yet. I will send you the link through email.

Thanks,

Yongjie

@kennybala97
Copy link
Collaborator

Hi Yongjie,

That makes sense. Thank you so much for your help!

@kennybala97
Copy link
Collaborator

@huangynj I've gone through your code and it seems to me like your changes boil down to the following:

  1. You create a new variable called record_pad which has two possible outputs. If reach_record_end is true, you will take self.data[frame] and self.data[frame+1] and concatenate them. Else, record_pad is just self.data[frame]. This is also conditional on Next_record_flag being true, which is only true if there's an additional frame after the current frame.
  2. You introduce a new field to img_dict which is record_start_idx. This is set as record_start_idx = i - len(record) which nominally should be zero assuming no particles spanning boundaries, but if particles do span boundaries, i would be an offset. For example if we have a 2048 byte first record and 2048 second record and the particle spanning boundaries is 50 bytes into the second buffer, then i would equal 2098 and thus i would equal 50. Then in the next frame this offset is used.
  3. When you initialize frame processing, record_start_idx is initially zero, and then with each frame, img_dict gets reset to whatever the last image in the previous frame was. This gets passed to each subsequent frame thus providing the process_frame function starting offsets for the i variable.

Is this a correct characterization of what the code changes do?

@huangynj
Copy link
Author

huangynj commented Aug 7, 2024

Hi @kennybala97,

Yes, I think your understanding is correct.

I also tried to simplify the method not using record_start_idx and loop each frame from index = 0. However, in some cases, some slice values would be equal to some special values such as 20044 ('NL'), and it would result in wrong results. So, finally I created this new variable to record the start index of the next frame.

Yongjie

@kennybala97
Copy link
Collaborator

@huangynj thank you for confirming! And I think you did the right thing using the record_start_idx, I don't see any other straightforward way of doing it.

@kennybala97
Copy link
Collaborator

Hi @huangynj , I have added all your changes to the interbuffer_particle_fix branch of the code. From now on, submit pull requests to that branch and I will approve them and merge them without any issue. Sorry for all the random workarounds! This should be a final solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants