Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsability of output of ResultWriter #368

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HomesGH
Copy link
Contributor

@HomesGH HomesGH commented Dec 20, 2024

Description

Currently, there is a # in the output of the ResultWriter, which makes the header row technically a comment. This is a problem when trying to parse the whole file. It is much easier with simstep instead of # step. This can be parse in python just with:
df = pd.read_csv(path2File, delim_whitespace=True, comment="#", engine="python")

@HomesGH HomesGH added the enhancement New feature or request label Dec 20, 2024
@HomesGH HomesGH requested a review from cniethammer December 20, 2024 19:01
@cniethammer
Copy link
Contributor

I use the following python pandas.read_csv command for result files:
data = pd.read_csv(inputfile, header=2, skipfooter=1)

Would this work for you, too?

I am hesitant to change this long-standing output format ...

@cniethammer cniethammer mentioned this pull request Dec 20, 2024
2 tasks
@cniethammer
Copy link
Contributor

Instead of modifying the output of the current ResultWriter your question is for a proper CSV file IMHO - feel free to have a look at #369

@HomesGH
Copy link
Contributor Author

HomesGH commented Dec 22, 2024

I use the following python pandas.read_csv command for result files: data = pd.read_csv(inputfile, header=2, skipfooter=1)

Would this work for you, too?

I am hesitant to change this long-standing output format ...

I am not sure how this could work for you. I tried it with the Argon example and you have to specify at least delim_whitespace=True otherwise you end up with just one column in your dataframe. But when setting this option, the resulting dataframe is not only wrong, it is also not too obvious since the column names are shifted by one compared to the data. E.g.

    #      step     time    U_pot     U_pot_avg             p     p_avg  beta_trans  beta_rot   c_v   N
0   0  0.000000 -2.09893 -2.09893 -2.669010e-07 -2.669010e-07  3.625780         1.0       0.0  2048 NaN
1   5  0.333758 -2.10937 -2.10316  6.642450e-07  2.621810e-07  1.001330         1.0       0.0  2048 NaN

If you don't have a close look at the data and just use e.g. data["U_pot"] you actually get the data of U_pot_avg.

The only way to parse the present result file correctly is by explicity specifying the column header names.

@HomesGH
Copy link
Contributor Author

HomesGH commented Dec 22, 2024

Instead of modifying the output of the current ResultWriter your question is for a proper CSV file IMHO - feel free to have a look at #369

This approach is IMHO the reason why there are so many plugins in ls1 right now. Instead of improving/extending an existing one, it is more convenient to just write a new plugin with a very similar functionality.
The differences between the ResultWriter and your new plugin are just:

  • Documentation of the column data (nice!)
  • Comma instead of whitespace as delimiter (as broad as long)
  • simstep instead of # step (see this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants