Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file type: .topostats #53

Merged
merged 13 commits into from
Jun 7, 2024

Conversation

SylviaWhittle
Copy link
Collaborator

@SylviaWhittle SylviaWhittle commented May 14, 2024

Closes #51

This PR migrates support for the .topostats file format from TopoStats as part of our initiative to migrate all of TopoStats' io.py over. (#22 )

A function for reading hdf5 files and dumping the data into a dictionary is the main logic behind loading these files. Several tests have been added for it.

@SylviaWhittle SylviaWhittle marked this pull request as ready for review May 29, 2024 09:19
@MaxGamill-Sheffield
Copy link
Collaborator

On testing and reviewing this, I saw that there was no space for the .topostats files in the example_0.ipyb so I have added that.

I have also tried with a few different minicircles.topostats files at different stages of TopoStats output:

  1. Filters onwards = False -> This produces the image file as the raw data
Screenshot 2024-05-29 at 11 03 23
  1. Filters = True -> This produces a flattened image instead - Is this by design? If so I think this should be documented.
Screenshot 2024-05-29 at 11 05 15

However the original image is still accesable via the metadata
Screenshot 2024-05-29 at 11 09 21

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive work on this @SylviaWhittle great to see all the tests and in-line examples as well as the notebook.

Some minor comments, not all for addressing in this Pull Request, have tried to make suggestions where things could be changed though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big issue and certainly not something to do here but we could move the comments in the cells out to Markdown cells.

data = unpack_hdf5(open_hdf5_file=f, group_path="/")

file_version = data["topostats_file_version"]
logger.info(f"TopoStats file version: {file_version}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include filename as is done in the logger.error() below so people know what file is which version?

Suggested change
logger.info(f"TopoStats file version: {file_version}")
logger.info(f"[{filename}] TopoStats file version : {file_version}")

[
pytest.param(
"sample_0.topostats",
0.1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but as we move forward with developing and standardising the .topostats file format and defining versions we should be looking to document precisely what each file version corresponds to with a detailed technical specification. A useful article on this is A practical guide to writing technical specs - Stack Overflow

("file_name", "topostats_file_version", "image_shape", "pixel_to_nm_scaling", "data_keys", "image_sum"),
[
pytest.param(
"sample_0.topostats",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps pedantic but to make it easier to read/understand what's what in the tests/resources/ directory we could use a nomenclature that reflects the version.

Suggested change
"sample_0.topostats",
"sample_0_1.topostats",

...would indicate the version 0.1 (would also require renaming sample_0.topostats > sample_0_1.topostats).

@ns-rse
Copy link
Collaborator

ns-rse commented May 29, 2024

Thanks for testing this @MaxGamill-Sheffield

On testing and reviewing this, I saw that there was no space for the .topostats files in the example_0.ipyb so I have added that.

You commit 0fdb9cc introduced a couple of unintended changes to the notebook, the cell execution_count was now non-zero. I think this is because you haven't enabled pre-commit on this repository. In this instance its not a problem as pre-commit.ci is able to run the nbstripout hook and commit the changes to this PR directly but if you ever do additional work on this repository it would be worth enabling pre-commit locally so that all the hooks run prior to commits.

pre-commit install

...should do the trick.

@SylviaWhittle
Copy link
Collaborator Author

I think I've addressed the comments? Let me know if I've forgotten something.

I think the documentation on the format could be done in another PR? Or should I add it to this one?

@ns-rse
Copy link
Collaborator

ns-rse commented Jun 7, 2024

I think the documentation on the format could be done in another PR? Or should I add it to this one?

Definitely a separate issue and one that would benefit from wider discussion as to what people actually want to have in .topostats files. Garnering input from the wider community (beyond the Pyne lab) would be really useful in planing the formats specification.

@SylviaWhittle SylviaWhittle merged commit 73b5d1d into main Jun 7, 2024
13 checks passed
@SylviaWhittle SylviaWhittle deleted the SylviaWhittle/51-topostats-file-format branch June 7, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] : Add topostats file format support
3 participants