Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PASTIS dataset #315

Merged
merged 44 commits into from
Aug 3, 2023
Merged

PASTIS dataset #315

merged 44 commits into from
Aug 3, 2023

Conversation

isaaccorley
Copy link
Collaborator

@isaaccorley isaaccorley commented Dec 20, 2021

This PR adds the PASTIS dataset (original and PASTIS-R versions) for time-series semantic/instance segmentation of agricultural parcels of 18 crop type categories in Sentinel-1 and Sentinel-2 imagery.

image

@isaaccorley isaaccorley self-assigned this Dec 20, 2021
@isaaccorley isaaccorley added the datasets Geospatial or benchmark datasets label Dec 20, 2021
@isaaccorley isaaccorley marked this pull request as draft December 20, 2021 19:34
@isaaccorley isaaccorley added this to the 0.2.0 milestone Dec 20, 2021
@adamjstewart adamjstewart modified the milestones: 0.2.0, 0.3.0 Jan 1, 2022
@github-actions github-actions bot added the datamodules PyTorch Lightning datamodules label Jul 6, 2022
@adamjstewart adamjstewart modified the milestones: 0.3.0, 0.4.0 Jul 9, 2022
@adamjstewart adamjstewart removed this from the 0.4.0 milestone Jan 24, 2023
@calebrob6 calebrob6 mentioned this pull request Jul 27, 2023
@calebrob6
Copy link
Member

calebrob6 commented Jul 31, 2023

Why not return both types of labels in the sample dict and then the trainer/task/model/thingy can be used to decide which to use?

@isaaccorley do you have a reason here? My thought was that the __getitem__ for instance seg probably takes longer (so if you are only doing semantic segmentation you get a bit of a speed boost), but I haven't benchmarked it

@isaaccorley
Copy link
Collaborator Author

What would we name the instance mask then? Efficiency reasons mostly. Instead of always loading Tx1xHxW mask you are also loading a TxNxHxW mask, where N is number of instances in the sample. I'm not married to the separation though so feel free to change it however.

@adamjstewart
Copy link
Collaborator

Key names should be decided on based on what Kornia uses. If I remember correctly, they don't yet have great support for instance segmentation?

As for speed, I would love to see a benchmark to see if it actually matters on an average system. If it does make a difference, then I would rather have a parameter to control which is used instead of two classes.

@calebrob6
Copy link
Member

And if it doesn't make a difference you want two classes?

@adamjstewart
Copy link
Collaborator

If it makes a difference I want one class controlled by parameter. If it doesn't make a difference I want one class that loads both. So either way, a single class.

@adamjstewart
Copy link
Collaborator

The only case where we have different datasets for different tasks is COWC, but it's actually a separate download and is referred to as COWC Counting or COWC Detection in the literature. In this case, there's a single PASTIS dataset, and it can be used for multiple tasks. That's why I think it makes more sense to have 1 dataset.

@calebrob6
Copy link
Member

calebrob6 commented Jul 31, 2023

Looks like instance_segmentation is roughly 3x slower

image

@adamjstewart
Copy link
Collaborator

It's obviously going to be slower, but is it slower than the GPU?

@calebrob6
Copy link
Member

GPUs change, I think it is better to have them separated!

@calebrob6
Copy link
Member

Okay I like this much better actually

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add example plots to the PR description? I think the current image is just from the paper.

torchgeo/datasets/pastis.py Outdated Show resolved Hide resolved
Returns:
the instance segmentation mask, box, and label for each instance
"""
mask_array = np.load(self.files[index]["semantic"])[0].astype(np.uint8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could somehow call _load_semantic_targets to avoid code duplication here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but let's stop here :)

torchgeo/datasets/pastis.py Outdated Show resolved Hide resolved
@calebrob6
Copy link
Member

image

@calebrob6
Copy link
Member

codecov shouldn't be failing, not sure what's going on

@adamjstewart
Copy link
Collaborator

codecov shouldn't be failing, not sure what's going on

This is due to codecov/codecov-action#903 and codecov/feedback#126. Solution is to re-run the minimum tests until codecov does report correctly.

@adamjstewart
Copy link
Collaborator

Is this the plot for semantic seg? Can you share the plot for instance seg too?

@calebrob6
Copy link
Member

Instance segmentation plot output. There is white instead of black here because in semantic segmentation we want to explicitly classify the background pixels (black), but in instance segmentation anything that is not labeled is the void class (white).

image

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious whether loading all labels (as opposed to the user choosing between semantic/instance) slows down I/O times enough to actually impact training time (vs. the time spent on an "average" GPU). But I think this PR is probably good enough. Happy to nitpick for eternity, but don't let my perfectionism get in the way of your research 😄

Comment on lines +33 to +34
* 3 Sentinel-1 Ascending bands
* 3 Sentinel-1 Descending bands
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make ascending/descending lowercase

dataset folds to include
bands: load Sentinel-1 ascending path data (s1a), Sentinel-1 descending path
data (s1d), or Sentinel-2 data (s2)
mode: load semantic (semantic) or instance (instance) annotations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "task" be a better parameter name? Also might change this to:

Suggested change
mode: load semantic (semantic) or instance (instance) annotations
mode: load "semantic" or "instance" segmentation annotations

fig, axs = plt.subplots(1, num_panels, figsize=(num_panels * 4, 4))
axs[0].imshow(images[0] / 5000)
axs[1].imshow(images[1] / 5000)
axs[2].imshow(mask, vmin=0, vmax=19, cmap=self._cmap, interpolation="none")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should use a different cmap for instance segmentation so that each field is a different color

@calebrob6 calebrob6 merged commit 711a576 into microsoft:main Aug 3, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants