Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show folder structure in the DataPackageView using atLocation #1452

Closed
laurenwalker opened this issue Jul 15, 2020 · 9 comments · Fixed by #2209
Closed

Show folder structure in the DataPackageView using atLocation #1452

laurenwalker opened this issue Jul 15, 2020 · 9 comments · Fixed by #2209
Assignees
Labels
ADC CI-14 Data search and display improvements (ADC deliverable) arctic data center editor enhancement
Milestone

Comments

@laurenwalker
Copy link
Member

No description provided.

@jeanetteclark
Copy link
Collaborator

I'm giving this issue a bump - I have a dataset which is moderately sized (8 GB), so the download all button is grayed out. Without that, there isn't a way for the average user to access the folder hierarchy this dataset needs, although all of the information is contained within the resource map as an atLocation relationship

https://arcticdata.io/catalog/view/urn%3Auuid%3A028b8008-8b68-491a-8a13-79b642e1e215

@mbjones
Copy link
Member

mbjones commented Oct 14, 2021

We have two existing mockups in the nceas-design repo for this as part of our original Editor refactor. They still look to be pretty good starting points to me.

Edit-Metadata-View-Add-Nested-Folder.pdf
Add-a-Folder.pdf

@amoeba
Copy link
Contributor

amoeba commented Oct 14, 2021

Showing a navigable folder structure on our landing pages would be a major improvement, but I might point out that we could consider raising or removing the limit @jeanetteclark mentions running into in her above comment. MetacatUI has a configurable property that controls whether the Download All button is enabled or not and we've currently got it set it to 3GB. I think we could set it higher.

Metacat 2.15.1 implements our getPackage API w/ SpeedBagIt and, on the example @jeanetteclark mentions, the getPackage download starts nearly instantly and saturates my internet connection. I think that probably wasn't the case before? @ThomasThelen does our implementation still (needlessly) fill up /tmp or metacat-temp with the bag ZIP? I see there are a few open, related tickets so we might touch base in case I'm wrong.

To get this feature in, I think we also need to talk about backend implementation. The client needs to be sent all or some of the information on which members are in a package and any atLocation info we have on each of them. There are a few ways to do this:

  1. Get it from the ORE itself. This is too slow to build a performant UI and doesn't scale well for larger packages.
  2. Store atLocation information in Solr when we index the ORE. This is a bit messy because we couldn't just store the value of each object's atLocation triple because Solr doesn't guarantee ordering and doesn't handle holes (objects without an atLocation triple so it's impossible to reliably associate atLocation information stored this way with the object its describing. We'd actually have to store a structure value (JSON, something else) in the field and parse that client side.
  3. Store it somewhere we don't have yet
    a. We could use another Solr core that indexes all of the relations from each ORE (pid, subject, predicate, object). I prototyped this when I was working on indexing and it works great. This was also one of our re-design refactors we've talked about in the past. The big upside here is that it gives us efficient much more efficient communication between the client and the server because we don't have to send nearly as much data over the wire.
    b. We could also just store similar information to (a) but store it in Postgres.

I think (2) is probably the best route to go down for the first version but I'd be curious to hear others' thoughts.

@mbjones
Copy link
Member

mbjones commented Oct 14, 2021

Agreed, @amoeba and I also note that we are about to embark on a "large data package" refactor in which our package design will allow for packages with hundreds of thousands of entities, and they will not necessarily be listed in the ORE. Our initial design discussions have been around building packages from granule/entity level metadata that is not "in" the package at all. Rather the package is an aggregation of metadata from multiple sources, with a central API for efficient access to entity metadata. This would be in addition to our current ORE and EML-based entity lists. So lots to talk about on the back end discussion here.

@mbjones
Copy link
Member

mbjones commented Jun 28, 2022

@laurenwalker Pinging you on this issue for the hierarchical data display, now that the atLocation field is fully operational in Metacat, the getPackage service creates downloads with the folder hierarchy, and the R datapack client can upload packages with atLocation values properly escaped. I think we should implement this in two parts, first getting display of the hierarchy out of the door in the dataset display, and then later enabling editing of the hierarchy in the editor (#1453).

ESS-DIVE has asked about the timing of this feature, which is also important for Arctic Data Center's big data display, so let's discuss the roadmap for this work. Thanks.

@laurenwalker
Copy link
Member Author

@mbjones I'll think about this feature and roadmap out a rough timeline for what I think it would take to implement. Be in touch soon.

@rushirajnenuji rushirajnenuji self-assigned this Dec 12, 2022
@robyngit robyngit added the ADC CI-14 Data search and display improvements (ADC deliverable) label May 17, 2023
@mbjones mbjones added this to the 2.28.0 milestone Dec 6, 2023
@mbjones
Copy link
Member

mbjones commented Dec 6, 2023

@rushirajnenuji has completed initial design and implementation on this, and its now in testing on https://test.arcticdata.io

Feedback from @vchendrix at ESS-DIVE on 2023-12-06 on slack is copied here for reference:

This looks really good. The interaction of opening and closing the folders is really smooth. Here is some initial feedback.

  • It is not obvious that one needs to scroll down for more data. I didn’t notice until much later that there was a scroll bar to the right indicating that there was more data to view
  • When loading the page for the first time, we felt that all of the top level directories should be collapsed to begin with
  • Also, it seems that the folders are at the bottom below the top level files. We felt that the top level folders should be at the top of the table.
  • The download all operation is not obvious anymore. It looks the same as the other download buttons.
  • In general, I think some tooltips might be helpful in understanding how to navigate the data table.
  • The root level looks the same as the other levels. I feel like there should be some indicator on that row which tells me this is the Dataset or something because it is special
  • when I shrink the page width, I lose the download buttons and have to scroll over to them. It feels like this should be on the left side or next tot he file name
  • In ESS-DIVE the EML file is at the top of the data table. It is not in these example datasets. Does this have something to do with the order in the resource map?

@mbjones mbjones mentioned this issue Dec 6, 2023
5 tasks
@robyngit robyngit linked a pull request Dec 21, 2023 that will close this issue
@robyngit
Copy link
Member

See also additional feedback on the associated PR, as well as the latest commits on this issue: #2209

@robyngit
Copy link
Member

Note: minor UI improvements that we identified above have been included in issue #2283 for a future release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADC CI-14 Data search and display improvements (ADC deliverable) arctic data center editor enhancement
Development

Successfully merging a pull request may close this issue.

6 participants