You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to add an option to retrieve datasets to merlin. Currently there is a 'PSI-ra' option when retrieving from scicat. We would like to support similar functionality for merlin and other central archiving locations.
Ra implementation
(Please edit if any of this information is incorrect)
The current PSI-ra retrieval workflow is as follows:
Each ra pgroup has a 'retrieve' directory owned by the retrieval service user
Arima fetches the data from tape, places it in /das/work/<pgroup>/retrieve/<user>/<pid> and reports success
users copy/move the data to the desired destination
Permissions rely on ACLs to allow both the service use and the pgroup members to access the directory.
Differences to merlin
Merlin does not use DUO or pgroups. Most users use a-groups and may archive from user directories or project directories, which do not correspond 1:1 with a-groups. This means that a mechanism must be added to allow users to select a path when retrieving a dataset.
Implementation steps
The minimal implementation in the backend would require:
A way to grant the service user write access to the destination folder.
At first this could be a fixed retrieve directory for each project like ra
Better would be a script that would set the appropriate permissions/acls on whatever directory the user specified. This could be incorporated into the datasetRetriever tool, and could validate some permissions at run time (e.g that the user has permission to read the dataset and permission to write to the destination folder to clean up).
Modify Job model in REST api to capture destination server and path
Modify Arima to write to the correct server and path
Front-end changes:
datasetRetriever modifications to set up the directory, validate settings, and pass the correct paths to SciCat
New SciCat retrieval option with a field for the destination
(Optional) File browser on SciCat to select the files. This would probably require a microservice running somewhere with access to all the central filesystems which would validate user permissions and return file lists.
The text was updated successfully, but these errors were encountered:
sbliven
changed the title
[WIP] Retrieve datasets to merlin
Retrieve datasets to merlin
Dec 1, 2022
@sbliven when would you need this to be implemented?
It will likely need a meeting with Krisz, Pedro and Michael (and us). Could you please schedule it depending on its urgency?
Thanks.
Here's an initial diagram for how the microservice I mention above might work. This "storage service" would run on the storage system and provide endpoints for the following queries:
Check if a filesystem is mounted centrally from this storage
List writable filesystem for a particular user
File browser/navigation (basically wraps ls and cd for central locations, taking user permissions into account)
SciCat would also need to implement an endpoint for checking what storage systems a user has access to (looking ahead to having non-PSI users in the system)
Feature Request
We would like to add an option to retrieve datasets to merlin. Currently there is a 'PSI-ra' option when retrieving from scicat. We would like to support similar functionality for merlin and other central archiving locations.
Ra implementation
(Please edit if any of this information is incorrect)
The current PSI-ra retrieval workflow is as follows:
/das/work/<pgroup>/retrieve/<user>/<pid>
and reports successPermissions rely on ACLs to allow both the service use and the pgroup members to access the directory.
Differences to merlin
Merlin does not use DUO or pgroups. Most users use a-groups and may archive from user directories or project directories, which do not correspond 1:1 with a-groups. This means that a mechanism must be added to allow users to select a path when retrieving a dataset.
Implementation steps
The minimal implementation in the backend would require:
retrieve
directory for each project like raFront-end changes:
The text was updated successfully, but these errors were encountered: