-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/issue 8895 search api #89
Changes from all commits
4a7c61f
a7c5c8b
94ccd06
ec4697b
cca2981
4d1d214
b2daf9f
7a5d832
e0a3c57
981403b
1fafd83
970fd10
5103f80
e4bd212
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
:Authors: | ||
Mike Cantelon | ||
|
||
Search API | ||
================================================================================ | ||
|
||
In addition to the search functionality present in the web interface, the | ||
storage service also includes a REST search API. Searches are performed by | ||
sending an HTTP GET request. | ||
|
||
Search results will include a count of how many items were found and will | ||
include next and previous properties indicating links to more items in the | ||
result set. | ||
|
||
Location search | ||
-------------------------------------------------------------------------------- | ||
|
||
The endpoint for searching locations is:: | ||
|
||
http://<storage service URL>/api/v2/search/location/ | ||
|
||
Locations can be searched using the following search parameters: | ||
|
||
* uuid (location UUID) | ||
* space (space UUID) | ||
* purpose (purpose code) | ||
* enabled (whether the location is enabled) | ||
|
||
For example, if you wanted to get details about the transfer source location | ||
contained in the space 6d0b6cce-4372-4ef8-bf48-ce642761fd41 you could HTTP get:: | ||
|
||
http://<storage service URL>/api/v2/search/location/?space=7ec3d5d9-23ec-4fd5-b9fb-df82da8de630&purpose=TS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that if you pass a nonsense param to one of these search endpoints, you get all of the resources. Is that expected/desired? For example |
||
|
||
Here is an example JSON response:: | ||
|
||
{ | ||
"count": 1, | ||
"next": null, | ||
"previous": null, | ||
"results": [ | ||
{ | ||
"uuid": "f74c23e1-6737-4c24-a470-a003bc573051", | ||
"space": "7ec3d5d9-23ec-4fd5-b9fb-df82da8de630", | ||
"pipelines": [ | ||
"2a351be8-99b4-4f53-8ea5-8d6ace6e0243", | ||
"b9d676ff-7c9d-4777-9a19-1b4b76a6542f" | ||
], | ||
"purpose": "TS", | ||
"quota": null, | ||
"used": 0, | ||
"enabled": true | ||
} | ||
] | ||
} | ||
|
||
|
||
Package search | ||
-------------------------------------------------------------------------------- | ||
|
||
The endpoint for searching packages is:: | ||
|
||
http://<storage service URL>/api/v2/search/package/ | ||
|
||
Packages can be searched using the following search parameters: | ||
|
||
* uuid (package UUID) | ||
* pipeline (pipeline UUID) | ||
* location (location UUID) | ||
* package_type (package type code: "AIP", "AIC", "SIP", "DIP", "transfer", "file", "deposit") | ||
* status (package status code: "PENDING", "STAGING", "UPLOADED", "VERIFIED", | ||
"DEL_REQ", "DELETED", "RECOVER_REQ", "FAIL", or "FINALIZE") | ||
* min_size (minimum package filesize) | ||
* max_size (maximum package filesize) | ||
|
||
For example, if you wanted to get details about packages contained in the location | ||
7c9ddb60-3d16-4fa3-a41e-4a1a876d2a89 you could HTTP GET:: | ||
|
||
http://<storage service URL>/api/v2/search/package/?package_type=AIP | ||
|
||
Here is an example JSON response:: | ||
|
||
{ | ||
count: 1, | ||
next: null, | ||
previous: null, | ||
results: [ | ||
{ | ||
uuid: "96365d3d-6656-4fdd-a247-f85c9e0ddd43", | ||
current_path: "9636/5d3d/6656/4fdd/a247/f85c/9e0d/dd43/Apples-96365d3d-6656-4fdd-a247-f85c9e0ddd43.7z", | ||
size: 7918099, | ||
origin_pipeline: "b9d676ff-7c9d-4777-9a19-1b4b76a6542f", | ||
current_location: "a3d95a1b-f8fb-4e34-9f15-60dcdf178470", | ||
package_type: "AIP", | ||
status: "UPLOADED", | ||
pointer_file_location: "c2dfb32b-77dd-4597-abff-7c52e05e6d01", | ||
pointer_file_path: "9636/5d3d/6656/4fdd/a247/f85c/9e0d/dd43/pointer.96365d3d-6656-4fdd-a247-f85c9e0ddd43.xml" | ||
} | ||
] | ||
} | ||
|
||
|
||
File search | ||
-------------------------------------------------------------------------------- | ||
|
||
The endpoint for searching files is:: | ||
|
||
http://<storage service URL>/api/v2/search/file/ | ||
|
||
Files can be searched using the following search criteria: | ||
|
||
* uuid (file UUID) | ||
* package (package UUID) | ||
* name (enter or partial filename) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a typo? Should "enter" be "entire"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think there's a typo in the line below: "PRONUM" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I fixed these typos in #262 |
||
* pronom_id (PRONUM PUID) | ||
* format_name (format name) | ||
* min_size (minimum filesize) | ||
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (boolean: whether or not file data is valid or malformed) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wording sounds contradictory: "whether or not file data is valid or malformed" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed |
||
|
||
For example, if you wanted to get details about files that are 29965171 bytes | ||
or larger, you could HTTP GET:: | ||
|
||
http://<storage service URL>/api/v2/search/file/?min_size=29965171 | ||
|
||
Here is an example JSON response:: | ||
|
||
{ | ||
count: 1, | ||
next: null, | ||
previous: null, | ||
results: [ | ||
{ | ||
uuid: "bd2074bb-2086-40b5-9c3f-3657cb900681", | ||
name: "Bodring-5f0fa831-a74b-4bf5-8598-779d49c3663a/objects/pictures/Landing_zone-e50c8452-0791-4fac-9f45-15b088a39b10.tif", | ||
file_type: "AIP", | ||
size: 29965171, | ||
format_name: "TIFF", | ||
pronom_id: "", | ||
source_package: "", | ||
normalized: null, | ||
validated: null, | ||
ingestion_time: "2015-10-30T04:16:39Z" | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Common | ||
# May have multiple models, so import * and use __all__ in file. | ||
from router import router | ||
|
||
__all__ = ['router'] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
import django_filters | ||
from rest_framework import routers, serializers, viewsets, filters | ||
from rest_framework.decorators import list_route | ||
from rest_framework.response import Response | ||
|
||
from django.db.models import Sum | ||
|
||
from locations import models | ||
|
||
|
||
class CaseInsensitiveBooleanFilter(django_filters.Filter): | ||
""" | ||
This allows users to query booleans without having to use "True" and "False" | ||
""" | ||
def filter(self, qs, value): | ||
if value is not None: | ||
lc_value = value.lower() | ||
if lc_value == "true": | ||
value = True | ||
elif lc_value == "false": | ||
value = False | ||
return qs.filter(**{self.name: value}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens here if the string doesn't match either? It just passes the string back? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just tested, looks like it's ignored if a supported value isn't passed. Maybe we want this it to raise an error instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it passes the string back as it and their search query will then fail (which makes sense given they're provided the wrong values for a boolean). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, cool! What's the failure look like? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'll pass the value back as-is and cause the query to be invalid (which is desirable given they're provided the wrong type of value). |
||
return qs | ||
|
||
|
||
class PipelineField(serializers.RelatedField): | ||
""" | ||
Used to show UUID of related pipelines | ||
""" | ||
def to_representation(self, value): | ||
return value.uuid | ||
|
||
|
||
class LocationSerializer(serializers.HyperlinkedModelSerializer): | ||
""" | ||
Serialize Location model data | ||
""" | ||
space = serializers.ReadOnlyField(source='space.uuid') | ||
pipelines = PipelineField(many=True, read_only=True, source='pipeline') | ||
|
||
class Meta: | ||
model = models.Location | ||
fields = ('uuid', 'space', 'pipelines', 'purpose', 'quota', 'used', 'enabled') | ||
|
||
|
||
class LocationFilter(django_filters.FilterSet): | ||
""" | ||
Filter for searching Location data | ||
""" | ||
uuid = django_filters.CharFilter(name='uuid') | ||
space = django_filters.CharFilter(name='space') | ||
purpose = django_filters.CharFilter(name='purpose') | ||
enabled = CaseInsensitiveBooleanFilter(name='enabled') | ||
|
||
class Meta: | ||
model = models.Location | ||
fields = ['uuid', 'space', 'purpose', 'enabled'] | ||
|
||
|
||
class LocationViewSet(viewsets.ReadOnlyModelViewSet): | ||
""" | ||
Search API view for Location model data | ||
""" | ||
queryset = models.Location.objects.all() | ||
serializer_class = LocationSerializer | ||
filter_backends = (filters.DjangoFilterBackend,) | ||
filter_class = LocationFilter | ||
|
||
|
||
class PackageSerializer(serializers.HyperlinkedModelSerializer): | ||
""" | ||
Serialize Package model data | ||
""" | ||
origin_pipeline = serializers.ReadOnlyField(source='origin_pipeline.uuid') | ||
current_location = serializers.ReadOnlyField(source='current_location.uuid') | ||
pointer_file_location = serializers.ReadOnlyField(source='pointer_file_location.uuid') | ||
|
||
class Meta: | ||
model = models.Package | ||
fields = ('uuid', 'current_path', 'size', 'origin_pipeline', 'current_location', 'package_type', 'status', 'pointer_file_location', 'pointer_file_path') | ||
|
||
|
||
class PackageFilter(django_filters.FilterSet): | ||
""" | ||
Filter for searching Package data | ||
""" | ||
min_size = django_filters.NumberFilter(name='size', lookup_type='gte') | ||
max_size = django_filters.NumberFilter(name='size', lookup_type='lte') | ||
pipeline = django_filters.CharFilter(name='origin_pipeline') | ||
location = django_filters.CharFilter(name='current_location') | ||
package_type = django_filters.CharFilter(name='package_type') | ||
|
||
class Meta: | ||
model = models.Package | ||
fields = ['uuid', 'min_size', 'max_size', 'pipeline', 'location', 'package_type', 'status', 'pointer_file_location'] | ||
|
||
|
||
class PackageViewSet(viewsets.ReadOnlyModelViewSet): | ||
""" | ||
Search API view for Package model data | ||
""" | ||
queryset = models.Package.objects.all() | ||
serializer_class = PackageSerializer | ||
filter_backends = (filters.DjangoFilterBackend,) | ||
filter_class = PackageFilter | ||
|
||
|
||
class FileSerializer(serializers.HyperlinkedModelSerializer): | ||
""" | ||
Serialize File model data | ||
""" | ||
pipeline = serializers.ReadOnlyField(source='origin.uuid') | ||
|
||
class Meta: | ||
model = models.File | ||
fields = ('uuid', 'name', 'file_type', 'size', 'format_name', 'pronom_id', 'pipeline', 'source_package', 'normalized', 'validated', 'ingestion_time') | ||
|
||
|
||
class FileFilter(django_filters.FilterSet): | ||
""" | ||
Filter for searching File data | ||
""" | ||
min_size = django_filters.NumberFilter(name='size', lookup_type='gte') | ||
max_size = django_filters.NumberFilter(name='size', lookup_type='lte') | ||
pipeline = django_filters.CharFilter(name='origin') | ||
package = django_filters.CharFilter(name='source_package') | ||
name = django_filters.CharFilter(name='name', lookup_type='icontains') | ||
normalized = CaseInsensitiveBooleanFilter(name='normalized') | ||
ingestion_time = django_filters.DateFilter(name='ingestion_time', lookup_type='contains') | ||
#ingestion_time_before = django_filters.DateFilter(name='ingestion_time', lookup_type='lt') | ||
#ingestion_time_after = django_filters.DateFilter(name='ingestion_time', lookup_type='gt') | ||
|
||
class Meta: | ||
model = models.File | ||
fields = ['uuid', 'name', 'file_type', 'min_size', 'max_size', | ||
'format_name', 'pronom_id', 'pipeline', 'source_package', | ||
'normalized', 'validated', 'ingestion_time'] | ||
#'ingestion_time_before', 'ingestion_time_after'] | ||
|
||
|
||
class FileViewSet(viewsets.ReadOnlyModelViewSet): | ||
""" | ||
Search API view for File model data | ||
|
||
Custom endpoint "stats" provides total size of files searched for | ||
""" | ||
queryset = models.File.objects.all() | ||
serializer_class = FileSerializer | ||
filter_backends = (filters.DjangoFilterBackend,) | ||
filter_class = FileFilter | ||
|
||
@list_route(methods=['get']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is this used? TODO: find out. |
||
def stats(self, request): | ||
filtered = FileFilter(request.GET, queryset=self.get_queryset()) | ||
count = filtered.qs.count() | ||
summary = filtered.qs.aggregate(Sum('size')) | ||
return Response({'count': count, 'total_size': summary['size__sum']}) | ||
|
||
|
||
# Route location, package, and file search API requests | ||
router = routers.DefaultRouter() | ||
router.register(r'location', LocationViewSet) | ||
router.register(r'package', PackageViewSet) | ||
router.register(r'file', FileViewSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💖 Docs