Skip to content

Commit

Permalink
Use query params and make returned fields selectable (#28)
Browse files Browse the repository at this point in the history
  • Loading branch information
Cito authored Aug 5, 2024
1 parent 4c7282f commit 1c49b9e
Show file tree
Hide file tree
Showing 31 changed files with 731 additions and 589 deletions.
14 changes: 10 additions & 4 deletions .devcontainer/.dev_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,19 @@ db_name: metadata-store
searchable_classes:
Dataset:
description: Dataset grouping files under controlled access.
facetable_properties:
- key: type # a property directly part of the dataset
facetable_fields:
- key: type # a field directly part of the dataset
name: Type
- key: "study.type" # a property that is part of study that is embedded into this dataset
- key: "study.type" # a field that is part of study that is embedded into this dataset
name: Study Type
- key: "study.project.alias" # a property part of a deeply embedded resource
- key: "study.project.alias" # a field part of a deeply embedded resource
name: Project Alias
selected_fields:
- key: accession
name: Dataset ID
- key: title
name: Title

resource_change_event_topic: searchable_resources
resource_deletion_event_type: searchable_resource_deleted
resource_upsertion_event_type: searchable_resource_upserted
Expand Down
2 changes: 1 addition & 1 deletion .devcontainer/dev_launcher
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash

mass
mass run-rest
26 changes: 13 additions & 13 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,10 @@
version: '3'

services:
app:
build:
context: .
dockerfile: ./Dockerfile
args:
# [Choice] Python version: 3, 3.8, 3.7, 3.6
VARIANT: 3.9
# [Choice] Install Node.js
INSTALL_NODE: "true"
NODE_VERSION: "lts/*"
# Please adapt to package name:
PACKAGE_NAME: "mass"
PACKAGE_NAME: mass
# On Linux, you may need to update USER_UID and USER_GID below if not your local UID is not 1000.
USER_UID: 1000
USER_GID: 1000
Expand All @@ -33,18 +25,26 @@ services:
environment:
# Please adapt to package name:
MASS_CONFIG_YAML: /workspace/.devcontainer/.dev_config.yaml
# Used by db migration:
DB_URL: postgresql://postgres:postgres@postgresql/postgres

# Use "forwardPorts" in **devcontainer.json** to forward an app port locally.
# (Adding the "ports" property to this file will not forward from a Codespace.)


# Please remove service dependencies that are not needed:
mongodb:
image: mongo:latest
restart: unless-stopped
volumes:
- mongo_fs:/data/db

mongo-express:
image: mongo-express:latest
restart: unless-stopped
ports:
- 8088:8081
environment:
ME_CONFIG_MONGODB_URL: mongodb://mongodb:27017/
ME_CONFIG_BASICAUTH_USERNAME: dev
ME_CONFIG_BASICAUTH_PASSWORD: dev
ME_CONFIG_MONGODB_ENABLE_ADMIN: "true"

volumes:
mongo_fs: {}
8 changes: 4 additions & 4 deletions .pyproject_generation/pyproject_custom.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[project]
name = "mass"
version = "2.1.0"
description = "Metadata Artifact Search Service - A service for searching metadata artifacts and filtering results."
version = "3.0.0"
description = "Metadata Artifact Search Service - A service for searching metadata artifacts and filtering results."
dependencies = [
"typer>=0.12",
"ghga-service-commons[api]>=3.0.0",
"ghga-event-schemas>=2.0.0",
"ghga-service-commons[api]>=3.1.5",
"ghga-event-schemas>=3.1.1",
"hexkit[mongodb,akafka]>=3.5.0",
]

Expand Down
10 changes: 7 additions & 3 deletions .readme_generation/description.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
The Metadata Artifact Search Service uses search parameters to look for metadata.

### Quick Overview of API
There are two available API endpoints that follow the RPC pattern (not REST):
One endpoint ("GET /rpc/search-options") will return an overview of all metadata classes that can be targeted
by a search. The actual search endpoint ("POST /rpc/search") can be used to search for these target classes using keywords. Hits will be reported in the context of the selected target class.
The API provides two not strictly RESTful endpoints:

One endpoint ("GET /search-options") will return an overview of all metadata classes
that can be targeted by a search.

The actual search endpoint ("GET /search") can be used to search for these target classes
using keywords. Hits will be reported in the context of the selected target class.
This means that target classes will be reported that match the specified search query,
however, the target class might contain embedded other classes and the match might
occur in these embedded classes, too.
Expand Down
17 changes: 10 additions & 7 deletions .readme_generation/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,17 @@ It uses protocol/provider pairs and dependency injection mechanisms provided by
This service is currently designed to work with MongoDB and uses an aggregation pipeline to produce search results.

Typical sequence of events is as follows:

1. Requests are received by the API, then directed to the QueryHandler in the core.

2. From there, the configuration is consulted to retrieve any facetable properties for the searched resource class.
2. From there, the configuration is consulted to retrieve any facetable and selected fields for the searched resource class.

3. The search parameters and facet fields are passed to the Aggregator, which builds and runs the aggregation pipeline on the appropriate collection. The aggregation pipeline is a series of stages run in sequence:
- The first stage runs a text match using the query string.
- The second stage applies a sort based on the IDs.
- The third stage applies any filters supplied in the search parameters.
- The fourth stage extract facets.
- The fifth/final stage transforms the results structure into {facets, hits, hit count}.
4. Once retrieved in the Aggregator, the results are passed back to the QueryHandler where they are shoved into a QueryResults pydantic model for validation before finally being sent back to the API.
1. Run a text match using the query string.
2. Apply a sort based on the IDs.
3. Apply any filters supplied in the search parameters.
4. Extract the facets.
5. Keep only selected fields if some have been specified.
6. Transform the results structure into {facets, hits, hit count}.

4. Once retrieved in the Aggregator, the results are passed back to the QueryHandler where they are shoved into a QueryResults Pydantic model for validation before finally being sent back to the API.
51 changes: 31 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,20 @@

# Mass

Metadata Artifact Search Service - A service for searching metadata artifacts and filtering results.
Metadata Artifact Search Service - A service for searching metadata artifacts and filtering results.

## Description

The Metadata Artifact Search Service uses search parameters to look for metadata.

### Quick Overview of API
There are two available API endpoints that follow the RPC pattern (not REST):
One endpoint ("GET /rpc/search-options") will return an overview of all metadata classes that can be targeted
by a search. The actual search endpoint ("POST /rpc/search") can be used to search for these target classes using keywords. Hits will be reported in the context of the selected target class.
The API provides two not strictly RESTful endpoints:

One endpoint ("GET /search-options") will return an overview of all metadata classes
that can be targeted by a search.

The actual search endpoint ("GET /search") can be used to search for these target classes
using keywords. Hits will be reported in the context of the selected target class.
This means that target classes will be reported that match the specified search query,
however, the target class might contain embedded other classes and the match might
occur in these embedded classes, too.
Expand All @@ -32,21 +36,21 @@ We recommend using the provided Docker container.

A pre-build version is available at [docker hub](https://hub.docker.com/repository/docker/ghga/mass):
```bash
docker pull ghga/mass:2.1.0
docker pull ghga/mass:3.0.0
```

Or you can build the container yourself from the [`./Dockerfile`](./Dockerfile):
```bash
# Execute in the repo's root dir:
docker build -t ghga/mass:2.1.0 .
docker build -t ghga/mass:3.0.0 .
```

For production-ready deployment, we recommend using Kubernetes, however,
for simple use cases, you could execute the service using docker
on a single server:
```bash
# The entrypoint is preconfigured:
docker run -p 8080:8080 ghga/mass:2.1.0 --help
docker run -p 8080:8080 ghga/mass:3.0.0 --help
```

If you prefer not to use containers, you may install the service from source:
Expand Down Expand Up @@ -100,7 +104,7 @@ The service requires the following configuration parameters:

- **`log_traceback`** *(boolean)*: Whether to include exception tracebacks in log messages. Default: `true`.

- **`searchable_classes`** *(object)*: A collection of searchable_classes with facetable properties. Can contain additional properties.
- **`searchable_classes`** *(object)*: A collection of searchable_classes with facetable and selected fields. Can contain additional properties.

- **Additional properties**: Refer to *[#/$defs/SearchableClass](#%24defs/SearchableClass)*.

Expand Down Expand Up @@ -303,19 +307,23 @@ The service requires the following configuration parameters:
## Definitions


- <a id="%24defs/FacetLabel"></a>**`FacetLabel`** *(object)*: Contains the key and corresponding user-friendly name for a facet.
- <a id="%24defs/FieldLabel"></a>**`FieldLabel`** *(object)*: Contains the field name and corresponding user-friendly name.

- **`key`** *(string, required)*: The raw facet key, such as study.type.
- **`key`** *(string, required)*: The raw field name, such as study.type.

- **`name`** *(string)*: The user-friendly name for the facet. Default: `""`.
- **`name`** *(string)*: A user-friendly name for the field (leave empty to use the key). Default: `""`.

- <a id="%24defs/SearchableClass"></a>**`SearchableClass`** *(object)*: Represents a searchable artifact or resource type.

- **`description`** *(string, required)*: A brief description of the resource type.

- **`facetable_properties`** *(array, required)*: A list of of the facetable properties for the resource type.
- **`facetable_fields`** *(array)*: A list of the facetable fields for the resource type (leave empty to not use faceting). Default: `[]`.

- **Items**: Refer to *[#/$defs/FieldLabel](#%24defs/FieldLabel)*.

- **Items**: Refer to *[#/$defs/FacetLabel](#%24defs/FacetLabel)*.
- **`selected_fields`** *(array)*: A list of the returned fields for the resource type (leave empty to return all). Default: `[]`.

- **Items**: Refer to *[#/$defs/FieldLabel](#%24defs/FieldLabel)*.


### Usage:
Expand Down Expand Up @@ -353,17 +361,20 @@ It uses protocol/provider pairs and dependency injection mechanisms provided by
This service is currently designed to work with MongoDB and uses an aggregation pipeline to produce search results.

Typical sequence of events is as follows:

1. Requests are received by the API, then directed to the QueryHandler in the core.

2. From there, the configuration is consulted to retrieve any facetable properties for the searched resource class.
2. From there, the configuration is consulted to retrieve any facetable and selected fields for the searched resource class.

3. The search parameters and facet fields are passed to the Aggregator, which builds and runs the aggregation pipeline on the appropriate collection. The aggregation pipeline is a series of stages run in sequence:
- The first stage runs a text match using the query string.
- The second stage applies a sort based on the IDs.
- The third stage applies any filters supplied in the search parameters.
- The fourth stage extract facets.
- The fifth/final stage transforms the results structure into {facets, hits, hit count}.
4. Once retrieved in the Aggregator, the results are passed back to the QueryHandler where they are shoved into a QueryResults pydantic model for validation before finally being sent back to the API.
1. Run a text match using the query string.
2. Apply a sort based on the IDs.
3. Apply any filters supplied in the search parameters.
4. Extract the facets.
5. Keep only selected fields if some have been specified.
6. Transform the results structure into {facets, hits, hit count}.

4. Once retrieved in the Aggregator, the results are passed back to the QueryHandler where they are shoved into a QueryResults Pydantic model for validation before finally being sent back to the API.


## Development
Expand Down
33 changes: 21 additions & 12 deletions config_schema.json
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
{
"$defs": {
"FacetLabel": {
"description": "Contains the key and corresponding user-friendly name for a facet",
"FieldLabel": {
"description": "Contains the field name and corresponding user-friendly name",
"properties": {
"key": {
"description": "The raw facet key, such as study.type",
"description": "The raw field name, such as study.type",
"title": "Key",
"type": "string"
},
"name": {
"default": "",
"description": "The user-friendly name for the facet",
"description": "A user-friendly name for the field (leave empty to use the key)",
"title": "Name",
"type": "string"
}
},
"required": [
"key"
],
"title": "FacetLabel",
"title": "FieldLabel",
"type": "object"
},
"SearchableClass": {
Expand All @@ -29,18 +29,27 @@
"title": "Description",
"type": "string"
},
"facetable_properties": {
"description": "A list of of the facetable properties for the resource type",
"facetable_fields": {
"default": [],
"description": "A list of the facetable fields for the resource type (leave empty to not use faceting)",
"items": {
"$ref": "#/$defs/FacetLabel"
"$ref": "#/$defs/FieldLabel"
},
"title": "Facetable Properties",
"title": "Facetable Fields",
"type": "array"
},
"selected_fields": {
"default": [],
"description": "A list of the returned fields for the resource type (leave empty to return all)",
"items": {
"$ref": "#/$defs/FieldLabel"
},
"title": "Selected Fields",
"type": "array"
}
},
"required": [
"description",
"facetable_properties"
"description"
],
"title": "SearchableClass",
"type": "object"
Expand Down Expand Up @@ -103,7 +112,7 @@
"additionalProperties": {
"$ref": "#/$defs/SearchableClass"
},
"description": "A collection of searchable_classes with facetable properties",
"description": "A collection of searchable_classes with facetable and selected fields",
"title": "Searchable Classes",
"type": "object"
},
Expand Down
7 changes: 6 additions & 1 deletion example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,18 @@ resource_upsertion_event_type: searchable_resource_upserted
searchable_classes:
Dataset:
description: Dataset grouping files under controlled access.
facetable_properties:
facetable_fields:
- key: type
name: Type
- key: study.type
name: Study Type
- key: study.project.alias
name: Project Alias
selected_fields:
- key: accession
name: Dataset ID
- key: title
name: Title
service_instance_id: '001'
service_name: mass
workers: 1
Loading

0 comments on commit 1c49b9e

Please sign in to comment.