Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List endpoint and pagination #61

Open
nsheff opened this issue Nov 1, 2023 · 10 comments
Open

List endpoint and pagination #61

nsheff opened this issue Nov 1, 2023 · 10 comments
Labels
enhancement New feature or request
Milestone

Comments

@nsheff
Copy link
Member

nsheff commented Nov 1, 2023

It's come up repeatedly that we will need some kind of list endpoint that will provide information about the available objects in the server. This will be required for a meta- aggregator-type service that would span services, to be aware of what each service provides.

Because there could be a large number of collections, the result must be paged. We intend to follow the GA4GH paging guidelines (which are still being developed).

Proposed API

  • GET /list with query parameters: ?page_token=abc123&page_size=1000

  • page_token would default to None, which indicates starting from the most recent collection

  • page_size could have a server-defined default, which recommendation of 1000 ?

Proposed return value

Here's a proposal for return value of list endpoint, using token-based paging. The terms are following a google standard for paging:

{
	"total_size": len(self.database),
	"page_size": page_size,
	"page_token": page_token,
	"next_page_token": "", 
	"items": ["xyz123", "abc456", ...]
}
  • items or collection_identifiers or collection_digests ?
  • should total_size be optional?
  • should prev_page_token be optional?

API thoughts

Our current API uses:

  • /collection
  • /comparison

Should this be: /list_collections, or /list, or simply /collections or /collections/list ?

@nsheff
Copy link
Member Author

nsheff commented Nov 1, 2023

In discussion today we decided:

  • everyone agreed the proposal above looks good as is, so I'll probably implement like this for my demo
  • we will add to the spec a notice that: for now, in 1.0 the spec will remain silent on such "discovery of sequence collections" functions.
  • we will revisit in 1.1, waiting for a good use case to drive what we should do.
  • the alternative to the above would be to use dataconnect. advantages: use another ga4gh standard, would add power to querying for different collections. disadvantage: it's overkill since we have only 1 table by definition, in simple use case our "table" is a single variable, and the dataconnect return values are bloated for large lists when all you have is a list, not a table.

@nsheff
Copy link
Member Author

nsheff commented Jul 29, 2024

Should the /list endpoint vary by the item you're trying to list? This would set us up for pangenomes:

/list/collections and /list/pangenomes so we'd be able to list items of both types.

@nsheff
Copy link
Member Author

nsheff commented Jul 29, 2024

@nsheff
Copy link
Member Author

nsheff commented Jul 29, 2024

@nsheff
Copy link
Member Author

nsheff commented Aug 6, 2024

I could see an alternative argument that it should be /collections/list instead of /list/collections.

@andrewyatz
Copy link
Collaborator

I'd personally keep /list/collections as it prioritises verb over noun. But suggesting that a preference to a particular construct of English grammar should influence design is a bit odd. I think you state the reason well in your previous comment.

As for the items in the original version around pagination and with v4 GA4GH guide in mind

  • page is missing if I've read it correctly
  • total is optional so we should continue that into here
  • The v4 guide suggests you can encapsulate the pagination information as a peer attribute of items called pagination that way you're not bleeding too much into the top level structure

Overall though list endpoint gets my thumbs up

@nsheff
Copy link
Member Author

nsheff commented Aug 7, 2024

Ok.Which of these would you use for the query param names?

  • GET /list/:object_type?limit=:limit&offset=:offset
  • GET /list/:object_type?page=:page&page_size=:page_size

I guess they are equivalent, just with different variable names. (and page is offset divided by page_size).

Here's the spec I've written, then:


3.3 List

  • Endpoint: GET /list/:object_type?page=:page&page_size=:page_size (REQUIRED)
  • Description: Lists identifiers for a given object type (e.g. collections). This endpoint provides a way to discover what sequence collections a servic provides.
  • Return value: The output is a paged list of identifiers following the GA4GH paging guide format, grouped into a results and a pagination section.

Example return value

{
  "results": [
    ...
  ],
  "pagination": [
    "page": 1,
    "page_size": 100,
    "total": 165,
  ]
}

@tcezard
Copy link
Collaborator

tcezard commented Aug 7, 2024

I agree with /list/collections.
It also lines up nicely with the comparison function /comparison/digest1/digest2

I'm not however of generalising to pangenomes just yet especially in the specs. I think it will make the specs harder to understand for to the reader considering, we're not working on the pangenome yet.

We can generalise later when pangenome is defined.

@sveinugu
Copy link
Collaborator

sveinugu commented Aug 7, 2024

I'd personally keep /list/collections as it prioritises verb over noun.

One should really try to avoid verbs in RESTful APIs, and limit the verbs to the HTTP methods. However, "list" can also be read as a noun, e.g. "list of collections". So using the HTTP GET method could be read as the sentence "get the list of collections", which is (I believe) as RESTful as it gets.

This is in line with our previous choice to use /comparison and not /compare as the endpoint.

@andrewyatz
Copy link
Collaborator

Ok.Which of these would you use for the query param names?

  • GET /list/:object_type?limit=:limit&offset=:offset
  • GET /list/:object_type?page=:page&page_size=:page_size

I guess they are equivalent, just with different variable names. (and page is offset divided by page_size).

Here's the spec I've written, then:

3.3 List

  • Endpoint: GET /list/:object_type?page=:page&page_size=:page_size (REQUIRED)
  • Description: Lists identifiers for a given object type (e.g. collections). This endpoint provides a way to discover what sequence collections a servic provides.
  • Return value: The output is a paged list of identifiers following the GA4GH paging guide format, grouped into a results and a pagination section.

Example return value

{
  "results": [
    ...
  ],
  "pagination": [
    "page": 1,
    "page_size": 100,
    "total": 165,
  ]
}

Number two for me. Also yes but with an associative array for pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants