Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate unified backends #416

Merged
merged 30 commits into from
Oct 12, 2023
Merged

Integrate unified backends #416

merged 30 commits into from
Oct 12, 2023

Conversation

wilbrdt
Copy link
Contributor

@wilbrdt wilbrdt commented Aug 18, 2023

Purpose

After unifying database and storage backends under a common data
interface, backends settings and methods have changed.

Proposal

  • Move Stream backends settings alongside backend class
  • Move HTTP backends settings alongside backend class
  • Delete database and storage backends
  • Move backends settings from conf.py to backends/conf.py to avoid a
    circular import (e.g. conf.py -> es.py -> conf.py)
  • Remove _class_path attributes from backends settings
  • Change utils.py methods to get backend type and backend instance
  • Change cli.py to use new unified backends
  • Adapt cli.py to be able to use async backends
  • Rebase and make modifications when async mongo backend is merged on unify-data-backends
  • Change router to use new unified backends (both sync and async)
  • Adapt CLI and API tests
  • Add tests for new utils functions
  • Make sure to not forget any new features/changes added on database backends only, after the work on backends unification started (thinking about PR Add LRS statement query parameter Pydantic model #426 , PR 🎨(api) convert camelCase variables to snake_case #436)

@wilbrdt wilbrdt self-assigned this Aug 18, 2023
@wilbrdt wilbrdt added this to the 4.0 milestone Aug 18, 2023
.env.dist Outdated Show resolved Hide resolved
.env.dist Outdated Show resolved Hide resolved
.env.dist Outdated Show resolved Hide resolved
.env.dist Outdated Show resolved Hide resolved
.env.dist Outdated Show resolved Hide resolved
.env.dist Show resolved Hide resolved
src/ralph/backends/http/__init__.py Outdated Show resolved Hide resolved
src/ralph/backends/stream/__init__.py Outdated Show resolved Hide resolved
src/ralph/utils.py Outdated Show resolved Hide resolved
src/ralph/utils.py Outdated Show resolved Hide resolved
@quitterie-lcs
Copy link
Contributor

@wilbrdt wilbrdt force-pushed the unify-data-backends branch from 25726e7 to c94d0f4 Compare August 21, 2023 07:51
@wilbrdt wilbrdt force-pushed the integrate-unified-backends branch from 2db0f3f to 43adbfd Compare August 21, 2023 10:19
@wilbrdt wilbrdt force-pushed the unify-data-backends branch from 62dd60e to 2058395 Compare September 6, 2023 07:24
@wilbrdt wilbrdt force-pushed the integrate-unified-backends branch 3 times, most recently from 4e2de6b to 678b0a6 Compare September 7, 2023 13:43
Copy link
Contributor

@Leobouloc Leobouloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a partial review (up to statements.py)

.env.dist Show resolved Hide resolved
.env.dist Show resolved Hide resolved
src/ralph/utils.py Outdated Show resolved Hide resolved
src/ralph/utils.py Show resolved Hide resolved
src/ralph/backends/conf.py Show resolved Hide resolved
src/ralph/utils.py Show resolved Hide resolved
src/ralph/api/routers/health.py Outdated Show resolved Hide resolved
src/ralph/api/routers/statements.py Outdated Show resolved Hide resolved
src/ralph/api/routers/statements.py Show resolved Hide resolved
src/ralph/api/routers/statements.py Outdated Show resolved Hide resolved
src/ralph/cli.py Show resolved Hide resolved
src/ralph/api/routers/statements.py Show resolved Hide resolved
src/ralph/api/routers/statements.py Outdated Show resolved Hide resolved
@quitterie-lcs quitterie-lcs force-pushed the integrate-unified-backends branch from 205036d to ccc73fa Compare September 19, 2023 16:03
src/ralph/cli.py Outdated Show resolved Hide resolved
@quitterie-lcs quitterie-lcs force-pushed the integrate-unified-backends branch 2 times, most recently from ec0cac2 to 7428a7e Compare September 22, 2023 16:05
Copy link
Contributor Author

@wilbrdt wilbrdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alt text

src/ralph/api/routers/statements.py Outdated Show resolved Hide resolved
src/ralph/api/routers/statements.py Outdated Show resolved Hide resolved
src/ralph/backends/data/base.py Show resolved Hide resolved
Copy link
Contributor

@Leobouloc Leobouloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat ! Looking forward to this being merged.

src/ralph/backends/http/async_lrs.py Show resolved Hide resolved
src/ralph/backends/http/async_lrs.py Outdated Show resolved Hide resolved
src/ralph/backends/http/async_lrs.py Show resolved Hide resolved
src/ralph/backends/http/base.py Show resolved Hide resolved
src/ralph/backends/lrs/clickhouse.py Outdated Show resolved Hide resolved
tests/api/test_statements_put.py Show resolved Hide resolved
tests/api/test_statements_put.py Show resolved Hide resolved
tests/api/test_statements_put.py Show resolved Hide resolved
tests/backends/test_conf.py Show resolved Hide resolved
tests/test_cli.py Outdated Show resolved Hide resolved
@quitterie-lcs quitterie-lcs force-pushed the integrate-unified-backends branch 4 times, most recently from 4752d8d to 06cc6bb Compare October 4, 2023 11:08
wilbrdt and others added 26 commits October 12, 2023 09:52
We add the OpenStack Swift data backend implementation. With the `data`
parameter changed to an Iterable, we cannot use high level SwiftService API to
upload files anymore (it needs a file object source, not an iterable).
Changing to Connection lower-level API, which is more flexible.
We add the LDP data backend implementation that is mostly taken
from the existing LDPStorage backend.
Add S3 backend under the new common `data` interface
Add ClickHouse backend under the new common 'data' interface.
With ClickHouse under the new data interface, tests are updated as well.
Storage and Database backends had similar interfaces and usage,
so a new Data Backend interface has been created.
We want to simplify our tests that are mocking the request package.
Therefore we choose to use the `request_mock` library.
We add the ES data backend implementation that is mostly taken
from the existing ESDatabase backend.
We want to provide an LRS backend that could be added to
Ralph's LRS without any additional dependencies.
Methods `read_raw` and `parse_bytes_to_dict` are generic and used by multiple
backends. Moving them to file `utils.py`.
Add asynchronous base interface for async backends such as async_es or
async_mongo
We add the Async elasticsearch data backend mostly taken from the sync backend
using the async elasticsearch methods.
We want to improve the current mongo data backend implementation
to align it more with other unified data backends.
We want to provide an async version of our MongoDatabase
backend.
- `get_query` method for Elasticsearch would be better namespaced under the
ESLRSBackend.
Changing it to a static method instead of a global function.
- At initialization, data backends can either take settings or None.
Setting `settings_class` to Optional to anticipate mypy warning when mypy will
be added.
- Piping x|None is preferred since Python 3.10, changing from Optional to
Union[x|None] for backends as it would be easier to switch to pipes.
- Changes to backend methods docstrings
- Rename variable `new_documents` to be more explicit
Synchronous backends such as Elasticsearch or Mongo need their connection to
be closed when finished. This commits adds an abstract method close to the
BaseDataBackend interface, and implements it for backends that need it.
The update to a recent version of `motor` highlighted a bug on our side when
listing collections. Now asynchronously iterate over collections list.
With the new data backend interface, settings are now close to each backend
and not under general conf.py.
Unifying stream backend WS to have the same architecture as data backends.
With the new data backend interface, settings are now close to each backend
and not under general conf.py.
Unifying HTTP backends to have the same architecture as data backends.
After unifying database and storage backends under a common interface, backends
settings are now handled directly alongside the backends classes.
Modifying the CLI to support new settings and new backends interfaces.
With the addition of new asynchronous backends, it could be useful to be able
to use them in the CLI.
Adding a default value for ClickHouse client option
`allow_experimental_object_type` highlights a pydantic validation error with
type `Literal[0,1]`. Switching to `coint`.
With addition of unified backends and changes to the conf files, API router
needs some changes to be able to get the backends instance.
With addition of unified backends, API router needs some changes to be able to
use asynchronous backends.
Tests using filesystem failed with pyfakefs in the CI as pyfakefs does not
succeed on creating requesting files in the default directory path. The latter
is then defined specifically for these tests and forced to be used in the ralph
command.
Environment variables `RALPH_BACKENDS__DATABASE__ES__*` have been renamed to
`RALPH_BACKENDS__DATA__ES__*`. Changing them in the `tray`.
Changed

- Refactor `database` and `storage` backends under the unified `data` backend
interface [BC]
- Refactor LRS `query_statements` and `query_statements_by_ids` backends
methods under the unified `lrs` backend interface [BC]
Following `pylint` upgrade to version > 3.0 , a false negative was corrected
resulting in many warnings about methods having too many arguments. Escaping
these warnings.
@wilbrdt wilbrdt force-pushed the integrate-unified-backends branch from 96beb9e to 27143be Compare October 12, 2023 08:11
@wilbrdt wilbrdt changed the base branch from unify-data-backends to master October 12, 2023 08:11
@quitterie-lcs quitterie-lcs merged commit 7cda560 into master Oct 12, 2023
@quitterie-lcs quitterie-lcs deleted the integrate-unified-backends branch October 12, 2023 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants