Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - Gen AI data ingestion workflow / pipeline (gitlab based) #1706

Open
Benvii opened this issue Aug 5, 2024 · 0 comments
Open

Feature - Gen AI data ingestion workflow / pipeline (gitlab based) #1706

Benvii opened this issue Aug 5, 2024 · 0 comments
Assignees

Comments

@Benvii
Copy link
Member

Benvii commented Aug 5, 2024

Data Ingestion is a complex task and ingested documents needs to be refreshed / renewed continuously. For now this task can be performed using our basic python tooling available here tock-llm-indexing-tools.

This is done manually and we are going to automate it a be more and also include testing features based on Langfuse datasets.

Our approach will be based on Gitlab pipelines, this solution is simple and will let us schedule data ingestion or even trigger them using Gitlab's API. We will also be able to keep track of each ingestion jobs using gitlab and each job states.

Related issues and PR :

Technical design needs to be approved before starting any development work it will also serve as documentation for futur contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant