Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time ingest feature #466

Open
eric-gardyn opened this issue Aug 12, 2024 · 5 comments
Open

Real-time ingest feature #466

eric-gardyn opened this issue Aug 12, 2024 · 5 comments

Comments

@eric-gardyn
Copy link
Contributor

Hi,

Is there a way to use the Ingest package to be more "real-time", API driven?
Use case:
We have an FAQ which is updated quite often in a CMS.
Goal would be to trigger an ingestion of the content on every Create/Update/Delete operation in the CMS.

Is it possible with some little effort?

@mongodben
Copy link
Collaborator

hi eric, long time no talk 😄

currently, there is no support for that, though you could write some custom routes/endpoints that wrap the ingest logic in the mongodb-rag-ingest package.

relatedly, we're moving some of that logic to the mongodb-rag-core library soon - #455 (though you'll still be able to consume it from the mongodb-rag-ingest lib)

@eric-gardyn
Copy link
Contributor Author

FWIW, I now have the 'ingest' running as an endpoint on an Azure function app (serverless function).
Just had to tweak the 'loadConfig' method in WithConfig.ts (I am running the repo's Typescript files for 'rag-ingest') to correctly load the config.
Otherwise, it works; it even helped me find a "bug" in my config object ;)

Next step is a wrapper code that can take the source of the modified content (in my case, an external CMS) and accordingly call the server-less endpoint.

@mongodben
Copy link
Collaborator

FWIW, I now have the 'ingest' running as an endpoint on an Azure function app (serverless function).

nice! just to clarify what you mean, did you created an endpoint that's like POST /ingest to trigger the ingestion process?

did you make separate pages/embed endpoints? are there path parameters to do it by data source, ie POST /ingest/pages/:sourceName?


somewhat related, i think it would be really neat to have embedding occur as an event-based process whenever a page is updated. would be pretty straightforward with MongoDB change streams. you'd just need to build some basic event queue to process the page creation/change/deletion events to take into account rate limit issues with the embedding models.

@eric-gardyn
Copy link
Contributor Author

yes POST /ingest that takes an array of strings in body's argument.
and basically just using withConfig like so:

    const resp = await withConfig(doAllCommand, { doPagesCommand, config, sourceNames })

changed doAllCommand args to

type DoAllCommandArgs = {
  doPagesCommand: typeof standardDoPagesCommand
  sourceNames?: string[]
}

and updated doAllCommand to call

  await doPagesCommand(config, { source: sourceNames })

  await doEmbedCommand(config, {
    since: lastSuccessfulRunDate ?? new Date('2023-01-01'),
    source: sourceNames,
  })

doPagesCommand and doEmbedCommand already took 'source' as string[]

@mongodben
Copy link
Collaborator

nice. this is great feedback. i realistically don't think that we'll create an ingest API anytime soon since we don't have need on our end. however, i would like to cleanly expose the ingestion methods so you or others can do something like what you've done w/o having to do anything hacky. like a "MongoDB RAG Ingest SDK".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants