-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Recrawl failed links #95
Conversation
AhmadMuj
commented
Apr 11, 2024
- Added a backoff for the Links Crawler Queue
- Added a new field in ( crawStatus ) in the bookmarkLinks table
- Added a new button the web to recrawl the failed URLs based on the crawStatus
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was fast! It looks good to me, I requested only some minor changes :)
EDIT, can you also maybe rebase your branch to get the PDF commits out of the PR? They are not that big of a problem though :)
<TableRow> | ||
<TableCell className="lg:w-2/3">Failed Crawling Jobs</TableCell> | ||
<TableCell>{serverStats.failedCrawls}</TableCell> | ||
</TableRow> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we change the table to be a bit more 2d?
As in:
Pending | Failed | |
---|---|---|
Crawling jobs | 0 | 0 |
Inference jobs | 0 | 0 |
Search jobs | 0 | - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the indexing and openai are going to be based off the queue status.
packages/trpc/routers/admin.ts
Outdated
@@ -59,7 +69,21 @@ export const adminAppRouter = router({ | |||
), | |||
); | |||
}), | |||
|
|||
recrawlFailedLinks: adminProcedure.mutation(async ({ ctx }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can easily add a onlyFailures
in the existing recrawlAllLinks
instead of creating a new endpoint?
UPDATE bookmarkLinks SET crawlStatus = 'failure' where htmlContent is null;--> statement-breakpoint | ||
UPDATE bookmarkLinks SET crawlStatus = 'success' where htmlContent is not null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that!
Thanks again, I'm pretty sure a lot of people will find this to be super useful! |