-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: catch uncaught exceptions & gc handles request aborts #102
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,7 @@ | |
* | `ECHO_HEADERS` | A debug flag to indicate whether you want to output request and response headers | `false` | | ||
* | `USE_DELEGATED_ROUTING` | Whether to use the delegated routing v1 API | `true` | | ||
* | `DELEGATED_ROUTING_V1_HOST` | Hostname to use for delegated routing v1 | `https://delegated-ipfs.dev` | | ||
* | `RECOVERABLE_ERRORS` | A comma delimited list of errors to recover from. These errors are checked in `uncaughtException` and `unhandledRejection` callbacks | `all` | | ||
* | ||
* <!-- | ||
* TODO: currently broken when used in docker, but they work when running locally (you can cache datastore and blockstore locally to speed things up if you want) | ||
|
@@ -159,7 +160,7 @@ import compress from '@fastify/compress' | |
import cors from '@fastify/cors' | ||
import Fastify from 'fastify' | ||
import metricsPlugin from 'fastify-metrics' | ||
import { HOST, PORT, METRICS, ECHO_HEADERS, FASTIFY_DEBUG } from './constants.js' | ||
import { HOST, PORT, METRICS, ECHO_HEADERS, FASTIFY_DEBUG, RECOVERABLE_ERRORS, ALLOW_UNHANDLED_ERROR_RECOVERY } from './constants.js' | ||
import { HeliaServer, type RouteEntry } from './helia-server.js' | ||
import { logger } from './logger.js' | ||
|
||
|
@@ -249,7 +250,7 @@ const stopWebServer = async (): Promise<void> => { | |
} | ||
|
||
let shutdownRequested = false | ||
async function closeGracefully (signal: number): Promise<void> { | ||
async function closeGracefully (signal: number | string): Promise<void> { | ||
log(`Received signal to terminate: ${signal}`) | ||
if (shutdownRequested) { | ||
log('closeGracefully: shutdown already requested, exiting callback.') | ||
|
@@ -268,3 +269,15 @@ async function closeGracefully (signal: number): Promise<void> { | |
// eslint-disable-next-line @typescript-eslint/no-misused-promises | ||
process.once(signal, closeGracefully) | ||
}) | ||
|
||
const uncaughtHandler = (error: any): void => { | ||
log.error('Uncaught Exception:', error) | ||
if (ALLOW_UNHANDLED_ERROR_RECOVERY && (RECOVERABLE_ERRORS === 'all' || RECOVERABLE_ERRORS.includes(error?.code) || RECOVERABLE_ERRORS.includes(error?.name))) { | ||
log.trace('Ignoring error') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't considered best practice - https://nodejs.org/api/process.html#warning-using-uncaughtexception-correctly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand, but we need some kind of error recovery instead of allowing things to just die like they were in Tiros. We could probably default this to FALSE and add a warning in the readme. Tiros needs this and probably some restarting of the server (to follow best practices).
I started to just listen for this change allows us to recover from anything which could cause problems in the future: There is a case where some error in libp2p/helia/fastify/helia-server.ts could happen that is unrecoverable, and this does an infinite loop of "On error resume next" and we don't find out until money has been eaten up running a dead service. However, we still need to block this server from dying in instances where we know we can safely recover, and unblocking Tiros was foremost on my mind. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point of the linked warning is that you can't tell if you can safely recover so the only safe thing you can do is exit the process and restart. If we have unhandled exceptions being thrown, these are bugs that should be fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree they're bugs that should be fixed, but should we allow a server to die given underlying libraries bugs if they're recoverable? Given the expectation of helia-http-server, I don't think so. It's supposed to fetch content using helia/libp2p, and return a result. If they die when fetching, we should certainly return a 500 error instead, and recover, right? BTW, The only other place I saw edit: or stream_premature_close error is coming from fastify req/resp There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just realized we should add a listener on the req and resp stream for ERR_STREAM_PREMATURE_CLOSE... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated #112 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Again, the point of the linked warning is that you can't tell if you can safely recover. Consider something like this: process.on('uncaughtException', (err) => {
if (err.message === 'recoverable') {
// it's ok, it's recoverable
return
}
console.error(err)
process.exit(1)
})
const fd = await fs.open('/path/to/file.txt')
// something that causes an error
throw new Error('recoverable')
fd.close() It looks recoverable, but the file descriptor is never closed so it leaks memory. Basically if you're in an uncaught exception handler all bets are off. |
||
return | ||
} | ||
void closeGracefully('SIGTERM') | ||
} | ||
|
||
process.on('uncaughtException', uncaughtHandler) | ||
process.on('unhandledRejection', uncaughtHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts like this use resources and can keep the process running. Better to use
AbortSignal.timeout(ms)
and combine signals withany-signal
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have cleared the timeout here, good catch, thanks. any-signal does not prevent duplicate handlers from being added to the same signal, so I prefer not to use it.
I will open an issue to update this to abortSignal.timeout(50) and
addEventListener
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you passing the same signal to
anySignal
multiple times? Sounds like a bug if so.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not, but I know libraries that do :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are not, then it should be okay to use it.
If you know of libraries that do, can you please open issues or better yet, PRs?