Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim the excessive logged stack trace for missing WARC files #420

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

tokee
Copy link
Contributor

@tokee tokee commented Oct 16, 2023

Missing WARC files, typically caused by a dismounted network drive or a moved folder, causes excessive track traces in the logs when performing playback of webpages. This pull request changes the code to

  • Log the FileNotFoundException at the root level ArcSource.get()
  • Perform explicit checking for the FileNotFoundException at the calling level ArcParserFileResolver.getArcEntry(...), culling the stack trace and converting it to a proper HTTP 404 service exception.

After this pull request, the log entries from a missing WARC will be

2023-10-16 20:50:37 [http-nio-8080-exec-25] DEBUG dk.kb.netarchivesuite.solrwayback.service.SolrWaybackResource(SolrWaybackResource.java:933) - View from FilePath:/home/te/projects/wac2023/solrwayback_package_4.4.1/indexing/warcs1/mywarc.warc.gz offset:266225
2023-10-16 20:50:37 [http-nio-8080-exec-25] ERROR dk.kb.netarchivesuite.solrwayback.interfaces.ArcSource(ArcSource.java:90) - FileNotFoundException trying to access (W)ARC '/home/te/projects/wac2023/solrwayback_package_4.4.1/indexing/warcs1/mywarc.warc.gz'
2023-10-16 20:50:37 [http-nio-8080-exec-25] INFO  dk.kb.netarchivesuite.solrwayback.service.SolrWaybackResource(SolrWaybackResource.java:1135) - Handling serviceException:Unable to locate (W)ARC '/home/te/projects/wac2023/solrwayback_package_4.4.1/indexing/warcs1/mywarc.dk.warc.gz'

The first (the "real" message) is from the logger @ ArcSource and is enables closer bug finding. The second is from the Exception handler in the SolrWaybackResource and is unavoidable with the current architecture as the old contract is that the exception message will be logged here.

This closes #414

@tokee tokee self-assigned this Oct 16, 2023
Copy link
Contributor

@thomasegense thomasegense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simple enough I dont have need to test it.
And I agreed the stack trace was maybe too much. However this should not happen.

@thomasegense thomasegense merged commit 3b1a45e into master Oct 16, 2023
4 checks passed
@thomasegense thomasegense deleted the 414_missing_warc_exception branch October 16, 2023 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up error message on missing WARC
2 participants