Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability #3393

Open
sbrunner opened this issue Sep 3, 2024 · 1 comment
Open

Observability #3393

sbrunner opened this issue Sep 3, 2024 · 1 comment

Comments

@sbrunner
Copy link
Member

sbrunner commented Sep 3, 2024

Introduction

Currently, we get some lag around the observability of the application, then here we defined how it should be.

Notes that here we define the general framework, not all the specific cases event if we pout the first wanted implementations.

We target in priority the Kubernetes/Docker environment, then some words comes from this world.

Usage of the health checks.

This will update the Result or the /metrics/healthcheck endpoint.

Examples of the responses

When we call the healthy method:

{
    "application": {
        "healthy": true,
        "message": "sbr test.",
        "duration": 0,
        "timestamp": "2024-08-29T13:44:58.398Z"
    }
}

Response code: HTTP code 200.

When we call the unhealthy method:

{
    "application": {
        "healthy": false,
        "message": "sbr test.",
        "duration": 0,
        "timestamp": "2024-08-29T13:44:58.398Z"
    }
}

Response code: HTTP status code 200 (or 500 when we set JAVA_OPTS to -DhttpStatusIndicator=true)

If we raise an exception:

{
    "application": {
        "healthy": false,
        "message": "sbr test.",
        "error": {
            "type": "java.lang.RuntimeException",
            "message": "sbr test.",
            "stack": [
                "org.mapfish.print.metrics.ApplicationStatus.check(ApplicationStatus.java:15)", 
                "com.codahale.metrics.health.HealthCheck.execute(HealthCheck.java:374)", 
                "com.codahale.metrics.health.HealthCheckRegistry.runHealthChecks(HealthCheckRegistry.java:184)", 
                "com.codahale.metrics.servlets.HealthCheckServlet.runHealthChecks(HealthCheckServlet.java:177)", 
                "com.codahale.metrics.servlets.HealthCheckServlet.doGet(HealthCheckServlet.java:146)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:529)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:623)", 
                "com.codahale.metrics.servlets.AdminServlet.service(AdminServlet.java:153)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:623)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:199)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:209)", 
                "com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:244)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:112)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:352)", 
                "org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:117)", 
                "org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)", 
                "org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:131)", 
                "org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:85)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", "org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:164)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", "org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:168)",
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:90)", 
                "org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:75)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:62)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:117)", 
                "org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.access.channel.ChannelProcessingFilter.doFilter(ChannelProcessingFilter.java:133)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.session.DisableEncodeUrlFilter.doFilterInternal(DisableEncodeUrlFilter.java:42)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:225)", 
                "org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:190)", 
                "org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:354)", 
                "org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:267)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.mapfish.print.servlet.RequestSizeFilter.doFilter(RequestSizeFilter.java:40)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:168)", 
                "org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)", 
                "org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)", 
                "org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130)", 
                "org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)", 
                "org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)", 
                "org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:346)", 
                "org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:388)", 
                "org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)", 
                "org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:936)", 
                "org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1791)", 
                "org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)", 
                "org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190)", 
                "org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)", 
                "org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)", 
                "java.base/java.lang.Thread.run(Thread.java:829)"
            ]                
        },
        "duration": 0,
        "timestamp": "2024-08-29T13:46:38.000Z"
    }
}

Response code: HTTP code 200.

Propose usage

Use this endpoint only in the Kubernetes endpoint to automatically restart the Pod.

Use the heathy and unhealthy method to change the view status.

Add the option httpStatusIndicator=true in the file core/src/main/resources/mapfish-spring.properties.

At first, we should get an error when we don't consume the queue event if she is not empty, at therm I think that this check should be something like this (for this check we need a time window.):

If the queue is empty during the time window => healthy
If a print job ends during the time window => healthy
Otherwise => unhealthy

It's possible that we need a check that tests the building of an epsg code, in the past we get on some container this exception:

java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
	at org.mapfish.print.output.Values.populateFromAttributes(Values.java:229)
	at org.mapfish.print.output.Values.<init>(Values.java:153)
	at org.mapfish.print.output.Values.<init>(Values.java:110)
	at org.mapfish.print.output.AbstractJasperReportOutputFormat.getJasperPrint(AbstractJasperReportOutputFormat.java:137)
	at org.mapfish.print.output.AbstractJasperReportOutputFormat.print(AbstractJasperReportOutputFormat.java:94)
	at org.mapfish.print.MapPrinter.print(MapPrinter.java:133)
	at org.mapfish.print.servlet.job.PrintJob.lambda$call$0(PrintJob.java:148)
	at org.mapfish.print.servlet.job.PrintJob.withOpenOutputStream(PrintJob.java:118)
	at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:147)
	at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:54)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
	at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:93)
	at org.mapfish.print.attribute.map.GenericMapAttribute$GenericMapAttributeValues.parseProjection(GenericMapAttribute.java:516)
	at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.parseBounds(MapAttribute.java:164)
	at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.postConstruct(MapAttribute.java:160)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.mapfish.print.parser.MapfishParser.parse(MapfishParser.java:138)
	at org.mapfish.print.attribute.ReflectiveAttribute.getValue(ReflectiveAttribute.java:428)
	at org.mapfish.print.output.Values.populateFromAttributes(Values.java:203)
	... 13 common frames omitted
Caused by: org.opengis.referencing.NoSuchAuthorityCodeException: No code "EPSG:2056" from authority "European Petroleum Survey Group" found for object of type "IdentifiedObject".
	at org.geotools.referencing.factory.AbstractAuthorityFactory.noSuchAuthorityCode(AbstractAuthorityFactory.java:874)
	at org.geotools.referencing.factory.PropertyAuthorityFactory.getWKT(PropertyAuthorityFactory.java:289)
	at org.geotools.referencing.factory.PropertyAuthorityFactory.createCoordinateReferenceSystem(PropertyAuthorityFactory.java:358)
	at org.geotools.referencing.factory.BufferedAuthorityFactory.createCoordinateReferenceSystem(BufferedAuthorityFactory.java:731)
	at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
	at org.geotools.referencing.factory.FallbackAuthorityFactory.createCoordinateReferenceSystem(FallbackAuthorityFactory.java:624)
	at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
	at org.geotools.referencing.factory.ThreadedAuthorityFactory.createCoordinateReferenceSystem(ThreadedAuthorityFactory.java:635)
	at org.geotools.referencing.DefaultAuthorityFactory.createCoordinateReferenceSystem(DefaultAuthorityFactory.java:176)
	at org.geotools.referencing.CRS.decode(CRS.java:517)
	at org.geotools.referencing.CRS.decode(CRS.java:433)
	at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:88)
	... 23 common frames omitted

See also Jira issue.

Usage of the metric.

The metrics should be reviewer and documented, currently it's a little mess...

At first, we should add a gauge to observe the queue length and a timer to observe the total print duration.

Then we should review all the metrics, see if they're working, update/remove them if needed, add documentation.

Pertinent metric:

  • Around print jobs
    • number of waiting/running/success/failed jobs with argument: application / template (new)
    • Time to process a print job with argument: application / template (new)
  • Around processors:
    • Time to process a processor with argument: application? / template? / processor type or class (new)
  • Around Requests
    • Time to process a request with argument: host name (should be verified that working, rename and document them)

Current metrics:

  • HttpRequestFetcher:
    • timer on download
    • timer on read by host
    • counter on error by host
  • AbstractSingleImageLayer:
    • counter on request error by host
    • another counter on request error by host
    • a third counter on request error by host
    • a counter on image read error (same name than before) by host
    • timer on request by host
  • CoverageTask:
    • timer on download by host
    • counter on error by host
    • another counter on error by host
    • a thirst counter on error by host

Cluster check

If we need a check, e.g. to notify that the print job queue it too long` we probably need to create a custom endpoint.

Resume

Use health checks only for health concerned with the current container.

Identify and add missing metrics to be able to better monitor the application with tools like Prometheus/Grafana.

Eventually, add a new endpoint for more specific checks like too long queue.

@sbrunner
Copy link
Member Author

The access logs should also easy be enabled:
See: https://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Access_Logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant