Running TiMVT in a serverless setup and managing database connections #104

dvd3v · 2022-11-17T16:34:35Z

dvd3v
Nov 17, 2022

I really love the TiTiler + AWS Lambda setup for its simplicity and scalability. I'm trying to do the same for TiMVT, however due to the database connections there are few more challenges. TiMVT does not seem to be written to work in a serverless setup out of the box as;

it creates a DB connection on each Lambda invocation
registers the complete table catalog on each Lambda invocation

@app.on_event("startup")
async def startup_event() -> None:
    """Connect to database on startup."""
    await connect_to_db(app)
    # TiMVT and TiFeatures share the same `Table_catalog` format
    # see https://github.com/developmentseed/timvt/pull/83
    await register_table_catalog(app)

As for the registering of the table catalog I've moved it to the LayerParams dependency. As we have access to the request parameters inside the LayerParams, we can only registers the schema and table name that is requested;

await register_table_catalog(request.app, schemas=schema, tables=layer_name)

This mitigates the need for registering the whole DB on each invocation and then subsetting the table that is needed. Although I'm not sure if this is the best method. Happy for suggestions.

As for the database connections, the TiMVT Lambda functions can scale easily and create many concurrent connections, meaning the database needs more resources to maintain connections instead of executing queries at some point. I could easily overload the database with a (complex) dataset.

To try and solve this I've fronted the Aurora Serverless V2 database with an RDS proxy.

The Lambda functions now interact with RDS Proxy instead of the database instance and the proxy handles the connection pooling necessary for scaling many simultaneous connections. This allows the TiMVT Lambdas to reuse existing connections, rather than creating new connections for every function invocation. Although I still get quite a few connections on the database, it performs much better than without a proxy.

Interested to hear the opinion on this architecture and what I could do better. Also wondering if TiPG will already have solved these issues.
Happy to share my serverless deployment templates for this setup if anyone is interested!

ps @vincentsarago awesome talk at PostGIS day!

vincentsarago · 2022-11-17T16:37:26Z

vincentsarago
Nov 17, 2022
Maintainer

thanks @dvd3v 🙏

I'll go through your post later but for now this might be of interest stac-utils/stac-fastapi#493

basically mangum will execute start/shutdown on each request while we want it to only be done on Lambda lifecycle 😭

0 replies

geospatial-jeff · 2022-11-17T19:25:57Z

geospatial-jeff
Nov 17, 2022

A couple RDS Proxy tips that I've learned the hard way:

The MaxConnectionsPercent and MaxIdleConnectionsPercent config options determine how many connections RDS Proxy target groups are allowed to open and the percentage of those connections allowed to stay idle, respectively. They default to 50 and 50, respectively.
- MaxConnectionsPercent really depends on how many other things are using your database. If it's just TIMVT lambda you can set it to a higher value like 90, but if you have other applications that are connecting to the database without RDS proxy (ex. a k8s service that is managing its own connection pool) this setting should be lower. Higher values means the lambda function will perform better when traffic spikes because there are more available connections to handle that load (ex. someone opens a webmap and starts panning around). But you obviously don't want to exceed the number of available RDS connections by setting this too high.
- I've found that aggressively limiting MaxIdleConnectionsPercent helps lambda handle bursts in traffic as well and minimize stale connections. I've been setting this to 10 and that seems to work pretty well.
The IdleClientTimeout config option determines how long a client (the lambda) can be idle before the proxy closes that connection. If the database pool IS cached across lambda invocations this should be set to a larger value to improve connection pooling / multiplexing across multiple invocations of your lambda. If the database pool IS NOT cached across lambda invocations this setting should be set to the lambda function timeout .

And client side there are a few things that help too:

Set min_size of the asyncpg database pool to 1 so the lambda function only checks out a single connection when it is invoked. This is particularly important if the database pool IS NOT cached across invocations.
Set max_inactive_connection_lifetime of the asyncpg database pool to the same value of IdleClientTimeout or even slightly longer (depending if you want the client to terminate the connection or the server - rds proxy - to do that).
Set maximum concurrency of your lambda function such that (max_concurrency * asyncpg.max_size) < (MaxConnectionsPercent * RDSAvailableConnections). Although you have to balance scaling horizontally (more lambdas w/ smaller connection pool) vs. scaling vertically (less lambdas w/ bigger connection pool), for tile servers in general I imagine its better to scale horizontally.

The most important thing for performance is reusing the connection pool across lambda invocations. Keep in mind that although lambdas are accessing the database through RDS Proxy the client code (asyncpg) still has to manage its own connections to the proxy. RDS proxy loses a lot of its value if clients aren't reusing connections to some extent.

1 reply

geospatial-jeff Nov 17, 2022

Last thing I'll mention is that RDS proxy is a lot easier to get right when it is pointing to a larger RDS database w/ higher number of available connections. If you are running on a ~t3 with ~80 available connections you really don't have a lot of wiggle room.

dvd3v · 2022-11-18T08:47:52Z

dvd3v
Nov 18, 2022
Author

Thanks @vincentsarago and @geospatial-jeff, this is super helpful! 🙏

Good to understand why connections are not cached due to the Mangum bug. This also explains why RDS proxy was still opening a lot of connections. Although I'm running a dedicated TiMVT RDS, a t3 instances was indeed very easy to overload. The Aurora cluster performs much better, especially with MaxConnectionsPercent set to 90-100.

Until there is a proper way to cache the database pool across Lambda invocations I'm going to change the RDS proxy settings to aggressively drop idle connections using the MaxIdleConnectionsPercent and IdleClientTimeout parameters. Will report back with some results after testing.

2 replies

vincentsarago Nov 18, 2022
Maintainer

@dvd3v also make sure to set db_max_conn_size to 1

timvt/timvt/settings.py

Lines 121 to 124 in 6e0b627

    
           db_min_conn_size: int = 1 
        
           db_max_conn_size: int = 10 
        
           db_max_queries: int = 50000 
        
           db_max_inactive_conn_lifetime: float = 300

developmentseed/eoAPI#47

dvd3v Nov 30, 2022
Author

Just to confirm; using these new settings improved performance significantly! Only catch is that I had my Lambda timeout set to 29 seconds (as this is the AWS API Gateway integration timeout is 29 seconds), but the IdleClientTimeout has a minimum setting of 60 seconds. So you have to either increase the Lambda timeout or set the max_inactive_connection_lifetime of the asyncpg database to a lower value.

Until mangum can cache database connections across lambda invocations, I think these are the best settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running TiMVT in a serverless setup and managing database connections #104

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Running TiMVT in a serverless setup and managing database connections #104

dvd3v Nov 17, 2022

Replies: 3 comments · 3 replies

vincentsarago Nov 17, 2022 Maintainer

geospatial-jeff Nov 17, 2022

geospatial-jeff Nov 17, 2022

dvd3v Nov 18, 2022 Author

vincentsarago Nov 18, 2022 Maintainer

dvd3v Nov 30, 2022 Author

dvd3v
Nov 17, 2022

Replies: 3 comments 3 replies

vincentsarago
Nov 17, 2022
Maintainer

geospatial-jeff
Nov 17, 2022

dvd3v
Nov 18, 2022
Author

vincentsarago Nov 18, 2022
Maintainer

dvd3v Nov 30, 2022
Author