Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add session property for jdbc metadata, pool size, caching enabled #23844

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

konjac-h
Copy link
Contributor

Description

This pull request enables the control of JDBC in-memory metadata caching through the use of a session property. Additionally, it also provides the capability to manage the thread pool size.

Motivation and Context

Currently, Presto does not have the capability to control the JDBC metadata thread pool size and caching for certain sources. This pull request aims to address this limitation by making these features available in Presto.

Impact

as above

Test Plan

build & deploy to verifier that the change will be picked up by inspecting session property at query level

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@konjac-h konjac-h requested a review from a team as a code owner October 16, 2024 11:52
@konjac-h konjac-h force-pushed the add-jdbc-metadata-session-properties branch from 8d0682c to 28ee5b2 Compare October 16, 2024 12:08
@@ -48,7 +49,7 @@ public class JdbcMetadataCache
public JdbcMetadataCache(JdbcClient jdbcClient, JdbcMetadataConfig config, JdbcMetadataCacheStats stats)
{
this(
newCachedThreadPool(daemonThreadsNamed("jdbc-metadata-cache" + "-%s")),
newFixedThreadPool(config.getMetadataCacheThreadPoolSize(), daemonThreadsNamed("jdbc-metadata-cache" + "-%s")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread pool created with the newCachedThreadPool will spin up threads on demand for the asyncReloading to occur. Why is a fixed size thread pool better here ?

Copy link
Contributor

@NikhilCollooru NikhilCollooru Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he is trying to limit the numbers of threads that can used for the refresh.

The following is better. its still a cache thread pool, but we can limit it
new BoundedExecutor(newCachedThreadPool(daemonThreadsNamed("jdbc-metadata-cache" + "-%s")), config.getMetadataCacheThreadPoolSize()),

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

implements JdbcSessionPropertiesProvider
{
private final List<PropertyMetadata<?>> sessionProperties;
private static final String JDBC_METADATA_CACHE_ENABLED = "jdbc_metadata_cache_enabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the cache is always created, maybe this session property is better named - use_jdbc_metadata_cache ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. thanks

@steveburnett
Copy link
Contributor

Consider adding documentation for this new session property.

@tdcmeehan tdcmeehan self-assigned this Oct 16, 2024
@@ -48,7 +49,7 @@ public class JdbcMetadataCache
public JdbcMetadataCache(JdbcClient jdbcClient, JdbcMetadataConfig config, JdbcMetadataCacheStats stats)
{
this(
newCachedThreadPool(daemonThreadsNamed("jdbc-metadata-cache" + "-%s")),
newFixedThreadPool(config.getMetadataCacheThreadPoolSize(), daemonThreadsNamed("jdbc-metadata-cache" + "-%s")),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

return this;
}

public boolean getJdbcMetadataCacheEnabled()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isJdbcMetadataCacheEnabled()

@@ -45,13 +47,17 @@ public void testExplicitPropertyMappings()
.put("metadata-cache-ttl", "1h")
.put("metadata-cache-refresh-interval", "10s")
.put("metadata-cache-maximum-size", "100")
.put("metadata-cache-thread-pool-size", "50")
.put("metadata-jdbc-cache-enabled", "true")
Copy link

@shangm2 shangm2 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to test another case where this config is not in the property and False should be expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is done in "testDefaults" test above

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to test the branch where "session.getProperty(JDBC_METADATA_CACHE_ENABLED, Boolean.class)" throws an exception? Is there a way to generate the coverage report like JaCoCo?

@konjac-h konjac-h force-pushed the add-jdbc-metadata-session-properties branch from 28ee5b2 to 7101209 Compare October 24, 2024 18:48
@konjac-h konjac-h force-pushed the add-jdbc-metadata-session-properties branch from 7101209 to 15fdc33 Compare October 24, 2024 18:54
@@ -29,6 +29,8 @@ public class JdbcMetadataConfig
private Duration metadataCacheTtl = new Duration(0, TimeUnit.SECONDS);
private Duration metadataCacheRefreshInterval = new Duration(0, TimeUnit.SECONDS);
private long metadataCacheMaximumSize = 10000;
private int metadataCacheThreadPoolSize = 10;
private boolean jdbcMetadataCacheEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also change the underlying feature flag methods and this to use names similar to useJdbcMetadataCache ?

this.sessionProperties = ImmutableList.of(
booleanProperty(
USE_JDBC_METADATA_CACHE,
"Whether Jdbc Metadata In Memory Cache is Enabled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make this more descriptive - Cache the result of the JDBC queries that fetch metadata about tables and columns

}

@Min(1)
@Config("metadata-cache-thread-pool-size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value represents the max background fetch threads for refreshing metadata, which IMO we should document.

@steveburnett I don't think we document these JdbcMetadataConfig properties at all.
Maybe we should rework the documentation for the different connectors that use these configs (like Postgres, MySQL) to have a common place for this and other JDBC related properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaneja thanks for the suggestion! Maybe a good place for documenting these features would be a new section in Presto Session Properties, titled JDBC Properties?

Let me know what you think, especially if you have a better idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. A dedicated section for common JDBC properties that we link to from other JDBC connectors would be ideal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaneja!

Normally I would recommend that documentation for new configuration or session properties should be part of the PR. In this specific case that there is nowhere good to put them at this time, I've opened a new issue #23918 (doc PR to follow from that) to add a JDBC Properties topic to Presto Session Properties and populate it with these JdbcMetadataConfig properties to begin with. Other JDBC properties can be added to the new doc section in later PRs, and this code PR can move forward.

Sound reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants