Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry when DatabaseName is wrong #2188

Closed
Takuya-Soda opened this issue Aug 9, 2023 · 17 comments · Fixed by #2212, #2247 or #2293
Closed

Retry when DatabaseName is wrong #2188

Takuya-Soda opened this issue Aug 9, 2023 · 17 comments · Fixed by #2212, #2247 or #2293
Assignees
Labels
Question Used when a question is asked, as opposed to an issue being raised

Comments

@Takuya-Soda
Copy link

Driver version

12.4.0

SQL Server version

Microsoft SQL Server 2022 (RTM-GDR) (KB5021522) - 16.0.1050.5 (X64) Jan 23 2023 17:02:42 Copyright (C) 2022 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2022 Standard 10.0 (Build 20348: ) (Hypervisor)

Client Operating System

Windows 10 22H2

JAVA/JVM version

AdoptOpenJDK jdk-8.0.292.10-hotspot

Table schema

(no schema)

Problem description

If you try to connect to SQL Server with the following URL, it will retry for 30 seconds.
(invalid_database_name is DatabaseName that does not exist)

jdbc:sqlserver://localhost:1433;DatabaseName=invalid_database_name;responseBuffering=adaptive

Expected behavior

Do not retry if DatabaseName is wrong.
(like mssql-jdbc-8.4.1.jre8.jar)

Actual behavior

If DatabaseName is wrong, retry until loginTimeout.

Error message/stack trace

このログインで要求されたデータベース "invalid_database_name" を開けません。ログインに失敗しました。 ClientConnectionId:***
com.microsoft.sqlserver.jdbc.SQLServerException: このログインで要求されたデータベース "invalid_database_name" を開けません。ログインに失敗しました。 ClientConnectionId:***
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:283)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:129)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5233)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3988)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:85)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3932)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7375)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3206)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2713)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2362)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2213)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1276)
	at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:861)
        ......

Any other details that can be helpful

This problem appears to have occurred in #1669.
Specifically, I suspect this code deletion is the culprit. ac78f50#diff-5a8fd0ea202eb55761d2a24d0619f2f2bd602ebbfe8e005bf4d112cc7768d840L2722

@Jeffery-Wasty
Copy link
Contributor

Hi @Takuya-Soda,

loginTimeout is the number of seconds the driver should wait before timing out a failed connection. Because this isn't set in the connection string, the default is 30 seconds.
connectRetryCount is the number of reconnection attempts if there's a connection failure. Because this isn't set in the connection string, the default is a single (1) retry.

So, based on your settings, the driver will attempt to connect, attempt to reconnect once, and then, after 30 seconds, time out.

You said "If you try to connect to SQL Server with the following URL, it will retry for 30 seconds.". There should only be a single retry, the driver should not be continually retrying for the entire 30 seconds. If you enable driver tracing, you should see a single connection and a single reconnection attempt. The reason it takes 30 seconds to return an error is because of the default 30 seconds for loginTimeout.

You said "Expected behavior - Do not retry if DatabaseName is wrong.". You can disable retry by setting connectRetryCount to 0, but again, the default value for this property is 1, a single retry.

For the moment, I don't see an issue with how retry and loginTimeout is set up, but please let us know if we missed something regarding your issue.

@lilgreenbird
Copy link
Contributor

hi @Takuya-Soda

The reason the driver retries on an invalid database name is because that is considered a transient fault error. There is no way for the driver to know that the database name is invalid it could simply be busy or other valid reasons and therefore resolvable. As Jeff mentioned if you do not want any retries at all then you can set connectRetryCount to 0 otherwise the driver will always retry in this scenario.

@lilgreenbird lilgreenbird added the Question Used when a question is asked, as opposed to an issue being raised label Aug 9, 2023
@Takuya-Soda
Copy link
Author

Takuya-Soda commented Aug 10, 2023

HI @lilgreenbird

The reason the driver retries on an invalid database name is because that is considered a transient fault error. There is no way for the driver to know that the database name is invalid it could simply be busy or other valid reasons and therefore resolvable.

I thought the SQL Server error code 4060 only happened when the DatabaseName was wrong. Does this error occur even when SQL Server is busy?

As Jeff mentioned if you do not want any retries at all then you can set connectRetryCount to 0 otherwise the driver will always retry in this scenario.

I tried it and it didn't work. I tried the following URL, but it will retry for 30 seconds.

jdbc:sqlserver://localhost:1433;DatabaseName=invalid_database_name;responseBuffering=adaptive;connectRetryCount=0

@Takuya-Soda
Copy link
Author

I think this is a bug. The reason is that only the following comments remain and the actual code has been deleted.

// for non-dbmirroring cases, do not retry after tcp socket connection succeeds

@Takuya-Soda
Copy link
Author

Looking at the link below, it seems to return a different error code if it's busy.
(This article is about Azure SQL Database, but since the code base is the same, the error conditions should be similar.)

https://learn.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-errors-issues?view=azuresql-db

@Takuya-Soda
Copy link
Author

Hi @Jeffery-Wasty

loginTimeout is the number of seconds the driver should wait before timing out a failed connection. Because this isn't set in the connection string, the default is 30 seconds. connectRetryCount is the number of reconnection attempts if there's a connection failure. Because this isn't set in the connection string, the default is a single (1) retry.
So, based on your settings, the driver will attempt to connect, attempt to reconnect once, and then, after 30 seconds, time out.

I would have liked that to be the case, but there are actually 31 retries.
(I checked the log of SQL Server.)
By the way, this number depends on the version of the driver, and is shown in the table below.

Driver Retry
mssql-jdbc-8.4.1.jre8.jar 0
mssql-jdbc-9.4.1.jre8.jar 1
mssql-jdbc-10.2.3.jre8.jar 17
mssql-jdbc-11.2.3.jre8.jar 31
mssql-jdbc-12.4.1.jre8.jar 31

This bug entered between 9.4.1 and 10.2.3. And in 11.2.3, the initial value of loginTimeout has changed, and I think it has increased further.

You said "If you try to connect to SQL Server with the following URL, it will retry for 30 seconds.". There should only be a single retry, the driver should not be continually retrying for the entire 30 seconds. If you enable driver tracing, you should see a single connection and a single reconnection attempt. The reason it takes 30 seconds to return an error is because of the default 30 seconds for loginTimeout.
You said "Expected behavior - Do not retry if DatabaseName is wrong.". You can disable retry by setting connectRetryCount to 0, but again, the default value for this property is 1, a single retry.

Hmmm, sure. If anything, it might be more of a problem that the retry is repeated uselessly.

@Jeffery-Wasty
Copy link
Contributor

Jeffery-Wasty commented Aug 10, 2023

Hi @Takuya-Soda,

The results from the SQL Server log are interesting. I tried reproducing this on my end after your initial post, and found the driver was doing retries that were consistent with the value of connectRetryCount. So for now I can't reproduce the issue, but we'll look into our code changes surrounding version 10.2.3 (including the code you've highlighted for us) and see if there is something that can explain this behavior.

I see you tried connectRetryCount=0, but it still retried for 30 seconds. Was there any difference in the number of retries you saw?

In the meantime, can you provide us the trace logs I mentioned above? These will really help in narrowing down the issue. You can either attach them to this thread or email us at [email protected].

@Jeffery-Wasty
Copy link
Contributor

Hi @Takuya-Soda,

I've reproduced the issue you are seeing. For a single connection attempt, you will see multiple login attempts, about 1 per second, which gives you the 17 or 31 you were seeing this. This is because for transient errors we continually retry login until loginTimeout is reached (or the error is no longer seen). It's an interesting thought whether this behavior should be modified, but for now I would recommend the following actions:

  • If you set loginTimeout to a low value (try 1 or 2), then you should see a low number of login attempts. I don't think its possible to limit it to exactly 1, but you can at least reduce it from 31.
  • We're currently working on a feature, where, as part of the feature, you can disable retry on current transient errors. This would allow you to set the driver to automatically fail on 4060, without any login retry. This feature is still in development, but we will update this thread with progress, including potential release windows.

Let us know if these steps work for you, and whether you have any further questions.

@Takuya-Soda
Copy link
Author

@Jeffery-Wasty

I see you tried connectRetryCount=0, but it still retried for 30 seconds. Was there any difference in the number of retries you saw?

I tried a pattern with no value in connectRetryCount and a pattern with 0 and 1, about 5 times each, but all 30 retries were performed.

In the meantime, can you provide us the trace logs I mentioned above? These will really help in narrowing down the issue. You can either attach them to this thread or email us at [email protected].

I'll have a chat with the team. However, today is a public holiday in Japan, so I don't think I can provide the log right away.

@Takuya-Soda
Copy link
Author

Thanks @Jeffery-Wasty

I am very happy that the issue has been reproduced.

I've reproduced the issue you are seeing. For a single connection attempt, you will see multiple login attempts, about 1 per second, which gives you the 17 or 31 you were seeing this. This is because for transient errors we continually retry login until loginTimeout is reached (or the error is no longer seen). It's an interesting thought whether this behavior should be modified, but for now I would recommend the following actions:

  • If you set loginTimeout to a low value (try 1 or 2), then you should see a low number of login attempts. I don't think its possible to limit it to exactly 1, but you can at least reduce it from 31.
  • We're currently working on a feature, where, as part of the feature, you can disable retry on current transient errors. This would allow you to set the driver to automatically fail on 4060, without any login retry. This feature is still in development, but we will update this thread with progress, including potential release windows.

Let us know if these steps work for you, and whether you have any further questions.

Certainly we can avoid this problem by reducing the loginTimeout.
Since this issue is only in test code, changing the loginTimeout is fine.

I created this issue because a JDBC update suddenly changed the behavior, and it was due to a weird code fix.
Therefore, it would be great if the mssql-jdbc development team could receive this issue and decide whether or not to fix it.

@Takuya-Soda
Copy link
Author

Sorry, @Jeffery-Wasty

Do you still need the logs? If you can reproduce it, I don't think you need it.
(Also, it will take some time for us to provide it.)

@Jeffery-Wasty
Copy link
Contributor

Yes, thank you for bringing this to our attention, and yes we will eventually have a way to avoid this behavior. No need to provide further logs.

Can I close this issue? I'll update the issue again when the above mentioned feature is released.

@Takuya-Soda
Copy link
Author

Okay, thank you for confirming such a small issue. Please close this issue.

@Jeffery-Wasty
Copy link
Contributor

We're currently looking into making changes to retry logic so that connectRetryCount and connectRetryInterval are always respected.

@Jeffery-Wasty
Copy link
Contributor

PR #2212 will make the above change.

@David-Engel
Copy link
Collaborator

With #2230 (reverts #2212), I believe this needs to be reopened until a new fix is found. Feel free to close if I'm mistaken.

@Jeffery-Wasty
Copy link
Contributor

Retry logic was rewritten in PR 2247, which will be available in driver release 12.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Used when a question is asked, as opposed to an issue being raised
Projects
Status: Closed Issues
4 participants