Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Retry all RPC commands to fix MacOS CI jobs #5171

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Oct 30, 2024

High Level Overview of Change

Retry all failed RPC connections / commands in unit tests.

Context of Change

Follow up to #5120 (23991c9), which added a retry for submit commands. It improved MacOS test reliability, but other tests are failing now.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.9%. Comparing base (838978b) to head (34d918a).

Files with missing lines Patch % Lines
include/xrpl/server/detail/Door.h 0.0% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #5171     +/-   ##
=========================================
- Coverage     77.9%   77.9%   -0.0%     
=========================================
  Files          782     782             
  Lines        66621   66619      -2     
  Branches      8161    8158      -3     
=========================================
- Hits         51902   51873     -29     
- Misses       14719   14746     +27     
Files with missing lines Coverage Δ
include/xrpl/server/detail/Door.h 79.3% <0.0%> (ø)

... and 7 files with indirect coverage changes

Impacted file tree graph

* Follow up to XRPLF#5120 (23991c9), which added a retry for submit commands.
  It improved MacOS test reliability, but other tests are failing now.
* Retry all failed RPC connections / commands in unit tests.
@ximinez ximinez changed the title DRAFT: test: Retry all RPC commands to fix MacOS CI jobs test: Retry all RPC commands to fix MacOS CI jobs Oct 31, 2024
@ximinez ximinez marked this pull request as ready for review October 31, 2024 22:11
@ximinez ximinez added Tech Debt Non-urgent improvements Perf impact not expected Change is not expected to improve nor harm performance. labels Oct 31, 2024
* upstream/develop:
  Add hubs.xrpkuwait.com to bootstrap (5169)
* upstream/develop:
  Add AMMClawback Transaction (XLS-0073d) (5142)
* upstream/develop:
  Fix unity build (5179)
@@ -292,6 +292,17 @@ class Env
The command is examined and used to build
the correct JSON as per the arguments.
*/
using rpcCallback = std::function<bool(Json::Value const&)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we capitalize the name (RpcCallback)?

Please add a comment like "Given the return value of an RPC, returns true if it is acceptable."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

Comment on lines 575 to 579
static auto const defaultCallback = [](Json::Value const& jr) {
auto const parsedResult = parseResult(jr, false);
return parsedResult.rpcCode != rpcINTERNAL;
};
cb = defaultCallback;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out of the function, into the definition of useDefaultCallback (either inline or extern), and rename useDefaultCallback to something like rejectInternalError? If you prefer to frame the predicate names as accept{Condition} instead of reject{Condition}, then I'd be fine with that too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

@@ -292,6 +292,17 @@ class Env
The command is examined and used to build
the correct JSON as per the arguments.
*/
using rpcCallback = std::function<bool(Json::Value const&)>;
const rpcCallback useDefaultCallback{};
const rpcCallback expectInternalRPCError = [](Json::Value const&) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to rejectNever and add a comment explaining that it is meant to be used for tests that expect an internal error and do not want to trigger a retry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

Comment on lines 301 to 304
getInternalFailureCallback(bool shouldFail)
{
return shouldFail ? expectInternalRPCError : useDefaultCallback;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to rejectInternalErrorIf?

@@ -514,7 +536,7 @@ class Env
/** Gets the TER result and `didApply` flag from a RPC Json result object.
*/
static ParsedResult
parseResult(Json::Value const& jr);
parseResult(Json::Value const& jr, bool checkTER);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what this parameter is supposed to be doing. Can we add an explanation in the docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more checks, and I was able to get rid of checkTER entirely. The intention was to flag whether the request was related to submitting a transaction, and thus a TER would be expected in the result. There is no TER in the result if there isn't a transaction.

* upstream/develop:
  Set version to 2.3.0-rc1
  Replace Uint192 with Hash192 in server_definitions response (5177)
  Fix potential deadlock (5124)
  Introduce Credentials support (XLS-70d): (5103)
  Fix token comparison in Payment (5172)
  Add fixAMMv1_2 amendment (5176)
* Rename the RpcCallback type, and the default variables of that type
* Move the "default callback" out of the retry logic, and into the variable
  definition.
* Get rid of the checkTER parameter to parseResult.
* upstream/develop:
  fix: include `index` in `server_definitions` RPC (5190)
  Fix ledger_entry crash on invalid credentials request (5189)
@ximinez
Copy link
Collaborator Author

ximinez commented Nov 13, 2024

@thejohnfreeman @vlntb I want to bring your attention to draft PR #5180. It builds on top of this PR, but adds three commits:

  1. The MacOS CI jobs have 4 configurations: The combinations of release vs debug, and unity vs non-unity.
  2. The Env tests cycle through a range of ports so that if the OS is slow to close a port when one test ends, the next test doesn't immediately try to open it. A general retry operation is added so that lower-level network issues can also be retried, not just failed RPC commands.
  3. The MacOS CI jobs use parallel runners! (It only uses half the available CPU threads, because using all was still not reliable, but it's better than nothing!)

I would like to roll those changes into this PR so we can get the MacOS jobs fixed all in one shot. Do you think that's a good idea, or should that be a separate PR to be merged separately sometime after this one?

* upstream/develop:
  Set version to 2.3.0-rc2
@thejohnfreeman
Copy link
Collaborator

I don't see any approvals in here yet, so I say roll them into this PR.

This was referenced Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Perf impact not expected Change is not expected to improve nor harm performance. Tech Debt Non-urgent improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants