Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC benchmarks for large data ~ 656K KB between python and java are way off #378

Closed
yawningphantom opened this issue Jul 20, 2023 · 5 comments

Comments

@yawningphantom
Copy link

yawningphantom commented Jul 20, 2023

Hello @LesnyRumcajs

I have a use case where I need to develop a service capable of handling large payload sizes ranging from 600KB to 1MB. I'm currently deciding between using Java or Python for this task. To make an informed decision, I conducted benchmarking tests using python_grpc_bench and java_quarkus_bench. Surprisingly, the results showed that the Python implementation performed well or even comparably to the Java implementation.

I ran the benchmarks on a Linux machine and while this doesn't seem to be a critical issue, I wanted to inquire if you have also experimented with larger payloads for these benchmarks and if you encountered any challenges or noteworthy observations. Your insights would be valuable in guiding my decision-making process. Thank you!

my machine configuration -

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

0                                             bus                 Motherboard
/0/0                                         memory         32GiB System memory
/0/1                                          processor      Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
/0/100                                     bridge             440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled)
/0/100/7                                  bridge             82371AB/EB/MB PIIX4 ISA
/0/100/7.1        scsi1                storage          82371AB/EB/MB PIIX4 IDE
/0/100/7.1/0.0.0  /dev/cdrom  disk               Virtual CD/ROM

Sample payload size ~ 24KB json file
Sample payload attached here -

Result

==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	218.12
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	215.17
-----
Benchmark finished. Detailed results are located in: results/231907T133929
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| python_grpc                 |     185 |         3.98 s |         6.61 s |         7.59 s |         8.80 s |   15.11% |     58.72 MiB |
| java_quarkus                |     183 |         4.06 s |        11.30 s |        14.57 s |        20.51 s |   41.71% |     169.9 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.
@LesnyRumcajs
Copy link
Owner

Hey @yawningphantom
Using 1 CPU for both the server and the client is too low. It's good for checking if the service works at all.

Looking at your specs, I recommend setting GRPC_CLIENT_CPUS to perhaps 5. Moreover, Java tends to perform significantly better when on at least 2 CPUs. I'm not sure your CPU is powerful enough to do such test, though. In general, I tend to use at least 3 cores of client per 1 core of server, but you may try. See https://github.com/LesnyRumcajs/grpc_bench/blob/master/example_benchmark.sh

Does it help?

@yawningphantom
Copy link
Author

Hey @LesnyRumcajs
Thanks for the reply, I had ran the benchmark for both where only GRPC_CLIENT_CPUS=5 and one where both client and server cpu's are 5. And still did not see any major difference between python and java which seems a bit weird.
Not sure why is that the case.

==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 10s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	9.98
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 10s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	9.98
-----
Benchmark finished. Detailed results are located in: results/231907T051744
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| java_quarkus                |      10 |       67.29 ms |       80.64 ms |       99.35 ms |      134.18 ms |   24.43% |     148.6 MiB |
| python_grpc                 |      10 |       53.15 ms |       61.81 ms |       67.81 ms |       77.80 ms |    7.58% |     32.92 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=60s
- GRPC_BENCHMARK_WARMUP=10s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=5120m
- GRPC_CLIENT_CONNECTIONS=10
- GRPC_CLIENT_CONCURRENCY=20
- GRPC_CLIENT_QPS=10
- GRPC_CLIENT_CPUS=5
- GRPC_REQUEST_SCENARIO=complex_proto
==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	0.95
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
		done.
		Results:
		    Requests/sec:	0.95
-----
Benchmark finished. Detailed results are located in: results/231907T051324
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| java_quarkus                |       1 |       64.77 ms |       82.63 ms |       85.83 ms |       85.83 ms |    3.89% |    145.53 MiB |
| python_grpc                 |       1 |       62.92 ms |       76.76 ms |      114.02 ms |      114.02 ms |    1.54% |     15.75 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=5
- GRPC_SERVER_RAM=5120m
- GRPC_CLIENT_CONNECTIONS=10
- GRPC_CLIENT_CONCURRENCY=20
- GRPC_CLIENT_QPS=1
- GRPC_CLIENT_CPUS=5
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.

@LesnyRumcajs
Copy link
Owner

@yawningphantom You imposed rather harsh limits on the client.GRPC_CLIENT_QPS=1 means that the client will send only one request per second, which you can confirm in the results. If that or 10 is your expected load then it may be that Python is a better choice for you. Python will definitely be outperformed with higher loads, though, as you can confirm in the results

@gcnyin
Copy link
Contributor

gcnyin commented Aug 7, 2023

Could you provide the test cases? Then I can run it again to verify.

@LesnyRumcajs
Copy link
Owner

Stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants