Skip to content

Load Testing for each Microservice

Nirav Raje edited this page Mar 10, 2022 · 1 revision

Load testing of our system is being done to evaluate the performance of our system under real-life load conditions and to test the limits of our system.

We have implemented load testing for all our microservices via each exposed API route of our application.

These API routes are as follows:

  1. History Route (Registry and DB Service)
  2. Landing Route (Frontend Service)
  3. Login Route (Auth Service)
  4. Metadata Route (Metadata Service)
  5. Widget Route (Data Service)

For each microservice, we primarily observe the “Aggregate Graph” and “Response Time Graph” results to understand throughput and error rates of our load testing. This is analyzed after incrementally increasing replica count to 1, 3 and 5 instances per service in each service’s Kubernetes deployment configuration.

As needed for each service, we have included additional graphs to analyze the performance.

History (Registry + DB Service)

Replica Count 1

image

250,000 requests with 0% error rate and low response time
image
image

Replica Count 3

image

Load: 50,000 samples (2500 users with a loop count of 200)
Throughput increased to 501.7/sec with a low error rate of 0.04
image image image

Replica Count 5

image

Load balanced among 3 instances of each service as seen below.
image
image image

Landing (UI) Service

Replica Count 1

image image

Replica Count 3

image image

Replica Count 5

image image

Login (Auth + User Service)

Replica Count 1

image image

Replica Count 3

image image

Replica Count 5

image image

Metadata (Metadata Service)

Replica Count 1

image image

Replica Count 3

image image

Replica Count 5

image image

Widget (Data Service)

Replica Count 1

Tested our data service with replica count 1 and a load of 50 users requesting the graph. With 1 instance, we observed that the data service eventually (after 28 requests in this case). This occurs because the data service maxes out the RAM allocated to its pod (5000Mi in our case) and restarts, resulting in the rejection of subsequent requests to this pod.

image

Response time dropped as a result of pod restarting.

image
image

Replica Count 3

Increasing the replicas to 3 helped improve the throughput from 15.4/min to 20.2/min and decreased the error rate significantly from 44% to 6%. This is because 1 pod restarting does not make the service unavailable.

image
Response time remained approximately the same throughout execution.
image

Replica Count 5

image
image
image

Clone this wiki locally