Our primary goal was to build a scalable and efficient backend for a ticketing service that could handle high loads with minimal latency. The project focuses exclusively on the backend and infrastructure aspects, omitting a frontend interface to concentrate on the underlying mechanics and performance. This repository highlights the infrastructure conponents, illustrating our journey through building infrastructure, creating CI/CD pipelines, and managing containers.
Programming Language & Framework: We chose Kotlin and Spring Boot for their expressive syntax and the powerful suite of tools for building web applications efficiently.
Database: We utilized MySQL for its powerful database locking functions, which are essential in managing concurrent operations effectively within our ticketing system. This capability ensures data integrity and consistent performance under high-load scenarios.
Containerization & Orchestration: Kubernetes was used to manage our containerized applications, enabling easy scaling and management across multiple servers hosted on AWS.
Configuration Management: Helm charts helped us streamline the installation and management of our Kubernetes applications.
Continuous Deployment: ArgoCD was employed to automate the deployment process, ensuring our changes were seamlessly and reliably pushed to production.
Infrastructure as Code: Terraform allowed us to define our infrastructure using configuration files, which helped in maintaining consistency and ease of deployment across environments.
Performance Testing: We employed K6 to conduct spike tests, simulating scenarios with excessive simultaneous access to evaluate the performance and robustness of our system under extreme conditions.
Monitoring: We integrated Prometheus and Grafana to monitor our applications and infrastructure, ensuring high availability and performance through real-time insights.
In the course of developing our infrastructure, we tackled a range of infrastructure challenges and optimizations. Below are key resources and discussions that provide insights into our decision-making process and the solutions we implemented:
- Building the Deployment Environment with Terraform: This issue tracks our use of Terraform to automate the provisioning of our entire cloud environment, focusing on reliability and scalability.
- Migration from AWS ALB to Nginx Ingress (Baremetal): To reduce costs, we replaced AWS ALB with a more cost-effective Nginx Ingress setup on bare metal. This discussion details the reasons behind the change and the implementation process.
- Using Public Subnet Node Group to Address NAT Gateway Cost Issues: We opted to configure our EKS cluster using a public subnet node group to avoid high costs associated with NAT gateways.
- How to Scrape Metrics from Multiple Pods Using Spring Actuator: This article explains how we set up Prometheus, via Helm, to scrape metrics from multiple pods, enhancing our monitoring capabilities using Spring Boot’s Actuator.
- Injecting Secrets into EKS Pods Using Terraform: We explored methods to securely inject secrets into our Kubernetes pods using Terraform, ensuring sensitive data is managed safely and effectively.
- Queue System Design Issues: Discusses considerations for preventing update losses, implementing non-blocking APIs, and choosing data structures for the queue system.
- Project Package Structure Considerations: Deliberations on how to organize the project's package structure effectively.
- Convention Documentation: Defines conventions for branch naming, commit messages, HTTP response structures, serialization, testing, and more.
- API Enhancement Considerations: Detailed discussion on time conventions, data transfer between layers (errors, responses), logging best practices, and their implementation.
- Maintaining Over 80% Test Coverage with Jacoco and Codecov: Outlines strategies and efforts to maintain a high level of test coverage using Jacoco and Codecov.
- Integration Testing Environment with Testcontainers and MySQL Container: Describes the setup of an integrated testing environment using Testcontainers and a MySQL Docker container to enhance testing reliability and consistency.
- Considerations for Building the Performance Test Environment: A detailed discussion on the setup and challenges of creating a suitable environment for performance testing.
- Detailed Performance Test Scenarios: This link provides a thorough description of the performance test scenarios used in our project.
- Calculating Costs for Spike Testing Using ALB LCU: An analysis of cost implications when using AWS ALB Load Capacity Units (LCU) for spike testing.
- Creating K6 Performance Test Scripts: Discussion and documentation on how we developed K6 scripts for our performance testing.
- Database Setup for Test Data and Large-Scale Data Insertions: Outlines our approach to preparing the database for testing, including the creation of large datasets.
- Building a Monitoring Environment with Prometheus and Grafana: Details on how we configured Prometheus and Grafana to monitor our application and infrastructure during the performance tests.
- Improved Slow Queries by adding an index to a single-column with 1 million records.
- Observed changes in CPU performance due to encryption: increased CPU core count and observed changes in encryption difficulty based on encryption level adjustments.
- Observed changes in JVM CodeHeap and performance by repeating the same test after process creation.
- Improved performance of
SELECT COUNT(*)
on ten million records by implementing NoOffset. - Introduced a queue system after considering competition for locks on a single resource (=Event) and observed tests.
- Improved CPU resource usage by modifying thread pool strategy for thread creation.
- Improved Pending Connection by modifying DB Connection Pool strategy.
- Improved latency by applying Redis caching.
Junha Ahn | Hayoung Lim | Jeongseop Park | Minjun Kim |
---|---|---|---|
@junha-ahn | @hihahayoung | @ParkJeongseop | @minjun3021 |
Infrastructure (Leader) | Infrastructure | Backend | Backend |