Ticket Master is a high-performance ticket reservation system capable of processing 1,000,000 reservations within 12 seconds.
The system is built on Kafka Streams, providing:
- Stateful Stream Processing: Utilizes RocksDB as the state backend for efficient state read/write operations.
- Exactly-Once Processing Semantics: Ensures data consistency even in the presence of failures.
- Horizontal Scalability: Scales seamlessly with the number of Kafka topic partitions.
- State Querying: Supports Interactive Queries to directly access the application’s local state store.
I published three Medium stories to introduce this system:
- Part 1: Dataflow Architecture
- Part 2: Data-Driven Optimizations
- Part 3: Infra, Observability, Load Test
The system adopts Dataflow architecture, originally introduced in Designing Data-Intensive Applications, consisting of:
- Ticket Service: Acts as the API gateway, handling users’ HTTP requests and forwarding reservation requests to the stream processing system.
- Reservation Service: Kafka Streams application that manages the reservation state.
- Event Service: Kafka Streams application that manages the event and section availability state.
- Powered by OpenTelemetry Java Agent.
- Collected via OTLP Collector and exported to Google Cloud Trace.
Logs are written to standard output and collected using GKE's native logging support.
- Tag Push to GitHub
- Trigger Cloud Build for Testing
Cloud Build is triggered by the tag push. It runs unit and integration tests using:mvn test - Build and Push Docker Image
If all tests pass, Cloud Build builds a Docker image using the Git tag as the image version, then pushes it to Artifact Registry.
- (Optional) Create or update the Kubernetes overlay in
deployment/k8s-configs/overlays(example) - (Optional) Overwrite application config under the newly created directory.
- Run:
make deploy -e PARTITIONS_COUNT=40 -e PERF_TYPE=40-instance-perf
PARTITIONS_COUNT: Number of partitions for Kafka topics.PERF_TYPE: Name of overlay folder used in deployment.
make destroy -e PERF_TYPE=40-instance-perf
PERF_TYPE: Name of overlay folder used in deployment.
kubectl get gateway
NAME CLASS ADDRESS PROGRAMMED AGE
external-http gke-l7-regional-external-managed 35.206.193.99 True 14m
internal-http gke-l7-rilb 10.140.0.41 True 14mYou can run a load test from:
- The Local machine sends requests to the
external-httpIP address. - The Google Compute Engine within the same VPC, send requests to the
internal-httpIP address.
The objective of the smoke test is to
- Verify that the setup is free of basic configuration or runtime errors.
- Allow the system to initialize and establish connections with Kafka and the Schema Registry.
# under scripts/perf/k6/ directory.
k6 run smoke.js -e HOST_PORT=[IP_ADDRESS] -e NUM_OF_AREAS=40
HOST_PORT: IP address of ticket service(gateway address in kubernetes deployment).NUM_OF_AREAS: Number of areas for each event.
The objective of the stress test is to
- See the performance under high traffic over a specific duration.
- Warm up the components for the spike test.
# under scripts/perf/k6/ directory.
k6 run stress.js -e HOST_PORT=[IP_ADDRESS] -e NUM_OF_AREAS=40
HOST_PORT: IP address of ticket service(gateway address in kubernetes deployment).NUM_OF_AREAS: Number of areas for each event.
Spike testing is critical for ticketing systems, as traffic typically surges immediately after ticket sales begin.
# under scripts/perf/go-client directory.
go run main.go --host [IP_ADDRESS] -a 100 -env prod --http2 -n 250000 -c 4
--host: IP address of ticket service(gateway address in kubernetes deployment).-a: number of areas for this event.--env:prodwould dismiss the logging.--http2: If present, would send traffic using HTTP/2.-n: number of concurrent requests.-c: number of HTTP clients. It aims to solve lock contention in high concurrency scenarios.
- Get the pod name by
kubectl get pods. - Enter the pod by
kubectl exec --stdin --tty [POD_NAME] -- /bin/bash - Inside the pod:
- Download java jdk:
wget https://download.oracle.com/java/24/latest/jdk-24_linux-x64_bin.deb dpkg -i jdk-24_linux-x64_bin.deb- Start profiling the application with the following command:
jcmd 1 JFR.start duration=60s filename=/tmp/recording.jfr settings=/usr/lib/jvm/jdk-24.0.1-oracle-x64/lib/jfr/profile.jfc - Download the recording file from the pod:
kubectl cp [POD_NAME]:/tmp/recording.jfr recording.jfr --retries 999
- Open the JFR recording with JDK Mission Control
- Run spike test with the following flags:
--cpuprofile file, --cpu file write cpu profile to file
--memprofile file, --mem file write memory profile to file
--blockprofile file, --block file write block profile to file
--lockprofile file, --lock file write lock profile to file
- Visualize profiles:
pprof -web [PROFILE_FILE_PATH]
- Docker Desktop
- Java
- Opentelemetry Java agent: The following examples put the agent under
otel/directory.
docker compose up -d
This would start
- Kafka(KRaft mode)
- Schema Registry: RESTful interface for storing and retrieving Avro schemas.
- Jaeger: Distributed tracing observability platforms.
- Kafdrop: Kafka Web UI for viewing Kafka topics and browsing consumer groups.
- Applications:
- Ticket Service
- Reservation Service
- Event Service
./mvnw testThis command runs both unit and integration tests. For local load test, see Load Test.
- Add or Update
.avrofiles under ./src/main/resources/avro - Run
./mvnw generate-sourcesto generate the corresponding Java classes.
The following properties can be configured by setting environment variables or via the -D flag
OTEL_EXPORTER_OTLP_ENDPOINT: The Jaeger endpoint.OTEL_SERVICE_NAME: The service name included in the spans.OTEL_TRACES_SAMPLER: The sampler described here.OTEL_TRACES_SAMPLER_ARG: Sampling rate described here.
-XX:+UseZGC -XX:+ZGenerational -Xmx2G -Xms2G -XX:+AlwaysPreTouch
We recommend using the Z Garbage Collector to minimize pause times and ensure low latency.
-XX:+UseZGC -XX:+ZGenerational: Configure JVM to use ZGC.-Xmx2G -Xms2G: Setting the same value to reduce time for memory allocation.-XX:+AlwaysPreTouch: Page in memory before the application starts.
./mvnw clean packageUse maven-shade-plugin to build an uber-jar.
java -javaagent:./otel/opentelemetry-javaagent.jar \
-Dotel.service.name=ticket-service \
-cp target/ticket-master-1.0-SNAPSHOT-shaded.jar \
lab.tall15421542.app.ticket.Service -p 8080 -d ./tmp/ticket-service/ -n 0 \
-c appConfig/client.dev.properties \
-pc appConfig/ticket-service/producer.properties \
-sc appConfig/ticket-service/stream.properties \
-r
-n: The maximum of virtual threads used by Jetty.0means unlimited.-p: The HTTP port of the ticket service.-d: Directory path for storing state.-c: Config file path for Kafka and schema registry connectivity properties.-pc: Config file path for Kafka producer properties.-sc: Config file path for Kafka Streams properties.-r: If present, enable the request log.-a: Specify the number of Jetty acceptors.-s: Specify the number of Jetty selectors.
java -javaagent:./otel/opentelemetry-javaagent.jar \
-Dotel.service.name=reservation-service \
-cp target/ticket-master-1.0-SNAPSHOT-shaded.jar \
lab.tall15421542.app.reservation.Service \
-c appConfig/client.dev.properties \
-sc appConfig/reservation-service/stream.properties \
-d ./tmp/reservation-service
-c: Config file path for Kafka and schema registry connectivity properties.-sc: Config file path for Kafka Streams properties.-d: Directory path for storing state.
java -javaagent:./otel/opentelemetry-javaagent.jar \
-Dotel.service.name=event-service \
-cp target/ticket-master-1.0-SNAPSHOT-shaded.jar \
lab.tall15421542.app.event.Service \
-c appConfig/client.dev.properties \
-sc appConfig/event-service/stream.properties \
-d ./tmp/event-service
-c: Config file path for Kafka and schema registry connectivity properties.-sc: Config file path for Kafka Streams properties.-d: Directory path for storing state.
