How I Built a Hybrid Auto-Scaling Microservice Platform with Kubernetes, AWS Fargate, and CI/CD

I had started this project on Friday night, to be completed by the next morning. And here I was, six hours later, wide awake, staring at terminal windows as pods scaled up and down while my current Netflix rewatch played in the background. Below is the complete breakdown of my 6-hour DevOps journey: what I built, how much it cost, and what I learned as I worked through the night.

Desiree' Weston

12/1/20258 min read

Why Build This?

I’ve read enough AWS documentation and watched enough DevOps learning videos on YouTube to know what happens when architecture doesn’t scale. Monoliths slow down once traffic grows. They hide bottlenecks deep in tangled codebases. They require manual deployments that keep engineers up at night. They lack the visibility needed to catch problems before users do. You end up fighting fires instead of preventing them.

I wanted something better. Something that represented the kind of infrastructure modern companies actually run.

I wanted automatic scaling that responds to real demand, not guesswork. I wanted a self-healing infrastructure that recovers from failures without human intervention. I wanted hands-off deployments where code moves from commit to production without me SSH-ing into servers or clicking through console dashboards.

Most importantly, I wanted real monitoring. The kind that tells you what’s happening before your users start complaining. And I wanted all of this wrapped in a cloud-native design that could pass a production sniff test, not just a toy project that lives in a tutorial.

But there was another reason too. I wanted a portfolio project that shows I can think like an engineer, not just follow copy-paste steps from documentation. Something that demonstrates I understand the trade-offs, the failure modes, and the reasoning behind architectural decisions.

This project delivered on all of it

The Architecture

I built a hybrid setup that runs the same microservice in two different environments. This wasn’t just for the sake of complexity. It gave me hands-on experience with both on-premises orchestration and cloud-native managed services, showing how the same containerized application behaves differently depending on where it runs.

Kubernetes Homelab Minikube

On my local machine, I set up a Minikube cluster running a FastAPI microservice packaged as a Docker image. The deployment includes a Kubernetes Deployment manifest that defines the number of pod replicas, a Service that exposes the application internally, and a HorizontalPodAutoscaler that watches resource usage and adjusts the pod count accordingly.

To make autoscaling work, I installed the metrics server, which collects CPU and memory data from each pod and surfaces it to the HPA controller. This gave me a complete local environment where I could run load tests and watch the system scale in real time. The observability here is basic but effective. Kubectl commands show me pod status, resource consumption, and event logs that reveal exactly what Kubernetes is doing behind the scenes.

AWS: ECS Fargate + Application Load Balancer

On the AWS side, I deployed the same Docker image to ECS Fargate, which handles container orchestration without requiring me to manage any EC2 instances. Fargate is serverless in the sense that I define the task and service, and AWS handles the entire compute layer. I don’t patch servers. I don’t worry about instance types. And I don’t pay for idle capacity.

In front of the Fargate tasks sits an Application Load Balancer, which handles incoming HTTP traffic and routes requests to healthy containers. The ALB performs continuous health checks against the /health endpoint. If a task fails, it stops receiving traffic immediately while ECS spins up a replacement.

CloudWatch collects metrics, logs, and alarms. I configured autoscaling policies based on CPU utilization, so when the load increases, ECS automatically launches additional tasks. When traffic drops, it scales back down. The entire system is self-regulating, which is precisely what I wanted.

Complete CI/CD Pipeline

I wired up a GitHub Actions pipeline that triggers on every push to the main branch. The workflow is straightforward but powerful. It builds the Docker image, tags it with the commit SHA for traceability, pushes it to Docker Hub, registers an updated ECS task definition with the new image tag, and triggers a rolling deployment.

ECS handles deployment itself using a blue-green strategy: new tasks spin up, pass health checks, and only then does the load balancer shift traffic away from the old version. There’s zero downtime. I don’t manually deploy anything. The pipeline does it all.

Infrastructure as Code with Terraform

Everything on AWS is defined in Terraform. That means the ECS cluster, task definitions, services, the Application Load Balancer with its target groups, IAM roles and policies, autoscaling rules, CloudWatch log groups, and security groups all live in version-controlled .tf files.

This matters because it makes the infrastructure reproducible. If I destroy everything and terraform apply again, I get an identical environment. No clicking through the AWS console, trying to remember what settings I used last time. Everything is explicit, auditable, and portable.

How It Works

Step 1: Building the Microservice

The FastAPI service itself is intentionally simple. It’s not meant to be feature-rich. It’s intended to be a reliable test bed for infrastructure. The core of the application is a single health check endpoint:

@app.get("/health")

def health():

return {"status": "ok", "version": "2"}

This single endpoint becomes the heartbeat for both Kubernetes and ECS; each orchestrator repeatedly hits this path to check whether a container is healthy. If it stops responding or returns an error, it assumes something is wrong and takes action, either restarting the pod or replacing the task.

Step 2: Running it on Kubernetes

I exposed the service to Minikube using three fundamental manifests: a Deployment to manage the lifecycle of pods, a Service to expose pods within a cluster, and a HorizontalPodAutoscaler to monitor CPU utilization and adjust the number of replicas accordingly.

After these configurations were set, load tests with hey and ab were run to simulate traffic spikes. One of the most satisfying moments of this project was watching the HPA go to work: the pod count started at two replicas, and when CPU usage climbed past the threshold, Kubernetes cranked out more pods. First three, then four, then six. When the load subsided, it scaled back down gracefully.

Seeing this happen in real time made the system feel alive — it wasn’t just sitting there. It responded to conditions, made decisions, and adapted all without any manual intervention.

Step 3: Building the AWS Infrastructure using Terraform

I wrote Terraform modules to define each piece of the AWS architecture. The ECS cluster provides a logical grouping for tasks. The Fargate task definition includes the container image, CPU and memory limits, environment variables, and logging configuration.

The ECS service manages how many tasks run at any given time, with a minimum of 2 and a maximum of 4. The Application Load Balancer sits in front of everything, listening on port 80 and forwarding traffic to healthy tasks on port 8000. I configured listener rules, target group health checks, and deregistration delays to ensure smooth traffic flow even during deployments.

The Scaling policies instruct ECS to add tasks when the CPU is above 70% and to remove tasks when the CPU falls below 30%. The CloudWatch log groups capture the stdout and stderr of each container. The IAM roles give the tasks permission to write logs and pull images from Docker Hub.

Security groups restrict the inbound traffic to only what is necessary. Port 80 on the ALB from the internet, port 8000 on the tasks from the ALB only; thus, nothing else can reach the containers.

This is great because when it’s time to tear everything down, I run terraform destroy, and Terraform removes every resource it created, in the correct dependency order. No orphaned load balancers. No forgotten security groups. Clean slate.

Step 4: CI/CD Setup

Every push to the main branch starts the GitHub Actions workflow. The pipeline checks out the code, logs in with Docker Hub, builds an image with a tag based on the Git commit SHA, and pushes it to the registry.

Then it updates the ECS task definition JSON file to reference the new image tag, registers the updated definition with ECS, and tells the ECS service to deploy it. ECS manages the rollout. It launches new tasks running the updated image, waits for them to pass health checks, then drains connections from the old tasks before terminating them.

Testing this, I changed the health check response version number from “version”: “1” to “version”: “2”, committed, and pushed to GitHub. A minute later, I hit the ALB endpoint:

curl http://<alb-dns>/health

It returned:

{"status":"ok","version":"2"}

No manual deploy, no SSH, no clicking through dashboards. The pipeline did it all, and the result was live.

Observability: Because Blind Spots Break Systems

Infrastructure without observability is infrastructure waiting to fail in silence. What I needed was visibility into what was going on at every layer of the stack, in Kubernetes and on AWS.

In Kubernetes, I used kubectl get pods -w to watch pod state changes in real time. The HPA targets showed exactly what CPU percentage each pod was reporting, and how that compared to the scaling threshold. Pod events revealed scheduling decisions, image pulls, and health check failures. Kubectl top gave me a snapshot of current resource usage across all the pods.

On AWS, CloudWatch became the central nervous system. Container logs from each Fargate task stream are sent to log groups, so I can search for errors, trace requests, and debug issues without needing shell access to the containers. ECS publishes CPU and memory metrics for every task and service, which I can graph and alert on.

The Application Load Balancer tracks request counts, response codes, and latency. I set up a CloudWatch alarm to trigger if the ALB receives too many 5XX errors, indicating backend failures. This kind of proactive monitoring lets the system detect issues and scale or alert before anything becomes user-facing.

Cost Breakdown

My primary concern when getting into this was that running infrastructure on AWS would blow through my budget. It's affordable for what it delivers.

Thus, with an average of 2 Fargate tasks running continuously, the estimated monthly costs break down to approximately:

$21 for Fargate compute
$21 for the Application Load Balancer
$3 for CloudWatch logs and metrics
$1 for data transfer.

That’s roughly $46 per month all up.

If traffic spikes and autoscaling kicks in to run four tasks instead of two, the cost rises to about $67 a month. Still, more affordable than many Platform-as-a-Service options, and I have complete control over every aspect of the infrastructure.

The Kubernetes homelab, by contrast, has no cost other than electricity. For early-stage projects, or for local development, this is often a far more economical solution than Firebase, Heroku, or other managed platforms. And it gives you complete flexibility.

What I have Learned

This project taught me far more than I expected. I learned how to debug Kubernetes networking issues when pods couldn’t reach each other. I learned to write Terraform that would survive resource changes without destroying and recreating everything needlessly. I learned how to tune autoscaling thresholds and health check intervals so the system scales fast enough to handle load without thrashing.

I learned how to handle the CI/CD pipelines from scratch, building images to triggering deployments. I learned how to debug failing ECS deployments where tasks kept crashing due to misconfigured environment variables or missing IAM permissions. I learned how IAM role chains work, why task execution roles and task roles are different, and how to grant least-privilege access.

Most importantly, I learned how to design systems that scale rather than collapse. But the most significant shift wasn’t technical; it was a change in mindset: I started thinking like a DevOps engineer.

That means expecting failure and designing around it. It means assuming scale will happen and building infrastructure that handles it gracefully. It means automating everything so humans aren’t bottlenecks. It means adding observability from day one, not after something breaks. And it means keeping costs predictable so you’re not surprised by the bill at the end of the month.

Why It Matters

In interviews, teams do not want to hear that you followed a tutorial. Instead, they want to be shown that you can build systems that scale, recover when failures happen, deploy safely, and monitor themselves once they’re in production. They want to see that you understand the trade-offs between different architectural patterns and that you can reason about why you chose one approach over another.

This project shows cloud architecture skills across multiple platforms. It demonstrates CI/CD experience with real deployments, not just toy examples. It proves I can write Infrastructure as Code that’s maintainable and reproducible. It shows Kubernetes knowledge that goes beyond running kubectl apply. And it proves I can debug real problems under pressure, the kind that don’t have Stack Overflow answers waiting.

Most importantly, it shows the ability to design and reason about systems, not just implement them. And honestly, it was really fun to build.

Concluding Remarks

This hybrid auto-scaling platform pushed me further as a cloud engineer than any tutorial or certification ever could. It showed me the complete arc that a modern system takes: develop on your local machine, containerize for portability, deploy to orchestrators, scale dynamically based on demand, observe its behavior in production, and automate everything so that it runs without continuous human intervention.

If you’re starting your own journey into cloud engineering, my advice is simple: pick projects that stretch you. Build something slightly more complex than the last thing you built. Don’t just follow tutorials; treat them as starting points and extend, break, fix, and make them your own.

That’s how you grow.