The Load Balancer Story – From Hardware Boxes to Cloud Brain (AWS + Azure Deep Dive for Engineers)

Target Audience

The target audience for this article falls into the following roles:

  • Software engineers preparing for system design interviews
  • Tech workers
  • DevOps and Cloud aspirants

What You Will Learn

This article explains Load Balancers in a simple, story-based way. You will understand:

  • What a Load Balancer actually is
  • How it worked before cloud
  • How AWS and Azure Load Balancers work today

By the end, you will clearly know what happens inside that small “Load Balancer” box in every system diagram.

Chapter 1 — The Early Web (One Server Was Enough)

In the beginning, most applications lived on a single server. One machine , One IP address ,All users connected directly to it.

For small traffic, this worked perfectly. But as users increased, problems started appearing:

  • CPU became overloaded
  • Requests slowed down
  • If the server stopped → whole website went down

This design had a simple weakness: One server = one failure point

So engineers added more servers. But then a new question appeared. If we have 5 servers… how does a user know which server to connect to?

Chapter 2 — The First Solution (Hardware Load Balancers)

To solve this, companies introduced a new device in front of servers. A special network box called a Load Balancer.

This box:

  • accepted all incoming traffic
  • checked which servers were healthy
  • distributed requests across them

Internally, it was just:

  • Linux OS
  • Reverse proxy software (such as HAProxy or Nginx)
  • routing logic
  • health checks

So technically, a load balancer was simply: a smart reverse proxy that sits between users and servers

Flow became:
User → Load Balancer → Server A / B / C , If Server A failed, traffic automatically moved to B or C.

Reverse proxy

A reverse proxy is just a server that stands in front of your real servers and handles all requests first. Users never talk directly to your backend machines.
They only talk to the reverse proxy.
Users never see or access the real servers. They only see the proxy. The proxy receives the request, decides which backend server should handle it, forwards the request, and then sends the response back to the user.
Because of this middle layer, the system becomes safer and smarter. The proxy can distribute traffic across multiple servers, block unhealthy servers, handle HTTPS, and protect your internal machines from direct exposure.

In simple words: A reverse proxy is like a gatekeeper that controls all incoming traffic before it reaches your servers.

Traffic Flow before reverse proxy

Traffic Flow after reverse proxy

For the first time, systems became reliable. For the first time, systems became reliable.


Chapter 3 — The Hidden Challenges Behind Load Balancers

At first, adding a load balancer looked like a perfect solution.

  • Traffic was distributed.
  • Servers stopped crashing.
  • System became more reliable.

But as traffic kept growing, new problems started appearing. Not technical theory problems — real operational headaches.


Setting Rules Was Not Simple

As applications grew, one server was not enough. We now had:

  • API servers
  • Web servers
  • Image servers
  • Auth services

Requests could not go to random machines anymore. They had to be routed carefully.

For example:

  • /api/* → API servers
  • /images/* → image servers
  • /login → auth servers

So we had to configure routing rules inside the load balancer. But managing dozens of rules manually became messy. One wrong rule could send traffic to the wrong service and break the app.


Listeners Added More Complexity

A listener in a load balancer checks incoming requests on a specific port and protocol (like HTTP on port 80). It then forwards those requests to the correct backend servers based on defined rules.

Different traffic types required different ports:

  • HTTP → 80
  • HTTPS → 443
  • TCP → custom ports

Each port needed a listener. Each listener needed:

  • certificates
  • routing rules
  • security settings

As services increased, listeners increased. Configuration became harder to manage.


Scaling the Load Balancer Itself

Earlier, we only worried about scaling servers. Now we realized something new: the load balancer can also become a bottleneck.

If too many users hit one load balancer:

  • connections pile up
  • latency increases
  • sometimes the LB itself crashes

So now we had to scale:

  • backend servers
  • and the load balancer

Managing hardware capacity was expensive and complicated. We had to predict traffic in advance. If we guessed wrong, downtime happened.


Rate Limiting and Protection

Another issue appeared: bad traffic. Some users or bots could send:

  • too many requests
  • DDoS traffic
  • brute force login attempts

Without protection, backend servers could get overwhelmed. So we had to add:

  • rate limits
  • throttling
  • connection limits

All this logic had to be configured manually in proxy software. More configs. More complexity. More chances of mistakes.


The Reality

At this point, the load balancer was no longer “just a simple proxy”. t became:

  • traffic manager
  • security layer
  • routing engine
  • scaling component

And managing all this on physical machines or self-managed software was painful. Teams were spending more time managing infrastructure than building features. Something had to change.


Chapter 4 — Moving to the Cloud (Load Balancer Becomes a Service)

By this point, the load balancer was no longer a small network box. It had become the most critical part of the system.

It was handling:

  • traffic routing
  • listeners and ports
  • SSL certificates
  • health checks
  • rate limiting
  • scaling
  • security

And we were managing everything manually. Every time traffic increased, we had to:

  • upgrade hardware
  • add more capacity
  • tune proxy configs
  • restart services
  • hope nothing breaks

Infrastructure work slowly became bigger than application work.Engineers were spending more time managing load balancers than building features.This didn’t scale.


The Big Shift

Cloud platforms changed this completely.Instead of installing hardware. Instead of running proxy software ourselves.Instead of worrying about scaling capacity.

Load balancing became a managed service.
In Amazon Web Services, this is called Elastic Load Balancing. In Microsoft Azure, this is called Azure Load Balancer.

Now we don’t manage machines. We simply configure behavior.


What changed for engineers?

Earlier:

Install Linux , install HAProxy/Nginx , configure manually , manage scaling , handle failures

Now:

create Load Balancer , add listeners , add rules , attach backend servers

That’s it.

Cloud handles:

  • auto scaling
  • high availability
  • failover
  • distributed infrastructure
  • capacity planning
  • hardware maintenance

The same reverse proxy idea still exists. But now it runs across hundreds of servers behind the scenes.


Why this matters

This small change had a big impact.

Before cloud:

Load balancer = infrastructure headache

After cloud:

Load balancer = simple configuration

Engineers could finally focus on:

  • writing code
  • improving features
  • shipping faster

instead of managing traffic manually.

Real-world architecture and Traffic Flow through Load balancer now looks like

In modern cloud systems, users never directly connect to servers anymore.

There is always a smart layer in between.That layer is the Load Balancer.Today, almost every production system follows this pattern:

User → Load Balancer → Services → Database

This has become the default architecture for startups, SaaS apps, and enterprise systems.

Step-by-step traffic flow

Let’s follow one real request. Imagine a user opens: https://myapp.com/api/users

Here’s what actually happens behind the scenes.

Step 1 — User hits the Load Balancer

The request first reaches the public IP/Private IP of the Load Balancer. Not your servers. Servers stay hidden inside private networks.This improves both security and control.

Step 2 — Listener accepts the request

The listener checks:

  • port (80 or 443)
  • protocol (HTTP/HTTPS)

If it’s HTTPS, the load balancer even handles SSL/TLS decryption. Backend servers don’t need to manage certificates.

Step 3 — Rules decide routing

Next, routing rules are evaluated.

For example:

  • /api/* → API service
  • /images/* → image service
  • /login → auth service

This allows one single domain to serve many microservices.


Step 4 — Target group / backend pool selection

Now the load balancer selects a healthy server from the correct group.

Example: API Group: API-1 , API-2 , API-3

It chooses one using:

  • round robin
  • least connections
  • or smart algorithms

If one server is unhealthy, it is automatically skipped.

Step 5 — Server responds

The backend server processes the request and sends the response back through the load balancer. Finally, the user receives the result. All of this happens in milliseconds. The user never knows multiple servers even exist.


What this architecture gives us

This simple flow solves many production problems:

  • High availability → no single server failure
  • Scalability → add/remove servers anytime
  • Security → servers are private
  • Performance → traffic distributed evenly
  • Reliability → unhealthy servers removed automatically

Without the load balancer, all of this becomes manual and fragile.


How this connects to real cloud design

This pattern perfectly fits with modern cloud practices:

Common Interview Questions — Load Balancer

  1. What is a Load Balancer and why is it required in a production architecture?
  2. Explain the complete request flow from user to backend server through a Load Balancer.
  3. What are listeners, routing rules, and target groups in a Load Balancer?
  4. How do health checks and auto-scaling improve reliability and availability?
  5. What is the difference between Layer 4 and Layer 7 Load Balancers, and when would you use each?