How to Scale Your APIs Using Distributed Architecture and Load Balancing

How to Scale Your APIs Using Distributed Architecture and Load Balancing

You have built an amazing product! Congratulations. Everyone appreciated you for your hard work in building an API that will make the lives of developers like you easier. 1 appreciation, 100 appreciations, 10000 appreciations – and before long, your API is seeing a regular spike in traffic! As cool as that may sound, as the creator you need to ensure that your API is always up and running no matter how many users are accessing them. In this article, we’re going to talk about how to scale your APIs using a distributed architecture and load balancing. But before we dive into that, let’s take a moment to talk about what we mean by “scaling.”

What is Scaling?

Imagine you’re running a sweet shop in a busy market. At first, you’re just selling a few sweets to your regular customers, and things are going smoothly. But suddenly, Diwali comes around the corner, and everyone in the market starts craving your delicious sweets taking them home for themselves and their friends. You start getting a flood of customers, and your little shop can’t keep up with the demand. Your shelves start to empty, and you run out of sweets to sell. This is where scaling comes in.

Scaling your APIs is like expanding your sweet shop to handle more customers and more orders. You might need to hire more staff, stock up on more ingredients, and invest in bigger and better equipment to keep up with the demand. Just like expanding your sweet shop, scaling your APIs requires an investment of time, resources, and effort. But once you’ve scaled up, you’ll be able to handle a higher volume of requests and keep your customers satisfied with a steady supply of sweets. Whether it’s Diwali or Holi or just a regular day in the market, scaling ensures that your sweet shop (or in this case, your API) can handle whatever comes your way, without running out of stock or losing customers to the competition.

What is Distributed Architecture?

Now that your sweet shop performed well during Diwali, you have now got investors who want to invest in you and your popular delicacies. With the money that you have been provided, imagine that now you’ve decided to expand your shop to handle more customers and sell more sweets. Instead of just adding more shelves and hiring more staff, you decide to break down the shop into smaller, independent sections, each specializing in a different type of sweet.

For example, you might have a section for chocolates, one for candies, one for cakes, and so on. Each section has its own staff, its own equipment, and its own inventory. This is like using a distributed architecture for your APIs, where each component is broken down into smaller, independent pieces that can be deployed and scaled separately.

By breaking down your sweet shop into smaller sections, you can handle more customers and provide a more diverse range of sweets, without overloading any one section. Similarly, with distributed architecture, each component can run on a different server or node, allowing the system to handle a higher volume of requests and providing a more robust and scalable solution.

Just like how each sweet section in your shop can operate independently of the others, each component in a distributed architecture can be developed, deployed, and scaled separately, making it easier to maintain and update your system over time. The distributed architecture is a powerful design approach that can help improve the performance, reliability, and availability of your APIs, just like how breaking down your sweet shop can help improve the customer experience and expand your business.

Load Balancing

Expanding your shop to handle more customers and more sweets, done! Breaking down the shop into smaller sections, each with its own staff, inventory, and equipment, done! But you still have a problem: with so many customers coming in, some sections are getting overloaded while others are left with too few customers.

This is where load balancing comes in. By distributing the incoming customers across all the sections of your sweet shop, you can ensure that no section is overloaded and that the workload is distributed evenly across the system. Let’s call this tool a “customer balancer”. The customer balancer is responsible to distribute customers across all the sections of the sweet shop. How will this tool know how to balance it? Let’s use some popular algorithms:

  • A round-robin algorithm will be able to send each customer to the next available section.
  • A least connection algorithm will be able to send the customers to the section which has the fewest customers.

Similarly for our API, we can use a “load balancer” and utilize these algorithms to direct and distribute the incoming traffic amongst the servers present. Each algorithm has its own strengths and weaknesses, hence it’s important to choose the right one for your sweet shop and your API.

Scaling Your API Using Distributed Architecture and Load Balancing

Now that we’ve covered the basics of scaling, distributed architecture, and load balancing, let’s talk about how to scale your APIs using these techniques.

Creating a Basic API with Node.js

Here we are utilizing express to create a basic server that runs on a port specified in the environment and whenever a request is sent to the server, it sends a response “I am a basic Codedamn API”.

const express = require('express'); const bodyParser = require('body-parser'); const app = express(); const port = process.env.PORT; if(!port){ throw new Error("The port hasn't been configured as an environment variable"); } app.use(bodyParser.json()); app.get('/', (req, res) => { res.send(`I am a basic Codedamn API running on port ${port}`); }); app.listen(port, () => { console.log(`API server running on port ${port}`); });
Code language: JavaScript (javascript)

Create a new terminal and enter the following commands:

export PORT=8000; node server.js
Code language: Bash (bash)

When you visit http://localhost:8000/ you should see the following output on your screen:

Now create one more terminal and enter the following commands:

export PORT=8001: node.server.js
Code language: Bash (bash)

Upon visiting http://localhost:8001/ you should be able to see the below-given output:

Right now, if your API is being hit with 3000 requests per second, it will undoubtedly be overloaded.

With a load balancer, we are aiming to create copies of this server to reduce the load on 1 server and distribute the incoming requests to provide our users with better and more efficient response time.

Now, you’ll need to set up load balancing to distribute the incoming traffic across these servers or nodes running at ports 8000 and 8001. This can be done using a load balancer, such as NGINX, HAProxy, or Amazon ELB, we’ll be using NGINX in our case.

Setting Up Load Balancing

Let us first install nginx on your system. I am using a macOS environment.

brew install nginx
Code language: Bash (bash)

Now start the nginx web server.

Code language: Bash (bash)

By default, nginx uses port 8080. If you redirect yourself to http://localhost:8080/ you should see the default nginx page as shown below:

We will now play around with the NGINX configuration and rewrite it from scratch to suit it specifically to our needs. By default, the configuration file is situated at /opt/homebrew/etc/nginx/nginx.conf Write the command given below in your terminal to open this file in the editor.

sudo vim /opt/homebrew/etc/nginx/nginx.conf
Code language: Bash (bash)

Your vim editor will open up like this:

Press i to enter the insert mode. You will know you are successfully inside the insert mode when you see --INSERT-- on the bottom left of your editor screen.

Remove all the existing configurations and enter the configuration as stated below:

http{ upstream backend{ server; server; } server{ listen 80; root /documents/poojagera-dev/codedamn/; location /{ proxy_pass http://backend; } } } events{}
Code language: Nginx (nginx)

Press ESC to enter the normal mode and enter :wq to save the file. You can now use the below-mentioned command to reload the NGINX server with the configuration you just specified.

nginx -s reload
Code language: Bash (bash)

In this short clip, you can see that your NGINX server running on http://localhost/ is oscillating between ports 8000 and 8001 and is serving the application (API) you created.

Congratulations! You now know how to implement load balancing for your APIs.


Scaling your APIs using a distributed architecture and load balancing can help you improve the performance, reliability, and availability of your system. By breaking down your system into smaller, independent components and distributing the load across multiple servers or nodes, you can handle a higher volume of traffic and provide a more robust and scalable solution.

So, if you want to ensure that your APIs are always up and running, no matter how many users are accessing them, start scaling today! If you wish to know more about backend web development, do not forget to check out the backend web development specialization that comes with the Codedamn pro subscription where you will get 24×7 doubt support with an AI mentor.

Happy Learning!

Frequently Asked Questions (FAQs)

How do you scale your API?

Scaling your API is like playing a game of Jenga – you need to keep building higher and higher without letting it all come crashing down. By using a distributed architecture and load balancing, you can add more blocks (servers or nodes) to your Jenga tower and distribute the weight evenly, making it more stable and reliable. This way, you can handle more traffic without risking a Jenga disaster.

What are the 3 scaling techniques used in distributed systems?

Scaling your distributed system is like a game of Tetris – you need to fit all the pieces together just right to achieve optimal performance and scalability. You can either add more pieces (horizontal scaling), make the existing pieces bigger (vertical scaling), or do a little bit of both (hybrid scaling). By finding the right balance, you can handle more traffic without dropping any pieces or crashing the system.

How do you scale a distributed system?

Scaling your distributed system is like playing a game of SimCity – you need to plan, build, and manage your city just right to keep your citizens happy and the traffic flowing smoothly. You need to identify the different functions or services (buildings) and separate them into individual components. Then, deploy them across multiple servers or nodes (neighborhoods) using a container orchestration tool (zoning laws). Set up load balancing (traffic lights) to distribute the traffic evenly. Monitor your city’s performance and adjust as needed to keep it running smoothly.

Icons by Flaticon.

Sharing is caring

Did you like what Pooja Gera wrote? Thank them for their work by sharing it on social media.


No comments so far