How & When to Scale

Last Updated October 2014

Pagoda Box is built for granular scalability. We give you the tools to scale individual services in your infrastructure along with analytics to help you know when to scale. Understanding how different scaling strategies affect the performance of your application and which are applicable in your particular case can be daunting. This doc outlines possible scaling strategies and what benefits they offer. After reading it, you should be familiar with:

  • Using application analytics as a scaling guide
  • The different ways to scale your app
  • When to scale your app
  • Using your ninja scaling skills to rule your infrastructure

Service Stats & Logs are Your Guide

The key to knowing when to scale your app lies in service statistics and application logs. There are four main metrics to watch when it comes to analyzing the performance of your services: CPU, RAM, Swap, and Disk.


The CPU statistic indicates how much of the allotted CPU a service is using. CPU provides speed and responsiveness. The higher the CPU usage, the more likely your service is to become slow or completely unresponsive.


The RAM statistic indicates how much of the allotted RAM a service is using. The higher the RAM usage, the more likely your service is to become unresponsive.


Swap is basically an overflow area for RAM. When more memory is needed than what's available, data is written to disk in the form of swap. For those accustomed to running on Linux servers, seeing anything stored in swap would trigger alarms.

Pagoda Box is built on SmartOS, which handles swap a little differently. It's still a disk-based overflow area for RAM, but SmartOS will selectively store data there, even if RAM is not full. In short, a service using small amounts of swap shouldn't trigger alarms. However, if one is using a lot swap, it should. Swap is slower than RAM, so the more a service uses, the more likely you'll see slowed performance.

Also know that once data is stored in swap, it persists, even if no longer needed. Once 1kb of swap is requested, it will clear out the old persisted data to make room for new data. High swap is only dangerous in conjuction with high RAM usage. If a service is using a lot of swap, but not very much RAM, it just means that data has persisted in swap since the last time the service needed it.


The Disk statistic represents how much of your service's allotted disk space is being used. If your service ever runs out of disk space, it will not function properly.


Application logs are helpful in diagnosing potential issues that may not be reflected in your application's service statistics, but do affect uptime. For example, it is possible to max out the available connections to your web instances while RAM and CPU usage remain low for your web service. You wouldn't see this in your service statistics, but you would see it in your logs. For detailed information about logs, view the Log Management doc.

How to Scale

All application scaling is done in your app dashboard. Each service is scaled independently. To scale a service, go to your dashboard and click the service you would like to scale to expand the service details. By default, the "Configure" tab will open.

There are three main scaling controls — the tier controller, the instance controller, and the resource controller.

The Tier Controller

The tier contoller allows you to select the type of resources your service's instance(s) will use. The options are cloud, private cloud, and bare metal, each with different available hardware configurations. The specific differences between the tiers are covered in the Cloud, Private Cloud, & Bare Metal Resources doc.

Scaling Tier Controller

The Instance Controller

The instance contoller allows you to control the number of instances in a service.

Scaling Instance Controller

This controller is availble for services with the "cluster" topology. The benefits of additional instances are covered in the Horizontal Scaling section below.

The Resource Controller

The resource controller allows you to specify what resources are available to each instance within a service. This is also known as vertical scaling.

Scaling Resource Controller

For Cloud and Private Cloud instances, you're given predefined instance size options to choose from, each with unique RAM, CPU, and Disk configurations. For Bare Metal instances, you're given granular control of each specs.

Different Types of Scaling

There are two main approaches to scaling your application. They can be used exclusively or in tandem, but each has their own strengths.

Horizontal Scaling (Scaling for Traffic and/or Redundancy)

Horizontal scaling is accomplished by increasing the number of instances inside a service.

Horizontal Scaling

Horizontal scaling does a few things:

  1. Increases your services' ability to handle concurrent requests.

  2. Increases your services' traffic throughput.

  3. Adds redundancy and failover to your service.

Scaling horizontally is extremely effective for web and worker services. You'll often find that a web service with 5×200MB RAM instances can handle much more traffic than a service with 1×1GB RAM instance. Adding instances spreads the load across multiple processes instead of dumping it on one.

One way to think of it would be to compare it to a line at a bank. A single bank-teller can only handle one customer at a time. No matter how fast that one bank teller is, he or she can only handle a limited number of customers in a given period of time. But by adding 5 other bank-tellers, even if they aren't as fast as the first, collectively, they'll be able to handle more customers in that same period of time.

Vertical Scaling (Scaling for Power)

Vertical Scaling is what the development world has come to know and love — MORE POWER. Scaling vertically consists of adding more resources to your services, giving them more power or capacity to accomplish their specific tasks.

Vertical Scaling

Scaling vertically is extremely effective for services with heavy processing loads like databases, caches, and sometimes even webs and workers depending on what they're doing. For some services, this is the only means of scaling.

Cloud, Private Cloud, and Bare Metal resources are the different "tiers" of vertical scaling, each with granular increments. To learn more about the specifics of each tier, check out the Cloud, Private Cloud & Bare Metal Resources doc.

The main weakness of vertical scaling really only becomes an issue when vertical scaling is used exclusively. Ultimately, you can only add so much power to a single instance and that instance can only do so much. This weakness is largely be mitigated through the use of both vertical and horizontal scaling (when possible).

When & What to Scale

Knowing when and what to scale is key to ensuring the uptime of your application. In the heat of the moment, it's easy to panic and just scale everything, but there's a couple of rules and questions to ask that will help you identify what needs to be scaled, how it should be scaled, and when it should be scaled.

1. Preemptive Scaling is Better than Reactive Scaling

If you know a rush of traffic is coming, by all means, scale up before it hits. It's always better to scale up before users start to experience issues than waiting until after they run into them.

2. Know How Your App Uses Resources

Understanding how each service in your app uses its available resources is extrememly helpful in understanding what to scale and how to scale it in your time of need. Below are some questions to ask when identifying what action needs to be taken.

  1. What are my service stats and logs telling me?
    Your service stats and logs are the first place to look when trying to identify what to scale. If you see that RAM usage is high on one of your services, it's likely that's the service that needs scaling. But the high RAM usage doesn't necessarily tell you how to scale. The next question will help to answer that.

    Another use case involves resource usage appearing normal, but errors appearing in the logs indicating other errors, such as maxed out connections. It's possible to max out connections on an instance without affecting the instance's resource usage. In this case, you can immediately assume that the service needs more instances rather than more resources. Adding instances will increase the number of possible connections available to the service.

  2. Is the service under abnormal load?
    Understanding how a service performs under different levels of load will tell you the most effective means of scaling that service. For example, if while under little or no load, a service uses most of its availalble resources, the most effective scaling strategy for that particular service will be scaling vertically - adding more resources. On the other hand, if while under little or no load, a service uses hardly any of its available resources, but as traffic increases, it's resource consumption does as well, the answer to the next question will help to identify the best scaling strategy.

  3. If under high load, what is the nature/cause of the load?
    If you know a service is under load, it's important to understand the nature and cause of the load. When a service is under stress, it's usually caused by one of three things:

    1. Large amounts of concurrent requests or queries

    2. Resource-intensive requests or queries

    3. Large amounts of data being stored in RAM or written to disk

    For web and worker instances, highly concurrent requests are best addressed by adding instances to your service. Scaling horizontally increases a web or worker service's ability to handle concurrent requests. If a web or worker is running a resource-intensive process, then scaling vertically is the best way to address performance issues.

    The primary means for relieving stress on database services (with a few exceptions) is scaling vertically. Adding more resources to a database will address all 3 of the major causes of stress.

    With cache services, it really depends on the type of cache you're using. It's normal for Memcached services to use a lot of RAM. Memcached stores all of its data in RAM. As it fills up, it will dump old data to make room for new data. Scaling Memcached will simply increase the amount of data that can be stored and allow data to persist longer. Redis stores data in RAM but also routinely persists to disk. Vertical scaling is the primary means of addressing performance issues in cache services.

Use Cases

Probably the best way to understand how and when to scale is to walk through some pretty basic use cases.

Your Blog Post Hits No.1 on Hacker News

Problem: You're blog runs on a simple WordPress install using a micro-cloud web instance and a micro-cloud Database. You write this awesome post that changes the development world and shoots up to the top of Hacker News. All of the sudden, your blog is inundated with traffic, your web service's RAM usage jumps to 99% and your database stays steady right around 65% RAM Usage.

Solution: The first thing to do would be to add more instances to your web service. WordPress is a fairly lightweight CMS that doesn't require a lot of processing power so scaling vertically wouldn't be the best approach for handling the traffic. Micro cloud instances have plenty of processing power for WordPress. Adding multiple instances would increase your apps ability to process concurrent requests and handle the surge in traffic. Your RAM usage would drop as it gets spread across multiple instances.

At this point, you may not need to scale your database, but as traffic increases, you'll probably see an increase in the RAM usage. Once analytics show your DB in the red, you'll probably want to scale up.

Developing an Image Processing Service

Problem: You're developing an image processing service with a light front-end and a background worker to handle all the image processing. You have 1 micro cloud web instance and 1 micro cloud worker instance. Your front-end works fine, but once you send a job to the worker, the worker's CPU usage spikes and the job never finishes.

Solution: Image processing can be very demanding. In this case, your worker doesn't have enough compute resources to handle the image processing. By scaling vertically (adding additional resources), the worker will be able to complete what's required of it. Because a single worker instance can only work on one job at a time, horizontal scaling won't help the jobs complete. But once you move into production and concurrency picks up, horizontal scaling will increase your workers throughput.

Ecommerce Store During the Holidays

Problem: It's that wonderful time of the year when everyone starts shopping online and Ecommerce stores do the majority of their annual sales. During times like this, chances are all your services will start to be in the red.

Solution: Keep an eye on your analytics to know what services in your application should be scaled. Scale your services when resource usage is high. Take comfort in the fact that your app can be scaled up to handle the surge in traffic and then back down when it's all over. That way you only use and pay for what you need. Nothing more.

If you have any questions, suggestions, or corrections, let us know.