What is Kubernetes HPA (Horizontal Pod Autoscaling) and How to Configure It

July 14, 2022

Introduction

Dynamic pod scaling is a Kubernetes feature that makes it easy to control resource utilization and ensure that an adequate number of pods are running. Kubernetes supports three types of autoscaling: horizontal, vertical, and cluster scaling.

This article will explain horizontal pod autoscaling (HPA) and provide steps to configure it in your cluster.

What is Horizontal Pod Autoscaling and how to configure it.

Prerequisites

  • Kubernetes cluster with kubectl installed.
  • Docker Hub account.

What Is Horizontal Pod Autoscaling?

Horizontal pod autoscaling allows Kubernetes to deal with increased or decreased loads by scaling the number of pod replicas related to a specific deployment or stateful set. While vertical scaling means assigning more resources to the running pods, horizontal scaling creates or removes pods within the same resource allocation.

How Does Horizontal Pod Autoscaling Work?

The Kubernetes object that enables horizontal pod autoscaling is called HorizontalPodAutoscaler (HPA). The HPA is a controller and a Kubernetes REST API top-level resource.

The HPA is an intermittent control loop - i.e., it periodically checks the resource utilization against the user-set requirements and scales the workload resource accordingly. The default interval for HPA checks is 30 seconds. Use the --horizontal-pod-autoscaler-sync-period controller manager flag to change the interval value.

The autoscaling/v1 API version of the HPA only supports the average CPU utilization metric. The autoscaling/v2 API version allows scaling according to memory usage, defining custom metrics, and using multiple metrics in a single HPA object.

How to Configure Horizontal Pod Autoscaling?

As an illustration of the horizontal pod autoscaling capabilities, this article will show you how to:

  • Create a test deployment.
  • Create an HPA via the command line or use the declarative approach.
  • Apply custom metrics.
  • Apply multiple metrics.

Create a Deployment

The following section shows how to create a Docker image for a small PHP app that performs a resource-intensive calculation.

1. Create the app directory:

mkdir test-app

2. Create a Dockerfile in a text editor:

nano Dockerfile

3. Put the following contents into the file:

FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php

4. Create the index.php file:

nano index.php

5. Add the mathematical operation to the file to create a CPU load:

<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

6. Build the Docker image:

docker build -t test-app:v1 .
Building a Docker image using the docker build command.

7. Tag the image. Use your Docker Hub account username:

docker image tag test-app:v1 [your-docker-hub-username]/test-app:v1

8. Push the image to Docker Hub by typing:

docker image push [your-docker-hub-username]/test-app:v1
Pushing a Docker image to Docker hub.

9. Create a deployment YAML in a text editor:

nano hpa-test.yaml

10. The YAML defines the deployment and the service that exposes it. The spec.template.spec.containers section specifies that the deployment uses the Docker image created in the previous steps. Furthermore, the resources sub-section contains resource limits and requests.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-test
spec:
  selector:
    matchLabels:
      run: hpa-test
  replicas: 1
  template:
    metadata:
      labels:
        run: hpa-test
    spec:
      containers:
      - name: hpa-test
        image: markopnap/test-app:v1
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: hpa-test
  labels:
    run: hpa-test
spec:
  ports:
  - port: 80
  selector:
    run: hpa-test

11. Create the deployment by using the kubectl apply command:

kubectl apply -f hpa-test.yaml 

The output confirms the successful creation of the objects.

Creating a Kubernetes deployment using the kubectl apply command.

12. Confirm that the objects are ready and running.

kubectl get all
Checking the Kubernetes objects created in the previous step.

Note: Our BMC solution lets you provision and decommission servers quickly and automatically. BMC also supports on-demand resource scaling and pay-per-use billing models.

Create HPA

With the deployment up and running, proceed to create a HorizontalPodAutoscaler object. The sections below illustrate the two methods for creating HPAs.

Create HPA via kubectl Command

Create an HPA using the kubectl autoscale command.

kubectl autoscale deployment hpa-test --cpu-percent=50 --min=1 --max=10

The --cpu-percent flag sets the CPU utilization percentage limit that triggers the autoscaler. The --min and --max flags specify the minimal and maximal number of pod replicas, i.e., the range within which the HPA is allowed to scale.

Creating a HPA using kubectl.

Create HPA via YAML Manifest

Alternatively, create the HPA declaratively by defining it in a YAML file.

1. Create the hpa.yaml file:

nano hpa.yaml

2. The YAML contains spec.minReplicas, spec.maxReplicas, and targetCPUUtilizationPercentage fields that correspond to the --min, --max, and --cpu-percent flags in the command-line method for creating HPAs.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-test
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

3. Create the HPA with kubectl apply:

kubectl apply -f hpa.yaml

Increase the Load

Before increasing the load to see the HPA in action, use kubectl get to check the status of the controller.

kubectl get hpa

The TARGETS column in the output shows that no load is generated. Consequently, the number of created pod replicas is still one.

Checking the created HPA using kubectl.

Now increase the CPU load using the load generator. Execute the following command in a new terminal tab/window:

kubectl -n default run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-test; done"

The command uses the busybox image to generate load and repeatedly query the hpa-test deployment. The deployment performs mathematical operations and engages the CPU.

In the first terminal tab/window, type the following command to watch the status of the HPA:

kubectl get hpa -w

After a short time, the TARGETS column shows that the CPU load exceeds the limit specified upon the HPA creation. Consequently, the HPA increases the number of replicas to match the current load.

Watching as the HPA increases the number of pod replicas as a response to an increased CPU utilization.

Stop Generating Load

To stop generating the CPU load, switch to the load generator terminal tab/window and press CTRL+C.

Type the following command to confirm that the number of replicas is back to one.

kubectl get hpa
The ouput of the kubectl get hpa command shows that the number of replicas is one.

Autoscaling Based on Custom Metrics

The autoscaling/v2 API version allows you to create custom metrics to trigger the HPA. To switch to autoscaling/v2, convert the deployment YAML into the new format with the following command:

kubectl get hpa hpa-test -o yaml > /tmp/hpa-v2.yaml

Open the new file and inspect its contents. The main change compared to autoscaling/v1 is the introduction of the spec.metrics section in which you can describe the metrics and create custom ones.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-test
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

For example, to create a memory-related HPA condition, type the following in the spec.metrics section:

  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 50Mi

Kubernetes enables several custom metrics to be used for the HPA, including:

  • Load balancer traffic.
  • Outbound connections.
  • Queue depth.
  • Latency of a function or dependency.
  • Request latency, etc.

Autoscaling Based on Multiple Metrics

The spec.metrics field in autoscaling/v2 allows setting up multiple metrics to be monitored by a single HPA. To specify more than one metric in a YAML, list them one after another. For example, the following section defines both CPU and memory metrics:

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 50Mi

HPA Best Practices

Consider the points below if you decide to use horizontal pod autoscaling:

  • Ensure that the metrics-server is running on your cluster, providing your HPA with per-pod metrics.
  • It is recommended to apply the HPA to the workload resources, not on the replication controllers themselves.
  • Create HPAs using YAML files. The command-line method makes it more difficult to version-control.
  • If you employ custom metrics, ensure that you use the correct target type for pods and objects.
  • Define pod resource requests to help the HPA optimize the scaling process and keep metrics calculations accurate.
  • Using microservice architecture is a great way to ensure that your deployment takes full advantage of horizontal autoscaling.
  • Adding native support for parallel pods allows the HPA to create and terminate pods in parallel, speeding up the process.

HPA Limitations

Although the HPA is very useful in configuring horizontal scaling for your project, it still has certain limitations:

  • You cannot use the HPA to scale DaemonSets.
  • If vertical pod scaling is set up on the system, the HPA may conflict.
  • If pod memory and CPU limits are not set properly, pods may terminate unexpectedly.
  • You can use the HPA only on apps that can run on multiple pods.
  • The HPA is limited by cluster capacity and may eventually require setting up cluster scaling.
  • The HPA does not take network and storage capacity into account, so using it may cause outages.

Conclusion

The article introduced you to the concept of horizontal pod autoscaling and the HorizontalPodAutoscaler controller in Kubernetes. After reading the guide, you are better acquainted with the benefits and limitations of the HPA and the ways to configure it on your deployment.

Was this article helpful?
YesNo
Marko Aleksic
Marko Aleksić is a Technical Writer at phoenixNAP. His innate curiosity regarding all things IT, combined with over a decade long background in writing, teaching and working in IT-related fields, led him to technical writing, where he has an opportunity to employ his skills and make technology less daunting to everyone.
Next you should read
How to Restart Kubernetes Pods
August 27, 2020

Containers don’t always run the way they are supposed to. If your Kubernetes pod ends up in an error state, you need a quick and easy way to fix the problem. This tutorial will explain how to restart pods in Kubernetes.
Read more
Kubernetes for Multi-Cloud and Hybrid Cloud Portability
May 6, 2021

This article introduces Kubernetes as an effective way to achieve application portability by making your hybrid or multi-cloud...
Read more
How to Configure Kubernetes for Rolling Update
July 15, 2021

This tutorial shows you how to perform rolling updates using Kubernetes Deployments. This method allows you to quickly update your apps...
Read more
Speed Up and Improve Software Testing with Kubernetes on BMC
July 14, 2022

Running Kubernetes in a production environment is an increasingly popular way to ensure scalability and increase...
Read more