![[Pasted image 20250128122518.png]]

Kubernetes autoscaling is a mechanism that allows your workloads to dynamically adjust resource allocations (such as CPU, memory, or replicas) based on demand, helping you optimize costs, improve performance, and maintain reliability. Kubernetes provides two main types of autoscaling:

![[Pasted image 20250128122720.png]]

1. Horizontal Pod Autoscaler (HPA)

Overview

HPA automatically scales the number of replicas (pods) in a deployment, replica set, or stateful set based on observed metrics such as CPU usage, memory usage, or custom metrics.
It ensures that your application has enough pods running to handle the current load.

How It Works

Metrics Collection:
- The HPA controller fetches metrics (e.g., CPU utilization, memory usage) from the Kubernetes Metrics Server.
Scaling Decision:
- Based on a target threshold defined in the HPA configuration (e.g., “target CPU utilization of 50%”), it calculates the desired number of replicas.
Update Deployment:
- The deployment or replica set is updated to match the desired number of replicas.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

minReplicas and maxReplicas: The minimum and maximum number of pods the HPA can scale to.
metrics: Defines the target metric (e.g., CPU usage) and its desired threshold.

Use Cases

Applications with fluctuating workloads, such as web servers or APIs, where traffic varies throughout the day.

2. Vertical Pod Autoscaler (VPA)

Overview

VPA adjusts the resource requests and limits (CPU and memory) of a pod based on its observed usage. This ensures that each pod gets the optimal resources it needs without overprovisioning.

How It Works

Metrics Analysis:
- VPA observes the resource usage of containers over time.
Recommendation:
- It suggests new CPU and memory values for requests and limits.
Scaling:
- If configured in “auto” mode, VPA will automatically update the pod’s resource requests/limits and trigger a pod restart with the updated configuration.

Modes

VPA operates in three modes:

Off: Only gives recommendations but doesn’t take any action.
Auto: Automatically updates resource requests/limits and restarts pods as needed.
Initial: Sets resource requests/limits only at the time of pod creation.

Configuration Example

Here’s an example of VPA for a deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: example-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  updatePolicy:
    updateMode: "Auto"

updateMode: Can be "Off", "Auto", or "Initial".

Use Cases

Applications with consistent workloads but uncertain resource requirements, such as batch jobs or workloads prone to memory spikes.

Key Differences Between HPA and VPA

Feature	HPA	VPA
Scaling Target	Number of pods (replicas).	Resource requests/limits (CPU/memory).
Metric Sources	Metrics Server or custom metrics.	Historical resource usage.
Scaling Type	Adds/removes pods.	Adjusts pod resource allocation.
Pod Restarts	Does not restart pods.	May restart pods to apply changes.
Use Case	Workloads with fluctuating traffic.	Optimizing resource allocation.

3. Combining HPA and VPA

HPA and VPA can work together in certain scenarios. For example:

Use HPA to scale replicas based on traffic.
Use VPA to fine-tune resource allocation for individual pods.

However, care must be taken to avoid conflicts:

Set the HPA to scale based on external metrics (like traffic) and allow VPA to manage resource requests/limits independently