Scaling at the Speed of Demand: A Guide to KEDA’s Swift Workload Scaling

Published in

Emumba

5 min readNov 28, 2023

“Showing a strong success and visible benefits is key to getting others to agree to try your way of doing things.” — Frederic Rivain

Introduction

In the ever-evolving landscape of modern applications, the quest for scalability often brings organizations face-to-face with challenges inherent in conventional scaling approaches. Our own journey of load-testing at eMumba, took us through the trenches of Horizontal Pod Autoscaling (HPA), where the limitations of linear scaling became apparent, especially during peak demand. Out-of-Memory (OOM) events emerged as persistent hurdles, prompting a deliberate transition towards a more advanced solution — Kubernetes Event-Driven Auto-scaling (KEDA). This article is a comprehensive exploration of the nuanced strategies and benefits that KEDA brings to the table in orchestrating dynamic workload scaling.

Frustration, Collaboration, Innovation — The DevOps Cycle

In the midst of challenges at eMumba, tension brewed between our DevOps and QA load testing teams due to those pesky failing pods during load spikes. Surprisingly, this tension became the catalyst for a collaborative breakthrough, giving birth to what we now affectionately call ‘The DevOps Cycle.’ Faced with the shared frustration, our DevOps and QA teams united, turning the challenge of load-spiked pod failures into an opportunity for innovative collaboration. This unexpected partnership not only produced clever solutions but also triggered a complete shift in how we approach scaling.

The Hurdles with HPA

Horizontal Pod Auto-scaling (HPA), once a reliable component in our scaling strategy, revealed its limitations when faced with the unpredictable peaks in demand. Linear scaling’s inability to flex and adapt became evident, exposing us to the disruptions caused by OOM events during high-stress periods. It was a call for evolution.

Enter KEDA

KEDA, or Kubernetes Event-Driven Autoscaling, emerged as the beacon of hope in our scaling woes. Unlike HPA, which relied on metrics like CPU and memory usage, KEDA extended its capabilities to scale based on external events. This shift in paradigm allowed us to decouple scaling decisions from resource utilization, providing a more responsive and flexible scaling solution.

In our current scaling strategy, we’ve harnessed the power of KEDA to dynamically scale based on the number of incoming HTTP requests hitting the Ambassador Gateway for a particular service. The dependence on the conventional reliance on resource metrics like CPU and memory has been reduced. Here’s a breakdown of how this tailored approach works:

Seamless Kubernetes Integration

KEDA smoothly integrates into our Kubernetes environment, extending the capabilities of the native Horizontal Pod Auto-scaler.

Event-Driven Scaling

KEDA’s standout feature is its ability to scale based on external triggers or events. In our case, these triggers are the incoming HTTP requests reaching the Ambassador Gateway.

Precision in Scaling

Unlike traditional scaling methods that may respond to general resource utilization, KEDA’s event-driven model allows for a more precise response. Scaling occurs precisely in tandem with the ebb and flow of incoming HTTP requests.

Adaptable Resource Allocation

Scaling based on the volume of HTTP requests ensures that resources are allocated in a highly adaptable manner. During spikes in demand, KEDA dynamically adjusts to accommodate the increased load, optimizing resource utilization. In periods of lower demand, resources are efficiently scaled down.

Real-time Responsiveness

By keying into incoming HTTP requests, KEDA provides real-time responsiveness to the actual demand on the system. This intelligent scaling ensures that resources are allocated exactly when needed, avoiding unnecessary scaling and optimizing overall system performance.

Setting Up KEDA for Dynamic Scaling with HTTP Triggers

Install KEDA

Begin by installing KEDA into your Kubernetes cluster. You can use KEDA’s Helm chart or apply the YAML manifests directly. Ensure that KEDA’s components, such as the keda-operator and scaledobjects Custom Resource Definitions (CRDs), are deployed successfully.

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

To deploy the CRDs separately from the Helm chart, use the keda-2.xx.x-crds.yaml file provided on the GitHub releases page.

Create ScaledObject for HTTP Requests

Define a ScaledObject in Kubernetes that specifies how KEDA should scale based on incoming HTTP requests. This includes information about the metric to monitor, the minimum and maximum number of replicas, and any additional parameters.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-service-scaledobject
spec:
  scaleTargetRef:
    kind: Deployment
    name: auth-service
  pollingInterval: 15
  cooldownPeriod: 30
  minReplicaCount: 2
  maxReplicaCount: 30
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090
      metricName: total_requests
      threshold: '30'
      query: sum(rate(envoy_cluster_upstream_rq_total{envoy_cluster_name=~"cluster_my_service.*"}[1m]))
  - type: cpu
    metricType: Utilization
    metadata:
      value: "50"
  - type: memory
    metricType: Utilization
    metadata:
      value: "50"

Apply the ScaledObject Configuration

kubectl apply -f your-scaledobject-config.yaml

Verify Scaling

Monitor the scaling behavior using KEDA’s logs and metrics. You can use the KEDA Dashboard or check the events associated with your deployment.

kubectl get hpa <scaled-object-name> -n <namespace>

Ensure that scaling occurs appropriately based on the incoming HTTP requests, and adjust the configuration as needed.

By following these steps, you’ll have set up KEDA to dynamically scale your deployment based on the number of incoming HTTP requests hitting the Ambassador Gateway. This tailored approach allows for precise resource allocation and responsive scaling, optimizing the performance of your applications in real-time.

The Results

The implementation of KEDA for dynamic scaling based on HTTP requests with the Ambassador Gateway yielded significant improvements over the traditional Horizontal Pod Autoscaling (HPA) approach. Examining the Grafana dashboard snapshots provides a clear comparison of the two scenarios.

With HPA — Horizontal Pod Autoscalar

In the test run with HPA, a noticeable decline in success ratio is observed as the load increases. The surge in 5xx errors signals that the system encounters difficulties under substantial stress. The pods struggle to handle the load, resulting in failures and an overall inability to recover efficiently.

With KEDA — Kubernetes Event Driven Autoscalar

Contrastingly, the above image depicting the results with KEDA presents a marked improvement. The Ambassador success ratio is notably higher, with a smooth graph reflecting the system’s ability to gracefully handle increased loads. Thanks to KEDA’s dynamic scaling capabilities, the pods auto-scale, preventing OOM (Out of Memory) issues and ensuring seamless traffic flow without disruptions.

These contrasting outcomes underline the effectiveness of KEDA in mitigating scaling challenges, providing a robust and responsive solution for dynamic workload orchestration. The success of KEDA is evident in the enhanced performance and reliability demonstrated by the Ambassador Gateway under varying levels of demand.