Optimizing Traffic Management in Kubernetes with Istio’s Destination Rules

6 min readJul 8, 2024

In our last blog post, we explored how to implement rate limiting using Istio’s destination rules. However, there was one oversight: who uses Kubernetes with just a single pod? Today, we’ll experiment with different replica counts for our pod and observe how Istio manages this. We’ll also discuss how to appropriately configure the destination rule for this setup.

Mastering Istio Rate Limit: Essential Techniques and Insights

Unlock the power of Istio with this comprehensive guide to rate limiting, featuring essential techniques and practical…

medium.com

Terminology

workload: The pods associated with a deployment.
Client: The workload that sends the requests.
Server: The workload that receives the requests.

Setup

Workload A will send 20 requests simultaneously.
All tests will utilize the same destination rule:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: app-b-destination-rule
spec:
  host: app-b
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 10
      http:
        http1MaxPendingRequests: 5

For those who haven’t read my previous blog, this setup means that Workload B will accept only 10 requests at once. Any additional requests will be queued up to a limit of 5. Any request that can’t be accommodated in the queue will be immediately rejected with an HTTP 503 error.

The logs displayed will focus solely on the return codes.

One client And One server

------Summary-------
Requests that returned 503: 5
Requests that returned 200: 15

Nothing was changed from the last time we ran this test 😮‍💨
This outcome was pretty straightforward.

One client And many (3) servers

------Summary-------
Requests that returned 503: 6
Requests that returned 200: 14

Surprisingly, despite having three times more workload, we can’t increase the number of requests that get status 200. One might expect that tripling the workload would allow us to send 30 requests without any issues.

Let’s explore why this isn’t the case:

Consider our destination rule:

spec:
  host: app-b # app-b.apps.svc.cluster.local

The host we are rate limiting is the service for Workload B (app-b.apps.svc.cluster.local).

So what should we do?
Maybe update our destination rule? If so? What should be the right values? And what if I use HPA?

Let’s delve into these questions.

What should be the right values of my destination rule?

Enter K6 an open-source load testing tool designed to test the performance and reliability of our systems.

By leveraging K6’s stages feature alongside Grafana, we can accurately gauge the maximum number of requests our workload can handle simultaneously.

I’ve developed a K6 script to send requests to app-b, incrementally increasing to 100 connections. In Grafana, I monitored when the pod restarted and how many concurrent connections it sustained at that moment.

Grafana panel that shows how many connections my app can handle at once

We’ll explore more about Grafana and the metrics to monitor in one of our upcoming blog posts — there’s much to anticipate!

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    stages: [
        { duration: '30s', target: 10 },
        { duration: '45s', target: 10 },
        ...
        { duration: '30s', target: 100 },
        { duration: '45s', target: 100 }
    ],
};

export default function () {
    const res = http.get('http://app-b.apps.svc.cluster.local:8080/scenarioA');
    check(res, { 'status was 200': (r) => r.status == 200 });
    sleep(1);
}

From this testing, I learned that my application can handle up to 56 requests simultaneously before the liveliness probe triggers a restart. Assuming a safety margin, I’ll reduce this by 20–30% and round it down, setting tcp.maxConnections to 40 in our destination rule. This adjustment ensures our pod can manage 40 concurrent requests effectively.

What if I use HPA?

If your HPA is configured correctly and your workload scales under load, you should adjust the tcp.maxConnections setting accordingly. The formula to use would be:

tcp.maxConnections = maxRequestOnePodCanHandle * hpaMaxReplicas

With this configuration, your workload should efficiently scale, and each pod would handle only a fraction (1/hpaMaxReplicas) of the total requests directed to the service. This approach ensures that no single pod is overwhelmed, maintaining optimal performance and stability.

Many (3) clients And one server

But first, let’s answer two important questions:

How can I invoke requests from all the pods at once?

As you might already be aware, when a request is sent to a Kubernetes service, it typically selects one pod to route the request to, and that pod processes the request. However, in scenarios where you want to invoke the request from all pods simultaneously, a different approach is needed.

For this purpose, I have created another service named “proxy.” The role of this proxy service is to distribute a single request sent to it across all pods in Workload A. This ensures that each pod receives and processes the request concurrently.

How Does Istio Prevent Excess Requests from Reaching Workload B?

Istio has the capability to block requests either at the client or the server side. But how does it determine where to stop them?

Requests are stopped at the client side when the Istio proxy detects that the number of connections exceeds the combined limit of tcp.maxConnections + http1MaxPendingRequests—in our case, 15 (10 maximum connections and 5 pending requests). These are flagged with the UO flag.

However, consider a scenario where we have three pods, and each pod sends 20 requests but 5 are blocked with the UO flag. This results in 45 requests reaching the workload of app-b. In such cases, Istio performs an additional check at the Istio proxy of app-b to determine the number of requests arriving at the service. If the number exceeds the sum of tcp.maxConnections + http1MaxPendingRequests, the excess requests are blocked and flagged with the URX flag.

For more detailed information on the Envoy flags, you can refer to the official Envoy documentation.

Testing

I sent 20 requests from each pod.
Let’s examine the summary:

------Summary-------
Requests that returned 503: 19
Requests that returned 200: 1
------Summary-------
Requests that returned 503: 12
Requests that returned 200: 8
------Summary-------
Requests that returned 503: 14
Requests that returned 200: 6

We observed that only 1+8+6=15 requests received an HTTP 200 status from app-b, and we understand precisely why this occurred.

In summary, Istio can intercept requests either at the client-side (flagged with UO) or at the server-side (flagged with URX) through its proxy mechanisms.

Many (3) Clients VS Many (3) Servers

Based on what we’ve learned so far, this scenario should be as straightforward as the one involving one client versus many servers. Ultimately, what matters most here is the server’s destination rule.

Summary

In this post, we’ve delved into the practical applications of Istio’s destination rules to manage and optimize traffic within Kubernetes environments. Through detailed experiments, we demonstrated how these rules control traffic flow, highlighted by our tests with various configurations of requests to and from multiple pods. We introduced a unique proxy service to ensure simultaneous request distribution across pods and detailed how Istio utilizes specific flags to prevent overloading services. The exploration revealed not just the technical settings like tcp.maxConnections and http1MaxPendingRequests, but also practical scenarios showing their effects in real Kubernetes deployments.