Skip to main content
Version: 1.5.x

Graceful Connection Drain of istio-proxy

This document explains what happens when a pod which has istio-proxy sidecar enabled is deleted, particularly how the connections are treated, and how smooth you can configure the sidecar to drain the inflight connections gracefully.

note

This document only applies to TSB version <= 1.5.x.

Before you get started, make sure you:
✓ Familiarize yourself with TSB concepts
✓ Install the TSB environment. You can use TSB demo for quick install
✓ Completed TSB usage quickstart. This document assumes you already created Tenant and are familiar with Workspace and Config Groups. Also you need to configure tctl to your TSB environment
Install httpbin

When you issue a delete request against a pod in your Kubernetes cluster, all containers within the pod are sent a SIGTERM. If the pod contains only a single container, it will receive a SIGTERM and go into the terminating state. However, if the pod contains a sidecar (in our case an istio-proxy sidecar), then it is not automatically guaranteed that the main application is terminated before the sidecar.

If the istio-proxy sidecar is terminated before the application, the following issues may occur:

  1. All TCP connections (both inbound and outbound) are terminated abruptly.
  2. Any connections from the application fail

While there is a proposed KEP for it, currently there is no straightforward way to tell Kubernetes to terminate the application before the sidecar.

However, it is possible to workaround this problem by configuring the terminationDrainDuration parameter. This configuration parameter controls the amount of time that the underlying envoy proxy drains inflight connections before fully terminating.

To take advantage of the terminationDrainDuration parameter, you will need to configure it in both the container sidecars, and the TSB gateways.

Configuring terminationDrainDuration time for istio-proxy containers

You will need to apply an overlay on our control plane configuration to set terminationDrainDuration. Consider the following example. Note that only applicable parts are shown -- you will most likely need a lot more configuration for your control plane.

apiVersion: install.tetrate.io/v1alpha1
kind: ControlPlane
metadata:
name: controlplane
components:
istio:
kubeSpec:
overlays:
- apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
name: tsb-istiocontrolplane
patches:
- path: spec.meshConfig.defaultConfig.terminationDrainDuration
value: 50s
# ... <snip> ...

After adding the overlay to your configuration, use the kubectl command to apply it to the controlplane custom resource:

kubectl apply -f controlplane.yaml

Verifying the terminationDrainDuration

You must restart of the workload with the istio-proxy to get the terminationDrainDuration in effect. Once you have restarted your workload, you can verify it by checking the config dump of the for envoy:

kubectl exec helloworld-v1-59fdd6b476-pjrtr -n helloworld -c istio-proxy -- pilot-agent request GET config_dump |grep -i terminationDrainDuration
"terminationDrainDuration": "50s",

Configuring terminationDrainDuration for TSB gateways

If you are using TSB gateways such as IngressGateway, EgressGateway, or Tier1Gateway, you will need to configure your appropriate gateway type using the connectionDrainDuration parameter.

You can query the current value for the connectionDrainDuration field on your gateway custom resource by issuing the following command:

kubectl get ingress helloworld-gateway  -n helloworld -oyaml | grep connectionDrainDuration:
connectionDrainDuration: 22s

The following example shows how connectionDrainDuration may be set. Please read the spec for further information on the this field.

apiVersion: install.tetrate.io/v1alpha1
kind: IngressGateway
metadata:
name: helloworld-gateway
spec:
connectionDrainDuration: 10s
# ... <snip> ...

Verifying the terminationDrainDuration in the TSB Gateway

To check the value for terminationDrainDuration that is being set on the pod, you can query the environment variable:

kubectl describe po helloworld-gateway-7d5d4c8d57-msfd6 -n helloworld | grep -i DRAIN
TERMINATION_DRAIN_DURATION_SECONDS: 22

You can also verify this value by looking at the logs for the gateway pod when you terminate the gateway. If you watch the logs as the gateway pod is terminated, you should see messages resembling the following:

2022-03-29T06:02:50.423789Z     info    Graceful termination period is 22s, starting...
2022-03-29T06:03:12.423988Z info Graceful termination period complete, terminating remaining proxies.