Skip to main content
Version: 1.4.x

HTTP 403 error code in the Management Plane

TSB 1.5 will introduced service account that can update cluster tokens automatically and this issue will not happen.

This article will be used to understand the errors HTTP403 coming from the FrontEnvoy and will help identify the root cause.

Prerequisites to understand the issue

The Management Plane manages user and TSB agent token authentication with IAM.

All the accesses to the MP are done via the Front Envoy, meaning that all onboarded clusters that communicate with the Management Plane do this through Front-Envoy, and they authenticate using a JWT token.

These tokens are forward from the front-envoy to the IAM, which will be handling the authentication of these tokens; if these tokens are expired, they will be rejected by IAM and shown a HTTP 403 in the front-envoy.

This traffic can be described as following:

Detecting the issue and getting the logs

If when reviewing the front Envoy logs, you see logs like this coming from the Front Envoy:

...ext_authz filter rejected the request. Response status code: '403'
...Sending local reply with details ext_authz_denied

Means that there is an issue with an external authorization in Envoy; With these logs, and knowing that IAM manages TSB agent token authentication, you can proceed and check IAM logs

For this, first change the logs level to Debug as stated in this page to get more visibility

With this done you can check the IAM logs, and if you see messages like this

debug   jwt validating JWT token (iss="https://demo.tetrate.io"): `<JWTToken>`
debug auth authentication with provider jwt.Provider took: 171.289µs
debug auth authentication failed: authentication failed: failed to verify jws signature: failed to verify message: crypto/rsa: verification error
debug auth provider ldap.Provider does not support credentials credentials.JWT. skipping
debug auth attempting authentication with provider: ldap.Provider
debug auth authentication with provider ldap.Provider took: 924ns
debug auth authentication failed: unsupported credentials: credentials.JWT
debug envoy-filter deny: INTERNAL(exhausted authentication providers)

You will need to verify which is the token that is expired.

Getting the faulty token

Here you have two options,

Using jwtinfo.sh

  • Move the IAM logs to a file, for this you can do run the following command:

  • kubectl logs <IAM-pod> -n tsb > iam.log

  • With the logs saved, run the following command:

  • for t in $(grep -i "failed to verify jws" iam.logs -B2 | grep "JWT token" | awk -F"): " '{print $2}'); do ./jwtinfo.sh $t | jq -r .sub ; done | tee faultytokens.txt

This will generate a new file, called faultytokens.txt in which you will have the name of the cluster and which component is the token failing, the format of this is going to be <component>-<cluster>, and you will end having something like this:

oap-agent-<cluster-name>
otel-collector-<cluster-name>

Without the script

The other options if you don't want to download the script, is to create a file with the logs as above, and do the following commands:

  • grep -i "failed to verify jws" iam.log -B2 | grep "JWT token" | awk -F": " '{print $2}' > faultytokens.txt

  • cat faultytokens.txt | sort | uniq

This will return the tokens encoded, with the output you can use for example this page https://www.base64decode.org/ to decode it or use base64 to decode them, and you will need to find for the sub section, where you will see the <component>-<cluster> information.

Let's say that, now that you know which token expired, you want to check the WHEN, for that, run the following command:

kubectl -n istio-system get secret oap-token -o json | jq -r .data.token | base64 -d

And paste the output in this page https://jwt.io/ this will return something like

{
"exp": 1659692930,
"iat": 1659689330,
"iss": "https://demo.tetrate.io",
"jti": "cbeaf391-2949-4e4b-a6fc-b716eb05a698",
"sub": "oap-agent-demo",
"tokentype": "bearer"
}

In that output, you will have the expiration date as exp and the component as sub.

If for security reasons you don't want to upload the decode token to a webpage, you can just move the output of the previous command

kubectl -n istio-system get secret oap-token -o json | jq -r .data.token | base64 -d

To a variable in your console, and run the next command:

jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "$JWT"

This will return the same output.

Renewing the faulty tokens

Once the tokens with issues are identified, you can proceed and renew them, for this you can run the following command:

tctl install manifest control-plane-secrets --cluster <cluster name>

This command will generate an output with all the tokens used on the controlplane; Search for the ones failing, for this particular example, you need to copy the ones assigned for OAP and Otel Collector.

Copy those tokens into a new yaml file, and apply them with a kubectl apply, and restart the pods.