Skip to content
Chavc0de

Monitoring Claude usage using Envoy AI gateway

May 4, 2026 — AI, Infra, Claude, Envoy

Introduction

Observability used to be mostly about system health—CPU, latency, and error rates. In the age of LLMs, that’s only part of the story. Understanding usage, tracking costs, and evaluating output quality have become just as important as monitoring infrastructure performance.

The gap is not visibility — it is meaning.

In a conventional microservices system:

With LLM systems:

So you end up with dashboards that look healthy, while:

This is the core limitation of traditional observability in AI systems.


AI Gateway

An AI Gateway is a proxy layer that sits between applications and LLM providers (such as AWS Bedrock, OpenAI, Anthropic, or self-hosted models).

Instead of treating LLMs as external APIs, it treats them as managed upstream services.

In this setup, the gateway becomes responsible for:


Why Envoy AI Gateway?

The decision to use Envoy AI Gateway was not driven by feature overlap with existing tools, but by where it operates in the stack.

It provides a network-level control and observability layer for LLM traffic that traditional monitoring systems do not capture.

Key reasons:

1. Unified LLM traffic visibility

All LLM calls (from CLI tools and applications) are routed through a single proxy layer, making it possible to observe:

This removes fragmentation of LLM observability across tools.


2. Network-level observability (not application-level)

Unlike traditional dashboards that only see aggregated metrics, Envoy provides visibility at the request level:

This makes it possible to correlate:

“which model + which request pattern caused cost or latency changes”


3. Integration with Kubernetes-native observability stack

Since the system already uses:

Envoy AI Gateway integrates naturally into this pipeline:

This keeps the stack consistent and avoids introducing a parallel telemetry system.


4. Policy and routing control at infrastructure layer

Even without complex AI decision logic, Envoy enables:

This ensures that all clients behave consistently regardless of how they call the LLM.


5. Standardized LLM access layer for multiple clients

By exposing Envoy AI Gateway as the single endpoint:

all use the same interface.

This simplifies:


What this setup enables (observability outcome)

With Envoy AI Gateway + Prometheus + Grafana: Envoy AI Gateway architecture

Prerequisites

Step 1: Install envoy

Run these commands to verify your tool installations:

Verify kubectl installation:

Kubernetes Cluster should be 1.32 and higher for Envoy AI Gateway compatibility. You can create a cluster using EKS, GKE, AKS, or Docker Desktop.

Uninstalling any existing Envoy Gateway deployments before proceeding:

Terminal window
helm uninstall eg -n envoy-gateway-system
kubectl delete namespace envoy-gateway-system

Envoy AI Gateway is built on top of Envoy Gateway. Install it using Helm and wait for the deployment to be ready.

Terminal window
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
--version v1.7.2 \
--namespace envoy-gateway-system \
--create-namespace \
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
kubectl wait --timeout=2m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available

Installing Envoy AI Gateway

Step 1: Install AI Gateway CRDs First, install the CRD Helm chart (ai-gateway-crds-helm) which manages all Custom Resource Definitions:

Terminal window
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \
--version v0.0.5.0 \
--namespace envoy-ai-gateway-system \
--create-namespace

Step 2: Install AI Gateway Resources After the CRDs are installed, install the AI Gateway Helm chart:

Terminal window
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
--version v0.5.0 \
--namespace envoy-ai-gateway-system \
--create-namespace
kubectl wait --timeout=2m -n envoy-ai-gateway-system deployment/ai-gateway-controller --for=condition=Available

Verify Installation Check the status of the pods. All pods should be in the Running state with Ready status.

Check AI Gateway pods:

Terminal window
kubectl get pods -n envoy-ai-gateway-system

Step 3: Configure Envoy AI Gateway to Monitor Claude API Usage

Connect AWS Bedrock

Prerequisites:

We will use static credentials for this example, but in production, consider using AWS IAM roles for secure credential management.

  1. Download and configure the AWS example manifest:
Terminal window
curl -O https://raw.githubusercontent.com/envoyproxy/ai-gateway/refs/tags/v0.5.0/examples/basic/aws.yaml
  1. Edit “aws.yaml” and replace the credential placeholders:

AWS_ACCESS_KEY_ID: Your AWS access key ID

AWS_SECRET_ACCESS_KEY: Your AWS secret access key

Also can customize model for Model you need, here llama and Claude sonnet are used 3. Apply and test

Terminal window
kubectl apply -f aws.yaml
kubectl wait pods --timeout=2m \
-l gateway.envoyproxy.io/owning-gateway-name=envoy-ai-gateway-basic \
-n envoy-gateway-system --for=condition=Ready
  1. Test the gateway with a sample request:
Terminal window
export GATEWAY_URL=$(kubectl get gateway envoy-ai-gateway-basic -n default -o jsonpath='{.status.addresses[0].value}')
curl -H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
"messages": [
{
"role": "user",
"content": "What is capital of France?"
}
],
"max_tokens": 100
}' \
$GATEWAY_URL/anthropic/v1/messages

Expected output:

Terminal window
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"model": "claude-3-5-sonnet-20241022",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 13,
"output_tokens": 8
}
}

For more troubleshooting and advanced features, see Envoy AI Documentation (opens in a new window)

Step 4: Monitoring the Claude Usage

Envoy AI Gateway collects metrics and exports them to Prometheus using the OpenTelemetry format. This follows the OpenTelemetry Gen AI Semantic Conventions (opens in a new window).

Apply the monitoring configuration:

Terminal window
kubectl apply -f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/examples/monitoring/monitoring.yaml

Wait for the monitoring pods to start, then forward Prometheus locally:

Terminal window
kubectl port-forward -n monitoring svc/prometheus 9090:9090

Open “http://localhost:9090 (opens in a new window)” to access the Prometheus dashboard.

To view the Grafana dashboard

You can install Grafana in the cluster using Helm, or run it locally with Docker Desktop for simplicity.

“docker-compose.yaml”

grafana:
image: grafana/grafana-enterprise:latest
container_name: grafana
restart: unless-stopped
ports:
- '3000:3000'
volumes:
- grafana-storage:/var/lib/grafana
volumes:
grafana-storage:
Terminal window
docker compose up -d

Navigate to “http://localhost:3000 (opens in a new window)”.

Dashboard

Step 6: Setting Up Claude, Cline to use envoy ai gateway URL

Claude

We Can setup claude using following commands if using Claude cli

Terminal window
export GATEWAY_URL=$(kubectl get gateway envoy-ai-gateway-basic -n default -o jsonpath='{.status.addresses[0].value}')
export ANTHROPIC_BASE_URL=GATEWAY_URL/anthropic
export ANTHROPIC_API_KEY=""
claude --model claude-sonnet-3-5

Cline

Cline is used as VS Code extension, here are settings

API Provider: Anthropic

User Custom Base URL: http://host.internal/anthropic (opens in a new window) **Note: As its on Docker Desktop

Model: claude-sonnet-3-5 **Note: Choose your own Model

Now View your both cline and claude cli usage in single Grafana Dashboard

Ending Notes:

Comments

Join the conversation

Add a quick note for other readers. Comments are stored locally in your browser so you can keep track of your thoughts on this post.

No comments yet. Be the first to leave feedback.