Testbench Reference
This reference covers the Helm chart values and the five TestWorkflowTemplate CRDs that the chart installs into the testkube namespace.
Helm chart values
| Field | Type | Default | Description |
|---|---|---|---|
|
boolean |
|
When |
|
string |
|
Namespace into which the chart resources are deployed. |
|
string |
|
Container image used by all pipeline phase containers. |
|
string |
|
Image tag override. Empty string uses the chart’s |
|
string |
|
Kubernetes image pull policy. |
|
string |
|
Name of the ConfigMap containing the Grafana dashboard JSON. |
|
string |
|
Namespace where the Grafana dashboard ConfigMap is created. Must be watched by your Grafana sidecar. |
TestWorkflowTemplates
The chart installs five TestWorkflowTemplate CRDs. Workflows reference them via the use list. All templates share a common /app/data volume provided by the calling TestWorkflow.
setup-template
Downloads a dataset from an S3-compatible store and writes it to the shared volume as an Experiment JSON file.
| Config parameter | Type | Description |
|---|---|---|
|
string |
S3/MinIO bucket name containing the dataset. |
|
string |
S3/MinIO object key (path to dataset file). Supported formats: |
Input: S3/MinIO object referenced by bucket / key.
Output: data/datasets/experiment.json — an Experiment model serialized as JSON.
Use this template when your dataset lives in object storage. For ConfigMap-based datasets, omit this template and mount the file directly via spec.content.files.
run-template
Sends every step in the experiment to the target agent via the A2A protocol and records responses.
| Config parameter | Type | Description |
|---|---|---|
|
string |
HTTP URL of the agent’s A2A endpoint (e.g. |
Input: data/datasets/experiment.json.
Output: data/experiments/executed_experiment.json — an ExecutedExperiment model.
Implicit inputs: workflow.name (injected automatically by Testkube).
evaluate-template
Scores agent responses using LLM-as-a-judge metrics via the configured metrics framework (RAGAS by default).
| Config parameter | Type | Description |
|---|---|---|
|
string |
Base URL for the OpenAI-compatible API used by the LLM judge (e.g. the in-cluster AI Gateway). Defaults to empty string, which uses the |
Input: data/experiments/executed_experiment.json.
Output: data/experiments/evaluated_experiment.json — an EvaluatedExperiment model.
publish-template
Publishes per-step evaluation scores to the OTLP endpoint configured via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable.
Config parameters: none.
Input: data/experiments/evaluated_experiment.json plus OTEL_EXPORTER_OTLP_ENDPOINT environment variable.
Output: Gauge metrics emitted to the OTLP collector. Each metric is labeled with workflow_name, scenario, and step attributes.
Implicit inputs: workflow.name, execution.id, execution.number (injected automatically by Testkube).
visualize-template
Generates a self-contained HTML evaluation report and saves it as a workflow artifact.
Config parameters: none.
Input: data/experiments/evaluated_experiment.json.
Output: data/results/evaluation_report.html — a single-file HTML dashboard with Chart.js visualizations (summary cards, score bar charts, metric distribution histograms, sortable results table).
Implicit inputs: workflow.name, execution.id, execution.number (injected automatically by Testkube).
OTLP metrics contract
The publish-template exports OpenTelemetry gauge metrics over HTTP/protobuf to port 4318. Each evaluation step produces one gauge observation per metric:
| Attribute | Value |
|---|---|
Metric name |
The |
Gauge value |
Float in |
|
The Testkube workflow name. |
|
The scenario name from the experiment. |
|
Zero-based step index within the scenario. |
The OTLP endpoint is read from the OTEL_EXPORTER_OTLP_ENDPOINT environment variable at runtime. Inject it via a ConfigMap or a container.env entry in the calling TestWorkflow.