---
title: TestingBot Tunnel Monitoring - Prometheus Metrics and Grafana
description: Monitor TestingBot Tunnel health with the built-in Prometheus metrics
  endpoint. Covers metric series, authentication, port configuration and a sample
  Grafana dashboard.
source_url:
  html: https://testingbot.com/support/tunnel/monitoring
  md: https://testingbot.com/support/tunnel/monitoring/index.md
---
# Watch your tunnel from Grafana

Production tunnels are easier to operate when you can see their health at a glance. TestingBot Tunnel ships with a built-in Prometheus-compatible metrics endpoint, so you can scrape it from any monitoring stack you already run.

- Prometheus 
- Grafana ready 
- Port 8003 

## Enable the metrics endpoint

The metrics endpoint is enabled by default on `http://localhost:8003`. Override the port with `--metrics-port`.

    java -jar testingbot-tunnel.jar --metrics-port 9100

If the tunnel host is reachable from the network (for example a shared CI runner), protect the endpoint with HTTP Basic Auth:

    java -jar testingbot-tunnel.jar --metrics-auth ops:s3cret

Export `TESTINGBOT_METRICS_AUTH` instead of passing credentials on the command line. See the [security guide](https://testingbot.com/support/tunnel/security#metrics).

## Available metrics

Tunnel-specific series are prefixed with `testingbot_`. The endpoint also exposes the standard JVM and process metrics from the Prometheus Java client. The full label set is always documented at the `/metrics` endpoint itself.

### Tunnel state

| Metric | Type | Meaning |
| --- | --- | --- |
| testingbot\_tunnel\_up | gauge | `1` when the tunnel is connected, `0` while reconnecting or down. The single most important alerting signal. |
| testingbot\_tunnel\_info | info | Static labels with the tunnel build (`version`, `id`, `name`). Use for dashboard headers and version filters. |
| testingbot\_tunnel\_uptime\_seconds | counter | Seconds since this tunnel process started. Drops on restart, useful for detecting flapping. |
| testingbot\_tunnel\_reconnects\_total | counter | Total tunnel reconnects since startup. Sustained increases indicate an unstable upstream link. |
| testingbot\_active\_connections | gauge | Number of in-flight client connections through the tunnel right now. |
| testingbot\_tunnel\_connect\_duration\_seconds | histogram | Time taken to establish the tunnel itself (cold-start latency). Buckets `_bucket`, `_sum`, `_count`. |

### HTTP traffic

| Metric | Type | Meaning |
| --- | --- | --- |
| testingbot\_http\_requests\_total | counter | HTTP requests proxied since startup. Labels: `method`, `code`. Use `rate()` for throughput, filter on `code=~"5.."` for errors. |
| testingbot\_http\_request\_duration\_seconds | histogram | End-to-end HTTP latency. Compute p50/p95/p99 with `histogram_quantile()`. |
| testingbot\_https\_connect\_total | counter | HTTPS CONNECT sessions established. Labels: `code`. |
| testingbot\_https\_connect\_duration\_seconds | histogram | CONNECT handshake latency, suitable for `histogram_quantile()`. |
| testingbot\_https\_connect\_errors\_total | counter | CONNECT errors. Label: `reason` (TLS, target unreachable, timeout, ...). |
| testingbot\_proxy\_bytes\_transferred\_total | counter | Total bytes proxied (both directions). Use `rate()` for throughput in B/s. |
| testingbot\_errors\_total | counter | Generic proxy errors. Label: `name` (the error class). Useful for alerting on burst increases. |

## JVM and process metrics

The endpoint also exposes the standard `jvm_*` and `process_*` series from the Prometheus Java client. These help you size the tunnel host, detect memory pressure and catch garbage-collection pauses.

| Metric | Type | Meaning |
| --- | --- | --- |
| jvm\_memory\_bytes\_used | gauge | Bytes of heap and non-heap memory in use. Label: `area` (`heap` or `nonheap`). |
| jvm\_memory\_bytes\_max | gauge | Maximum bytes available per memory area. Pair with `_used` to compute headroom. |
| jvm\_memory\_pool\_bytes\_used | gauge | Per-pool memory usage (Eden, Survivor, Old Gen, Metaspace, ...). Label: `pool`. |
| jvm\_gc\_collection\_seconds\_count | counter | Number of GC collections. Label: `gc` (collector name). |
| jvm\_gc\_collection\_seconds\_sum | counter | Total time spent in GC. Use `rate()` to see GC pressure over time. |
| jvm\_threads\_current | gauge | Live thread count. Watch for runaway growth. |
| jvm\_threads\_daemon | gauge | Daemon thread count. |
| jvm\_threads\_peak | gauge | Peak thread count since process start. |
| jvm\_classes\_loaded | gauge | Currently loaded classes. |
| jvm\_buffer\_pool\_used\_bytes | gauge | Direct and mapped NIO buffer usage. Label: `pool`. |
| process\_cpu\_seconds\_total | counter | Total CPU time consumed by the process. Use `rate()` for CPU utilisation. |
| process\_resident\_memory\_bytes | gauge | RSS memory of the tunnel process as reported by the OS. |
| process\_open\_fds | gauge | Open file descriptors. Compare with `process_max_fds` to spot leaks. |
| process\_start\_time\_seconds | gauge | Unix epoch when the process started. Useful for restart detection. |

All names above are the canonical Prometheus Java client names. Open `http://localhost:8003/metrics` to see the full live list and their `# HELP` / `# TYPE` annotations.

## Prometheus scrape config

Add a job to your `prometheus.yml` that targets the tunnel host.

    scrape_configs:
      - job_name: testingbot_tunnel
        static_configs:
          - targets: ["tunnel-host.example.com:8003"]
        basic_auth:
          username: ops
          password_file: /etc/prometheus/testingbot_metrics_password

## Grafana dashboard

A ready-made Grafana dashboard ships with the tunnel source. Import it once and you have a full Overview / HTTP / Tunnel-health view in seconds.

Download `testingbot-tunnel.json` from the [grafana-dashboard examples folder](https://github.com/testingbot/Testingbot-Tunnel/tree/master/examples/grafana-dashboard) on GitHub. In Grafana go to **Dashboards → New → Import** , paste the JSON or upload the file, pick your Prometheus data source, and click **Import**.

[![TestingBot Tunnel Grafana dashboard with Overview and HTTP panels](https://testingbot.com/assets/support/tunnel/grafana-dashboard-bf173520c136608330e3e59c0444883d41815ea3bd031cf063f3147f3ef65bf2.png) ](https://testingbot.com/assets/support/tunnel/grafana-dashboard-bf173520c136608330e3e59c0444883d41815ea3bd031cf063f3147f3ef65bf2.png)
_The bundled dashboard rendered against a live tunnel_

The bundled dashboard groups panels into two rows:

- **Overview:** Tunnel status (UP/DOWN), build version, active connections, uptime and reconnect counter.
- **HTTP:** request rate by status class, HTTPS CONNECT rate, p50/p95/p99 latency and response throughput.

Want a turnkey setup? The repo also includes a [docker-compose example](https://github.com/testingbot/Testingbot-Tunnel/tree/master/examples/docker-compose-prometheus-grafana) that spins up Prometheus + Grafana already wired to scrape a local tunnel and load the dashboard automatically. Useful for local development or a single-machine CI runner.

    git clone https://github.com/testingbot/Testingbot-Tunnel.git
    cd Testingbot-Tunnel/examples/docker-compose-prometheus-grafana
    docker compose up -d
    # Grafana on http://localhost:3000 · Prometheus on http://localhost:9090

If you prefer to build the dashboard yourself, here are the most useful PromQL queries:

Tunnel up

max(testingbot\_tunnel\_up)

Active connections

sum(testingbot\_active\_connections)

2xx request rate

rate(testingbot\_http\_requests\_total{code=~"2.."}[5m])

5xx error rate

rate(testingbot\_http\_requests\_total{code=~"5.."}[5m])

p95 HTTP latency

histogram\_quantile(0.95, sum by (le) (rate(testingbot\_http\_request\_duration\_seconds\_bucket[5m])))

Bytes throughput

rate(testingbot\_proxy\_bytes\_transferred\_total[5m])

Reconnects

sum(testingbot\_tunnel\_reconnects\_total)

JVM heap usage

sum by (area) (jvm\_memory\_bytes\_used)

Process CPU

rate(process\_cpu\_seconds\_total[5m])

Threads

jvm\_threads\_current

## Alert ideas

- **Tunnel restart:** fire when `testingbot_uptime_seconds` drops to under 60 seconds.
- **Error spike:** fire when the 5-minute error rate exceeds your normal baseline.
- **Stalled connections:** fire when `testingbot_connections` stays at zero during normal CI hours.
- **Endpoint unreachable:** fire on Prometheus `up == 0` for the tunnel target.

Was this page helpful? Yes No 

## Looking for More Help?

Have questions or need more information?   
 You can reach us via the following channels:

- [Email us](https://testingbot.com/contact/new)
- [Join our Slack Channel](https://join.slack.com/t/testingb0t/shared_invite/zt-3bcw9xch-jk19~6XPs_xBrsAgAedkCw)