Cloudflare tunnel
Cloudflare Tunnel (cloudflared) plan — Traefik external ingress
Goal: eliminate intermittent Cloudflare 523 Origin Unreachable events by removing inbound WAN reachability from the equation. We’ll do this by running a Cloudflare Tunnel (cloudflared) inside Kubernetes and routing public traffic through it to traefik-external.
This document is written as a step-by-step plan so it can be copied into Cursor Plan Mode later.
High-level decisions (what we’re optimizing for)
- Single URL per app: keep
https://<app>.haynesnetwork.comas the only URL (no “-internal” duplicates). - Phase 1 (simpler): no split DNS — even LAN clients will go out to Cloudflare and come back through the tunnel. This makes validation straightforward.
- Phase 2: add monitoring so we can safely introduce split DNS later.
- Phase 3: split DNS — LAN clients resolve the same public hostnames to
traefik-externaldirectly (bypassing Cloudflare), while WAN continues through the tunnel.
Background / current state (relevant to the 523 issue)
- External services use
IngressRouteobjects with: kubernetes.io/ingress.class: traefik-externalexternal-dns.alpha.kubernetes.io/target: haynesnetwork.comexternal-dns-cloudflareis configured withsources: ["crd", "ingress", "traefik-proxy"]and--cloudflare-proxied.- This causes records like
authentik.haynesnetwork.comto be published asCNAME haynesnetwork.com. - When Cloudflare returns
523, it means the selected Cloudflare POP couldn’t reach the origin (today: your WAN IP path).
With a tunnel, Cloudflare does not need to initiate a connection to your WAN IP. Cloudflare connects to a tunnel that your cloudflared pods keep open outbound.
Architecture overview
Without split DNS (Phase 1)
- Client (LAN or WAN) → Cloudflare → Cloudflare Tunnel →
cloudflared(in-cluster) →traefik-external→ app Service
With split DNS (Phase 3)
- WAN client → Cloudflare → tunnel →
traefik-external→ app Service - LAN client → LAN DNS override →
traefik-externaldirectly → app Service
Same hostname, different DNS answer depending on where you are.
References (read-up)
Cloudflare:
- Cloudflare Tunnel (concepts + setup): https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
- cloudflared configuration (config.yaml, ingress rules): https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/configure-tunnels/local-management/configuration-file/
- Public hostname routing for tunnels: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/routing-to-tunnel/
Traefik:
- Forwarded headers / real client IP concepts (Traefik proxy behavior): https://doc.traefik.io/traefik/routing/entrypoints/#forwarded-headers
GitOps examples in this workspace:
- onedr0p-home-ops Cloudflare tunnel app:
- kubernetes/apps/network/cloudflare-tunnel/app/helmrelease.yaml
- kubernetes/apps/network/cloudflare-tunnel/app/externalsecret.yaml
- kubernetes/apps/network/cloudflare-tunnel/app/dnsendpoint.yaml
- bjw-s-labs-home-ops cloudflared app:
- kubernetes/apps/network/cloudflared/app/helmrelease.yaml
- kubernetes/apps/network/cloudflared/app/externalsecret.yaml
- kubernetes/apps/network/cloudflared/app/dnsEndpoint.yaml
Phase 0 — prerequisites (mostly manual / external systems)
0.1 Cloudflare prerequisites (manual)
You need:
- A Cloudflare Zone for haynesnetwork.com
- Cloudflare Zero Trust enabled (for Tunnel management)
- A Tunnel created (name it something like haynes-ops-main)
Outputs we will use in Git: - Tunnel ID (UUID) - Tunnel secret (base64-ish secret used to build the token) - Account tag (Cloudflare account identifier)
Notes:
- The Kubernetes deployment can authenticate either via a “tunnel token” or via credentials file. The example repos build the TUNNEL_TOKEN JSON and base64-encode it.
0.2 Secret management prerequisites (manual)
This repo uses External Secrets. We will store tunnel values in your secret backend (e.g., 1Password item):
- cloudflare_tunnel_account_tag
- cloudflare_tunnel_id
- cloudflare_tunnel_secret
An agent cannot create or read these secrets safely without your secret store access.
0.3 Certificate issuance prerequisites (important)
If you plan to remove inbound port-forwards (80/443) as part of the tunnel migration, certificate issuance/renewal must not depend on inbound HTTP reachability (ACME HTTP-01).
- This cluster already satisfies that requirement:
cert-manageruses ACME DNS-01 via Cloudflare for Let’s Encrypt. - Warning: if you later change
cert-managerIssuers to HTTP-01 (or otherwise remove/break Cloudflare DNS-01), it can silently become a dependency again and renewals may fail once ports are closed.
Phase 1 — deploy the tunnel in-cluster (no split DNS yet)
1.1 Add a cloudflared app to the network namespace (GitOps)
Pattern to follow: the onedr0p-home-ops cloudflare-tunnel app.
Create a new app under kubernetes/main/apps/network/ (proposed structure):
- kubernetes/main/apps/network/cloudflare-tunnel/ks.yaml
- kubernetes/main/apps/network/cloudflare-tunnel/app/helmrelease.yaml
- kubernetes/main/apps/network/cloudflare-tunnel/app/externalsecret.yaml
- kubernetes/main/apps/network/cloudflare-tunnel/app/dnsendpoint.yaml (Phase 1.3)
- optionally GrafanaDashboard + ServiceMonitor like the examples
Implementation details (recommended defaults):
- Replicas: 2 (avoid single-pod tunnel outages)
- Health checks: /ready on metrics port (TUNNEL_METRICS)
- Metrics: enable scrape via ServiceMonitor
- Ingress rules:
- Start with a wildcard for your external domain:
- hostname: "*.haynesnetwork.com"
- service: http://traefik-external.network.svc.cluster.local:80
- Default: http_status:404
Why HTTP to Traefik initially:
- Avoids TLS/SNI complexity between cloudflared and Traefik during early validation.
- Still end-to-end encrypted between client↔Cloudflare↔tunnel; the only plaintext hop is inside the cluster.
1.2 Reconcile and verify cloudflared is healthy (manual commands)
An agent can’t run kubectl from this environment; you’ll run these and paste results if needed:
- Pods ready:
kubectl -n network get pods -l app.kubernetes.io/name=cloudflare-tunnel- Logs show “connected” / no auth errors:
kubectl -n network logs -l app.kubernetes.io/name=cloudflare-tunnel --tail=200 -f- Metrics endpoint responds (optional):
kubectl -n network port-forward svc/cloudflare-tunnel 8080:8080- then browse
http://localhost:8080/ready
Success criteria: - Two pods Running/Ready - No repeated reconnect/auth failures
Rollback: - revert the Git commits that add the app, reconcile Flux
1.3 Create a stable DNS target name for external-dns (GitOps)
Today, many IngressRoutes publish CNAME haynesnetwork.com because of:
external-dns.alpha.kubernetes.io/target: haynesnetwork.com
With a tunnel, we want external routes to publish:
- CNAME ingress-ext.haynesnetwork.com (example name)
Then we create one record:
- ingress-ext.haynesnetwork.com → ${TUNNEL_ID}.cfargotunnel.com (CNAME)
This is exactly what the example repos do with a DNSEndpoint CR:
- onedr0p-home-ops: external.turbo.ac → ${CLOUDFLARE_TUNNEL_ID}.cfargotunnel.com
- bjw-s-labs-home-ops: ingress-ext.bjw-s.dev → <tunnel_id>.cfargotunnel.com
In this repo:
- Add a DNSEndpoint in network namespace (or wherever your external-dns-cloudflare watches CRDs) that creates:
- dnsName: ingress-ext.haynesnetwork.com
- recordType: CNAME
- targets: ["<tunnel_id>.cfargotunnel.com"]
Success criteria:
- Cloudflare DNS shows the CNAME for ingress-ext.haynesnetwork.com
1.4 Update external IngressRoute targets to point at ingress-ext
For each external IngressRoute currently using:
external-dns.alpha.kubernetes.io/target: haynesnetwork.com
Change to:
external-dns.alpha.kubernetes.io/target: ingress-ext.haynesnetwork.com
Known locations (non-exhaustive; expand as you implement):
- kubernetes/main/apps/network/authentik/app/ingressroute.yaml
- kubernetes/main/apps/photos/immich/server/ingressroute.yaml
- kubernetes/main/apps/office/paperless-ngx/app/ingressroute.yaml
- kubernetes/main/apps/network/traefik/config/ingress-routes/traefik-external/*
Success criteria:
- External-DNS records for apps become CNAME ingress-ext.haynesnetwork.com (not haynesnetwork.com)
Rollback: - revert the annotation changes (records will revert)
1.5 “No split DNS” validation (manual)
From LAN:
- confirm https://<app>.haynesnetwork.com works
- optionally verify Cloudflare headers present (means you went through Cloudflare)
From WAN/cellular: - confirm the same URLs work
Optional: temporarily remove/disable your router port forwards for 80/443 (after you are confident).
Phase 2 — Monitoring strategy for tunnel-first + split DNS later
We want to detect two classes of failure:
1) Public path down (Cloudflare/tunnel/external) 2) LAN path down while public is up (or vice-versa) once split DNS is introduced
2.1 Gatus checks: Public (Cloudflare/tunnel) path
Add (or keep) endpoints like:
- url: https://authentik.haynesnetwork.com/
These validate: - Cloudflare edge - tunnel connectivity - Traefik routing - app health (at least HTTP-level)
2.2 Gatus checks: “LAN direct” path (even before split DNS)
Because Gatus runs inside the cluster, it won’t automatically use your LAN DNS overrides later. So for “LAN path” checks, don’t rely on DNS — test the internal routing path directly.
Recommended approach:
- Send HTTP to the in-cluster Traefik service
- Set the Host header to the public hostname so Traefik routes the same as external
Example pattern (conceptual):
- url: http://traefik-external.network.svc.cluster.local/
- headers: { Host: authentik.haynesnetwork.com }
- condition: status code is 200/302/etc depending on the app
This validates: - Traefik is reachable - Traefik is routing that hostname correctly - Upstream service is reachable
It does not validate: - Cloudflare edge behavior - tunnel connectivity
2.3 DNS monitoring for split DNS (future)
Once you introduce split DNS via Unifi:
- Add Gatus DNS endpoints that query Unifi DNS and assert:
- authentik.haynesnetwork.com resolves to 192.168.40.206 (Traefik external VIP)
- Add Gatus DNS endpoints that query a public resolver (e.g. 1.1.1.1) and assert:
- authentik.haynesnetwork.com resolves to the tunnel target (CNAME chain to cfargotunnel.com)
This gives quick visibility: - “Public DNS correct?” - “LAN DNS override correct?”
Agent constraint:
- configuring Unifi DNS is outside cluster GitOps unless you manage it via external-dns-unifi (see Phase 3).
Phase 3 — Introduce split DNS (after tunnel is proven stable)
3.1 Decide how to manage Unifi DNS overrides
Options:
- Option A (manual): create Unifi DNS host overrides for selected external hostnames
- Option B (GitOps): extend/adjust external-dns-unifi so it can manage haynesnetwork.com LAN overrides
- currently it is scoped to haynesops.com and excludes haynesnetwork.com
- changing this requires care to avoid record ownership collisions
3.2 Implement split DNS (manual or GitOps)
For each external hostname you want “LAN direct”:
- set Unifi DNS A record to 192.168.40.206 (Traefik external VIP)
3.3 Verify split DNS with the Phase 2 monitoring in place
Success criteria: - Public Gatus checks remain green - LAN-direct (Traefik service + Host header) checks remain green - DNS checks show: - public: tunnel route - LAN: VIP route
Rollback: - remove Unifi overrides (LAN returns to public DNS/tunnel)
Implementation notes / gotchas (Traefik + Cloudflare specifics)
- Client IPs:
- With tunnel, Traefik will see the immediate client as
cloudflared. - Apps should rely on
CF-Connecting-IP/X-Forwarded-Forfor real client IP. -
You may want to configure Traefik forwarded header trust appropriately (don’t blindly trust all in-cluster sources).
-
TLS between cloudflared → Traefik:
- Start with HTTP to Traefik to validate the tunnel quickly.
-
Later, you can tighten to HTTPS origin if desired (requires SNI / cert alignment).
-
external-dns record shape:
- The easiest migration is changing the
external-dns .../targetannotation fromhaynesnetwork.comtoingress-ext.haynesnetwork.comso every app record repoints cleanly.
Work breakdown (copy/paste into Plan Mode)
Phase 1: tunnel deployed, no split DNS
- Create
cloudflare-tunnelapp manifests (HelmRelease + ExternalSecret + config.yaml) - Add
DNSEndpointforingress-ext.haynesnetwork.com→<tunnel_id>.cfargotunnel.com - Update external
IngressRouteannotations to targetingress-ext.haynesnetwork.com - Reconcile Flux and validate from LAN + WAN
- Optionally close router 80/443 port-forwards after confidence
Phase 2: monitoring foundation
- Add Gatus endpoints for public URLs (tunnel path)
- Add Gatus endpoints that hit Traefik service directly with
Host:header (LAN-direct path) - Add DNS assertions (public resolver vs Unifi resolver) once split DNS is introduced
Phase 3: split DNS
- Choose manual vs GitOps for Unifi DNS overrides
- Implement overrides for selected hostnames
- Validate with Gatus (public + lan-direct + DNS)