Feature Summary
Since the cluster networking was unified under a single Envoy Gateway (#4191) and the access-control-service was added as the external-authorization (ext-auth) service for computing-unit traffic (#3598), the access-control-service is the single component that decides which upstream a user's computing-unit request is routed to. On every authorized request it returns a Host header that the gateway uses as the routing target, and Envoy forwards the upgraded connection there.
Today the service does not route to a URI it was given — it reconstructs the target from KubernetesConfig (pool name, namespace, port) using the in-cluster Kubernetes DNS convention:
computing-unit-<cuid>.<pool>-svc.<namespace>.svc.cluster.local:<port>
This hard-wires every computing unit to the local cluster under a fixed naming convention, and duplicates the address-construction logic that already lives in KubernetesClient.generatePodURI. It makes it impossible to distribute computing units to arbitrary locations — a CU running on a remote node, in a different cluster, or on a host outside the cluster.
To enable CU distribution, the access-control-service should accept and route to any URI recorded for a computing unit, instead of assuming the in-cluster address.
Proposed Solution or Design
Why this is possible with Envoy Gateway. Envoy Gateway's ext-auth SecurityPolicy lets an external service authorize each request and contribute headers; in Texera's setup the access-control-service also supplies the upstream Host. Envoy's dynamic forward proxy — a Backend of type DynamicResolver — then resolves and forwards to an arbitrary host:port (FQDN or IP) determined at request time from that header. In other words, the routing target is whatever the access-control-service returns; it does not have to be an in-cluster *.svc.cluster.local address. (Refs: Envoy Gateway External Authorization and Backend Routing / Dynamic Resolver.)
Proposed change.
- In
AccessControlResource, resolve the routing target from the URI persisted for the computing unit (the workflow_computing_unit row, written by the managing service via KubernetesClient.generatePodURI) — a single source of truth — instead of reconstructing it from KubernetesConfig.
- Keep the previously constructed in-cluster address as a fallback for units that do not yet have a recorded URI, so existing behavior is preserved.
- Make the recorded URI complete by including the port in
generatePodURI (the pod's container listens on computeUnitPortNumber, but the stored URI omitted it), so the value the access-control-service routes to is directly connectable.
Operational notes / prerequisites (from the Envoy Gateway docs above): the DynamicResolver Backend is disabled by default and must be explicitly enabled with appropriate RBAC; loopback hosts (localhost, 127.0.0.1, ::1) are denied by default; and routing to out-of-cluster targets additionally requires the corresponding network egress to be allowed.
Affected Area
- Deployment / Infrastructure
Addressed by #5629.
By submitting this issue, you agree to follow the Apache Code of Conduct.
Feature Summary
Since the cluster networking was unified under a single Envoy Gateway (#4191) and the access-control-service was added as the external-authorization (ext-auth) service for computing-unit traffic (#3598), the access-control-service is the single component that decides which upstream a user's computing-unit request is routed to. On every authorized request it returns a
Hostheader that the gateway uses as the routing target, and Envoy forwards the upgraded connection there.Today the service does not route to a URI it was given — it reconstructs the target from
KubernetesConfig(pool name, namespace, port) using the in-cluster Kubernetes DNS convention:This hard-wires every computing unit to the local cluster under a fixed naming convention, and duplicates the address-construction logic that already lives in
KubernetesClient.generatePodURI. It makes it impossible to distribute computing units to arbitrary locations — a CU running on a remote node, in a different cluster, or on a host outside the cluster.To enable CU distribution, the access-control-service should accept and route to any URI recorded for a computing unit, instead of assuming the in-cluster address.
Proposed Solution or Design
Why this is possible with Envoy Gateway. Envoy Gateway's ext-auth
SecurityPolicylets an external service authorize each request and contribute headers; in Texera's setup the access-control-service also supplies the upstreamHost. Envoy's dynamic forward proxy — aBackendof typeDynamicResolver— then resolves and forwards to an arbitraryhost:port(FQDN or IP) determined at request time from that header. In other words, the routing target is whatever the access-control-service returns; it does not have to be an in-cluster*.svc.cluster.localaddress. (Refs: Envoy Gateway External Authorization and Backend Routing / Dynamic Resolver.)Proposed change.
AccessControlResource, resolve the routing target from the URI persisted for the computing unit (theworkflow_computing_unitrow, written by the managing service viaKubernetesClient.generatePodURI) — a single source of truth — instead of reconstructing it fromKubernetesConfig.generatePodURI(the pod's container listens oncomputeUnitPortNumber, but the stored URI omitted it), so the value the access-control-service routes to is directly connectable.Operational notes / prerequisites (from the Envoy Gateway docs above): the
DynamicResolverBackend is disabled by default and must be explicitly enabled with appropriate RBAC; loopback hosts (localhost,127.0.0.1,::1) are denied by default; and routing to out-of-cluster targets additionally requires the corresponding network egress to be allowed.Affected Area
Addressed by #5629.
By submitting this issue, you agree to follow the Apache Code of Conduct.