Skip to content

feat(access-control-service): route computing units by their recorded URI to enable out-of-cluster CU distribution #5630

@aicam

Description

@aicam

Feature Summary

Since the cluster networking was unified under a single Envoy Gateway (#4191) and the access-control-service was added as the external-authorization (ext-auth) service for computing-unit traffic (#3598), the access-control-service is the single component that decides which upstream a user's computing-unit request is routed to. On every authorized request it returns a Host header that the gateway uses as the routing target, and Envoy forwards the upgraded connection there.

Today the service does not route to a URI it was given — it reconstructs the target from KubernetesConfig (pool name, namespace, port) using the in-cluster Kubernetes DNS convention:

computing-unit-<cuid>.<pool>-svc.<namespace>.svc.cluster.local:<port>

This hard-wires every computing unit to the local cluster under a fixed naming convention, and duplicates the address-construction logic that already lives in KubernetesClient.generatePodURI. It makes it impossible to distribute computing units to arbitrary locations — a CU running on a remote node, in a different cluster, or on a host outside the cluster.

To enable CU distribution, the access-control-service should accept and route to any URI recorded for a computing unit, instead of assuming the in-cluster address.

Proposed Solution or Design

Why this is possible with Envoy Gateway. Envoy Gateway's ext-auth SecurityPolicy lets an external service authorize each request and contribute headers; in Texera's setup the access-control-service also supplies the upstream Host. Envoy's dynamic forward proxy — a Backend of type DynamicResolver — then resolves and forwards to an arbitrary host:port (FQDN or IP) determined at request time from that header. In other words, the routing target is whatever the access-control-service returns; it does not have to be an in-cluster *.svc.cluster.local address. (Refs: Envoy Gateway External Authorization and Backend Routing / Dynamic Resolver.)

Proposed change.

  1. In AccessControlResource, resolve the routing target from the URI persisted for the computing unit (the workflow_computing_unit row, written by the managing service via KubernetesClient.generatePodURI) — a single source of truth — instead of reconstructing it from KubernetesConfig.
  2. Keep the previously constructed in-cluster address as a fallback for units that do not yet have a recorded URI, so existing behavior is preserved.
  3. Make the recorded URI complete by including the port in generatePodURI (the pod's container listens on computeUnitPortNumber, but the stored URI omitted it), so the value the access-control-service routes to is directly connectable.

Operational notes / prerequisites (from the Envoy Gateway docs above): the DynamicResolver Backend is disabled by default and must be explicitly enabled with appropriate RBAC; loopback hosts (localhost, 127.0.0.1, ::1) are denied by default; and routing to out-of-cluster targets additionally requires the corresponding network egress to be allowed.

Affected Area

  • Deployment / Infrastructure

Addressed by #5629.

By submitting this issue, you agree to follow the Apache Code of Conduct.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions