fix(access-control-service): include port in computing unit pod URI and use Envoy Gateway for distributed CUs#5629
fix(access-control-service): include port in computing unit pod URI and use Envoy Gateway for distributed CUs#5629aicam wants to merge 6 commits into
Conversation
`KubernetesClient.generatePodURI` builds the in-cluster address that is stored as the computing unit's `uri` (via `setUri` in `ComputingUnitManagingResource` and returned to clients as `nodeAddresses`). The pod's container listens on `KubernetesConfig.computeUnitPortNumber` (set via `withContainerPort`), but the generated URI omitted the port, so the stored address pointed at the default port instead of the one the computing unit actually serves on. Append the configured port so the persisted URI is directly connectable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5629 +/- ##
=========================================
Coverage 52.95% 52.95%
Complexity 2627 2627
=========================================
Files 1090 1090
Lines 42210 42217 +7
Branches 4534 4538 +4
=========================================
+ Hits 22353 22357 +4
Misses 18546 18546
- Partials 1311 1314 +3
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The access-control service rebuilt the computing unit's in-cluster address from `KubernetesConfig` on every authorization request, which duplicates the address-construction logic already in `KubernetesClient.generatePodURI` and can drift from it (e.g. service name vs. pool-name conventions). Read the URI persisted for the unit (written by the managing service when the pod is created) and route to it directly, so the routing target comes from a single source of truth. Fall back to the previously constructed in-cluster address when no URI has been recorded for the unit yet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bobbai00
left a comment
There was a problem hiding this comment.
Changes LGTM. Please include a diagram of frontend able to connect to CUs at different places through the envoy gateway
Per review, drop the in-cluster address fallback in the access-control service. A computing unit is routed only to the URI recorded for it; if no URI has been recorded the unit is not routable, so the authorization request is refused (403) instead of falling back to a reconstructed in-cluster address. Also drops the now-unused KubernetesConfig import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@aicam please add test cases to raise the test coverage of your changes |
Add coverage for the dynamic routing logic in AccessControlResource: record a URI on the existing test computing unit and assert the rewritten Host header carries it, and add two computing units (no URI, blank URI) that the user can access but which are refused with FORBIDDEN. This also fixes the existing OK-path tests, which previously failed under the refuse-when-no-URI behavior because the test unit had no recorded URI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Added test coverage for the dynamic routing logic in
This also fixes the existing OK-path tests, which were failing under the new refuse-when-no-URI behavior because the test unit had no recorded URI. All 12 tests in the spec pass locally. |
# Conflicts: # access-control-service/src/main/scala/org/apache/texera/service/resource/AccessControlResource.scala
What changes were proposed in this PR?
Make the in-cluster address of a computing unit come from a single source of truth — the URI recorded when its pod is created — and ensure that URI is complete (includes the port). This lets the gateway route a user to a computing unit located anywhere it can reach (in the local cluster, another cluster, or an external host), instead of being limited to a reconstructed in-cluster address. See #5630.
Two related changes:
1. Include the port in the generated pod URI (
computing-unit-managing-service)KubernetesClient.generatePodURIbuilds the address stored as the computing unit'suri(viasetUriinComputingUnitManagingResource) and returned to clients asnodeAddresses. The pod's container listens onKubernetesConfig.computeUnitPortNumber(declared withwithContainerPort(...)in the same file), but the generated URI omitted the port, so the persisted address was not directly connectable. The port is now appended:2. Route using the recorded URI (
access-control-service)AccessControlResourcerebuilt the computing unit's address fromKubernetesConfigon every authorization request, duplicating the construction logic ingeneratePodURIand pinning every CU to the local cluster. It now reads the URI recorded for the unit and returns it as theHostfor the gateway to route to. If no URI has been recorded, the unit is not routable and the request is refused with403(no in-cluster fallback, per review).Routing flow
The access-control service is the gateway's external authorizer; the
Hostit returns is the upstream Envoy forwards the (upgraded) connection to. Because that host comes from the unit's recorded URI, the same gateway can reach computing units in different locations:flowchart LR FE["Frontend<br/>(/wsapi?cuid=N)"] --> GW["Envoy Gateway"] GW -. "ext-auth: authorize + get Host" .-> ACS["access-control-service"] ACS -- "read recorded uri for CU N" --> DB[("workflow_computing_unit")] ACS -- "Host = recorded uri<br/>(or 403 if none)" --> GW GW == "dynamic forward proxy<br/>to returned Host" ==> R{Where the CU lives} R --> CU1["In-cluster CU pod<br/>computing-unit-N...svc.cluster.local:port"] R --> CU2["CU in another cluster"] R --> CU3["External / remote CU host:port"]Any related issues, documentation, discussions?
access-control-serviceto authorize the requests to/wsapiandComputing Unitendpoints #3598 (access-control-service as the ext-auth service for computing-unit traffic).How was this PR tested?
On live deployment.

Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)