Skip to content

Add unstable.sandbox#128

Open
scotttrinh wants to merge 34 commits into
refactor-internal-httpfrom
unstable/persistent-sandbox
Open

Add unstable.sandbox#128
scotttrinh wants to merge 34 commits into
refactor-internal-httpfrom
unstable/persistent-sandbox

Conversation

@scotttrinh

@scotttrinh scotttrinh commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Introduces a new unstable package at vercel.unstable (and internals at vercel._internal.unstable) that currently exposes the latest v2 persistent Sandbox.

Looks like this (excerpt from included example):

async def review_code(files: list[tuple[str, str]], review_agent: str) -> str:
    async with sandbox.create_sandbox(
        runtime="python3.13",
        execution_time_limit=timedelta(minutes=1),
    ) as box:
        await box.fs.mkdir("workspace")
        async with box.fs.batch() as batch:
            for path, content in files:
                batch.write_text(path, content)
            batch.write_text("workspace/review_agent.py", review_agent)

        await box.run_process(
            "python",
            ["workspace/review_agent.py", "workspace"],
            kill_after=timedelta(seconds=30),
            check=True,
        )
        return await box.fs.read_text("workspace/review.md")

scotttrinh added 13 commits May 27, 2026 15:49
This doesn't need to be public and we certainly don't need to make a
fake private method that calls this public method.
Slightly better factoring here with two low-level helper methods that
vary only in streaming vs non-streaming, and then a small JSON-specific
layer atop that for the endpoints that need it.
Also moves base URL ownership to the services since we might eventually
need to support a single service making requests to different origins
for common multi-origin flows like OIDC.
This system was a little bit over-designed. We don't really care about
avoiding use-after-free at the resource level, since that will error as
expected from the resource server end. There are also plenty of cases
where you'll hit these errors (like if a sandbox exceeds it's maximum
execution time limit) so instead of having local state and server-owned
state for whether some operation on a resource is valid, we let the
server own that.

Now that we don't have multiple tokens floating around, a simple boolean
on the session is enough to track the session lifetime part of this.
Simpler to reason about with fewer moving parts.
Prioritize live tests covering a lot of ground over small focused unit
tests that use mocks. There is still some coverage overlap between live
and example based tests, but I think that's acceptable for now. We can
switch to running example tests less frequently if it starts to be
expensive.
We have some complex rules that affect filtering and sorting, so instead
of presenting the raw API knobs, this exposes only the legal querying
knobs. This is going to be higher maintenance, but it seems worth it for
a really nice DX.
@vercel

vercel Bot commented May 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vercel-py Ready Ready Preview Jun 11, 2026 2:54pm

Request Review

- Add caching
- Improve typing of the log stream
- Maintain ordering when getting "both" streams
- Raise on in-band error
The old implementation was leaky. Instead of it being an async-shaped
iter_coroutine driven business logic module, it had some awkward binding
machinery and very loose types. We are striving for only exposing the
sync/async split at the very top and very bottom of the stack, so this
does a bunch of refactoring to achieve a cleaner separation.

Before
Async facade -> SandboxService -> SandboxApiClient -> AsyncTransport
Sync facade -> iter_coroutine(SandboxService) -> SandboxApiClient -> SyncTransport
                          |
                          -> SandboxService constructs async/sync public handles

After
Async facade -> AsyncSandboxClient -> SandboxService -> SandboxApiClient -> AsyncTransport
Sync facade  -> SyncSandboxClient  -> SandboxService -> SandboxApiClient -> SyncTransport
                  |                      |
                  -> public handles      -> neutral frozen state only

`iter_coroutine` now exists only in `SyncSandboxClient`; handle binding
and stream consumption live in the runtime-specific clients.
The session still holds the service options, transport, sleep functions,
etc, but each separate domain service will own the service constructor.
That means that the session itself will not know much about the
underlying domains so that can grow without introducing a centralized
registry that knows about all of the different services.
- `Sandbox.fs.write_bytes`/`Sandbox.fs.write_text`/`Sandbox.fs.batch`
for file operations instead of the old `write_files` method.
- `SandboxCommand` -> `Process`. Modeled on stdlib Popen and
asyncio.subprocess classes and concepts, but both async and sync
runtimes have a unified set of methods and data types. Not 100%
compatible, but we've gone for it being as close as possible to the
existing interfaces and affordances as possible without backend changes.
@socket-security

socket-security Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​trio@​0.33.073100100100100

View full report

We don't need the whole Filesystem abstraction, just a callback that
should run when the batch scope exits.

@1st1 1st1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments here and there, let's have a quick iteration and then merge

collect_output=self._collect_output,
)

async def listdir(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_dir to align with other methods? Or this is inspired by shutil?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the name from os but I also don't love it.

lambda: service.process_logs_response(session_id=self._session_id, process_id=self.id)
)

async def refresh(self) -> Self:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add docstrings to all public API methods? What's the purpose of refresh?

@scotttrinh scotttrinh Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, there is too much info locked up in the readme that needs to move into docstrings. Will fix.

To the actual question: the purpose of refresh is to get the latest state from the server, typically for updating things like the status and the current_session_id.

kill_after: float | timedelta | None = None,
check: bool = False,
stdout: TextIO | int | None = None,
stderr: TextIO | int | None = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a problem -- sometimes process output isn't decodable, especially if caught between utf opcodes or something.

Does the underlying vercel sandbox API operate on text or bytes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, subprocess APIs are typically better with BytesIO for this reason. Users basically can decode on their own (potentially setting the codec to ignore decoding/encoding errors for this reason)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying backend API is text-only via NDJSON lines, so yeah, it can't be bytes unless we just re-encode it into bytes. We can talk to the backend team about having something more low-level if they expose bytes, but as-is, we have text.

This seems like it's worth taking a pause on while we figure out what we want to do here. We can expose a text and byte interface where bytes are actually just re-encoded text for now and then switch that to some kind of mode on the backend side that sends buffers directly as number[]?

self._apply_payload(payload)
return self

async def extend_execution_time_limit(self, duration: DurationInput) -> Self:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is weird, just stating for the record.

I assume we do want to let the sandbox manage its timeout via the API for.. reasons, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a pretty well-used API in a lot of integrations where it is basically like an active keep-alive mechanism. Set the max execution time to something reasonable, and if some event happens before than, extend it again.

return self

async def update_network_policy(self, network_policy: JSONValue) -> Self:
payload = await self._service.update_runtime_session_network_policy(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's JSONValue here? I'd rather work with well-defined types than dicts

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, this is a vestige of how I built out the subresources, I missed this one. It's actually a pretty rich type, so I'll add the proper class structure here.

)

def batch(self, *, cwd: RemotePath | None = None) -> "SandboxFilesystemBatch":
return SandboxFilesystemBatch(write_files=lambda files: self._write_files(files, cwd=cwd))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to .batch_io()?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's write-only. The original API was write_files(files: Sequence(WriteFile)) and this is just a different API over that same underlying function. @elprans can you weigh in here on this design since it's mostly from your feedback? Or maybe a simple rename to batch_write?

Maybe eventually we'll have a combined read-write batch endpoint, but I'm not sure if we had that, how we'd even expose the read results back given this design?

@scotttrinh

Copy link
Copy Markdown
Collaborator Author

Wanted to make a note here that we've decided to remove the logs() command from the process class for now. It has some utility, but it feels like a misfit compared to the other ways of consuming the stdio streams, so we're going to wait until we have a better understanding of the use case and see if there is a more natural API for it.

Removes the `logs()` method in favor of Popen-style `stdout`/`stderr`
streams.
@scotttrinh

Copy link
Copy Markdown
Collaborator Author

One more bit of feedback to record here. It doesn't block merging this PR since we can fast follow, but blocks the promotion of this unstable to the main namespace:

Follow-up from the sandbox/session API discussion with @elprans

The backend model is:

  • A sandbox is the durable identity, configuration, snapshots, and session history. Omitting name generates one.
  • A session is one billable execution. Creation always starts an initial session, and a sandbox has at most one active session.
  • A snapshot preserves filesystem state between sessions and is used to resume a stopped sandbox.

The current Sandbox.session() API exposes sessions too prominently and suggests users can create independent active sessions. We agreed to keep normal lifecycle management at the sandbox level instead:

  • get_sandbox() only fetches state and never resumes.
  • resume_sandbox() ensures the sandbox has an active session.
  • Awaiting create_sandbox() or resume_sandbox() performs no automatic cleanup.
  • Both can be used as context managers, which always stop the active session on exit.
  • A create_sandbox() context additionally destroys the sandbox by default.
  • create_sandbox(..., destroy=False) stops the session but preserves the sandbox, snapshots, and history.
  • Sandbox.stop() explicitly stops the active session.
  • Session objects remain available for history, billing, diagnostics, and advanced inspection, but are not required for normal workflows.

Example:

# Stops the session and destroys the sandbox on exit.
async with sandbox.create_sandbox(runtime="python3.13") as box:
    ...

# Stops the session but preserves the sandbox on exit.
async with sandbox.create_sandbox(name="development", destroy=False) as box:
    ...

# Fetches state without resuming or cleanup.
box = await sandbox.get_sandbox(name="development")

# Ensures an active session and stops it on exit.
async with sandbox.resume_sandbox(name="development") as box:
    ...

# Ensures an active session without automatic cleanup.
box = await sandbox.resume_sandbox(name="development")
await box.stop()

Nested or concurrent resume_sandbox() scopes may share the same active session; either scope exiting can stop it. We will document this and correctly surface stopped-session errors rather than adding runtime ownership or lease tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants