Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
-
Updated
Jun 20, 2026 - Go
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
1-Click LLM Server on Your Phone — no Termux needed! 无需Termux,一键让你的手机变成LLM服务器!
Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No Electron, no Python, offline-first.
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Lightweight proxy for LLM
Run Pi coding agent isolated in a Docker Sandbox microVM with a local llama-server as the inference backend
CLI wrapper for llama.cpp providing an ollama-like experience
Local LLM proxy, DevOps friendly
A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml
FastAPI proxy that strips volatile fields from OpenClaw requests to dramatically improve llama-server KV cache hit rates (~22× faster prompt eval)
One place to store and manage all your recipe for Llama Server
Offline-First Local AI Desktop & Mobile Agent.
llama.cpp fleet manager with orchestration routing — cut AI coding tool costs by routing tool-loop turns to local GPU models and frontier APIs (Copilot, OpenRouter). GPU pinning, heterogeneous pools, browser dashboard, OpenAI-compatible API.
llama-server model endpoint manager with JiT swapping with human UI, MCP and pi extension support.
A production-grade Python SDK for llama-server that streamlines authentication, token rotation, observability, and PII masking—helping AI architects ship secure, traceable LLM systems with enterprise-ready guardrails.
Benchmark Gemma 4 E2B on Apple Silicon: MLX (mlx-lm) vs GGUF (llama-server), with TTFT, tokens/sec, and memory.
LlamaOrch is simple Bash-based CLI Orchestrator for llama.cpp server.
Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.
To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."