Philosophy
Why should AI infrastructure be managed by AI?
Private LLM stacks today usually sit in one of two corners. The simple stacks take one binary, one engine, defaults that work out of the box — but throughput ends up capped by the engine's ceiling. The high-performance stacks (raw vLLM, SGLang) give good numbers on paper, but flag tuning, quantization choice, deployment wiring, and every per-vendor quirk are all on you. Each new chip is basically a redo. AIMA's bet: the operator should be an agent.
The operator should be an agent, not you
AIMA's core loop: detect the accelerator → pick an engine and config from the YAML knowledge base → deploy the model → run benchmarks → write the winning config back. This loop is driven by a built-in PDCA agent (Explorer). When a new chip arrives, the agent runs the tuning matrix itself — the same single binary reaches vLLM-level throughput without exposing vLLM's flag soup to you.
Knowledge accumulates in a YAML knowledge base, not in a consultant's head
"The fastest way to run inference on this silicon" is transferable, versionable, shareable data — not someone's muscle memory. The first deploy is exploration: the agent plans benchmarks, deploys configs, samples throughput and TTFT, and promotes winning configs to a shared knowledge base. After that, it's a lookup: deploying a known model on familiar hardware hits the knowledge base and skips the exploration phase.
Agent-native: an MCP server that runs an agent inside
AIMA is an MCP server. Point any MCP-compatible runtime at AIMA's port and it has the full operational surface: hardware detection, model scan, engine selection, deployment, benchmark, fleet discovery, knowledge sync. No REST wrapper to write, no official SDK to wait for. AIMA is currently used in production as the inference backend behind OpenClaw, covering LLM, ASR, TTS, image generation, and VLM. AIMA also consumes MCP internally — it is both server and client.
Open source first
The device-side AIMA binary is open source under Apache 2.0. The install script, configuration logic, and YAML knowledge base are all transparent, auditable, and forkable. AIMA Cloud (灵机云) is built on top of this open-source core: a cloud-side agent connects to your devices and installs / diagnoses / repairs / upgrades them remotely, with one-command support for Dify, ComfyUI, Open WebUI, and OpenClaw. It works out of the box — the connect command ships with a built-in invite code, and 10 free service runs are included.
Questions or collaboration? Email guanjiawei@approaching.ai