AI Infrastructure, managed by AI
One-click deploy. Peak performance.
AIMA detects your hardware, picks the fastest engine, and deploys — a built-in agent keeps tuning, so every run gets faster.
What is AIMA?
AIMA is a single Go binary that runs AI models on your own computer or server. It auto-detects your hardware, picks the best engine and config, deploys the model, and self-tunes — no manual parameter tuning needed.
Ease or performance? AIMA says both.
| Simple stacks (one binary, one engine) | High-performance stacks (raw engines, you tune them) | AIMA | |
|---|---|---|---|
| One-line install | ✓ | — | ✓ |
| OpenAI-compatible API | ✓ | ✓ | ✓ |
| Inference backend | llama.cpp | vLLM / SGLang | vLLM · SGLang · llama.cpp (auto-picked per hardware) |
| SOTA throughput on discrete GPUs | — | DIY | ✓ |
| Multi-vendor silicon | — | DIY | NVIDIA · AMD · Ascend · Hygon DCU · Moore Threads · MetaX · Apple |
| MCP server out of the box | — | — | ✓ |
| Self-tuning loop | — | — | ✓ |
| LAN fleet / multi-node | — | DIY | ✓ |
Simple stacks trade performance for low cost. High-performance stacks trade ease for raw throughput. AIMA makes the operator an agent — ease and performance are no longer a tradeoff; "what runs fastest on this machine" accumulates in the knowledge base, not in any engineer's head.
Agent in the loop
A built-in PDCA agent (codename Explorer) keeps cycling: plan a benchmark → deploy a config → sample throughput / TTFT → promote the winner to a shared knowledge base. When a new chip arrives, the agent runs the tuning matrix itself — the more it accumulates, the faster and more precise the next deployment.
Plan
Decide the next benchmark: model, quantization, parallelism
Deploy
Deploy the engine and model with the selected config
Benchmark
Sample throughput / TTFT, compare candidate configs
Learn
Winner goes to the knowledge base — next run is faster
Validated on silicon, not on slides
NVIDIA, AMD, Huawei Ascend, Hygon DCU, Moore Threads, MetaX, Apple Silicon — all benchmarked on real hardware. CPU-only works too.
Full benchmark data on the Benchmarks page.
See the benchmarks →MCP-native
AIMA is an MCP server. Point any MCP-compatible runtime at its port and you get the full operational surface: hardware detection, model scan, engine selection, deployment, benchmark, fleet discovery, knowledge sync — no REST wrapper to write, no official SDK to wait for.
Runs in production as OpenClaw's inference backend — covering LLM, ASR, TTS, image generation, and VLM. Any other MCP-speaking runtime plugs in the same way.
Point any MCP client at AIMA's HTTP endpoint — that's the integration.
{
"mcpServers": {
"aima": { "type": "http", "url": "http://<aima-host>:6188/mcp" }
}
} Hand your device fleet to a cloud AI ops agent
Connect devices to the cloud and let an AI agent install, diagnose, repair, and upgrade them remotely — including one-command install of Dify / ComfyUI / Open WebUI / OpenClaw. Built-in invite code, works out of the box; 10 free service runs included.
Quick start
One command installs the binary — the agent takes it from there.
Get the binary
One command installs the AIMA binary. Or grab a pre-built binary from Releases, or build from source.
curl -fsSL https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.sh | sh irm https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.ps1 | iex Or grab a pre-built binary from Releases, or build from source: git clone ... && make build Downloads / Releases →
See what AIMA found / run onboarding
Prints the detected GPU/NPU, driver versions, RAM. Then run onboarding checks (read-only).
Run a model / open the API
Resolves the model, picks the engine + config for this host, pulls missing assets, deploys, waits for readiness.
Linux shared server: sudo aima init → aima deploy qwen3-4b → aima serve