Open source · Apache 2.0 · MCP-native

AI Infrastructure, managed by AI

One-click deploy. Peak performance.

AIMA detects your hardware, picks the fastest engine, and deploys — a built-in agent keeps tuning, so every run gets faster.

AIMA Deploy detect · query KB · optimal config · deploy · ready
live
1
Detect hardware AMD Radeon 8060S
2
Query knowledge base 23 on-silicon records matched
3
Optimal config vLLM · FP8 42.3 tok/s
4
Deploy model Qwen3.6-35B
5
API ready localhost:6188

What is AIMA?

AIMA is a single Go binary that runs AI models on your own computer or server. It auto-detects your hardware, picks the best engine and config, deploys the model, and self-tunes — no manual parameter tuning needed.

Ease or performance? AIMA says both.

  Simple stacks (one binary, one engine) High-performance stacks (raw engines, you tune them) AIMA
One-line install
OpenAI-compatible API
Inference backend llama.cpp vLLM / SGLang vLLM · SGLang · llama.cpp (auto-picked per hardware)
SOTA throughput on discrete GPUs DIY
Multi-vendor silicon DIY NVIDIA · AMD · Ascend · Hygon DCU · Moore Threads · MetaX · Apple
MCP server out of the box
Self-tuning loop
LAN fleet / multi-node DIY

Simple stacks trade performance for low cost. High-performance stacks trade ease for raw throughput. AIMA makes the operator an agent — ease and performance are no longer a tradeoff; "what runs fastest on this machine" accumulates in the knowledge base, not in any engineer's head.

Agent in the loop

A built-in PDCA agent (codename Explorer) keeps cycling: plan a benchmark → deploy a config → sample throughput / TTFT → promote the winner to a shared knowledge base. When a new chip arrives, the agent runs the tuning matrix itself — the more it accumulates, the faster and more precise the next deployment.

01

Plan

Decide the next benchmark: model, quantization, parallelism

02

Deploy

Deploy the engine and model with the selected config

03

Benchmark

Sample throughput / TTFT, compare candidate configs

04

Learn

Winner goes to the knowledge base — next run is faster

↻ repeats — back to step 1

Validated on silicon, not on slides

NVIDIA, AMD, Huawei Ascend, Hygon DCU, Moore Threads, MetaX, Apple Silicon — all benchmarked on real hardware. CPU-only works too.

NVIDIA AMD Huawei Ascend Hygon DCU Moore Threads MetaX Apple Silicon CPU-only

Full benchmark data on the Benchmarks page.

See the benchmarks →

MCP-native

AIMA is an MCP server. Point any MCP-compatible runtime at its port and you get the full operational surface: hardware detection, model scan, engine selection, deployment, benchmark, fleet discovery, knowledge sync — no REST wrapper to write, no official SDK to wait for.

Runs in production as OpenClaw's inference backend — covering LLM, ASR, TTS, image generation, and VLM. Any other MCP-speaking runtime plugs in the same way.

Point any MCP client at AIMA's HTTP endpoint — that's the integration.

mcp config
{
  "mcpServers": {
    "aima": { "type": "http", "url": "http://<aima-host>:6188/mcp" }
  }
}
AIMA Cloud · Works out of the box · 10 free runs included

Hand your device fleet to a cloud AI ops agent

Connect devices to the cloud and let an AI agent install, diagnose, repair, and upgrade them remotely — including one-command install of Dify / ComfyUI / Open WebUI / OpenClaw. Built-in invite code, works out of the box; 10 free service runs included.

Quick start

One command installs the binary — the agent takes it from there.

1

Get the binary

One command installs the AIMA binary. Or grab a pre-built binary from Releases, or build from source.

# macOS / Linux
Terminal · AIMA
curl -fsSL https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.sh | sh
# Windows (PowerShell)
PowerShell
irm https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.ps1 | iex

Or grab a pre-built binary from Releases, or build from source: git clone ... && make build Downloads / Releases →

2

See what AIMA found / run onboarding

Prints the detected GPU/NPU, driver versions, RAM. Then run onboarding checks (read-only).

Terminal · AIMA
# see what AIMA detected
aima hal detect
# run onboarding checks (read-only — no services installed, no model deployed)
aima onboarding
3

Run a model / open the API

Resolves the model, picks the engine + config for this host, pulls missing assets, deploys, waits for readiness.

Terminal · AIMA
# resolve model, pick engine, pull assets, deploy, wait for readiness
aima run qwen3-4b
# keep the OpenAI-compatible API and Web UI open in this terminal (http://localhost:6188)
aima serve

Linux shared server: sudo aima init → aima deploy qwen3-4b → aima serve