Connect apps once, route them to local AI capacity.
The v5 gateway keeps OpenAI-shaped APIs while preparing the runtime proxy layer for local LLM servers, GPU nodes, and edge model runtimes.
Compatible routes
Drop-in API surface.
| Method | Route | Purpose |
|---|---|---|
| GET | /api/v1/models | Model discovery for compatible clients. |
| POST | /api/v1/chat/completions | OpenAI-style chat completions with route decision logging. |
| POST | /api/v1/embeddings | Embedding endpoint with usage metering. |
| POST | /api/router/simulate | Explainable routing decision before runtime execution. |
Example
Chat completions request.
curl -X POST http://localhost:3000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"umami-swahili-small","messages":[{"role":"user","content":"Explain energy-aware AI edge nodes for Singapore, Tanzania, and Europe."}]}'