ModelMUX API + SDK examples
Snappy snippets to route, sync, and ship in minutes.
Quickstart
Choose either SDK-first (recommended) or direct API calls to the gateway.
SDK (Python)
Use one client config file plus one local vault file.
What it does: reads routing settings from client_config.json and resolves provider + key using modelmux_vault.json.
import json
from pathlib import Path
from modelmux_sdk import ModelMuxClient
cfg = json.loads(Path("./client_config.json").read_text(encoding="utf-8"))
client = ModelMuxClient(
vault_path=cfg["vault_path"],
portal_base_url=cfg["base_url"],
portal_api_key=cfg["sdk_key"],
mode="managed",
myconfig_uuid=cfg.get("myconfig_uuid", ""),
managed_token_names=[row["name"] for row in cfg.get("tokens", []) if row.get("name")]
)
resolved = client.call(
"SUPPORT_BOT",
{"messages": [{"role": "user", "content": "Hello"}]}
)
Gateway API (curl)
POST to the gateway with your SDK key and routing rules.
What it does: server-side routing + provider call in one request. Uses your vault on the server.
Fastest testcurl -X POST https://your-portal.com/portal/api/gateway/ \
-H "Authorization: Bearer mmux_..." \
-H "Content-Type: application/json" \
-d '{
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role":"user","content":"Hello"}]
}'
SDK Interfaces
Preferred integration patterns for client-side orchestration.
SDK Wrapper Client (Integration)
Drop-in replacement for model SDKs with automatic telemetry and multi-provider support.
const client = new ModelMuxClient({
vault_path: "./modelmux_vault.json",
portal_base_url: "https://your-portal.com",
portal_api_key: "mmux_...",
token_counting: true,
local_cache: { enabled: true, ttl_sec: 300 }
});
Single SDK API (Meta Interface)
Client-side orchestration API that accepts meta model input and returns meta model output.
const output = await client.run({
routing_key: "SUPPORT_BOT",
messages: [{ role: "user", content: "Hello" }],
controls: { max_tokens: 256, temperature: 0.7 }
});
Multi-Model Sample (Gemini + Llama)
Example two-file setup for two models: one vault file for secrets and one client config file for routing + client settings.
Local vault
Store provider keys locally. Aliases must match the token names in your client config.
Vault{
"providers": {
"gemini": {
"primary": "gm-your-gemini-key"
},
"llama": {
"primary": "hf-or-hosted-llama-key"
}
}
}
client_config.json
One config file contains base URL, SDK key, routing plan, and token definitions.
Config{
"base_url": "http://127.0.0.1:8000",
"sdk_key": "mmux_...",
"myconfig_uuid": "your-config-uuid",
"vault_path": "modelmux_vault.json",
"routing_plan": {
"id": "plan_a",
"name": "Primary Plan",
"routing_mode": "round_robin"
},
"tokens": [
{
"name": "GEMINI_CHAT",
"provider": "google",
"model": "gemini-1.5-pro",
"key_aliases": ["primary"],
"cooldown_seconds": 300,
"is_active": true
},
{
"name": "LLAMA_CHAT",
"provider": "meta",
"model": "llama-3.1-8b-instruct",
"key_aliases": ["primary"],
"cooldown_seconds": 300,
"is_active": true
}
]
}
Single SDK call (meta interface)
What it does: loads the single client config file, uses the vault file for keys, and calls one of the configured tokens.
import json
from pathlib import Path
from modelmux_sdk import ModelMuxClient
cfg = json.loads(Path("./client_config.json").read_text(encoding="utf-8"))
client = ModelMuxClient(
vault_path=cfg["vault_path"],
portal_base_url=cfg["base_url"],
portal_api_key=cfg["sdk_key"],
mode="managed",
myconfig_uuid=cfg.get("myconfig_uuid", ""),
managed_token_names=[row["name"] for row in cfg.get("tokens", []) if row.get("name")]
)
result = client.call("GEMINI_CHAT", {
"messages": [{"role": "user", "content": "Hello from Gemini"}]
})
Standalone Scripts (3 Models)
Three standalone examples (Python, Node.js, curl) using three routing keys.
Models used
Gemini, Llama 3.1 8B Instruct, and GPT‑4o mini as three separate routing keys.
Python (requests)
import requests
BASE_URL = "https://your-portal.com"
SDK_KEY = "mmux_..."
def call_model(routing_key, prompt):
url = f"{BASE_URL}/portal/api/gateway/"
payload = {
"routing": {"preferred_tokens": [routing_key]},
"messages": [{"role": "user", "content": prompt}]
}
headers = {
"Authorization": f"Bearer {SDK_KEY}",
"Content-Type": "application/json"
}
resp = requests.post(url, headers=headers, json=payload, timeout=60)
resp.raise_for_status()
return resp.json()
print(call_model("GEMINI_CHAT", "Hello from Gemini"))
print(call_model("LLAMA_CHAT", "Hello from Llama 3.1"))
print(call_model("OPENAI_CHAT", "Hello from GPT-4o mini"))
Node.js (fetch)
const BASE_URL = "https://your-portal.com";
const SDK_KEY = "mmux_...";
async function callModel(routingKey, prompt) {
const url = `${BASE_URL}/portal/api/gateway/`;
const payload = {
routing: { preferred_tokens: [routingKey] },
messages: [{ role: "user", content: prompt }]
};
const resp = await fetch(url, {
method: "POST",
headers: {
"Authorization": `Bearer ${SDK_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify(payload)
});
if (!resp.ok) throw new Error(`${resp.status} ${await resp.text()}`);
return await resp.json();
}
console.log(await callModel("GEMINI_CHAT", "Hello from Gemini"));
console.log(await callModel("LLAMA_CHAT", "Hello from Llama 3.1"));
console.log(await callModel("OPENAI_CHAT", "Hello from GPT-4o mini"));
curl
BASE_URL="https://your-portal.com"
SDK_KEY="mmux_..."
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["GEMINI_CHAT"]},
"messages": [{"role":"user","content":"Hello from Gemini"}]
}'
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["LLAMA_CHAT"]},
"messages": [{"role":"user","content":"Hello from Llama 3.1"}]
}'
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["OPENAI_CHAT"]},
"messages": [{"role":"user","content":"Hello from GPT-4o mini"}]
}'
Sample Client Files
Copy these files as-is for quick local testing. Each uses the same three routing keys.
client_sample.py
import requests
BASE_URL = "https://your-portal.com"
SDK_KEY = "mmux_..."
def call_model(routing_key, prompt):
url = f"{BASE_URL}/portal/api/gateway/"
payload = {
"routing": {"preferred_tokens": [routing_key]},
"messages": [{"role": "user", "content": prompt}]
}
headers = {
"Authorization": f"Bearer {SDK_KEY}",
"Content-Type": "application/json"
}
resp = requests.post(url, headers=headers, json=payload, timeout=60)
resp.raise_for_status()
return resp.json()
print(call_model("GEMINI_CHAT", "Hello from Gemini"))
print(call_model("LLAMA_CHAT", "Hello from Llama 3.1"))
print(call_model("OPENAI_CHAT", "Hello from GPT-4o mini"))
client_sample.js
const BASE_URL = "https://your-portal.com";
const SDK_KEY = "mmux_...";
async function callModel(routingKey, prompt) {
const url = `${BASE_URL}/portal/api/gateway/`;
const payload = {
routing: { preferred_tokens: [routingKey] },
messages: [{ role: "user", content: prompt }]
};
const resp = await fetch(url, {
method: "POST",
headers: {
"Authorization": `Bearer ${SDK_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify(payload)
});
if (!resp.ok) throw new Error(`${resp.status} ${await resp.text()}`);
return await resp.json();
}
console.log(await callModel("GEMINI_CHAT", "Hello from Gemini"));
console.log(await callModel("LLAMA_CHAT", "Hello from Llama 3.1"));
console.log(await callModel("OPENAI_CHAT", "Hello from GPT-4o mini"));
client_sample.sh
#!/usr/bin/env bash
BASE_URL="https://your-portal.com"
SDK_KEY="mmux_..."
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["GEMINI_CHAT"]},
"messages": [{"role":"user","content":"Hello from Gemini"}]
}'
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["LLAMA_CHAT"]},
"messages": [{"role":"user","content":"Hello from Llama 3.1"}]
}'
curl -X POST "$BASE_URL/portal/api/gateway/" \
-H "Authorization: Bearer $SDK_KEY" \
-H "Content-Type: application/json" \
-d '{
"routing": {"preferred_tokens": ["OPENAI_CHAT"]},
"messages": [{"role":"user","content":"Hello from GPT-4o mini"}]
}'
Gateway Calls by Language
Use the same payload shape in any client. Swap in your portal base URL and SDK key.
Python (requests)
Call the gateway API with a JSON payload.
What it does: calls the gateway API and returns the routed model response.
Backend friendlyimport requests
url = "https://your-portal.com/portal/api/gateway/"
headers = {
"Authorization": "Bearer mmux_...",
"Content-Type": "application/json"
}
payload = {
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role": "user", "content": "Hello"}]
}
resp = requests.post(url, headers=headers, json=payload, timeout=60)
print(resp.status_code, resp.json())
Node.js (fetch)
Use native fetch with a JSON body.
What it does: same gateway call using native fetch.
Frontend or serverconst url = "https://your-portal.com/portal/api/gateway/";
const payload = {
routing: { max_cost_per_1k_input: 0.006 },
messages: [{ role: "user", content: "Hello" }]
};
const resp = await fetch(url, {
method: "POST",
headers: {
"Authorization": "Bearer mmux_...",
"Content-Type": "application/json"
},
body: JSON.stringify(payload)
});
console.log(resp.status, await resp.json());
Java (HttpClient)
Send raw JSON via Java 11+ HttpClient.
What it does: gateway call with a raw JSON body.
Enterprise stacksimport java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
String url = "https://your-portal.com/portal/api/gateway/";
String body = """
{
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role":"user","content":"Hello"}]
}
""";
HttpRequest req = HttpRequest.newBuilder(URI.create(url))
.header("Authorization", "Bearer mmux_...")
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(body))
.build();
HttpResponse resp = HttpClient.newHttpClient().send(req, HttpResponse.BodyHandlers.ofString());
System.out.println(resp.statusCode() + " " + resp.body());
Go (net/http)
Minimal gateway call with standard library.
What it does: gateway call using the standard library.
Go servicespackage main
import (
"bytes"
"net/http"
"io"
"fmt"
)
func main() {
url := "https://your-portal.com/portal/api/gateway/"
body := []byte(`{
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role":"user","content":"Hello"}]
}`)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer mmux_...")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
raw, _ := io.ReadAll(resp.Body)
fmt.Println(resp.StatusCode, string(raw))
}
C# (.NET HttpClient)
Send JSON with HttpClient.
What it does: gateway call with HttpClient and raw JSON.
.NET servicesusing System.Net.Http;
using System.Text;
var url = "https://your-portal.com/portal/api/gateway/";
var json = """
{
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role":"user","content":"Hello"}]
}
""";
using var client = new HttpClient();
using var req = new HttpRequestMessage(HttpMethod.Post, url);
req.Headers.Add("Authorization", "Bearer mmux_...");
req.Content = new StringContent(json, Encoding.UTF8, "application/json");
var resp = await client.SendAsync(req);
Console.WriteLine($"{(int)resp.StatusCode} {await resp.Content.ReadAsStringAsync()}");
PHP (cURL)
Call the gateway with PHP cURL.
What it does: gateway call with PHP cURL helpers.
Legacy apps<?php
$url = "https://your-portal.com/portal/api/gateway/";
$payload = json_encode([
"routing" => ["max_cost_per_1k_input" => 0.006],
"messages" => [["role" => "user", "content" => "Hello"]]
]);
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
"Authorization: Bearer mmux_...",
"Content-Type: application/json"
],
CURLOPT_POSTFIELDS => $payload,
CURLOPT_RETURNTRANSFER => true
]);
$body = curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
echo $status . " " . $body;
curl
Quickest way to test the gateway.
What it does: the simplest gateway call from any terminal.
CLI testcurl -X POST https://your-portal.com/portal/api/gateway/ \
-H "Authorization: Bearer mmux_..." \
-H "Content-Type: application/json" \
-d '{
"routing": {"max_cost_per_1k_input": 0.006},
"messages": [{"role":"user","content":"Hello"}]
}'
Local Client Files
Your local setup uses exactly two files: modelmux_vault.json for secrets and client_config.json for routing/client settings.
Vault file
Provider keys are stored locally in a vault JSON.
What it does: maps provider + alias to real API keys on your machine.
Local secrets{
"providers": {
"openai": {
"primary": "sk-your-primary-key",
"backup": "sk-your-backup-key"
},
"anthropic": {
"primary": "sk-ant-your-key"
}
}
}
client_config.json
Defines portal connection details, routing plan, and the token list used by the SDK.
What it does: gives the client one place to load routing and local runtime settings.
Client settings{
"base_url": "http://127.0.0.1:8000",
"sdk_key": "mmux_...",
"myconfig_uuid": "your-config-uuid",
"vault_path": "modelmux_vault.json",
"routing_plan": {"routing_mode": "round_robin"},
"token_counting": {"enabled": true},
"tokens": [
{
"name": "chat_service",
"provider": "openai",
"model": "gpt-4o-mini",
"rotation_strategy": "round_robin",
"key_aliases": ["primary", "backup"],
"weights": {"primary": 3, "backup": 1},
"cooldown_seconds": 300,
"is_active": true
}
]
}
Use the two-file setup in the SDK
What it does: loads routing from client_config.json and secrets from modelmux_vault.json.
import json
from pathlib import Path
from modelmux_sdk import ModelMuxClient
cfg = json.loads(Path("./client_config.json").read_text(encoding="utf-8"))
client = ModelMuxClient(
vault_path=cfg["vault_path"],
portal_base_url=cfg["base_url"],
portal_api_key=cfg["sdk_key"],
mode="managed",
myconfig_uuid=cfg.get("myconfig_uuid", "")
)
resolved = client.call("chat_service", {
"messages": [{"role": "user", "content": "Hello"}]
})
Portal APIs
Use these endpoints to automate setup. Session endpoints require a logged-in browser and CSRF token.
SDK config sync (curl)
Use your SDK key in the Authorization header.
What it does: downloads active routing keys for the SDK.
SDK authcurl -X GET https://your-portal.com/portal/api/sdk/config/ \
-H "Authorization: Bearer mmux_..."
Create routing key (browser session)
Requires a logged-in session and CSRF token.
What it does: creates a routing key row in the portal.
Portal automationfetch("/portal/api/tokens/", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-CSRFToken": csrftoken
},
body: JSON.stringify({
"name": "SUPPORT_BOT",
"provider": "openai",
"model_name": "gpt-4o-mini",
"key_aliases": ["primary"],
"rotation_strategy": "round_robin"
})
})
Rotate SDK key (browser session)
What it does: generates a new SDK key and invalidates the old one.
fetch("/portal/api/sdk/key/rotate/", {
method: "POST",
headers: {"X-CSRFToken": csrftoken}
})
Routing Rules
Add routing constraints directly in the gateway payload.
{
"routing": {
"max_cost_per_1k_input": 0.006,
"max_latency_ms": 900,
"preferred_tokens": ["SUPPORT_BOT"],
"deny_providers": ["openrouter"]
},
"messages": [
{"role": "user", "content": "Summarize this report."}
]
}