Universal in-browser GGUF runtime · 4 SOTA defaults · HuggingFace search · one-click install · arbitrary URL · local file
Honest stack: Amni-LLM = Amni-Scient loader/registry/UI/API on top of wllama (MIT-licensed llama.cpp WASM port). The point: any GGUF on HuggingFace works the day it's published — no waiting for an MLC pre-compile.
Load a model
Nothing installed yet. Use HF search to add models.
Type a query and press Search. Results come from huggingface.co/api/models?library=gguf.
Loaded entirely in-memory; no upload to any server.
Idle. Pick a model and press Load.
Chat
API quick reference
import { createEngine } from '/lib/amni-llm/amni-llm.js';
// SOTA default
const e = await createEngine('Qwen3-4B-Q4_K_M');
// Any HuggingFace GGUF (the killer feature)
const e = await createEngine({
url: 'https://huggingface.co/<org>/<repo>/resolve/main/model.Q4_K_M.gguf'
});
// Local file
const e = await createEngine({ file: fileInput.files[0] });
// Chat (WebLLM-compatible)
const r = await e.chatCompletions.create({
messages: [{role:'user', content:'Hello'}],
temperature: 0.4
});