Amni-LLM — Universal In-Browser GGUF Runtime

Honest stack: Amni-LLM = Amni-Scient loader/registry/UI/API on top of wllama (MIT-licensed llama.cpp WASM port). The point: any GGUF on HuggingFace works the day it's published — no waiting for an MLC pre-compile.

Load a model

Idle. Pick a model and press Load.

Chat

API quick reference

import { createEngine } from '/lib/amni-llm/amni-llm.js';

// SOTA default
const e = await createEngine('Qwen3-4B-Q4_K_M');

// Any HuggingFace GGUF (the killer feature)
const e = await createEngine({
  url: 'https://huggingface.co/<org>/<repo>/resolve/main/model.Q4_K_M.gguf'
});

// Local file
const e = await createEngine({ file: fileInput.files[0] });

// Chat (WebLLM-compatible)
const r = await e.chatCompletions.create({
  messages: [{role:'user', content:'Hello'}],
  temperature: 0.4
});