Every AI model has billions of numbers (called weights) that need to live in your GPU's memory to run.
A 7-billion parameter model needs 14 GB of VRAM at standard precision. A 70B model needs 140 GB. The biggest models need over 800 GB.
Most GPUs have 8–24 GB. Even datacenter GPUs max out at 80 GB. The math doesn't work.
Today's solution? Buy more GPUs. Split the model across machines. Spend millions.
AmniTex Railgun compresses those weights using a fundamentally different approach based on abstract algebra—a branch of pure mathematics.
Instead of rounding numbers and hoping for the best, Railgun encodes weights into a mathematical structure where compression is exact.
The result: a 7B model fits in 0.88 GB. And when you need the original weights back? They come back perfectly. Not approximately. Perfectly.
Imagine a 1,000-page book. Normal compression is like summarizing each chapter—you lose details.
Railgun is like writing a formula that can regenerate any page on demand. The formula fits on an index card, but the full book is always available. No information is ever lost.
The progressive tier system means you can choose: read the chapter summaries for speed (R-only tier), or regenerate the full page when you need it (Full tier). Both from the same index card.
Each weight is classified into a small set of discrete states using a mathematical structure with special properties. This routing layer lives in VRAM—just 1–2 bits per weight.
Residual errors from routing are captured in successive refinement layers (G, B channels), each adding precision. These layers can stream from fast NVMe storage on demand.
At the Full tier, the complete fp16 weight is recovered exactly. Not approximately—bit-exact. This is a mathematical guarantee, not an empirical observation.
| Feature | GPTQ / AWQ (4-bit) | GGUF Q4 | Railgun |
|---|---|---|---|
| Bits per weight (VRAM) | 4.0 | 4.0–6.0 | 1.0–2.0 |
| 7B model VRAM | 3.5 GB | 3.5–5.3 GB | 0.88–1.75 GB |
| Lossless at full precision? | No | No | Yes |
| Progressive quality tiers? | No | No | 4 tiers |
| Context stability | Degrades | Degrades | Infinite (proven) |
| GPU hardware acceleration | ALU only | CPU / ALU | TMU native |
| Training compatible? | Inference only | Inference only | Yes |
Run the same models on less hardware. Or run bigger models on the same hardware. Either way, your cloud bill drops.
Models that currently need a cluster can run on a single workstation GPU. Faster iteration, lower barrier to entry.
7B models running on laptops and phones with 1 GB of available VRAM. Local, private, fast.
The claims above are verifiable. The evaluation package runs on any machine with Python and NumPy. No GPU required. Results speak for themselves.