THE AI MEMORY PROBLEM

0.88 GB
THAT'S A 7-BILLION PARAMETER MODEL. IN UNDER A GIGABYTE.

THE PROBLEM

THE VRAM WALL

Every AI model has billions of numbers (called weights) that need to live in your GPU's memory to run.

A 7-billion parameter model needs 14 GB of VRAM at standard precision. A 70B model needs 140 GB. The biggest models need over 800 GB.

Most GPUs have 8–24 GB. Even datacenter GPUs max out at 80 GB. The math doesn't work.

Today's solution? Buy more GPUs. Split the model across machines. Spend millions.

THE RAILGUN SOLUTION

AmniTex Railgun compresses those weights using a fundamentally different approach based on abstract algebra—a branch of pure mathematics.

Instead of rounding numbers and hoping for the best, Railgun encodes weights into a mathematical structure where compression is exact.

The result: a 7B model fits in 0.88 GB. And when you need the original weights back? They come back perfectly. Not approximately. Perfectly.

BY THE NUMBERS

16×
COMPRESSION AT ROUTING TIER
1.0
COSINE SIMILARITY (FULL)
CONTEXT STABILITY
TMU THROUGHPUT/WATT

HOW IT WORKS (SIMPLY)

THE BOOK ANALOGY

Imagine a 1,000-page book. Normal compression is like summarizing each chapter—you lose details.

Railgun is like writing a formula that can regenerate any page on demand. The formula fits on an index card, but the full book is always available. No information is ever lost.

The progressive tier system means you can choose: read the chapter summaries for speed (R-only tier), or regenerate the full page when you need it (Full tier). Both from the same index card.

STEP 1 — ROUTE

Algebraic Routing

Each weight is classified into a small set of discrete states using a mathematical structure with special properties. This routing layer lives in VRAM—just 1–2 bits per weight.

STEP 2 — REFINE

Progressive Layers

Residual errors from routing are captured in successive refinement layers (G, B channels), each adding precision. These layers can stream from fast NVMe storage on demand.

STEP 3 — RECONSTRUCT

Lossless Recovery

At the Full tier, the complete fp16 weight is recovered exactly. Not approximately—bit-exact. This is a mathematical guarantee, not an empirical observation.

COMPARED TO EXISTING METHODS

FeatureGPTQ / AWQ (4-bit)GGUF Q4Railgun
Bits per weight (VRAM)4.04.0–6.01.0–2.0
7B model VRAM3.5 GB3.5–5.3 GB0.88–1.75 GB
Lossless at full precision?NoNoYes
Progressive quality tiers?NoNo4 tiers
Context stabilityDegradesDegradesInfinite (proven)
GPU hardware accelerationALU onlyCPU / ALUTMU native
Training compatible?Inference onlyInference onlyYes

WHO IS THIS FOR?

AI COMPANIES

Cut GPU Costs by 4–16×

Run the same models on less hardware. Or run bigger models on the same hardware. Either way, your cloud bill drops.

RESEARCHERS

Run 70B on a Desktop

Models that currently need a cluster can run on a single workstation GPU. Faster iteration, lower barrier to entry.

EDGE / MOBILE

AI on Consumer Hardware

7B models running on laptops and phones with 1 GB of available VRAM. Local, private, fast.

SEE THE PROOF

The claims above are verifiable. The evaluation package runs on any machine with Python and NumPy. No GPU required. Results speak for themselves.

TECHNICAL DEEP DIVE → GET EVALUATION ACCESS