NEURAL NETWORK WEIGHT COMPRESSION VIA FINITE-FIELD ARITHMETIC
Distribution-adaptive codecs using Galois field GF(17), [PROPRIETARY OPTIMIZATION], and [CLASSIFIED HARDWARE ACCELERATION] — benchmarked against TurboQuant (ICLR 2026).
PROPRIETARY RESEARCH — All rights reserved. This work describes proprietary technology protected as trade secret.
Results and benchmarks are published for evaluation purposes. Implementation details are [CLASSIFIED].
Reffelt, A. (2026). GF(17) Quantization Strategies for Neural Network Weight Compression. AMNI-SCIENT.
We present a family of quantization strategies built on arithmetic in the Galois field GF(17) for compressing neural network weight matrices. Unlike conventional methods operating in floating-point or power-of-two integer domains, our approach maps weights into a prime finite field where all arithmetic — addition, multiplication, and inversion — is exact and drift-free.
We introduce five complementary codecs spanning 2-bit to 16-bit operating points: GF(17) Legendre 4-bit, Contextual GF(17), Membrane GF(17), ATRP, and AmniTex Progressive. The Membrane GF(17) codec — derived from our Holographic Resistance Membrane framework — achieves cosine similarity 0.939 at 4 bits/element on heavy-tail distributions (2.5× improvement over uniform quantization), encoding in 17ms via a [PROPRIETARY ACCELERATION TECHNIQUE].
We compare against Google’s TurboQuant (ICLR 2026) and related methods (QJL, PolarQuant), demonstrating complementary strengths: TurboQuant excels at online KV-cache compression with provable distortion bounds, while GF(17) strategies offer exact arithmetic, [CLASSIFIED HARDWARE ACCELERATION], and progressive decoding for persistent weight storage.
The choice of prime p = 17 is deliberate. The 17 values {0, …, 16} yield a complete finite field where every non-zero element has a multiplicative inverse. [ALGEBRAIC PARTITIONING METHOD] enables bijective 4-bit packing ([COMPRESSION MECHANISM]).
All operations — add, multiply, invert — are exact in mod-17. No rounding, no accumulation error, no drift over arbitrarily long computations.
Values 0–16 map to a [CLASSIFIED HARDWARE FORMAT]. GPU [CLASSIFIED HARDWARE UNITS] perform [CLASSIFIED OPERATION] for free — dedicated silicon, zero ALU cost.
Multi-layer residuals compose coherently in the same field. Decode at any layer for quality/speed tradeoff — from 2-bit preview to 16-bit exact.
Cumulative sums cycle within {0, …, 16}. Linear attention accumulations stay exact regardless of sequence length. fp16 drifts; GF(17) doesn’t.
Uniform min-max quantization to {0…16} with [CLASSIFIED PACKING]. Fastest encode (3ms) but collapses on heavy-tail data.
Two-segment quantization: inner 2σ and outer tails quantized independently. Better tail coverage at slight overhead.
[PROPRIETARY ADAPTIVE FIELD] steers quantization boundaries toward stable regions. [CLASSIFIED ACCELERATION] achieves 13× speedup. Excels on heavy tails.
Adaptive [CLASSIFIED] Residual Packing. Layer-by-layer residual encoding in GF(17). At 4 layers × 4 bits = 16b, achieves bit-exact reconstruction (SNR 85dB).
Coarse-to-fine hierarchy with 4 resolution levels. Full progressive decode — stop at any level for LOD rendering.
Membrane GF(17) applied at each progressive depth. Best heavy-tail progressive: 0.996 CosSim at 16 bits.
The Membrane codec applies [PROPRIETARY EQUATION] from the Holographic Resistance Membrane framework. Instead of uniform bin spacing, it models a [CLASSIFIED ADAPTIVE FIELD] derived from local properties of the data distribution:
[CLASSIFIED MECHANISM] regions generate high resistance R, causing their weight w to decrease. This steers quantization boundaries toward stable, high-density regions where precision yields the greatest cosine similarity gain.
[PROPRIETARY INITIALIZATION PROCEDURE — 5-STEP PIPELINE REDACTED]
[PROPRIETARY BOUNDARY PLACEMENT METHOD]
[PROPRIETARY ITERATIVE REFINEMENT]
[PROPRIETARY ACCELERATION TECHNIQUE]
[PROPRIETARY PACKING METHOD]
The [CLASSIFIED OPTIMIZATION] (v3.108.0) achieved a 13× encode speedup from 200ms to 17ms while slightly improving quality through [CLASSIFIED TECHNIQUE].
802,816 elements (896×896 matrix). CPU timing, Python 3.14, NumPy 2.4.3.
| Codec | Bits/e | Ratio | Enc (ms) | Dec (ms) | CosSim (normal) | CosSim (heavy-tail) | SNR (dB) |
|---|---|---|---|---|---|---|---|
| fp16 (baseline) | 16.00 | 1.0× | 1.2 | 0.8 | 1.0000 | 1.0000 | 73.7 |
| TQ turbo4 | 4.25 | 3.8× | 25.4 | 21.5 | 0.9914 | 0.9907 | 17.6 |
| TQ turbo3 | 3.50 | 4.6× | 26.1 | 21.2 | 0.9728 | 0.9710 | 12.5 |
| GF(17) Leg 4b | 4.00 | 4.0× | 3.2 | 2.0 | 0.9843 | 0.3732 | 14.9 |
| Ctx GF17 (2σ) | 5.18 | 3.1× | 8.4 | 3.6 | 0.9965 | 0.7965 | 21.5 |
| Membrane GF17 | 4.00 | 4.0× | 17.3 | 2.8 | 0.9911 | 0.9391 | 17.4 |
| Codec | Bits/e | Enc (ms) | Dec (ms) | CosSim (normal) | CosSim (heavy-tail) | SNR (dB) |
|---|---|---|---|---|---|---|
| AmniTex full (4L) | 16.0 | 18.2 | 8.8 | 1.0000 | 0.9998 | 43.1 |
| ATRP 4b×4L | 16.0 | 7.2 | 5.4 | 1.0000 | 1.0000 | 84.9 |
| Tex3D 4-layer | 16.0 | 15.0 | 8.8 | 1.0000 | 0.9998 | 43.1 |
| Membrane ×4L | 16.0 | 68.8 | 9.4 | 0.9999 | 0.9957 | 37.8 |
Neural network weights — particularly in attention layers and embedding tables — frequently exhibit long-tailed distributions where uniform quantization wastes most bins on rare extreme values. The quality drop from normal to heavy-tail data reveals a codec’s robustness:
| Codec | Normal CosSim | Heavy-Tail CosSim | Δ (quality drop) |
|---|---|---|---|
| GF(17) Leg 4b | 0.9843 | 0.3732 | 0.611 |
| TQ turbo4 | 0.9914 | 0.9907 | 0.001 |
| Ctx GF17 | 0.9965 | 0.7965 | 0.200 |
| Membrane GF17 | 0.9911 | 0.9391 | 0.052 |
TurboQuant’s random rotation preprocessing naturally redistributes tail energy across all coordinates, making it inherently distribution-agnostic (Δ = 0.001). The Membrane codec achieves comparable robustness (Δ = 0.052) through [CLASSIFIED] adaptation, while maintaining exact finite-field arithmetic that TurboQuant’s continuous-domain approach cannot provide.
In linear attention mechanisms, key-value pairs accumulate over long sequences. Floating-point representations suffer progressive drift — GF(17) arithmetic does not:
| Sequence Step | fp16 RMSE | GF(17) Drift |
|---|---|---|
| 512 | 0.000760 | 0 (exact) |
| 1,024 | 0.001482 | 0 (exact) |
| 2,048 | 0.003029 | 0 (exact) |
| 4,096 | 0.006138 | 0 (exact) |
GF(17) operations cycle within {0, …, 16} by construction. There is zero accumulation error regardless of sequence length — a property that floating-point arithmetic fundamentally cannot achieve.
| Model | fp16 | TQ turbo4 | GF(17) Leg 4b | GF(17) + TQ KV |
|---|---|---|---|---|
| Qwen-0.5B | 1,164 MB | 1,035 MB | 291 MB | 291 MB |
| Qwen-3B | 7,208 MB | 6,321 MB | 1,802 MB | 1,802 MB |
| Qwen-7B | 15,879 MB | 14,499 MB | 3,970 MB | 3,970 MB |
GF(17) compresses weights 4× independently of any activation-side compression. Combined with TurboQuant KV-cache compression = best of both.
The two approaches are complementary rather than competing. They target different components of the inference pipeline:
| Scenario | Best Approach | Rationale |
|---|---|---|
| KV-cache compression (online inference) | TurboQuant | Data-oblivious, provable bounds, H100 CUDA kernels |
| Weight storage & loading | GF(17) Leg 4b | 4× compression, exact field, 3ms encode, [CLASSIFIED] compatible |
| Heavy-tail weight distributions | Membrane GF(17) | Distribution-adaptive, 0.939 vs. 0.373 (uniform) at 4 bits |
| Progressive level-of-detail | ATRP / AmniTex | Multi-layer GF(17) residuals, stop at any layer |
| Long-sequence linear attention | GF(17) cumulative | Zero drift, exact mod-17 arithmetic at any sequence length |
| Lossless at minimum bits | ATRP 4b×4L | Bit-exact reconstruction, SNR 85dB, 16 bits in algebra-exact field |
| Combined deployment | GF(17) weights + TQ KV | 4× weight compression + KV-cache compression in parallel |
Membrane GF(17) encodes at 17ms (CPU). TurboQuant reports “negligible” runtime overhead on H100 GPUs. Direct speed comparison is not meaningful across CPU vs. datacenter GPU.
Base GF(17) Legendre collapses on heavy-tail data (CosSim 0.37). The Membrane variant solves this but at higher encode cost. TurboQuant’s rotation preprocessing handles all distributions uniformly by design.
TurboQuant has formal proofs of near-optimal distortion rates within constant factor of information-theoretic lower bounds. GF(17) codecs are empirically validated but lack equivalent formal analysis.
TurboQuant integrates directly into attention computation. GF(17) weight quantization requires dequantization before standard compute, unless [CLASSIFIED HARDWARE ACCELERATION] is used.