GF(17) QUANTIZATION

NEURAL NETWORK WEIGHT COMPRESSION VIA FINITE-FIELD ARITHMETIC

Distribution-adaptive codecs using Galois field GF(17), [PROPRIETARY OPTIMIZATION], and [CLASSIFIED HARDWARE ACCELERATION] — benchmarked against TurboQuant (ICLR 2026).

🔒

PROPRIETARY RESEARCH — All rights reserved. This work describes proprietary technology protected as trade secret.
Results and benchmarks are published for evaluation purposes. Implementation details are [CLASSIFIED].

Reffelt, A. (2026). GF(17) Quantization Strategies for Neural Network Weight Compression. AMNI-SCIENT.

ABSTRACT

We present a family of quantization strategies built on arithmetic in the Galois field GF(17) for compressing neural network weight matrices. Unlike conventional methods operating in floating-point or power-of-two integer domains, our approach maps weights into a prime finite field where all arithmetic — addition, multiplication, and inversion — is exact and drift-free.

We introduce five complementary codecs spanning 2-bit to 16-bit operating points: GF(17) Legendre 4-bit, Contextual GF(17), Membrane GF(17), ATRP, and AmniTex Progressive. The Membrane GF(17) codec — derived from our Holographic Resistance Membrane framework — achieves cosine similarity 0.939 at 4 bits/element on heavy-tail distributions (2.5× improvement over uniform quantization), encoding in 17ms via a [PROPRIETARY ACCELERATION TECHNIQUE].

We compare against Google’s TurboQuant (ICLR 2026) and related methods (QJL, PolarQuant), demonstrating complementary strengths: TurboQuant excels at online KV-cache compression with provable distortion bounds, while GF(17) strategies offer exact arithmetic, [CLASSIFIED HARDWARE ACCELERATION], and progressive decoding for persistent weight storage.

WHY GF(17)?

The choice of prime p = 17 is deliberate. The 17 values {0, …, 16} yield a complete finite field where every non-zero element has a multiplicative inverse. [ALGEBRAIC PARTITIONING METHOD] enables bijective 4-bit packing ([COMPRESSION MECHANISM]).

EXACT ARITHMETIC

All operations — add, multiply, invert — are exact in mod-17. No rounding, no accumulation error, no drift over arbitrarily long computations.

HARDWARE ACCELERATION

Values 0–16 map to a [CLASSIFIED HARDWARE FORMAT]. GPU [CLASSIFIED HARDWARE UNITS] perform [CLASSIFIED OPERATION] for free — dedicated silicon, zero ALU cost.

PROGRESSIVE REFINEMENT

Multi-layer residuals compose coherently in the same field. Decode at any layer for quality/speed tradeoff — from 2-bit preview to 16-bit exact.

ZERO DRIFT

Cumulative sums cycle within {0, …, 16}. Linear attention accumulations stay exact regardless of sequence length. fp16 drifts; GF(17) doesn’t.

CODEC FAMILY

BASELINE • 4-BIT

GF(17) LEGENDRE

Uniform min-max quantization to {0…16} with [CLASSIFIED PACKING]. Fastest encode (3ms) but collapses on heavy-tail data.

4.0 bits/e 3ms enc 0.98 cs (normal)

CONTEXTUAL • 5-BIT

CTX GF(17) 2σ

Two-segment quantization: inner 2σ and outer tails quantized independently. Better tail coverage at slight overhead.

5.2 bits/e 8ms enc 0.997 cs (normal)

DISTRIBUTION-ADAPTIVE • 4-BIT

MEMBRANE GF(17)

[PROPRIETARY ADAPTIVE FIELD] steers quantization boundaries toward stable regions. [CLASSIFIED ACCELERATION] achieves 13× speedup. Excels on heavy tails.

4.0 bits/e 17ms enc 0.939 cs (heavy-tail)

PROGRESSIVE • 2–16 BIT

ATRP

Adaptive [CLASSIFIED] Residual Packing. Layer-by-layer residual encoding in GF(17). At 4 layers × 4 bits = 16b, achieves bit-exact reconstruction (SNR 85dB).

2–16 bits/e 7ms enc (4L) 1.000 cs (4×4L)

MULTI-RESOLUTION • 16-BIT

AMNITEX PROGRESSIVE

Coarse-to-fine hierarchy with 4 resolution levels. Full progressive decode — stop at any level for LOD rendering.

16.0 bits/e 18ms enc 1.000 cs (normal)

MEMBRANE + PROGRESSIVE • 16-BIT

MEMBRANE ×4L

Membrane GF(17) applied at each progressive depth. Best heavy-tail progressive: 0.996 CosSim at 16 bits.

16.0 bits/e 69ms enc 0.996 cs (heavy-tail)

MEMBRANE M6 — CORE ALGORITHM

The Membrane codec applies [PROPRIETARY EQUATION] from the Holographic Resistance Membrane framework. Instead of uniform bin spacing, it models a [CLASSIFIED ADAPTIVE FIELD] derived from local properties of the data distribution:

EQUATION M6 — [CLASSIFIED]

$$w(x) = \frac{h(x)}{\text{\small[PROPRIETARY WEIGHTING FUNCTION]}} \qquad R(x) = \text{\small[PROPRIETARY FIELD DERIVATION]}$$

[CLASSIFIED MECHANISM] regions generate high resistance R, causing their weight w to decrease. This steers quantization boundaries toward stable, high-density regions where precision yields the greatest cosine similarity gain.

STEP 1 — [CLASSIFIED]

[PROPRIETARY INITIALIZATION PROCEDURE — 5-STEP PIPELINE REDACTED]

STEP 2 — [CLASSIFIED]

[PROPRIETARY BOUNDARY PLACEMENT METHOD]

STEP 3 — [CLASSIFIED]

[PROPRIETARY ITERATIVE REFINEMENT]

STEP 4 — [CLASSIFIED]

[PROPRIETARY ACCELERATION TECHNIQUE]

STEP 5 — [CLASSIFIED]

[PROPRIETARY PACKING METHOD]

The [CLASSIFIED OPTIMIZATION] (v3.108.0) achieved a 13× encode speedup from 200ms to 17ms while slightly improving quality through [CLASSIFIED TECHNIQUE].

BENCHMARK RESULTS

SINGLE-SHOT CODECS — 4-BIT OPERATING POINT

802,816 elements (896×896 matrix). CPU timing, Python 3.14, NumPy 2.4.3.

Codec	Bits/e	Ratio	Enc (ms)	Dec (ms)	CosSim (normal)	CosSim (heavy-tail)	SNR (dB)
fp16 (baseline)	16.00	1.0×	1.2	0.8	1.0000	1.0000	73.7
TQ turbo4	4.25	3.8×	25.4	21.5	0.9914	0.9907	17.6
TQ turbo3	3.50	4.6×	26.1	21.2	0.9728	0.9710	12.5
GF(17) Leg 4b	4.00	4.0×	3.2	2.0	0.9843	0.3732	14.9
Ctx GF17 (2σ)	5.18	3.1×	8.4	3.6	0.9965	0.7965	21.5
Membrane GF17	4.00	4.0×	17.3	2.8	0.9911	0.9391	17.4

QUALITY VS. COMPRESSION RATIO (INTERACTIVE)

PROGRESSIVE CODECS — 16-BIT OPERATING POINT

Codec	Bits/e	Enc (ms)	Dec (ms)	CosSim (normal)	CosSim (heavy-tail)	SNR (dB)
AmniTex full (4L)	16.0	18.2	8.8	1.0000	0.9998	43.1
ATRP 4b×4L	16.0	7.2	5.4	1.0000	1.0000	84.9
Tex3D 4-layer	16.0	15.0	8.8	1.0000	0.9998	43.1
Membrane ×4L	16.0	68.8	9.4	0.9999	0.9957	37.8

HEAVY-TAIL RESILIENCE COMPARISON

HEAVY-TAIL ANALYSIS

Neural network weights — particularly in attention layers and embedding tables — frequently exhibit long-tailed distributions where uniform quantization wastes most bins on rare extreme values. The quality drop from normal to heavy-tail data reveals a codec’s robustness:

Codec	Normal CosSim	Heavy-Tail CosSim	Δ (quality drop)
GF(17) Leg 4b	0.9843	0.3732	0.611
TQ turbo4	0.9914	0.9907	0.001
Ctx GF17	0.9965	0.7965	0.200
Membrane GF17	0.9911	0.9391	0.052

TurboQuant’s random rotation preprocessing naturally redistributes tail energy across all coordinates, making it inherently distribution-agnostic (Δ = 0.001). The Membrane codec achieves comparable robustness (Δ = 0.052) through [CLASSIFIED] adaptation, while maintaining exact finite-field arithmetic that TurboQuant’s continuous-domain approach cannot provide.

CUMULATIVE DRIFT

In linear attention mechanisms, key-value pairs accumulate over long sequences. Floating-point representations suffer progressive drift — GF(17) arithmetic does not:

Sequence Step	fp16 RMSE	GF(17) Drift
512	0.000760	0 (exact)
1,024	0.001482	0 (exact)
2,048	0.003029	0 (exact)
4,096	0.006138	0 (exact)

GF(17) operations cycle within {0, …, 16} by construction. There is zero accumulation error regardless of sequence length — a property that floating-point arithmetic fundamentally cannot achieve.

VRAM PROJECTIONS

Model	fp16	TQ turbo4	GF(17) Leg 4b	GF(17) + TQ KV
Qwen-0.5B	1,164 MB	1,035 MB	291 MB	291 MB
Qwen-3B	7,208 MB	6,321 MB	1,802 MB	1,802 MB
Qwen-7B	15,879 MB	14,499 MB	3,970 MB	3,970 MB

GF(17) compresses weights 4× independently of any activation-side compression. Combined with TurboQuant KV-cache compression = best of both.

COMPARISON WITH TURBOQUANT

The two approaches are complementary rather than competing. They target different components of the inference pipeline:

TURBOQUANT / QJL / POLARQUANT

Targets KV-cache activations (online, streaming)
Data-oblivious random rotation preprocessing
Provable near-optimal distortion rate (≈2.7× of lower bound)
1-bit QJL residual for unbiased inner-product estimation
8× speedup on H100 GPUs via CUDA kernels
Quality-neutral at 3.5 bits/channel
Continuous floating-point domain

GF(17) STRATEGIES (THIS WORK)

Targets model weights (offline, persistent storage)
Distribution-adaptive [CLASSIFIED] field
Exact finite-field arithmetic (zero drift by construction)
Progressive decode: 2-bit preview to 16-bit exact
[CLASSIFIED HARDWARE] acceleration (free [CLASSIFIED] MACs)
ATRP 4b×4L: bit-exact at 16 bits (SNR 85dB)
Discrete algebraic domain (mod-17)

WHERE EACH APPROACH SHINES

Scenario	Best Approach	Rationale
KV-cache compression (online inference)	TurboQuant	Data-oblivious, provable bounds, H100 CUDA kernels
Weight storage & loading	GF(17) Leg 4b	4× compression, exact field, 3ms encode, [CLASSIFIED] compatible
Heavy-tail weight distributions	Membrane GF(17)	Distribution-adaptive, 0.939 vs. 0.373 (uniform) at 4 bits
Progressive level-of-detail	ATRP / AmniTex	Multi-layer GF(17) residuals, stop at any layer
Long-sequence linear attention	GF(17) cumulative	Zero drift, exact mod-17 arithmetic at any sequence length
Lossless at minimum bits	ATRP 4b×4L	Bit-exact reconstruction, SNR 85dB, 16 bits in algebra-exact field
Combined deployment	GF(17) weights + TQ KV	4× weight compression + KV-cache compression in parallel

LIMITATIONS

ENCODE OVERHEAD

Membrane GF(17) encodes at 17ms (CPU). TurboQuant reports “negligible” runtime overhead on H100 GPUs. Direct speed comparison is not meaningful across CPU vs. datacenter GPU.

DISTRIBUTION SENSITIVITY

Base GF(17) Legendre collapses on heavy-tail data (CosSim 0.37). The Membrane variant solves this but at higher encode cost. TurboQuant’s rotation preprocessing handles all distributions uniformly by design.

THEORETICAL GAP

TurboQuant has formal proofs of near-optimal distortion rates within constant factor of information-theoretic lower bounds. GF(17) codecs are empirically validated but lack equivalent formal analysis.

INFERENCE INTEGRATION

TurboQuant integrates directly into attention computation. GF(17) weight quantization requires dequantization before standard compute, unless [CLASSIFIED HARDWARE ACCELERATION] is used.

CITATION

Reffelt, A. (2026). GF(17) Quantization Strategies for Neural Network Weight Compression: Finite-Field Arithmetic, Proprietary Optimization, and Hardware-Accelerated Packing. AMNI-SCIENT Technical Report. https://amni-scient.com/research/gf17-quantization

REFERENCES

Zandieh, A., Daliri, M., Hadian, M., & Mirrokni, V. (2025). TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. ICLR 2026. arXiv:2504.19874
Zandieh, A., Daliri, M., & Han, I. (2024). QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead. AAAI 2025. arXiv:2406.03482
Han, I., Kacham, P., Karbasi, A., Mirrokni, V., & Zandieh, A. (2025). PolarQuant: Quantizing KV Caches with Polar Transformation. AISTATS 2026. arXiv:2502.02617
Reffelt, A. (2019–2026). Holographic Resistance Membrane. AMNI-SCIENT. Framework page
Reffelt, A. (2019–2026). Toroidal Manifold Geometry. AMNI-SCIENT. Framework page