PROPRIETARY • SELECTIVE DISCLOSURE • FULL DETAILS AVAILABLE UNDER NDA

GF(17) QUANTIZATION

NEURAL NETWORK WEIGHT COMPRESSION VIA FINITE-FIELD ARITHMETIC

Distribution-adaptive codecs using Galois field GF(17), [PROPRIETARY OPTIMIZATION], and [CLASSIFIED HARDWARE ACCELERATION] — benchmarked against TurboQuant (ICLR 2026).

🔒

PROPRIETARY RESEARCH — All rights reserved. This work describes proprietary technology protected as trade secret.
Results and benchmarks are published for evaluation purposes. Implementation details are [CLASSIFIED].

Reffelt, A. (2026). GF(17) Quantization Strategies for Neural Network Weight Compression. AMNI-SCIENT.

ABSTRACT

We present a family of quantization strategies built on arithmetic in the Galois field GF(17) for compressing neural network weight matrices. Unlike conventional methods operating in floating-point or power-of-two integer domains, our approach maps weights into a prime finite field where all arithmetic — addition, multiplication, and inversion — is exact and drift-free.

We introduce five complementary codecs spanning 2-bit to 16-bit operating points: GF(17) Legendre 4-bit, Contextual GF(17), Membrane GF(17), ATRP, and AmniTex Progressive. The Membrane GF(17) codec — derived from our Holographic Resistance Membrane framework — achieves cosine similarity 0.939 at 4 bits/element on heavy-tail distributions (2.5× improvement over uniform quantization), encoding in 17ms via a [PROPRIETARY ACCELERATION TECHNIQUE].

We compare against Google’s TurboQuant (ICLR 2026) and related methods (QJL, PolarQuant), demonstrating complementary strengths: TurboQuant excels at online KV-cache compression with provable distortion bounds, while GF(17) strategies offer exact arithmetic, [CLASSIFIED HARDWARE ACCELERATION], and progressive decoding for persistent weight storage.

WHY GF(17)?

The choice of prime p = 17 is deliberate. The 17 values {0, …, 16} yield a complete finite field where every non-zero element has a multiplicative inverse. [ALGEBRAIC PARTITIONING METHOD] enables bijective 4-bit packing ([COMPRESSION MECHANISM]).

EXACT ARITHMETIC

All operations — add, multiply, invert — are exact in mod-17. No rounding, no accumulation error, no drift over arbitrarily long computations.

HARDWARE ACCELERATION

Values 0–16 map to a [CLASSIFIED HARDWARE FORMAT]. GPU [CLASSIFIED HARDWARE UNITS] perform [CLASSIFIED OPERATION] for free — dedicated silicon, zero ALU cost.

PROGRESSIVE REFINEMENT

Multi-layer residuals compose coherently in the same field. Decode at any layer for quality/speed tradeoff — from 2-bit preview to 16-bit exact.

ZERO DRIFT

Cumulative sums cycle within {0, …, 16}. Linear attention accumulations stay exact regardless of sequence length. fp16 drifts; GF(17) doesn’t.

CODEC FAMILY

BASELINE • 4-BIT

GF(17) LEGENDRE

Uniform min-max quantization to {0…16} with [CLASSIFIED PACKING]. Fastest encode (3ms) but collapses on heavy-tail data.

4.0 bits/e 3ms enc 0.98 cs (normal)
CONTEXTUAL • 5-BIT

CTX GF(17) 2σ

Two-segment quantization: inner 2σ and outer tails quantized independently. Better tail coverage at slight overhead.

5.2 bits/e 8ms enc 0.997 cs (normal)
DISTRIBUTION-ADAPTIVE • 4-BIT

MEMBRANE GF(17)

[PROPRIETARY ADAPTIVE FIELD] steers quantization boundaries toward stable regions. [CLASSIFIED ACCELERATION] achieves 13× speedup. Excels on heavy tails.

4.0 bits/e 17ms enc 0.939 cs (heavy-tail)
PROGRESSIVE • 2–16 BIT

ATRP

Adaptive [CLASSIFIED] Residual Packing. Layer-by-layer residual encoding in GF(17). At 4 layers × 4 bits = 16b, achieves bit-exact reconstruction (SNR 85dB).

2–16 bits/e 7ms enc (4L) 1.000 cs (4×4L)
MULTI-RESOLUTION • 16-BIT

AMNITEX PROGRESSIVE

Coarse-to-fine hierarchy with 4 resolution levels. Full progressive decode — stop at any level for LOD rendering.

16.0 bits/e 18ms enc 1.000 cs (normal)
MEMBRANE + PROGRESSIVE • 16-BIT

MEMBRANE ×4L

Membrane GF(17) applied at each progressive depth. Best heavy-tail progressive: 0.996 CosSim at 16 bits.

16.0 bits/e 69ms enc 0.996 cs (heavy-tail)

MEMBRANE M6 — CORE ALGORITHM

The Membrane codec applies [PROPRIETARY EQUATION] from the Holographic Resistance Membrane framework. Instead of uniform bin spacing, it models a [CLASSIFIED ADAPTIVE FIELD] derived from local properties of the data distribution:

EQUATION M6 — [CLASSIFIED]
$$w(x) = \frac{h(x)}{\text{\small[PROPRIETARY WEIGHTING FUNCTION]}} \qquad R(x) = \text{\small[PROPRIETARY FIELD DERIVATION]}$$

[CLASSIFIED MECHANISM] regions generate high resistance R, causing their weight w to decrease. This steers quantization boundaries toward stable, high-density regions where precision yields the greatest cosine similarity gain.

STEP 1 — [CLASSIFIED]

[PROPRIETARY INITIALIZATION PROCEDURE — 5-STEP PIPELINE REDACTED]

STEP 2 — [CLASSIFIED]

[PROPRIETARY BOUNDARY PLACEMENT METHOD]

STEP 3 — [CLASSIFIED]

[PROPRIETARY ITERATIVE REFINEMENT]

STEP 4 — [CLASSIFIED]

[PROPRIETARY ACCELERATION TECHNIQUE]

STEP 5 — [CLASSIFIED]

[PROPRIETARY PACKING METHOD]

The [CLASSIFIED OPTIMIZATION] (v3.108.0) achieved a 13× encode speedup from 200ms to 17ms while slightly improving quality through [CLASSIFIED TECHNIQUE].

BENCHMARK RESULTS

SINGLE-SHOT CODECS — 4-BIT OPERATING POINT

802,816 elements (896×896 matrix). CPU timing, Python 3.14, NumPy 2.4.3.

CodecBits/eRatioEnc (ms)Dec (ms)CosSim (normal)CosSim (heavy-tail)SNR (dB)
fp16 (baseline)16.001.0×1.20.81.00001.000073.7
TQ turbo44.253.8×25.421.50.99140.990717.6
TQ turbo33.504.6×26.121.20.97280.971012.5
GF(17) Leg 4b4.004.0×3.22.00.98430.373214.9
Ctx GF17 (2σ)5.183.1×8.43.60.99650.796521.5
Membrane GF174.004.0×17.32.80.99110.939117.4

QUALITY VS. COMPRESSION RATIO (INTERACTIVE)

PROGRESSIVE CODECS — 16-BIT OPERATING POINT

CodecBits/eEnc (ms)Dec (ms)CosSim (normal)CosSim (heavy-tail)SNR (dB)
AmniTex full (4L)16.018.28.81.00000.999843.1
ATRP 4b×4L16.07.25.41.00001.000084.9
Tex3D 4-layer16.015.08.81.00000.999843.1
Membrane ×4L16.068.89.40.99990.995737.8

HEAVY-TAIL RESILIENCE COMPARISON

HEAVY-TAIL ANALYSIS

Neural network weights — particularly in attention layers and embedding tables — frequently exhibit long-tailed distributions where uniform quantization wastes most bins on rare extreme values. The quality drop from normal to heavy-tail data reveals a codec’s robustness:

CodecNormal CosSimHeavy-Tail CosSimΔ (quality drop)
GF(17) Leg 4b0.98430.37320.611
TQ turbo40.99140.99070.001
Ctx GF170.99650.79650.200
Membrane GF170.99110.93910.052

TurboQuant’s random rotation preprocessing naturally redistributes tail energy across all coordinates, making it inherently distribution-agnostic (Δ = 0.001). The Membrane codec achieves comparable robustness (Δ = 0.052) through [CLASSIFIED] adaptation, while maintaining exact finite-field arithmetic that TurboQuant’s continuous-domain approach cannot provide.

CUMULATIVE DRIFT

In linear attention mechanisms, key-value pairs accumulate over long sequences. Floating-point representations suffer progressive drift — GF(17) arithmetic does not:

Sequence Stepfp16 RMSEGF(17) Drift
5120.0007600 (exact)
1,0240.0014820 (exact)
2,0480.0030290 (exact)
4,0960.0061380 (exact)

GF(17) operations cycle within {0, …, 16} by construction. There is zero accumulation error regardless of sequence length — a property that floating-point arithmetic fundamentally cannot achieve.

VRAM PROJECTIONS

Modelfp16TQ turbo4GF(17) Leg 4bGF(17) + TQ KV
Qwen-0.5B1,164 MB1,035 MB291 MB291 MB
Qwen-3B7,208 MB6,321 MB1,802 MB1,802 MB
Qwen-7B15,879 MB14,499 MB3,970 MB3,970 MB

GF(17) compresses weights 4× independently of any activation-side compression. Combined with TurboQuant KV-cache compression = best of both.

COMPARISON WITH TURBOQUANT

The two approaches are complementary rather than competing. They target different components of the inference pipeline:

TURBOQUANT / QJL / POLARQUANT

  • Targets KV-cache activations (online, streaming)
  • Data-oblivious random rotation preprocessing
  • Provable near-optimal distortion rate (≈2.7× of lower bound)
  • 1-bit QJL residual for unbiased inner-product estimation
  • 8× speedup on H100 GPUs via CUDA kernels
  • Quality-neutral at 3.5 bits/channel
  • Continuous floating-point domain

GF(17) STRATEGIES (THIS WORK)

  • Targets model weights (offline, persistent storage)
  • Distribution-adaptive [CLASSIFIED] field
  • Exact finite-field arithmetic (zero drift by construction)
  • Progressive decode: 2-bit preview to 16-bit exact
  • [CLASSIFIED HARDWARE] acceleration (free [CLASSIFIED] MACs)
  • ATRP 4b×4L: bit-exact at 16 bits (SNR 85dB)
  • Discrete algebraic domain (mod-17)

WHERE EACH APPROACH SHINES

ScenarioBest ApproachRationale
KV-cache compression (online inference)TurboQuantData-oblivious, provable bounds, H100 CUDA kernels
Weight storage & loadingGF(17) Leg 4b4× compression, exact field, 3ms encode, [CLASSIFIED] compatible
Heavy-tail weight distributionsMembrane GF(17)Distribution-adaptive, 0.939 vs. 0.373 (uniform) at 4 bits
Progressive level-of-detailATRP / AmniTexMulti-layer GF(17) residuals, stop at any layer
Long-sequence linear attentionGF(17) cumulativeZero drift, exact mod-17 arithmetic at any sequence length
Lossless at minimum bitsATRP 4b×4LBit-exact reconstruction, SNR 85dB, 16 bits in algebra-exact field
Combined deploymentGF(17) weights + TQ KV4× weight compression + KV-cache compression in parallel

LIMITATIONS

ENCODE OVERHEAD

Membrane GF(17) encodes at 17ms (CPU). TurboQuant reports “negligible” runtime overhead on H100 GPUs. Direct speed comparison is not meaningful across CPU vs. datacenter GPU.

DISTRIBUTION SENSITIVITY

Base GF(17) Legendre collapses on heavy-tail data (CosSim 0.37). The Membrane variant solves this but at higher encode cost. TurboQuant’s rotation preprocessing handles all distributions uniformly by design.

THEORETICAL GAP

TurboQuant has formal proofs of near-optimal distortion rates within constant factor of information-theoretic lower bounds. GF(17) codecs are empirically validated but lack equivalent formal analysis.

INFERENCE INTEGRATION

TurboQuant integrates directly into attention computation. GF(17) weight quantization requires dequantization before standard compute, unless [CLASSIFIED HARDWARE ACCELERATION] is used.

RADAR — MULTI-DIMENSIONAL COMPARISON

CITATION

Reffelt, A. (2026). GF(17) Quantization Strategies for Neural Network Weight Compression: Finite-Field Arithmetic, Proprietary Optimization, and Hardware-Accelerated Packing. AMNI-SCIENT Technical Report. https://amni-scient.com/research/gf17-quantization

REFERENCES

  1. Zandieh, A., Daliri, M., Hadian, M., & Mirrokni, V. (2025). TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. ICLR 2026. arXiv:2504.19874
  2. Zandieh, A., Daliri, M., & Han, I. (2024). QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead. AAAI 2025. arXiv:2406.03482
  3. Han, I., Kacham, P., Karbasi, A., Mirrokni, V., & Zandieh, A. (2025). PolarQuant: Quantizing KV Caches with Polar Transformation. AISTATS 2026. arXiv:2502.02617
  4. Reffelt, A. (2019–2026). Holographic Resistance Membrane. AMNI-SCIENT. Framework page
  5. Reffelt, A. (2019–2026). Toroidal Manifold Geometry. AMNI-SCIENT. Framework page