Benchmark Evidence

MemQ benchmark, exactly as generated.

This page is sourced from the latest benchmark evidence published from the standalone MemQ performance repo and mirrored into this site during deploy. In the current retrieval run, MemQ is not marginally better than Mem0 OSS: it ranks the right memory first more often, stays leakage-free, and returns results about 193.2x faster on average.

MemQ semantic compression artwork showing raw history compressed into active context for model handoff

Campaign visual shown for semantic compression. The benchmark evidence below is the published source of record for current public metrics.

MemQ primary@1

100%

Mem0 OSS reaches 58% on the same cases, giving MemQ a +42 point lead.

Recall@K

100%

MemQ matches Mem0 OSS recall while ranking the right memory first more often.

Delta vs Mem0

+42 pts

Published primary@1 uplift in the improved-v4 retrieval snapshot.

MemQ avg retrieval

13 ms

Mem0 OSS averages 2511 ms in the same harness, about 193.2x slower.

Reproducibility envelope

Generated
2026-04-28T18:02:44.970Z
Evaluation config
Fixed benchmark model and retrieval settings
Repetitions
1, 2, 3
Benchmark type
Retrieval benchmark
Artifact count
108
Case corpus
12 cases across 3 namespaces
Latency
MemQ avg 13 ms · p95 25 ms · Mem0 avg 2511 ms
LLM answer corpus
12 cases · 48 condition results · same fixed model configuration

What this shows today

  • MemQ leads raw retrieval quality by a wide margin: 100% primary@1 versus 58% for the Mem0 OSS adapter, a +42 point lead.
  • MemQ keeps recall@K at 100% while reaching 100% leakage-free retrieval; Mem0 OSS matches recall but drops to 67% leakage-free.
  • MemQ is dramatically faster in the retrieval benchmark: 13 ms average and 25 ms p95 versus 2511 ms average for the Mem0 OSS adapter, roughly 193.2x lower average latency.
  • MemQ context materially improves final answer quality versus no memory: 0% to 75% answer pass, a +75 point lift with the same fixed model configuration.
  • The benchmark harness is real, versioned, repeated, and published through the public performance repo. These numbers are a current snapshot, not a universal claim about every workload.

Next validation target

Paired Codex token-cost benchmark

The current artifacts show retrieval quality, latency, leakage posture, and answer quality. The next cost validation should measure the exact idea from the product discussion: the same Codex task run once without MemQ and once with MemQ, then compare usage tokens and task outcome.

Step 1

Baseline Codex

Run the task in a Codex environment with no MemQ tools installed and no retrieved memory context.

Step 2

MemQ Codex

Run the same task in a Codex environment with MemQ installed, using the same prompt, repo, budget, and stopping rule.

Step 3

Usage capture

Record prompt tokens, completion tokens, tool calls, repeated context, elapsed time, and whether the task reached the same quality bar.

Step 4

Savings report

Publish paired traces and compute token reduction, cost reduction, time reduction, and quality deltas without replacing failed runs.

Status: protocol defined on the site; paired live Codex runs still need to be executed before Multinex can publish a token-savings percentage.

Published case families

The current corpus spans real retrieval families rather than a single toy fixture: freshness, runbooks, incidents, project memory, procedural recall, preference memory, and disambiguation across multiple namespaces.

disambiguationfreshnessincidentpreferenceproceduralproject_memoryrunbook

Published benchmark assets

The public evidence surface is made from the exact published snapshot and summary that ship with the MemQ performance repo. Download the machine-readable snapshot or the summary markdown directly from this site.

curl -L https://multinex.ai/downloads/memq/memq-benchmark-snapshot.json
curl -L https://multinex.ai/downloads/memq/benchmark-summary.md
curl -L https://multinex.ai/downloads/memq/memq-llm-snapshot.json
curl -L https://multinex.ai/downloads/memq/memq-llm-summary.md

Navigate the benchmark system

Move from benchmark evidence to the product funnel without losing the benchmark trail.