Benchmark Evidence
MemQ benchmark, exactly as generated.
This page is sourced from the latest benchmark evidence published from the standalone MemQ performance repo and mirrored into this site during deploy. In the current retrieval run, MemQ is not marginally better than Mem0 OSS: it ranks the right memory first more often, stays leakage-free, and returns results about 193.2x faster on average.

Campaign visual shown for semantic compression. The benchmark evidence below is the published source of record for current public metrics.
MemQ primary@1
100%
Mem0 OSS reaches 58% on the same cases, giving MemQ a +42 point lead.
Recall@K
100%
MemQ matches Mem0 OSS recall while ranking the right memory first more often.
Delta vs Mem0
+42 pts
Published primary@1 uplift in the improved-v4 retrieval snapshot.
MemQ avg retrieval
13 ms
Mem0 OSS averages 2511 ms in the same harness, about 193.2x slower.
Reproducibility envelope
- Generated
- 2026-04-28T18:02:44.970Z
- Evaluation config
- Fixed benchmark model and retrieval settings
- Repetitions
- 1, 2, 3
- Benchmark type
- Retrieval benchmark
- Artifact count
- 108
- Case corpus
- 12 cases across 3 namespaces
- Latency
- MemQ avg 13 ms · p95 25 ms · Mem0 avg 2511 ms
- LLM answer corpus
- 12 cases · 48 condition results · same fixed model configuration
What this shows today
- MemQ leads raw retrieval quality by a wide margin: 100% primary@1 versus 58% for the Mem0 OSS adapter, a +42 point lead.
- MemQ keeps recall@K at 100% while reaching 100% leakage-free retrieval; Mem0 OSS matches recall but drops to 67% leakage-free.
- MemQ is dramatically faster in the retrieval benchmark: 13 ms average and 25 ms p95 versus 2511 ms average for the Mem0 OSS adapter, roughly 193.2x lower average latency.
- MemQ context materially improves final answer quality versus no memory: 0% to 75% answer pass, a +75 point lift with the same fixed model configuration.
- The benchmark harness is real, versioned, repeated, and published through the public performance repo. These numbers are a current snapshot, not a universal claim about every workload.
Next validation target
Paired Codex token-cost benchmark
The current artifacts show retrieval quality, latency, leakage posture, and answer quality. The next cost validation should measure the exact idea from the product discussion: the same Codex task run once without MemQ and once with MemQ, then compare usage tokens and task outcome.
Step 1
Baseline Codex
Run the task in a Codex environment with no MemQ tools installed and no retrieved memory context.
Step 2
MemQ Codex
Run the same task in a Codex environment with MemQ installed, using the same prompt, repo, budget, and stopping rule.
Step 3
Usage capture
Record prompt tokens, completion tokens, tool calls, repeated context, elapsed time, and whether the task reached the same quality bar.
Step 4
Savings report
Publish paired traces and compute token reduction, cost reduction, time reduction, and quality deltas without replacing failed runs.
Status: protocol defined on the site; paired live Codex runs still need to be executed before Multinex can publish a token-savings percentage.
Published case families
The current corpus spans real retrieval families rather than a single toy fixture: freshness, runbooks, incidents, project memory, procedural recall, preference memory, and disambiguation across multiple namespaces.
Published benchmark assets
The public evidence surface is made from the exact published snapshot and summary that ship with the MemQ performance repo. Download the machine-readable snapshot or the summary markdown directly from this site.
curl -L https://multinex.ai/downloads/memq/memq-benchmark-snapshot.json curl -L https://multinex.ai/downloads/memq/benchmark-summary.md curl -L https://multinex.ai/downloads/memq/memq-llm-snapshot.json curl -L https://multinex.ai/downloads/memq/memq-llm-summary.md
Navigate the benchmark system
Move from benchmark evidence to the product funnel without losing the benchmark trail.
Install the memory loop
Benchmarks matter most when developers can reproduce the memory loop.
The starter repos turn benchmark context into a working save, restart, and recall path. Use them to verify that MemQ is a memory layer your agents can actually call.
First install proof
memq-mcp-starter
Minimal MCP memory setup with a save, restart, and recall continuity test.
Team memory
memq-team-handoff-starter
Team namespace handoff flow for developer teams that need shared agent context.
Editor onboarding
memq-cursor-claude-starter
Editor-focused setup for Cursor, Claude, VS Code, and Antigravity.
Workflow agents
memq-n8n-agent-starter
Workflow-agent memory example for n8n MCP Client and MCP Client Tool nodes.