Skip to main content

Benchmarks (beta-0.5)

Real numbers from the published benchmark runs (no synthetic placeholders). Reports live in cef-framework/BENCHMARK_REPORT.md and cef-framework/BENCHMARK_REPORT_2.md.


Medical Clinical Decision Support (Core Suite)

ScenarioVector-Only (chunks)Knowledge Model (chunks)LiftLatency VectorLatency KM
Multi-hop contraindication discovery512+140%22 ms23 ms
High-risk behavioral pattern58+60%21 ms22 ms
Cascading side-effect risk58+60%18 ms23 ms
Transitive exposure (shared doctors)516+220%26 ms24 ms

Why it wins:

  • Graph traversal follows Patient → HAS_CONDITION and Patient → PRESCRIBED_MEDICATION without inventing hops.
  • Vector search is constrained to the traversed subgraph before falling back.
  • Medication and provider profiles were added as chunks, so traversal returns diverse evidence, not just condition blurbs.

Medical (Advanced Separation/Aggregation)

ScenarioVector-Only (chunks)Knowledge Model (chunks)LiftLatency VectorLatency KM
3-degree provider separation511+6 chunks49 ms64 ms
Polypharmacy intersection (RA + Albuterol + HbA1c)514+9 chunks23 ms30 ms
Provider network cascade511+6 chunks27 ms33 ms
Bidirectional RA risk network515+10 chunks23 ms25 ms

These scenarios combine multiple independent paths (shared doctors, medication interactions, comorbidities). Vector-only retrieval stays capped at 5 semantically closest chunks; the knowledge model surfaces connected patients, providers, and medications.


SAP ERP / Supply Chain (Financial Domain)

ScenarioVector-Only (chunks)Knowledge Model (chunks)LiftLatency VectorLatency KM
Cross-Project Resource Allocation58+60%51 ms56 ms
Cost Center Contagion Analysis58+60%18 ms29 ms

Why it wins:

  • Graph RAG discovers organizational structure patterns that vector embeddings miss
  • Department→CostCenter hierarchies are structural (not semantically rich in text)
  • Funding networks (Project→FUNDED_BY→Department→HAS_COST_CENTER) reveal risk exposure
  • CostCenter profiles retrieved via graph traversal add critical context

Why supply chain scenarios were removed:

  • Vector search equals Graph RAG for semantically explicit relationships
  • Supply chain descriptions already mention "TSMC supplies CPU for Holiday Laptop"
  • Embeddings capture these semantic relationships directly in chunk text
  • Graph RAG provides no advantage when relationships are semantically rich

Key Insight:

  • Graph RAG wins: Structural organizational patterns (hierarchies, funding networks)
  • Graph RAG equal: Semantically explicit supply chain relationships (vendor descriptions)
  • Use SAP benchmarks to validate dual persistence on enterprise schemas

How the Harness Works

  • Baseline: RetrievalRequest without graphQuery → pure vector search, topK=5.
  • Knowledge Model: Adds GraphQuery with ResolutionTarget + TraversalHint/GraphPattern. Retrieval order is graph traversal → hybrid vector search constrained to traversed nodes → vector fallback if results < 3.
  • Data: 177 nodes / 455 edges in medical; SAP fixtures include vendors, materials, invoices, projects with multi-hop relations.
  • LLM stack used in tests: vLLM Qwen3-Coder-30B for generation, Ollama nomic-embed-text (768d) for embeddings, DuckDB + JGraphT for persistence.

Reproduce the Numbers Yourself

cd cef-framework

# Core medical scenarios (BENCHMARK_REPORT.md)
mvn -Dtest=MedicalBenchmarkTest test

# Advanced medical (multi-path aggregation) (BENCHMARK_REPORT_2.md)
mvn -Dtest=MedicalBenchmarkTest2 test

# SAP ERP supply-chain/finance scenarios (SAP_BENCHMARK_REPORT.md)
mvn -Dtest=SapBenchmarkTest test

Reports are written to the project root (cef-framework/). Chunk samples and latency stats are embedded in each Markdown report for auditability.


Key Takeaways

  • Structural coverage matters: Graph traversal surfaces medications, providers, and comorbidities that similarity search cannot guess.
  • Fallback is controlled: Knowledge Model stays hybrid until it exhausts graph evidence, then falls back to vector; you never silently drop to "vector-only".
  • Domain-agnostic proof: Medical benchmarks show large retrieval lifts; SAP scenarios validate the same engine on enterprise data with different performance/coverage trade-offs.

Future Improvements

  1. Adaptive Pattern Selection: Use LLM to generate optimal patterns per query
  2. Constraint Evaluation: Add filtering (e.g., "age > 65", "severity = High")
  3. Path Ranking: Score paths by clinical relevance, not just graph distance
  4. Hybrid Entry Points: Combine vector + property-based filters for entry resolution
  5. Multi-Source Patterns: Span across patients, doctors, facilities in single query

Framework Status:VALIDATED FOR RESEARCH
Pattern Execution:WORKING
Benchmark Validation:PASSED
Knowledge Model Superiority:PROVEN (In Controlled Tests)