Benchmarks (beta-0.5)
Real numbers from the published benchmark runs (no synthetic placeholders). Reports live in cef-framework/BENCHMARK_REPORT.md and cef-framework/BENCHMARK_REPORT_2.md.
Medical Clinical Decision Support (Core Suite)
| Scenario | Vector-Only (chunks) | Knowledge Model (chunks) | Lift | Latency Vector | Latency KM |
|---|---|---|---|---|---|
| Multi-hop contraindication discovery | 5 | 12 | +140% | 22 ms | 23 ms |
| High-risk behavioral pattern | 5 | 8 | +60% | 21 ms | 22 ms |
| Cascading side-effect risk | 5 | 8 | +60% | 18 ms | 23 ms |
| Transitive exposure (shared doctors) | 5 | 16 | +220% | 26 ms | 24 ms |
Why it wins:
- Graph traversal follows
Patient → HAS_CONDITIONandPatient → PRESCRIBED_MEDICATIONwithout inventing hops. - Vector search is constrained to the traversed subgraph before falling back.
- Medication and provider profiles were added as chunks, so traversal returns diverse evidence, not just condition blurbs.
Medical (Advanced Separation/Aggregation)
| Scenario | Vector-Only (chunks) | Knowledge Model (chunks) | Lift | Latency Vector | Latency KM |
|---|---|---|---|---|---|
| 3-degree provider separation | 5 | 11 | +6 chunks | 49 ms | 64 ms |
| Polypharmacy intersection (RA + Albuterol + HbA1c) | 5 | 14 | +9 chunks | 23 ms | 30 ms |
| Provider network cascade | 5 | 11 | +6 chunks | 27 ms | 33 ms |
| Bidirectional RA risk network | 5 | 15 | +10 chunks | 23 ms | 25 ms |
These scenarios combine multiple independent paths (shared doctors, medication interactions, comorbidities). Vector-only retrieval stays capped at 5 semantically closest chunks; the knowledge model surfaces connected patients, providers, and medications.
SAP ERP / Supply Chain (Financial Domain)
| Scenario | Vector-Only (chunks) | Knowledge Model (chunks) | Lift | Latency Vector | Latency KM |
|---|---|---|---|---|---|
| Cross-Project Resource Allocation | 5 | 8 | +60% | 51 ms | 56 ms |
| Cost Center Contagion Analysis | 5 | 8 | +60% | 18 ms | 29 ms |
Why it wins:
- Graph RAG discovers organizational structure patterns that vector embeddings miss
- Department→CostCenter hierarchies are structural (not semantically rich in text)
- Funding networks (Project→FUNDED_BY→Department→HAS_COST_CENTER) reveal risk exposure
- CostCenter profiles retrieved via graph traversal add critical context
Why supply chain scenarios were removed:
- Vector search equals Graph RAG for semantically explicit relationships
- Supply chain descriptions already mention "TSMC supplies CPU for Holiday Laptop"
- Embeddings capture these semantic relationships directly in chunk text
- Graph RAG provides no advantage when relationships are semantically rich
Key Insight:
- Graph RAG wins: Structural organizational patterns (hierarchies, funding networks)
- Graph RAG equal: Semantically explicit supply chain relationships (vendor descriptions)
- Use SAP benchmarks to validate dual persistence on enterprise schemas
How the Harness Works
- Baseline:
RetrievalRequestwithoutgraphQuery→ pure vector search,topK=5. - Knowledge Model: Adds
GraphQuerywithResolutionTarget+TraversalHint/GraphPattern. Retrieval order is graph traversal → hybrid vector search constrained to traversed nodes → vector fallback if results < 3. - Data: 177 nodes / 455 edges in medical; SAP fixtures include vendors, materials, invoices, projects with multi-hop relations.
- LLM stack used in tests: vLLM Qwen3-Coder-30B for generation, Ollama
nomic-embed-text(768d) for embeddings, DuckDB + JGraphT for persistence.
Reproduce the Numbers Yourself
cd cef-framework
# Core medical scenarios (BENCHMARK_REPORT.md)
mvn -Dtest=MedicalBenchmarkTest test
# Advanced medical (multi-path aggregation) (BENCHMARK_REPORT_2.md)
mvn -Dtest=MedicalBenchmarkTest2 test
# SAP ERP supply-chain/finance scenarios (SAP_BENCHMARK_REPORT.md)
mvn -Dtest=SapBenchmarkTest test
Reports are written to the project root (cef-framework/). Chunk samples and latency stats are embedded in each Markdown report for auditability.
Key Takeaways
- Structural coverage matters: Graph traversal surfaces medications, providers, and comorbidities that similarity search cannot guess.
- Fallback is controlled: Knowledge Model stays hybrid until it exhausts graph evidence, then falls back to vector; you never silently drop to "vector-only".
- Domain-agnostic proof: Medical benchmarks show large retrieval lifts; SAP scenarios validate the same engine on enterprise data with different performance/coverage trade-offs.
Future Improvements
- Adaptive Pattern Selection: Use LLM to generate optimal patterns per query
- Constraint Evaluation: Add filtering (e.g., "age > 65", "severity = High")
- Path Ranking: Score paths by clinical relevance, not just graph distance
- Hybrid Entry Points: Combine vector + property-based filters for entry resolution
- Multi-Source Patterns: Span across patients, doctors, facilities in single query
Framework Status: ✅ VALIDATED FOR RESEARCH
Pattern Execution: ✅ WORKING
Benchmark Validation: ✅ PASSED
Knowledge Model Superiority: ✅ PROVEN (In Controlled Tests)