Validation & Test Coverage
Everything below maps to concrete tests in cef-framework/src/test/java—no placeholder claims.
Always-On Coverage (default mvn test in cef-framework)
- Medical benchmark harness (
MedicalBenchmarkTest): GeneratesBENCHMARK_REPORT.mdwith 4 scenarios (contraindications, behavioral risk, cascading side effects, shared doctors). Uses DuckDB + JGraphT + vLLM + Ollama embeddings. - Advanced medical graph patterns (
MedicalBenchmarkTest2): Multi-path separation/aggregation, producingBENCHMARK_REPORT_2.md. - SAP financial/supply-chain suite (
SapBenchmarkTest): WritesSAP_BENCHMARK_REPORT.mdfor enterprise-style multi-hop traversal. - Graph correctness (
InMemoryKnowledgeGraphTest): CRUD, traversal, concurrency sanity on the JGraphT-backed store. - Repository layer (
BaseRepositoryTest): Reactive persistence for nodes/edges/chunks. - Retriever integration (
OllamaKnowledgeRetrieverIntegrationTest): EnsuresKnowledgeRetriever.retrievereturns ranked chunks with Ollama embeddings.
Optional Integration Tests (gated by system properties)
Run these only when the required services/models are available.
| Test | Purpose | How to run |
|---|---|---|
OllamaEmbeddingIntegrationTest | Verifies real nomic-embed-text embeddings (dimension 768) via Spring AI and CEF wrapper | mvn test -Dembedding.integration=true -Dtest=OllamaEmbeddingIntegrationTest |
McpToolLLMIntegrationTest | LLM (Ollama qwq:32b) reads MCP schema and builds graphHints for retrieval; checks fallback behavior | mvn test -Dollama.integration=true -Dtest=McpToolLLMIntegrationTest |
VllmMcpToolIntegrationTest / VllmMcpCallTest | Validates MCP tool against vLLM server for end-to-end retrieval | mvn test -Dvllm.integration=true -Dtest=VllmMcpToolIntegrationTest |
Datasets Used in Tests
- Medical benchmark: 177 nodes, 455 edges, chunks across patients, doctors, conditions, medications (
medical_benchmark_data.json). Powers both benchmark suites. - SAP benchmark: Vendors, materials, invoices, projects, budgets with 4–6 hop relations (
sap_data/*.csvparsed bySapDataParser). - Fixtures for ad-hoc scenarios:
MedicalDomainFixturesandLegalDomainFixturescreate small graphs for MCP tool and retriever tests.
Reproduction Recipes
cd cef-framework
# Full default suite (benchmarks + graph + repository tests)
mvn test
# Specific reports
mvn -Dtest=MedicalBenchmarkTest test
mvn -Dtest=MedicalBenchmarkTest2 test
mvn -Dtest=SapBenchmarkTest test
# Verify graph store correctness only
mvn -Dtest=InMemoryKnowledgeGraphTest test
Reports are emitted alongside the module (cef-framework/BENCHMARK_REPORT*.md, SAP_BENCHMARK_REPORT.md). Integration tests log the external endpoints they hit (Ollama, vLLM) so you can confirm connectivity.
What to Watch
- Latency vs. coverage: Medical domains show 60–220% chunk lift with ~4–13 ms overhead; SAP shows parity in coverage but higher traversal cost (75–110% latency).
- Graph size constraints: JGraphT is validated up to ~100K nodes (see Known Issues for memory guidance). Move to Neo4j/pgvector when you exceed that.
- Fallback discipline: If graph traversal yields < 3 chunks, retrieval falls back to vector-only; benchmarks assert this behavior so regressions are caught.