Configuration
Comprehensive configuration guide for CEF Framework.
Configuration Overview
CEF uses Spring Boot's configuration system with YAML files. All configuration is under the cef namespace.
cef:
database: # Database backend configuration
graph: # Graph store configuration
vector: # Vector store configuration
llm: # LLM provider configuration
embedding: # Embedding configuration
retrieval: # Retrieval strategy configuration
indexing: # Indexing configuration
Database Configuration
DuckDB (Default)
Embedded database, perfect for development and testing:
cef:
database:
type: duckdb
duckdb:
path: ./data/cef.duckdb # Database file location
schema: graph # Schema name
in-memory: false # Set true for in-memory database
Pros:
- Zero configuration
- Fast for <100K entities
- Embedded, no external services
- Great for development/testing
Cons:
- Single-threaded writes
- Limited to one process
- No true ACID transactions
PostgreSQL
Production-grade database with pgvector extension:
cef:
database:
type: postgresql
postgresql:
enabled: true
host: localhost
port: 5432
database: cef_db
username: cef_user
password: ${DB_PASSWORD} # Use environment variable
schema: graph
pool-size: 20 # Connection pool size
Spring R2DBC Connection (required for reactive database access):
spring:
r2dbc:
url: r2dbc:postgresql://localhost:5432/cef_db
username: cef_user
password: ${DB_PASSWORD}
pool:
initial-size: 5
max-size: 20
max-idle-time: 30m
Pros:
- Production-grade ACID compliance
- Concurrent read/write
- pgvector extension for efficient vector search
- Battle-tested scalability
Cons:
- Requires external service
- More complex setup
Graph Store Configuration (v0.6 - 5 Backends)
DuckDB (Default)
Embedded SQL graph store:
cef:
graph:
store: duckdb # Default - no external DB needed
Recommended for: Development, embedded deployments
In-Memory (JGraphT)
In-memory graph with O(1) lookups:
cef:
graph:
store: in-memory
thread-safe: true # Enable ReadWriteLock wrapper
max-traversal-depth: 5
Recommended for: <100K nodes, development, CI/CD
Neo4j (Tested in v0.6)
Dedicated graph database for large-scale deployments:
cef:
graph:
store: neo4j
neo4j:
uri: bolt://localhost:7687
username: neo4j
password: ${NEO4J_PASSWORD}
database: neo4j
connection-pool-size: 50
Recommended for: >100K nodes, production, complex Cypher queries
PostgreSQL SQL (Tested in v0.6)
Pure SQL adjacency tables - no extensions needed:
cef:
graph:
store: pg-sql
postgres:
max-traversal-depth: 5
Recommended for: Maximum PostgreSQL compatibility
PostgreSQL AGE (Tested in v0.6)
Apache AGE extension for Cypher on PostgreSQL:
cef:
graph:
store: pg-age
postgres:
graph-name: cef_graph
Recommended for: Cypher queries without Neo4j infrastructure
Vector Store Configuration (v0.6 - 4 Backends)
DuckDB Vector Store (Default)
Uses DuckDB's VSS extension:
cef:
vector:
store: duckdb
dimension: 768 # Embedding dimension (nomic-embed-text default)
Pros: Same database as graph data, simple setup, fast for <10K chunks
Cons: Brute-force search only (no HNSW index)
In-Memory Vector Store
ConcurrentHashMap-based for development:
cef:
vector:
store: in-memory
dimension: 768
Pros: Zero dependencies, fast for testing
Cons: No persistence, limited scale
Neo4j Vector Store (Tested in v0.6)
Neo4j 5.11+ native vector indexes:
cef:
vector:
store: neo4j
dimension: 768
Pros: Unified with Neo4j graph, production-grade
Cons: Requires Neo4j 5.11+
PostgreSQL Vector Store (Tested in v0.6)
Uses pgvector extension with reactive R2DBC:
cef:
vector:
store: postgresql
dimension: 768
spring:
r2dbc:
url: r2dbc:postgresql://localhost:5432/cef_db
username: cef_user
password: cef_password
Pros: HNSW index, scalable to millions of vectors, production-grade
Cons: Requires pgvector extension
Qdrant (Configured, Untested)
Specialized vector database:
cef:
vector:
store: qdrant
qdrant:
host: localhost
port: 6333
collection: cef_vectors
dimension: 768
LLM Provider Configuration
Ollama (Recommended for Development)
Local LLM server:
cef:
llm:
default-provider: ollama
ollama:
base-url: http://localhost:11434
model: llama3.2:3b # or llama3.1:70b, qwen2.5:32b
timeout: 60s
vLLM (Recommended for Production)
High-performance inference server:
cef:
llm:
default-provider: vllm
vllm:
base-url: http://localhost:8000
model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
max-tokens: 4096
temperature: 0.7
OpenAI
Cloud-hosted LLM:
cef:
llm:
default-provider: openai
openai:
api-key: ${OPENAI_API_KEY}
model: gpt-4o-mini
base-url: https://api.openai.com
timeout: 30s
Embedding Configuration
Ollama Embeddings (Default)
cef:
embedding:
provider: ollama
model: nomic-embed-text
dimension: 768
batch-size: 100 # Batch size for embedding generation
Models available:
nomic-embed-text(768 dims) - General purpose, defaultmxbai-embed-large(1024 dims) - Higher qualityall-minilm(384 dims) - Smaller, faster
OpenAI Embeddings
cef:
embedding:
provider: openai
model: text-embedding-3-small
dimension: 1536
api-key: ${OPENAI_API_KEY}
Models available:
text-embedding-3-small(1536 dims) - Cost-effectivetext-embedding-3-large(3072 dims) - Highest qualitytext-embedding-ada-002(1536 dims) - Legacy model
Retrieval Configuration
Hybrid Retrieval Strategy
cef:
retrieval:
default-strategy: hybrid # hybrid, vector, graph
hybrid:
vector-weight: 0.7 # Weight for semantic similarity
bm25-weight: 0.3 # Weight for keyword matching
top-k: 10 # Number of chunks to retrieve
min-score: 0.5 # Minimum similarity score
fallback-threshold: 3 # Fall back to vector-only if <3 graph results
Strategy Options
- hybrid (default): Combines graph traversal + semantic search
- vector: Pure semantic search only
- graph: Graph traversal only
Indexing Configuration
cef:
indexing:
batch-size: 100 # Batch size for bulk indexing
chunk-size: 512 # Tokens per chunk
chunk-overlap: 50 # Overlapping tokens between chunks
auto-embed: true # Automatically generate embeddings on index
parallel: false # Parallel indexing (experimental)
Context Assembly Configuration
cef:
context:
token-budget: 4000 # Maximum tokens for assembled context
max-queries: 5 # Maximum graph queries per retrieval
deduplicate: true # Remove duplicate chunks
include-metadata: true # Include chunk metadata
Complete Example Configuration
Development Setup
cef:
database:
type: duckdb
duckdb:
path: ./data/cef.duckdb
graph:
store: jgrapht
in-memory: true
load-on-startup: true
vector:
store: duckdb
dimension: 768
llm:
default-provider: ollama
ollama:
base-url: http://localhost:11434
model: llama3.2:3b
embedding:
provider: ollama
model: nomic-embed-text
dimension: 768
retrieval:
default-strategy: hybrid
top-k: 10
logging:
level:
org.ddse.ml.cef: DEBUG
Production Setup (Experimental)
Note: This configuration uses production-grade components (PostgreSQL, vLLM) but the framework integration is currently in alpha.
cef:
database:
type: postgresql
postgresql:
enabled: true
host: ${DB_HOST}
port: 5432
database: cef_production
username: ${DB_USER}
password: ${DB_PASSWORD}
pool-size: 50
graph:
store: jgrapht # or neo4j for >100K nodes
in-memory: true
load-on-startup: true
vector:
store: postgres
dimension: 768
postgres:
hnsw-index: true
hnsw-m: 16
llm:
default-provider: vllm
vllm:
base-url: ${VLLM_URL}
model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
embedding:
provider: ollama
model: nomic-embed-text
dimension: 768
batch-size: 200
retrieval:
default-strategy: hybrid
top-k: 20
min-score: 0.6
spring:
r2dbc:
url: r2dbc:postgresql://${DB_HOST}:5432/cef_production
username: ${DB_USER}
password: ${DB_PASSWORD}
pool:
initial-size: 10
max-size: 50
logging:
level:
org.ddse.ml.cef: INFO
org.springframework.ai: WARN
Environment Variables
Use environment variables for sensitive configuration:
# .env file
DB_PASSWORD=your_secure_password
OPENAI_API_KEY=sk-...
VLLM_URL=http://vllm-server:8000
Access in configuration:
cef:
database:
postgresql:
password: ${DB_PASSWORD}
Configuration Profiles
Use Spring profiles for environment-specific configuration:
# application.yml (shared)
cef:
embedding:
model: nomic-embed-text
---
# application-dev.yml
spring:
config:
activate:
on-profile: dev
cef:
database:
type: duckdb
---
# application-prod.yml
spring:
config:
activate:
on-profile: prod
cef:
database:
type: postgresql
Run with profile:
java -jar app.jar --spring.profiles.active=prod
Next Steps
- Learn about Knowledge Models
- Follow the Quick Start Tutorial
- Explore Advanced Configuration