Skip to main content

Configuration

Comprehensive configuration guide for CEF Framework.

Configuration Overview

CEF uses Spring Boot's configuration system with YAML files. All configuration is under the cef namespace.

cef:
database: # Database backend configuration
graph: # Graph store configuration
vector: # Vector store configuration
llm: # LLM provider configuration
embedding: # Embedding configuration
retrieval: # Retrieval strategy configuration
indexing: # Indexing configuration

Database Configuration

DuckDB (Default)

Embedded database, perfect for development and testing:

cef:
database:
type: duckdb
duckdb:
path: ./data/cef.duckdb # Database file location
schema: graph # Schema name
in-memory: false # Set true for in-memory database

Pros:

  • Zero configuration
  • Fast for <100K entities
  • Embedded, no external services
  • Great for development/testing

Cons:

  • Single-threaded writes
  • Limited to one process
  • No true ACID transactions

PostgreSQL

Production-grade database with pgvector extension:

cef:
database:
type: postgresql
postgresql:
enabled: true
host: localhost
port: 5432
database: cef_db
username: cef_user
password: ${DB_PASSWORD} # Use environment variable
schema: graph
pool-size: 20 # Connection pool size

Spring R2DBC Connection (required for reactive database access):

spring:
r2dbc:
url: r2dbc:postgresql://localhost:5432/cef_db
username: cef_user
password: ${DB_PASSWORD}
pool:
initial-size: 5
max-size: 20
max-idle-time: 30m

Pros:

  • Production-grade ACID compliance
  • Concurrent read/write
  • pgvector extension for efficient vector search
  • Battle-tested scalability

Cons:

  • Requires external service
  • More complex setup

Graph Store Configuration (v0.6 - 5 Backends)

DuckDB (Default)

Embedded SQL graph store:

cef:
graph:
store: duckdb # Default - no external DB needed

Recommended for: Development, embedded deployments

In-Memory (JGraphT)

In-memory graph with O(1) lookups:

cef:
graph:
store: in-memory
thread-safe: true # Enable ReadWriteLock wrapper
max-traversal-depth: 5

Recommended for: <100K nodes, development, CI/CD

Neo4j (Tested in v0.6)

Dedicated graph database for large-scale deployments:

cef:
graph:
store: neo4j
neo4j:
uri: bolt://localhost:7687
username: neo4j
password: ${NEO4J_PASSWORD}
database: neo4j
connection-pool-size: 50

Recommended for: >100K nodes, production, complex Cypher queries

PostgreSQL SQL (Tested in v0.6)

Pure SQL adjacency tables - no extensions needed:

cef:
graph:
store: pg-sql
postgres:
max-traversal-depth: 5

Recommended for: Maximum PostgreSQL compatibility

PostgreSQL AGE (Tested in v0.6)

Apache AGE extension for Cypher on PostgreSQL:

cef:
graph:
store: pg-age
postgres:
graph-name: cef_graph

Recommended for: Cypher queries without Neo4j infrastructure

Vector Store Configuration (v0.6 - 4 Backends)

DuckDB Vector Store (Default)

Uses DuckDB's VSS extension:

cef:
vector:
store: duckdb
dimension: 768 # Embedding dimension (nomic-embed-text default)

Pros: Same database as graph data, simple setup, fast for <10K chunks
Cons: Brute-force search only (no HNSW index)

In-Memory Vector Store

ConcurrentHashMap-based for development:

cef:
vector:
store: in-memory
dimension: 768

Pros: Zero dependencies, fast for testing
Cons: No persistence, limited scale

Neo4j Vector Store (Tested in v0.6)

Neo4j 5.11+ native vector indexes:

cef:
vector:
store: neo4j
dimension: 768

Pros: Unified with Neo4j graph, production-grade
Cons: Requires Neo4j 5.11+

PostgreSQL Vector Store (Tested in v0.6)

Uses pgvector extension with reactive R2DBC:

cef:
vector:
store: postgresql
dimension: 768

spring:
r2dbc:
url: r2dbc:postgresql://localhost:5432/cef_db
username: cef_user
password: cef_password

Pros: HNSW index, scalable to millions of vectors, production-grade
Cons: Requires pgvector extension

Qdrant (Configured, Untested)

Specialized vector database:

cef:
vector:
store: qdrant
qdrant:
host: localhost
port: 6333
collection: cef_vectors
dimension: 768

LLM Provider Configuration

Local LLM server:

cef:
llm:
default-provider: ollama
ollama:
base-url: http://localhost:11434
model: llama3.2:3b # or llama3.1:70b, qwen2.5:32b
timeout: 60s

High-performance inference server:

cef:
llm:
default-provider: vllm
vllm:
base-url: http://localhost:8000
model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
max-tokens: 4096
temperature: 0.7

OpenAI

Cloud-hosted LLM:

cef:
llm:
default-provider: openai
openai:
api-key: ${OPENAI_API_KEY}
model: gpt-4o-mini
base-url: https://api.openai.com
timeout: 30s

Embedding Configuration

Ollama Embeddings (Default)

cef:
embedding:
provider: ollama
model: nomic-embed-text
dimension: 768
batch-size: 100 # Batch size for embedding generation

Models available:

  • nomic-embed-text (768 dims) - General purpose, default
  • mxbai-embed-large (1024 dims) - Higher quality
  • all-minilm (384 dims) - Smaller, faster

OpenAI Embeddings

cef:
embedding:
provider: openai
model: text-embedding-3-small
dimension: 1536
api-key: ${OPENAI_API_KEY}

Models available:

  • text-embedding-3-small (1536 dims) - Cost-effective
  • text-embedding-3-large (3072 dims) - Highest quality
  • text-embedding-ada-002 (1536 dims) - Legacy model

Retrieval Configuration

Hybrid Retrieval Strategy

cef:
retrieval:
default-strategy: hybrid # hybrid, vector, graph
hybrid:
vector-weight: 0.7 # Weight for semantic similarity
bm25-weight: 0.3 # Weight for keyword matching
top-k: 10 # Number of chunks to retrieve
min-score: 0.5 # Minimum similarity score
fallback-threshold: 3 # Fall back to vector-only if <3 graph results

Strategy Options

  1. hybrid (default): Combines graph traversal + semantic search
  2. vector: Pure semantic search only
  3. graph: Graph traversal only

Indexing Configuration

cef:
indexing:
batch-size: 100 # Batch size for bulk indexing
chunk-size: 512 # Tokens per chunk
chunk-overlap: 50 # Overlapping tokens between chunks
auto-embed: true # Automatically generate embeddings on index
parallel: false # Parallel indexing (experimental)

Context Assembly Configuration

cef:
context:
token-budget: 4000 # Maximum tokens for assembled context
max-queries: 5 # Maximum graph queries per retrieval
deduplicate: true # Remove duplicate chunks
include-metadata: true # Include chunk metadata

Complete Example Configuration

Development Setup

cef:
database:
type: duckdb
duckdb:
path: ./data/cef.duckdb

graph:
store: jgrapht
in-memory: true
load-on-startup: true

vector:
store: duckdb
dimension: 768

llm:
default-provider: ollama
ollama:
base-url: http://localhost:11434
model: llama3.2:3b

embedding:
provider: ollama
model: nomic-embed-text
dimension: 768

retrieval:
default-strategy: hybrid
top-k: 10

logging:
level:
org.ddse.ml.cef: DEBUG

Production Setup (Experimental)

Note: This configuration uses production-grade components (PostgreSQL, vLLM) but the framework integration is currently in alpha.

cef:
database:
type: postgresql
postgresql:
enabled: true
host: ${DB_HOST}
port: 5432
database: cef_production
username: ${DB_USER}
password: ${DB_PASSWORD}
pool-size: 50

graph:
store: jgrapht # or neo4j for >100K nodes
in-memory: true
load-on-startup: true

vector:
store: postgres
dimension: 768
postgres:
hnsw-index: true
hnsw-m: 16

llm:
default-provider: vllm
vllm:
base-url: ${VLLM_URL}
model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

embedding:
provider: ollama
model: nomic-embed-text
dimension: 768
batch-size: 200

retrieval:
default-strategy: hybrid
top-k: 20
min-score: 0.6

spring:
r2dbc:
url: r2dbc:postgresql://${DB_HOST}:5432/cef_production
username: ${DB_USER}
password: ${DB_PASSWORD}
pool:
initial-size: 10
max-size: 50

logging:
level:
org.ddse.ml.cef: INFO
org.springframework.ai: WARN

Environment Variables

Use environment variables for sensitive configuration:

# .env file
DB_PASSWORD=your_secure_password
OPENAI_API_KEY=sk-...
VLLM_URL=http://vllm-server:8000

Access in configuration:

cef:
database:
postgresql:
password: ${DB_PASSWORD}

Configuration Profiles

Use Spring profiles for environment-specific configuration:

# application.yml (shared)
cef:
embedding:
model: nomic-embed-text

---
# application-dev.yml
spring:
config:
activate:
on-profile: dev

cef:
database:
type: duckdb

---
# application-prod.yml
spring:
config:
activate:
on-profile: prod

cef:
database:
type: postgresql

Run with profile:

java -jar app.jar --spring.profiles.active=prod

Next Steps