Knowledge Model

Understanding the core abstraction of CEF Framework.

What is a Knowledge Model?

A Knowledge Model in CEF is a graph-based representation of domain entities and their relationships, designed specifically for LLM context engineering. Think of it as a schema for your knowledge, similar to how a database schema defines tables and relationships for transactional data.

Comparison with Traditional Models

Aspect	Traditional ORM (Hibernate)	Knowledge ORM (CEF)
Purpose	Persist transactional data	Assemble LLM context
Entities	`@Entity` classes	`Node` instances
Relationships	`@ManyToOne`, `@OneToMany`	`Edge` with semantic types
Query	SQL/JPQL	Graph traversal + vector search
Storage	Relational tables	Dual: Graph + Vector stores
Access Pattern	Random access by ID	Context assembly by query

Core Components

1. Node - The Universal Entity

Every entity in your domain becomes a Node:

public class Node {
    UUID id;                        // Unique identifier
    String label;                   // Entity type ("Patient", "Product", etc.)
    Map<String, Object> properties; // Flexible attributes (JSONB)
    String vectorizableContent;     // Text for semantic search
    Timestamp created, updated;
}

Example - Medical Domain:

Node patient = new Node(
    null,  // Auto-generate ID
    "Patient",  // Entity type
    Map.of(
        "name", "John Doe",
        "age", 45,
        "gender", "M",
        "mrn", "MRN-12345"
    ),
    // Vectorizable content - rich description for semantic search
    "45-year-old male patient with history of type 2 diabetes mellitus, " +
    "hypertension, and hyperlipidemia. Recently experienced chest pain " +
    "and shortness of breath. Currently prescribed Metformin 1000mg BID, " +
    "Lisinopril 10mg QD, and Atorvastatin 40mg QD."
);

Example - E-Commerce Domain:

Node product = new Node(
    null,
    "Product",
    Map.of(
        "sku", "LAPTOP-XPS-15",
        "name", "Dell XPS 15",
        "price", 1899.99,
        "category", "Electronics",
        "stock", 42
    ),
    "Dell XPS 15 laptop with 15.6-inch 4K OLED display, Intel Core i9 " +
    "processor, 32GB RAM, 1TB NVMe SSD. Perfect for content creators and " +
    "professionals. Includes Thunderbolt 4 ports, Wi-Fi 6E, and premium " +
    "aluminum build quality."
);

2. Edge - Typed Relationships

Relationships between nodes are represented as typed edges:

public class Edge {
    UUID id;
    String relationType;            // Semantic relationship type
    UUID sourceNodeId;              // Source node
    UUID targetNodeId;              // Target node
    Map<String, Object> properties; // Relationship attributes
    Double weight;                  // Optional weight for graph algorithms
    Timestamp created;
}

Example - Medical Relationships:

// Patient HAS_CONDITION Diabetes
Edge hasCondition = new Edge(
    null,
    "HAS_CONDITION",
    patientId,
    diabetesId,
    Map.of(
        "diagnosedOn", "2023-01-15",
        "severity", "moderate",
        "status", "active"
    ),
    1.0
);

// Doctor TREATS Patient
Edge treats = new Edge(
    null,
    "TREATS",
    doctorId,
    patientId,
    Map.of(
        "since", "2023-01-20",
        "specialty", "endocrinology"
    ),
    1.0
);

3. RelationType - Semantic Annotations

Define relationship types with semantic hints for intelligent traversal:

public class RelationType {
    String name;                    // "TREATS", "HAS_CONDITION", etc.
    String sourceLabel;             // Expected source node label
    String targetLabel;             // Expected target node label
    RelationSemantics semantics;    // Semantic category
    boolean directed;               // Is relationship directional?
}

Semantic Categories:

public enum RelationSemantics {
    HIERARCHY,        // Parent-child (e.g., "IS_PART_OF")
    CLASSIFICATION,   // Type-instance (e.g., "IS_TYPE_OF")
    ASSOCIATION,      // Peer-to-peer (e.g., "KNOWS", "WORKS_WITH")
    ATTRIBUTION,      // Ownership/has (e.g., "HAS_CONDITION", "OWNS")
    CAUSALITY,        // Cause-effect (e.g., "CAUSES", "TREATS")
    TEMPORAL,         // Time-based (e.g., "FOLLOWS", "PRECEDES")
    REFERENCE         // Cross-reference (e.g., "MENTIONS", "CITES")
}

Example - Medical Domain:

List<RelationType> medicalRelations = List.of(
    new RelationType("TREATS", "Doctor", "Patient", 
        RelationSemantics.ASSOCIATION, true,
        "Doctor provides treatment to patient"),
    
    new RelationType("HAS_CONDITION", "Patient", "Condition",
        RelationSemantics.ATTRIBUTION, false,
        "Patient has medical condition"),
    
    new RelationType("PRESCRIBED_MEDICATION", "Patient", "Medication",
        RelationSemantics.CAUSALITY, false,
        "Patient is prescribed medication"),
    
    new RelationType("IS_TYPE_OF", "Condition", "ConditionCategory",
        RelationSemantics.HIERARCHY, true,
        "Condition is subtype of category")
);

4. Chunk - Vectorized Content

Text chunks with embeddings for semantic search:

public class Chunk {
    UUID id;
    String content;                 // Text content
    float[] embedding;              // Vector embedding
    UUID linkedNodeId;              // Optional: linked to a Node
    Map<String, Object> metadata;   // Source, author, date, etc.
    Timestamp created;
}

Why Chunks?

Nodes represent entities, Chunks represent text content:

Node: "Patient John Doe" (structured entity)
Chunk: "John's medical history describes..." (semantic text)

Chunks enable:

Semantic search across unstructured text
Large text documents split into manageable pieces
Vector similarity matching

Dual Persistence Architecture

CEF automatically manages dual persistence:

Graph Store (Relationships)

Stores Node and Edge entities for:

Fast relationship traversal (multi-hop queries)
Graph algorithms (shortest path, neighbors)
Structural reasoning

Implementations:

JGraphT (default): In-memory, <100K nodes, O(1) lookups
Neo4j (planned): Millions of nodes, Cypher queries

Vector Store (Semantics)

Stores Chunk entities with embeddings for:

Semantic similarity search
Full-text search
Content retrieval

Implementations:

DuckDB (default): Embedded, brute-force search
PostgreSQL + pgvector: HNSW index, production-grade
Qdrant (planned): Specialized vector database

How They Work Together

Indexing:
```
Node patient = new Node(..., vectorizableContent);
indexer.indexNode(patient);  // Stores in BOTH graph + vector
```
- Graph store: Saves node with properties
- Vector store: Chunks content, generates embeddings, saves chunks linked to node
Retrieval:
```
retriever.retrieve(query);
```
- Vector search: Finds semantically similar chunks
- Graph traversal: Follows edges from chunk's linked node
- Context assembly: Combines graph neighborhood + semantic results

Knowledge Model Lifecycle

1. Definition Phase

Define your domain entities and relationships:

@Configuration
public class DomainModelConfig {
    
    @PostConstruct
    public void initializeModel() {
        List<RelationType> relations = List.of(
            new RelationType("HAS_CONDITION", "Patient", "Condition",
                RelationSemantics.ATTRIBUTION, false, "..."),
            // ... more relations
        );
        
        indexer.initialize(relations).block();
    }
}

2. Indexing Phase

Populate the knowledge model with data:

// Index individual nodes
Mono<Node> savedPatient = indexer.indexNode(patientInput);

// Index edges
Mono<Edge> savedEdge = indexer.indexEdge(edgeInput);

// Bulk indexing
BatchInput batch = new BatchInput(nodes, edges, chunks);
Mono<BatchIndexResult> result = indexer.batchIndex(batch);

3. Retrieval Phase

Query the knowledge model for LLM context:

RetrievalRequest request = RetrievalRequest.builder()
    .query("Find patients with diabetes on insulin therapy")
    .depth(2)  // Multi-hop traversal
    .topK(10)
    .build();

Mono<RetrievalResult> result = retriever.retrieve(request);

4. Maintenance Phase

Update and manage the knowledge model:

// Update node properties
indexer.updateNode(nodeId, Map.of("status", "inactive"));

// Delete node and edges
indexer.deleteNode(nodeId, cascade = true);

// Reindex after major changes
indexer.fullIndex();

Design Principles

1. Domain Agnostic

CEF doesn't prescribe your domain model. You define:

Node labels (entity types)
Edge types (relationship types)
Property schemas

2. Flexible Schema

Nodes use JSONB properties for flexibility:

// Medical domain
Map.of("mrn", "12345", "age", 45, "gender", "M")

// E-commerce domain
Map.of("sku", "ABC123", "price", 99.99, "stock", 42)

3. Semantic Awareness

RelationSemantics guide intelligent traversal:

HIERARCHY: Navigate up/down organizational structures
CAUSALITY: Follow cause-effect chains
ASSOCIATION: Explore peer relationships

4. Vectorizable Content

Separate structured properties from semantic content:

Properties: Structured data for filtering/sorting
Vectorizable content: Rich text for semantic search

Best Practices

1. Meaningful Labels

Use descriptive, domain-specific labels:

// Good
"Patient", "Medication", "Doctor"

// Bad
"Entity1", "Thing", "Item"

2. Rich Vectorizable Content

Provide detailed descriptions for better semantic search:

// Good
"45-year-old male with type 2 diabetes, hypertension, and hyperlipidemia. 
 Currently prescribed Metformin 1000mg BID..."

// Bad
"Patient has diabetes"

3. Appropriate Granularity

Balance between too fine-grained (noise) and too coarse (loss of detail):

// Too fine-grained
Node bloodPressureReading = new Node(...); // Each reading as separate node

// Too coarse
Node allPatientData = new Node(...); // Everything in one node

// Just right
Node patient = new Node(...);  // Patient entity
Chunk visitNote = new Chunk(...);  // Individual visit notes as chunks

4. Semantic Relationship Types

Choose relationship types that match your query patterns:

// Medical queries: "Find patients treated by Dr. Smith"
RelationType("TREATS", "Doctor", "Patient", ...)

// E-commerce queries: "Find products in same category"
RelationType("IN_CATEGORY", "Product", "Category", ...)

Next Steps

Learn about Indexing to populate your knowledge model
Explore Retrieval Strategies to query your model
See Examples for complete implementations

What is a Knowledge Model?​

Comparison with Traditional Models​

Core Components​

1. Node - The Universal Entity​

2. Edge - Typed Relationships​

3. RelationType - Semantic Annotations​

4. Chunk - Vectorized Content​

Dual Persistence Architecture​

Graph Store (Relationships)​

Vector Store (Semantics)​

How They Work Together​

Knowledge Model Lifecycle​

1. Definition Phase​

2. Indexing Phase​

3. Retrieval Phase​

4. Maintenance Phase​

Design Principles​

1. Domain Agnostic​

2. Flexible Schema​

3. Semantic Awareness​

4. Vectorizable Content​

Best Practices​

1. Meaningful Labels​

2. Rich Vectorizable Content​

3. Appropriate Granularity​

4. Semantic Relationship Types​

Next Steps​