Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    daffy0208

    knowledge-graph-builder

    daffy0208/knowledge-graph-builder
    Productivity
    9
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Design and build knowledge graphs. Use when modeling complex relationships, building semantic search, or creating knowledge bases...

    SKILL.md

    Knowledge Graph Builder

    Build structured knowledge graphs for enhanced AI system performance through relational knowledge.

    Core Principle

    Knowledge graphs make implicit relationships explicit, enabling AI systems to reason about connections, verify facts, and avoid hallucinations.

    When to Use Knowledge Graphs

    Use Knowledge Graphs When:

    • ✅ Complex entity relationships are central to your domain
    • ✅ Need to verify AI-generated facts against structured knowledge
    • ✅ Semantic search and relationship traversal required
    • ✅ Data has rich interconnections (people, organizations, products)
    • ✅ Need to answer "how are X and Y related?" queries
    • ✅ Building recommendation systems based on relationships
    • ✅ Fraud detection or pattern recognition across connected data

    Don't Use Knowledge Graphs When:

    • ❌ Simple tabular data (use relational DB)
    • ❌ Purely document-based search (use RAG with vector DB)
    • ❌ No significant relationships between entities
    • ❌ Team lacks graph modeling expertise
    • ❌ Read-heavy workload with no traversal (use traditional DB)

    6-Phase Knowledge Graph Implementation

    Phase 1: Ontology Design

    Goal: Define entities, relationships, and properties for your domain

    Entity Types (Nodes):

    • Person, Organization, Location, Product, Concept, Event, Document

    Relationship Types (Edges):

    • Hierarchical: IS_A, PART_OF, REPORTS_TO
    • Associative: WORKS_FOR, LOCATED_IN, AUTHORED_BY, RELATED_TO
    • Temporal: CREATED_ON, OCCURRED_BEFORE, OCCURRED_AFTER

    Properties (Attributes):

    • Node properties: id, name, type, created_at, metadata
    • Edge properties: type, confidence, source, timestamp

    Example Ontology:

    # RDF/Turtle format
    @prefix : <http://example.org/ontology#> .
    
    :Person a owl:Class ;
        rdfs:label "Person" .
    
    :Organization a owl:Class ;
        rdfs:label "Organization" .
    
    :worksFor a owl:ObjectProperty ;
        rdfs:domain :Person ;
        rdfs:range :Organization ;
        rdfs:label "works for" .
    

    Validation:

    • Entities cover all domain concepts
    • Relationships capture key connections
    • Ontology reviewed with domain experts
    • Classification hierarchy defined (is-a relationships)

    Phase 2: Graph Database Selection

    Decision Matrix:

    Neo4j (Recommended for most):

    • Pros: Mature, Cypher query language, graph algorithms, excellent visualization
    • Cons: Licensing costs for enterprise, scaling complexity
    • Use when: Complex queries, graph algorithms, team can learn Cypher

    Amazon Neptune:

    • Pros: Managed service, supports Gremlin and SPARQL, AWS integration
    • Cons: Vendor lock-in, more expensive than self-hosted
    • Use when: AWS infrastructure, need managed service, compliance requirements

    ArangoDB:

    • Pros: Multi-model (graph + document + key-value), JavaScript queries
    • Cons: Smaller community, fewer graph-specific features
    • Use when: Need document DB + graph in one system

    TigerGraph:

    • Pros: Best performance for deep traversals, parallel processing
    • Cons: Complex setup, higher learning curve
    • Use when: Massive graphs (billions of edges), real-time analytics

    Technology Stack:

    graph_database: 'Neo4j Community' # or Enterprise for production
    vector_integration: 'Pinecone' # For hybrid search
    embeddings: 'text-embedding-3-large' # OpenAI
    etl: 'Apache Airflow' # For data pipelines
    

    Neo4j Schema Setup:

    // Create constraints for uniqueness
    CREATE CONSTRAINT person_id IF NOT EXISTS
    FOR (p:Person) REQUIRE p.id IS UNIQUE;
    
    CREATE CONSTRAINT org_name IF NOT EXISTS
    FOR (o:Organization) REQUIRE o.name IS UNIQUE;
    
    // Create indexes for performance
    CREATE INDEX entity_search IF NOT EXISTS
    FOR (e:Entity) ON (e.name, e.type);
    
    CREATE INDEX relationship_type IF NOT EXISTS
    FOR ()-[r:RELATED_TO]-() ON (r.type, r.confidence);
    

    Phase 3: Entity Extraction & Relationship Building

    Goal: Extract entities and relationships from data sources

    Data Sources:

    • Structured: Databases, APIs, CSV files
    • Unstructured: Documents, web content, text files
    • Semi-structured: JSON, XML, knowledge bases

    Entity Extraction Pipeline:

    class EntityExtractionPipeline:
        def __init__(self):
            self.ner_model = load_ner_model()  # spaCy, Hugging Face
            self.entity_linker = EntityLinker()
            self.deduplicator = EntityDeduplicator()
    
        def process_text(self, text: str) -> List[Entity]:
            # 1. Extract named entities
            entities = self.ner_model.extract(text)
    
            # 2. Link to existing entities (entity resolution)
            linked_entities = self.entity_linker.link(entities)
    
            # 3. Deduplicate and resolve conflicts
            resolved_entities = self.deduplicator.resolve(linked_entities)
    
            return resolved_entities
    

    Relationship Extraction:

    class RelationshipExtractor:
        def extract_relationships(self, entities: List[Entity],
                                text: str) -> List[Relationship]:
            relationships = []
    
            # Use dependency parsing or LLM for extraction
            doc = self.nlp(text)
            for sent in doc.sents:
                rels = self.extract_from_sentence(sent, entities)
                relationships.extend(rels)
    
            # Validate against ontology
            valid_relationships = self.validate_relationships(relationships)
            return valid_relationships
    

    LLM-Based Extraction (for complex relationships):

    def extract_with_llm(text: str) -> List[Relationship]:
        prompt = f"""
        Extract entities and relationships from this text:
        {text}
    
        Format: (Entity1, Relationship, Entity2, Confidence)
        Only extract factual relationships.
        """
    
        response = llm.generate(prompt)
        relationships = parse_llm_response(response)
        return relationships
    

    Validation:

    • Entity extraction accuracy >85%
    • Entity deduplication working
    • Relationships validated against ontology
    • Confidence scores assigned

    Phase 4: Hybrid Knowledge-Vector Architecture

    Goal: Combine structured graph with semantic vector search

    Architecture:

    class HybridKnowledgeSystem:
        def __init__(self):
            self.graph_db = Neo4jConnection()
            self.vector_db = PineconeClient()
            self.embedding_model = OpenAIEmbeddings()
    
        def store_entity(self, entity: Entity):
            # Store structured data in graph
            self.graph_db.create_node(entity)
    
            # Store embeddings in vector database
            embedding = self.embedding_model.embed(entity.description)
            self.vector_db.upsert(
                id=entity.id,
                values=embedding,
                metadata=entity.metadata
            )
    
        def hybrid_search(self, query: str, top_k: int = 10) -> SearchResults:
            # 1. Vector similarity search
            query_embedding = self.embedding_model.embed(query)
            vector_results = self.vector_db.query(
                vector=query_embedding,
                top_k=100
            )
    
            # 2. Graph traversal from vector results
            entity_ids = [r.id for r in vector_results.matches]
            graph_results = self.graph_db.get_subgraph(entity_ids, max_hops=2)
    
            # 3. Merge and rank results
            merged = self.merge_results(vector_results, graph_results)
            return merged[:top_k]
    

    Benefits of Hybrid Approach:

    • Vector search: Semantic similarity, flexible queries
    • Graph traversal: Relationship-based reasoning, context expansion
    • Combined: Best of both worlds

    Phase 5: Query Patterns & API Design

    Common Query Patterns:

    1. Find Entity:

    MATCH (e:Entity {id: $entity_id})
    RETURN e
    

    2. Find Relationships:

    MATCH (source:Entity {id: $entity_id})-[r]-(target)
    RETURN source, r, target
    LIMIT 20
    

    3. Path Between Entities:

    MATCH path = shortestPath(
      (source:Person {id: $source_id})-[*..5]-(target:Person {id: $target_id})
    )
    RETURN path
    

    4. Multi-Hop Traversal:

    MATCH (p:Person {name: $name})-[:WORKS_FOR]->(o:Organization)-[:LOCATED_IN]->(l:Location)
    RETURN p.name, o.name, l.city
    

    5. Recommendation Query:

    // Find people similar to this person based on shared organizations
    MATCH (p1:Person {id: $person_id})-[:WORKS_FOR]->(o:Organization)<-[:WORKS_FOR]-(p2:Person)
    WHERE p1 <> p2
    RETURN p2, COUNT(o) AS shared_orgs
    ORDER BY shared_orgs DESC
    LIMIT 10
    

    Knowledge Graph API:

    class KnowledgeGraphAPI:
        def __init__(self, graph_db):
            self.graph = graph_db
    
        def find_entity(self, entity_name: str) -> Entity:
            """Find entity by name with fuzzy matching"""
            query = """
            MATCH (e:Entity)
            WHERE e.name CONTAINS $name
            RETURN e
            ORDER BY apoc.text.levenshtein(e.name, $name)
            LIMIT 1
            """
            return self.graph.run(query, name=entity_name).single()
    
        def find_relationships(self, entity_id: str,
                             relationship_type: str = None,
                             max_hops: int = 2) -> List[Relationship]:
            """Find relationships within specified hops"""
            query = f"""
            MATCH (source:Entity {{id: $entity_id}})
            MATCH path = (source)-[r*1..{max_hops}]-(target)
            RETURN path, relationships(path) AS rels
            LIMIT 100
            """
            return self.graph.run(query, entity_id=entity_id).data()
    
        def get_subgraph(self, entity_ids: List[str],
                        max_hops: int = 2) -> Subgraph:
            """Get connected subgraph for multiple entities"""
            query = f"""
            MATCH (e:Entity)
            WHERE e.id IN $entity_ids
            CALL apoc.path.subgraphAll(e, {{maxLevel: {max_hops}}})
            YIELD nodes, relationships
            RETURN nodes, relationships
            """
            return self.graph.run(query, entity_ids=entity_ids).data()
    

    Phase 6: AI Integration & Hallucination Prevention

    Goal: Use knowledge graph to ground LLM responses and detect hallucinations

    Knowledge Graph RAG:

    class KnowledgeGraphRAG:
        def __init__(self, kg_api, llm_client):
            self.kg = kg_api
            self.llm = llm_client
    
        def retrieve_context(self, query: str) -> str:
            # Extract entities from query
            entities = self.extract_entities_from_query(query)
    
            # Retrieve relevant subgraph
            subgraph = self.kg.get_subgraph(
                [e.id for e in entities],
                max_hops=2
            )
    
            # Format subgraph for LLM
            context = self.format_subgraph_for_llm(subgraph)
            return context
    
        def generate_with_grounding(self, query: str) -> GroundedResponse:
            context = self.retrieve_context(query)
    
            prompt = f"""
            Context from knowledge graph:
            {context}
    
            User query: {query}
    
            Answer based only on the provided context. Include source entities.
            """
    
            response = self.llm.generate(prompt)
    
            return GroundedResponse(
                response=response,
                sources=self.extract_sources(context),
                confidence=self.calculate_confidence(response, context)
            )
    

    Hallucination Detection:

    class HallucinationDetector:
        def __init__(self, knowledge_graph):
            self.kg = knowledge_graph
    
        def verify_claim(self, claim: str) -> VerificationResult:
            # Parse claim into (subject, predicate, object)
            parsed_claim = self.parse_claim(claim)
    
            # Query knowledge graph for evidence
            evidence = self.kg.find_evidence(
                parsed_claim.subject,
                parsed_claim.predicate,
                parsed_claim.object
            )
    
            if evidence:
                return VerificationResult(
                    is_supported=True,
                    evidence=evidence,
                    confidence=evidence.confidence
                )
    
            # Check for contradictory evidence
            contradiction = self.kg.find_contradiction(parsed_claim)
    
            return VerificationResult(
                is_supported=False,
                is_contradicted=bool(contradiction),
                contradiction=contradiction
            )
    

    Key Principles

    1. Start with Ontology

    Define your schema before ingesting data. Changing ontology later is expensive.

    2. Entity Resolution is Critical

    Deduplicate entities aggressively. "Apple Inc", "Apple", "Apple Computer" → same entity.

    3. Confidence Scores on Everything

    Every relationship should have a confidence score (0.0-1.0) and source.

    4. Incremental Building

    Don't try to model entire domain at once. Start with core entities and expand.

    5. Hybrid Architecture Wins

    Combine graph traversal (structured) with vector search (semantic) for best results.


    Common Use Cases

    1. Question Answering:

    • Extract entities from question
    • Traverse graph to find answer
    • Return path as explanation

    2. Recommendation:

    • Find similar entities via shared relationships
    • Rank by relationship strength
    • Return top-K recommendations

    3. Fraud Detection:

    • Model transactions as graph
    • Find suspicious patterns (cycles, anomalies)
    • Flag for review

    4. Knowledge Discovery:

    • Identify implicit relationships
    • Suggest missing connections
    • Validate with domain experts

    5. Semantic Search:

    • Hybrid vector + graph search
    • Expand context via relationships
    • Return rich connected results

    Technology Recommendations

    For MVPs (<10K entities):

    • Neo4j Community Edition (free)
    • SQLite for metadata
    • OpenAI embeddings
    • FastAPI for API layer

    For Production (10K-1M entities):

    • Neo4j Enterprise or ArangoDB
    • Pinecone for vector search
    • Airflow for ETL
    • GraphQL API

    For Enterprise (1M+ entities):

    • Neo4j Enterprise or TigerGraph
    • Distributed vector DB (Pinecone, Weaviate)
    • Kafka for streaming
    • Kubernetes deployment

    Validation Checklist

    • Ontology designed and validated with domain experts
    • Graph database selected and set up
    • Entity extraction pipeline tested (>85% accuracy)
    • Relationship extraction validated
    • Hybrid search (graph + vector) implemented
    • Query API created and documented
    • AI integration tested (RAG or hallucination detection)
    • Performance benchmarks met (query <100ms for common patterns)
    • Data quality monitoring in place
    • Backup and recovery tested

    Related Resources

    Related Skills:

    • rag-implementer - For hybrid KG+RAG systems
    • multi-agent-architect - For knowledge-graph-powered agents
    • api-designer - For KG API design

    Related Patterns:

    • META/DECISION-FRAMEWORK.md - Graph DB selection
    • STANDARDS/architecture-patterns/knowledge-graph-pattern.md - KG architectures (when created)

    Related Playbooks:

    • PLAYBOOKS/deploy-neo4j.md - Neo4j deployment (when created)
    • PLAYBOOKS/build-kg-rag-system.md - KG-RAG integration (when created)
    Recommended Servers
    InfraNodus Knowledge Graphs & Text Analysis
    InfraNodus Knowledge Graphs & Text Analysis
    Confluence
    Confluence
    Memory Tool
    Memory Tool
    Repository
    daffy0208/ai-dev-standards
    Files