knowledge-base-manager

daffy0208/knowledge-base-manager

Productivity

1 installs

About

SKILL.md

Knowledge Base Manager

Build and maintain high-quality knowledge bases for AI systems and human consumption.

Core Principle

Knowledge Base = Structured Information + Quality Curation + Accessibility

A knowledge base is not just a data dump—it's curated, validated, versioned information designed to answer questions and enable reasoning.

When to Use Knowledge Bases

Use Knowledge Bases When:

✅ Need to answer factual questions consistently
✅ Information changes frequently and needs version control
✅ Multiple sources need to be unified and reconciled
✅ Provenance and citation tracking is critical
✅ Building AI systems that need grounded, verifiable information
✅ Organizational knowledge needs to be preserved and searchable
✅ Complex domain with interconnected concepts

Don't Use Knowledge Bases When:

❌ Static documentation is sufficient (use docs + search)
❌ No one will maintain/update it (knowledge rot guaranteed)
❌ Simple FAQ covers all questions (<50 items)
❌ Information doesn't change (static site faster/cheaper)
❌ Team lacks resources for curation

Knowledge Base Types: Decision Framework

1. Document-Based Knowledge Base (RAG)

What it is: Collection of documents, chunked and embedded for semantic search

Best for:

Technical documentation
Support articles, FAQs
Policy documents
Research papers
Blog content
User manuals

Strengths:

Easy to add new documents
Preserves full context
Natural for text-heavy content

Weaknesses:

Hard to query relationships ("Who works where?")
Duplicate information across documents
Difficult to keep facts consistent

Use: rag-implementer skill + vector-database-mcp

2. Entity-Based Knowledge Base (Knowledge Graph)

What it is: Network of entities (people, places, things) connected by relationships

Best for:

Organizational charts
Product catalogs with relationships
Social networks
Recommendation systems
Fraud detection
Supply chain tracking

Strengths:

Excellent for "how are X and Y related?" queries
Consistent facts (one source of truth)
Powerful traversal ("friends of friends")

Weaknesses:

Upfront modeling required (ontology design)
Harder to add unstructured information
Learning curve for graph queries

Use: knowledge-graph-builder skill + graph-database-mcp

3. Hybrid Knowledge Base (RAG + Graph)

What it is: Documents for unstructured knowledge + Graph for structured entities/relationships

Best for:

Enterprise knowledge management
Research with citations and relationships
Medical systems (documents + patient/drug relationships)
Legal systems (cases + precedents + entities)
E-commerce (products + specs + relationships)

Strengths:

Best of both worlds
Flexible for different knowledge types
Rich querying capabilities

Weaknesses:

Most complex to build and maintain
Requires expertise in both RAG and graphs
Higher infrastructure costs

Use: Both rag-implementer + knowledge-graph-builder skills

Decision Tree: Which KB Type?

What kind of knowledge do you have?

├─ Mostly unstructured text (docs, articles, content)?
│  └─ Document-Based KB (RAG)
│     Use: rag-implementer skill
│
├─ Mostly structured entities with relationships?
│  └─ Entity-Based KB (Graph)
│     Use: knowledge-graph-builder skill
│
└─ Mix of both?
   └─ Hybrid KB (RAG + Graph)
      Use: Both skills + This skill for integration

6-Phase Knowledge Base Implementation

Phase 1: Knowledge Audit & Architecture

Goal: Understand what knowledge exists and how to structure it

Actions:

Inventory existing knowledge sources
- Internal: databases, documents, wikis, Slack, emails
- External: public data, APIs, third-party sources
- Tribal: SME interviews, recorded conversations
Classify knowledge types
- Factual: Verifiable facts ("Product X costs $50")
- Procedural: How-to knowledge ("How to deploy")
- Conceptual: Definitions and explanations
- Relationship: Connections between entities
Choose KB architecture
- Document-based? Entity-based? Hybrid?
- Decision: Use framework above
Define knowledge schema
- For documents: metadata fields (source, date, author, category)
- For entities: ontology (entity types, relationship types, properties)

Validation:

All knowledge sources inventoried and prioritized
KB architecture chosen and justified
Schema defined and validated with users
Success metrics established

Phase 2: Knowledge Curation & Ingestion

Goal: Transform raw information into high-quality knowledge

Actions:

Extract knowledge from sources
- Automated: scraping, API ingestion, file parsing
- Manual: expert input, annotation, validation
Clean and normalize
- Remove duplicates
- Standardize formats
- Fix inconsistencies
- Enrich with metadata
Structure knowledge
- For documents: chunk intelligently (semantic boundaries)
- For entities: extract entities, relationships, properties
Add provenance
- Source URL or reference
- Last updated timestamp
- Author/contributor
- Confidence score (if applicable)

Curation Best Practices:

Single Source of Truth: One canonical answer per question
Deduplication: Merge similar knowledge entries
Conflict Resolution: When sources disagree, establish priority rules
Metadata Richness: More metadata = better filtering and search

Validation:

Knowledge extracted and structured
Quality metrics above threshold (accuracy >95%)
Provenance tracked for all entries
Sample queries return relevant results

Phase 3: Storage & Retrieval Setup

Goal: Implement technical infrastructure for knowledge access

Architecture Patterns:

For Document-Based KB:

// Vector database for semantic search
interface DocumentKB {
  store: 'Pinecone' | 'Weaviate' | 'pgvector'
  chunks: {
    content: string
    embedding: number[]
    metadata: {
      source: string
      title: string
      updated_at: string
      category: string
    }
  }[]
}

For Entity-Based KB:

// Graph database for relationship queries
interface EntityKB {
  store: 'Neo4j' | 'ArangoDB'
  nodes: {
    id: string
    type: 'Person' | 'Organization' | 'Product' | 'Concept'
    properties: Record<string, any>
  }[]
  relationships: {
    from: string
    to: string
    type: string
    properties: Record<string, any>
  }[]
}

For Hybrid KB:

// Both vector DB + graph DB
interface HybridKB {
  vectorDB: DocumentKB
  graphDB: EntityKB
  linker: {
    // Links documents to entities mentioned in them
    linkDocumentToEntities(docId: string): string[]
    // Links entities to documents that mention them
    linkEntityToDocuments(entityId: string): string[]
  }
}

Actions:

Choose database(s)
- Document: Pinecone, Weaviate, pgvector
- Entity: Neo4j, ArangoDB
- Hybrid: Both + linking layer
Implement search/query layer
- Vector similarity search (for documents)
- Graph traversal (for entities)
- Hybrid queries (combining both)
Add caching and optimization
- Cache frequent queries
- Optimize for common access patterns

Validation:

Database deployed and accessible
Search/query functionality working
Performance meets requirements (<100ms for most queries)

Phase 4: Quality Control & Validation

Goal: Ensure knowledge base accuracy and reliability

Quality Metrics:

Accuracy: % of correct answers to test questions
Coverage: % of user questions answerable
Freshness: Average age of knowledge
Consistency: % of conflicts/contradictions
Source Quality: % from authoritative sources

Validation Strategies:

1. Test Question Sets Create 100+ test questions with known correct answers:

interface TestQuestion {
  question: string
  expected_answer: string
  category: string
  difficulty: 'easy' | 'medium' | 'hard'
}

2. Human Review

Sample random knowledge entries
Subject matter expert validation
User feedback loops

3. Automated Checks

Duplicate Detection: Find near-identical entries
Conflict Detection: Find contradictory facts
Staleness Detection: Flag outdated information
Citation Validation: Verify sources still exist

4. Continuous Monitoring

interface KBHealthMetrics {
  accuracy_score: number // 0-100
  coverage_score: number // % questions answered
  freshness_score: number // avg days since update
  consistency_score: number // % no conflicts
  user_satisfaction: number // feedback rating
}

Actions:

Run test question validation (target: >90% accuracy)
Conduct human review (sample 10% of entries)
Fix detected issues (duplicates, conflicts, staleness)
Establish monitoring dashboards

Validation:

Accuracy >90% on test questions
Coverage >80% of user questions
<5% conflicting information
Monitoring dashboard operational

Phase 5: Versioning & Evolution

Goal: Track knowledge changes over time and enable rollback

Why Versioning Matters:

Knowledge changes (facts update, policies change)
Need audit trail (who changed what when)
Rollback capability (undo bad updates)
Historical queries ("What was policy on X in 2023?")

Versioning Strategies:

1. Snapshot Versioning

interface KnowledgeEntry {
  id: string
  content: string
  version: number
  created_at: string
  updated_at: string
  updated_by: string
  changelog: string
  previous_version?: string // ID of prior version
}

2. Event Sourcing

interface KnowledgeEvent {
  event_id: string
  entity_id: string
  event_type: 'created' | 'updated' | 'deleted'
  timestamp: string
  changes: {
    field: string
    old_value: any
    new_value: any
  }[]
  author: string
}

3. Git-Style Versioning

Treat knowledge like code
Commit-based changes
Branch for experimental knowledge
Merge when validated

Actions:

Implement version tracking
Add changelog for all updates
Create rollback mechanism
Build version comparison tools

Validation:

All changes tracked with versions
Rollback tested and working
Historical queries supported
Audit trail complete

Phase 6: Maintenance & Governance

Goal: Keep knowledge base healthy long-term

Maintenance Tasks:

Daily:

Monitor for errors and failures
Review user feedback
Address urgent corrections

Weekly:

Review new content submissions
Update time-sensitive knowledge
Run automated quality checks

Monthly:

Audit knowledge freshness
Review and resolve conflicts
Analyze usage patterns
Update stale content

Quarterly:

Comprehensive quality audit
Schema/ontology review
Performance optimization
User satisfaction survey

Governance Framework:

1. Roles & Responsibilities

Knowledge Owners: Domain experts responsible for content
Curators: Review and approve changes
Contributors: Submit new knowledge
Consumers: Use knowledge and provide feedback

2. Change Process

Submit → Review → Approve → Publish → Monitor

3. Quality Standards

Minimum source quality requirements
Citation requirements
Update frequency requirements
Conflict resolution process

Actions:

Establish maintenance schedule
Assign roles and responsibilities
Create governance documentation
Train team on processes

Validation:

Maintenance schedule in place
Governance documented and communicated
Team trained on processes
Quality trending upward

Knowledge Base Anti-Patterns

❌ Anti-Pattern 1: Data Dump Without Curation

Problem: Ingesting everything without quality filtering

Impact: Low signal-to-noise ratio, poor search results, user frustration

Solution: Curate before ingesting. Quality > Quantity

❌ Anti-Pattern 2: No Version Control

Problem: Knowledge changes but no history tracked

Impact: Can't audit changes, can't rollback errors, no accountability

Solution: Implement versioning from Phase 5

❌ Anti-Pattern 3: Stale Knowledge

Problem: Knowledge base outdated but no one knows

Impact: AI systems hallucinate using old facts, users get wrong answers

Solution: Freshness monitoring + scheduled updates

❌ Anti-Pattern 4: Duplicate Information

Problem: Same fact in multiple places, becomes inconsistent

Impact: Conflicting answers, confused users

Solution: Deduplication + single source of truth

❌ Anti-Pattern 5: No Provenance

Problem: Knowledge without source citations

Impact: Can't verify accuracy, can't trace errors

Solution: Always track source + timestamp + author

Integration with Other Skills

With rag-implementer

Use for document-based portion of hybrid KB
Follow RAG implementation phases
Integrate vector search with KB queries

With knowledge-graph-builder

Use for entity-based portion of hybrid KB
Follow graph design patterns
Integrate graph traversal with KB queries

With data-engineer

For ETL pipelines (extract, transform, load knowledge)
For data quality monitoring
For performance optimization

With quality-auditor

For automated quality checks
For testing and validation
For continuous monitoring

With technical-writer

For knowledge documentation
For user guides on KB usage
For governance documentation

Tools & Technologies

Document-Based KB Stack

Vector DB: Pinecone, Weaviate, pgvector
Embeddings: OpenAI, Cohere, custom
Search: Semantic + keyword hybrid

Entity-Based KB Stack

Graph DB: Neo4j, ArangoDB
Query: Cypher, AQL
Visualization: Neo4j Bloom, Gephi

Curation Tools

Deduplication: Custom algorithms, fuzzy matching
Conflict Detection: Rule-based, ML-based
Validation: Test question sets, human review

Monitoring

Metrics: Custom dashboard (Grafana)
Logging: Structured logging of queries/updates
Alerts: Freshness, accuracy, error rate alerts

Success Metrics

Knowledge Quality

Accuracy: >90% on test questions
Coverage: >80% of user questions answered
Freshness: <30 days average age
Consistency: <5% conflicting information

User Satisfaction

Relevance: >85% query results rated relevant
Usefulness: >80% users find KB valuable
Speed: <100ms median query time

Operational Health

Uptime: >99.9%
Update frequency: Weekly minimum
Team engagement: Regular contributions

Common Pitfalls & Solutions

Pitfall 1: "Build it and they will come"

Problem: No user validation, KB doesn't meet needs

Solution: Start with user research, validate continuously

Pitfall 2: Perfectionism

Problem: Waiting to launch until KB is "perfect"

Solution: Launch with 80% coverage, iterate based on usage

Pitfall 3: Over-engineering

Problem: Building complex hybrid system when simple docs would work

Solution: Start simple, add complexity only when needed

Pitfall 4: Maintenance neglect

Problem: Build once, never update

Solution: Establish maintenance schedule from day 1

Quick Start Checklist

Before you start:

Read this entire skill
Review rag-implementer if using document KB
Review knowledge-graph-builder if using entity KB
Have clear use case and success metrics

Phase 1 - Architecture (Week 1):

Inventory knowledge sources
Choose KB type (document/entity/hybrid)
Define schema/ontology
Set up infrastructure

Phase 2 - Initial Build (Week 2-3):

Ingest and curate initial knowledge
Implement search/query functionality
Create test question set
Validate with users

Phase 3 - Iterate (Ongoing):

Add more knowledge based on usage
Monitor quality metrics
Fix issues as discovered
Establish maintenance cadence

Related Resources

Skills: rag-implementer, knowledge-graph-builder, data-engineer, quality-auditor
MCPs: vector-database-mcp, graph-database-mcp, knowledge-base-mcp, semantic-search-mcp
Patterns: STANDARDS/architecture-patterns/rag-pattern.md, knowledge-base-pattern.md (coming soon)
Integrations: INTEGRATIONS/pinecone/, INTEGRATIONS/graph-databases/neo4j/

knowledge-base-manager

About

SKILL.md

knowledge-base-manager

About

SKILL.md

Knowledge Base Manager

Core Principle

When to Use Knowledge Bases

Use Knowledge Bases When:

Don't Use Knowledge Bases When:

Knowledge Base Types: Decision Framework

1. Document-Based Knowledge Base (RAG)

2. Entity-Based Knowledge Base (Knowledge Graph)

3. Hybrid Knowledge Base (RAG + Graph)

Decision Tree: Which KB Type?

6-Phase Knowledge Base Implementation

Phase 1: Knowledge Audit & Architecture

Phase 2: Knowledge Curation & Ingestion

Phase 3: Storage & Retrieval Setup

Phase 4: Quality Control & Validation

Phase 5: Versioning & Evolution

Phase 6: Maintenance & Governance

Knowledge Base Anti-Patterns

❌ Anti-Pattern 1: Data Dump Without Curation

❌ Anti-Pattern 2: No Version Control

❌ Anti-Pattern 3: Stale Knowledge

❌ Anti-Pattern 4: Duplicate Information

❌ Anti-Pattern 5: No Provenance

Integration with Other Skills

With rag-implementer

With knowledge-graph-builder

With data-engineer

With quality-auditor

With technical-writer

Tools & Technologies

Document-Based KB Stack

Entity-Based KB Stack

Curation Tools

Monitoring

Success Metrics

Knowledge Quality

User Satisfaction

Operational Health

Common Pitfalls & Solutions

Pitfall 1: "Build it and they will come"

Pitfall 2: Perfectionism

Pitfall 3: Over-engineering

Pitfall 4: Maintenance neglect

Quick Start Checklist

Related Resources

Further Reading

About

SKILL.md

About

SKILL.md

Knowledge Base Manager

Core Principle

When to Use Knowledge Bases

Use Knowledge Bases When:

Don't Use Knowledge Bases When:

Knowledge Base Types: Decision Framework

1. Document-Based Knowledge Base (RAG)

2. Entity-Based Knowledge Base (Knowledge Graph)

3. Hybrid Knowledge Base (RAG + Graph)

Decision Tree: Which KB Type?

6-Phase Knowledge Base Implementation

Phase 1: Knowledge Audit & Architecture

Phase 2: Knowledge Curation & Ingestion

Phase 3: Storage & Retrieval Setup

Phase 4: Quality Control & Validation

Phase 5: Versioning & Evolution

Phase 6: Maintenance & Governance

Knowledge Base Anti-Patterns

❌ Anti-Pattern 1: Data Dump Without Curation

❌ Anti-Pattern 2: No Version Control

❌ Anti-Pattern 3: Stale Knowledge

❌ Anti-Pattern 4: Duplicate Information

❌ Anti-Pattern 5: No Provenance

Integration with Other Skills