Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    melodic-software

    vector-databases

    melodic-software/vector-databases
    Data & Analytics
    19

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization...

    SKILL.md

    Vector Databases

    When to Use This Skill

    Use this skill when:

    • Choosing between vector database options
    • Designing semantic/similarity search systems
    • Optimizing vector search performance
    • Understanding ANN algorithm trade-offs
    • Scaling vector search infrastructure
    • Implementing hybrid search (vectors + filters)

    Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search

    Vector Database Comparison

    Managed Services

    Database Strengths Limitations Best For
    Pinecone Fully managed, easy scaling, enterprise Vendor lock-in, cost at scale Enterprise production
    Weaviate Cloud GraphQL, hybrid search, modules Complexity Knowledge graphs
    Zilliz Cloud Milvus-based, high performance Learning curve High-scale production
    MongoDB Atlas Vector Existing MongoDB users Newer feature MongoDB shops
    Elastic Vector Existing Elastic stack Resource heavy Search platforms

    Self-Hosted Options

    Database Strengths Limitations Best For
    Milvus Feature-rich, scalable, GPU support Operational complexity Large-scale production
    Qdrant Rust performance, filtering, easy Smaller ecosystem Performance-focused
    Weaviate Modules, semantic, hybrid Memory usage Knowledge applications
    Chroma Simple, Python-native Limited scale Development, prototyping
    pgvector PostgreSQL extension Performance limits Postgres shops
    FAISS Library, not DB, fastest No persistence, no filtering Research, embedded

    Selection Decision Tree

    Need managed, don't want operations?
    ├── Yes → Pinecone (simplest) or Weaviate Cloud
    └── No (self-hosted)
        └── Already using PostgreSQL?
            ├── Yes, <1M vectors → pgvector
            └── No
                └── Need maximum performance at scale?
                    ├── Yes → Milvus or Qdrant
                    └── No
                        └── Prototyping/development?
                            ├── Yes → Chroma
                            └── No → Qdrant (balanced choice)
    

    ANN Algorithms

    Algorithm Overview

    Exact KNN:
    • Search ALL vectors
    • O(n) time complexity
    • Perfect accuracy
    • Impractical at scale
    
    Approximate NN (ANN):
    • Search SUBSET of vectors
    • O(log n) to O(1) complexity
    • Near-perfect accuracy
    • Practical at any scale
    

    HNSW (Hierarchical Navigable Small World)

    Layer 3: ○───────────────────────○  (sparse, long connections)
              │                       │
    Layer 2: ○───○───────○───────○───○  (medium density)
              │   │       │       │   │
    Layer 1: ○─○─○─○─○─○─○─○─○─○─○─○─○  (denser)
              │││││││││││││││││││││││
    Layer 0: ○○○○○○○○○○○○○○○○○○○○○○○○○  (all vectors)
    
    Search: Start at top layer, greedily descend
    • Fast: O(log n) search time
    • High recall: >95% typically
    • Memory: Extra graph storage
    

    HNSW Parameters:

    Parameter Description Trade-off
    M Connections per node Memory vs. recall
    ef_construction Build-time search width Build time vs. recall
    ef_search Query-time search width Latency vs. recall

    IVF (Inverted File Index)

    Clustering Phase:
    ┌─────────────────────────────────────────┐
    │     Cluster vectors into K centroids    │
    │                                         │
    │    ●         ●         ●         ●     │
    │   /│\       /│\       /│\       /│\    │
    │  ○○○○○     ○○○○○     ○○○○○     ○○○○○   │
    │ Cluster 1  Cluster 2 Cluster 3 Cluster 4│
    └─────────────────────────────────────────┘
    
    Search Phase:
    1. Find nprobe nearest centroids
    2. Search only those clusters
    3. Much faster than exhaustive
    

    IVF Parameters:

    Parameter Description Trade-off
    nlist Number of clusters Build time vs. search quality
    nprobe Clusters to search Latency vs. recall

    IVF-PQ (Product Quantization)

    Original Vector (128 dim):
    [0.1, 0.2, ..., 0.9]  (128 × 4 bytes = 512 bytes)
    
    PQ Compressed (8 subvectors, 8-bit codes):
    [23, 45, 12, 89, 56, 34, 78, 90]  (8 bytes)
    
    Memory reduction: 64x
    Accuracy trade-off: ~5% recall drop
    

    Algorithm Comparison

    Algorithm Search Speed Memory Build Time Recall
    Flat/Brute Slow (O(n)) Low None 100%
    IVF Fast Low Medium 90-95%
    IVF-PQ Very fast Very low Medium 85-92%
    HNSW Very fast High Slow 95-99%
    HNSW+PQ Very fast Medium Slow 90-95%

    When to Use Which

    < 100K vectors:
    └── Flat index (exact search is fast enough)
    
    100K - 1M vectors:
    └── HNSW (best recall/speed trade-off)
    
    1M - 100M vectors:
    ├── Memory available → HNSW
    └── Memory constrained → IVF-PQ or HNSW+PQ
    
    > 100M vectors:
    └── Sharded IVF-PQ or distributed HNSW
    

    Distance Metrics

    Common Metrics

    Metric Formula Range Best For
    Cosine Similarity A·B / (||A|| ||B||) [-1, 1] Normalized embeddings
    Dot Product A·B (-∞, ∞) When magnitude matters
    Euclidean (L2) √Σ(A-B)² [0, ∞) Absolute distances
    Manhattan (L1) Σ|A-B| [0, ∞) High-dimensional sparse

    Metric Selection

    Embeddings pre-normalized (unit vectors)?
    ├── Yes → Cosine = Dot Product (use Dot, faster)
    └── No
        └── Magnitude meaningful?
            ├── Yes → Dot Product
            └── No → Cosine Similarity
    
    Note: Most embedding models output normalized vectors
          → Dot product is usually the best choice
    

    Filtering and Hybrid Search

    Pre-filtering vs Post-filtering

    Pre-filtering (Filter → Search):
    ┌─────────────────────────────────────────┐
    │ 1. Apply metadata filter               │
    │    (category = "electronics")           │
    │    Result: 10K of 1M vectors           │
    │                                         │
    │ 2. Vector search on 10K vectors        │
    │    Much faster, guaranteed filter match │
    └─────────────────────────────────────────┘
    
    Post-filtering (Search → Filter):
    ┌─────────────────────────────────────────┐
    │ 1. Vector search on 1M vectors         │
    │    Return top-1000                      │
    │                                         │
    │ 2. Apply metadata filter               │
    │    May return < K results!             │
    └─────────────────────────────────────────┘
    

    Hybrid Search Architecture

    Query: "wireless headphones under $100"
               │
         ┌─────┴─────┐
         ▼           ▼
     ┌───────┐  ┌───────┐
     │Vector │  │Filter │
     │Search │  │ Build │
     │"wire- │  │price  │
     │less   │  │< 100  │
     │head-  │  │       │
     │phones"│  │       │
     └───────┘  └───────┘
         │           │
         └─────┬─────┘
               ▼
        ┌───────────┐
        │  Combine  │
        │  Results  │
        └───────────┘
    

    Metadata Index Design

    Metadata Type Index Strategy Query Example
    Categorical Bitmap/hash index category = "books"
    Numeric range B-tree price BETWEEN 10 AND 50
    Keyword search Inverted index tags CONTAINS "sale"
    Geospatial R-tree/geohash location NEAR (lat, lng)

    Scaling Strategies

    Sharding Approaches

    Naive Sharding (by ID):
    ┌─────────┐ ┌─────────┐ ┌─────────┐
    │ Shard 1 │ │ Shard 2 │ │ Shard 3 │
    │ IDs 0-N │ │IDs N-2N │ │IDs 2N-3N│
    └─────────┘ └─────────┘ └─────────┘
    Query → Search ALL shards → Merge results
    
    Semantic Sharding (by cluster):
    ┌─────────┐ ┌─────────┐ ┌─────────┐
    │ Shard 1 │ │ Shard 2 │ │ Shard 3 │
    │ Tech    │ │ Health  │ │ Finance │
    │ docs    │ │ docs    │ │ docs    │
    └─────────┘ └─────────┘ └─────────┘
    Query → Route to relevant shard(s) → Faster!
    

    Replication

    ┌─────────────────────────────────────────┐
    │              Load Balancer              │
    └─────────────────────────────────────────┘
             │           │           │
             ▼           ▼           ▼
        ┌─────────┐ ┌─────────┐ ┌─────────┐
        │Replica 1│ │Replica 2│ │Replica 3│
        │  (Read) │ │  (Read) │ │  (Read) │
        └─────────┘ └─────────┘ └─────────┘
             │           │           │
             └───────────┼───────────┘
                         │
                    ┌─────────┐
                    │ Primary │
                    │ (Write) │
                    └─────────┘
    

    Scaling Decision Matrix

    Scale (vectors) Architecture Replication
    < 1M Single node Optional
    1-10M Single node, more RAM For HA
    10-100M Sharded, few nodes Required
    100M-1B Sharded, many nodes Required
    > 1B Sharded + tiered Required

    Performance Optimization

    Index Build Optimization

    Optimization Description Impact
    Batch insertion Insert in batches of 1K-10K 10x faster
    Parallel build Multi-threaded index construction 2-4x faster
    Incremental index Add to existing index Avoids rebuild
    GPU acceleration Use GPU for training (IVF) 10-100x faster

    Query Optimization

    Optimization Description Impact
    Warm cache Keep index in memory 10x latency reduction
    Query batching Batch similar queries Higher throughput
    Reduce dimensions PCA, random projection 2-4x faster
    Early termination Stop when "good enough" Lower latency

    Memory Optimization

    Memory per vector:
    ┌────────────────────────────────────────┐
    │ 1536 dims × 4 bytes = 6KB per vector   │
    │                                        │
    │ 1M vectors:                            │
    │   Raw: 6GB                             │
    │   + HNSW graph: +2-4GB (M-dependent)   │
    │   = 8-10GB total                       │
    │                                        │
    │ With PQ (64 subquantizers):            │
    │   1M vectors: ~64MB                    │
    │   = 100x reduction                     │
    └────────────────────────────────────────┘
    

    Operational Considerations

    Backup and Recovery

    Strategy Description RPO/RTO
    Snapshots Periodic full backup Hours
    WAL replication Write-ahead log streaming Minutes
    Real-time sync Synchronous replication Seconds

    Monitoring Metrics

    Metric Description Alert Threshold
    Query latency p99 99th percentile latency > 100ms
    Recall Search accuracy < 90%
    QPS Queries per second Capacity dependent
    Memory usage Index memory > 80%
    Index freshness Time since last update Domain dependent

    Index Maintenance

    ┌─────────────────────────────────────────┐
    │        Index Maintenance Tasks          │
    ├─────────────────────────────────────────┤
    │ • Compaction: Merge small segments      │
    │ • Reindex: Rebuild degraded index       │
    │ • Vacuum: Remove deleted vectors        │
    │ • Optimize: Tune parameters             │
    │                                         │
    │ Schedule during low-traffic periods     │
    └─────────────────────────────────────────┘
    

    Common Patterns

    Multi-Tenant Vector Search

    Option 1: Namespace/Collection per tenant
    ┌─────────────────────────────────────────┐
    │ tenant_1_collection                     │
    │ tenant_2_collection                     │
    │ tenant_3_collection                     │
    └─────────────────────────────────────────┘
    Pro: Complete isolation
    Con: Many indexes, operational overhead
    
    Option 2: Single collection + tenant filter
    ┌─────────────────────────────────────────┐
    │ shared_collection                       │
    │   metadata: { tenant_id: "..." }        │
    │   Pre-filter by tenant_id               │
    └─────────────────────────────────────────┘
    Pro: Simpler operations
    Con: Requires efficient filtering
    

    Real-Time Updates

    Write Path:
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Write     │    │   Buffer    │    │   Merge     │
    │   Request   │───▶│   (Memory)  │───▶│   to Index  │
    └─────────────┘    └─────────────┘    └─────────────┘
    
    Strategy:
    1. Buffer writes in memory
    2. Periodically merge to main index
    3. Search: main index + buffer
    4. Compact periodically
    

    Embedding Versioning

    Version 1 embeddings ──┐
                           │
    Version 2 embeddings ──┼──▶ Parallel indexes during migration
                           │
                           │    ┌─────────────────────┐
                           └───▶│ Gradual reindexing  │
                                │ Blue-green switch   │
                                └─────────────────────┘
    

    Cost Estimation

    Storage Costs

    Cost = (vectors × dimensions × bytes × replication) / GB × $/GB/month
    
    Example:
    10M vectors × 1536 dims × 4 bytes × 3 replicas = 184 GB
    At $0.10/GB/month = $18.40/month storage
    
    Note: Memory (for serving) costs more than storage
    

    Compute Costs

    Factors:
    • QPS (queries per second)
    • Latency requirements
    • Index type (HNSW needs more RAM)
    • Filtering complexity
    
    Rule of thumb:
    • 1M vectors, HNSW, <50ms latency: 16GB RAM
    • 10M vectors, HNSW, <50ms latency: 64-128GB RAM
    • 100M vectors: Distributed system required
    

    Related Skills

    • rag-architecture - Using vector databases in RAG systems
    • llm-serving-patterns - LLM inference with vector retrieval
    • ml-system-design - End-to-end ML pipeline design
    • estimation-techniques - Capacity planning for vector systems

    Version History

    • v1.0.0 (2025-12-26): Initial release - Vector database patterns for systems design

    Last Updated

    Date: 2025-12-26

    Recommended Servers
    Cloudflare AI Search
    Cloudflare AI Search
    ThinAir Data
    ThinAir Data
    Jina AI
    Jina AI
    Repository
    melodic-software/claude-code-plugins
    Files