mongodb

akornmeier/mongodb

Data & Analytics

2 installs

About

SKILL.md

mongodb

akornmeier/mongodb

Data & Analytics

2 installs

About

Guide for implementing MongoDB - a document database platform with CRUD operations, aggregation pipelines, indexing, replication, sharding, search capabilities, and comprehensive security...

SKILL.md

MongoDB Agent Skill

A comprehensive guide for working with MongoDB - a document-oriented database platform that provides powerful querying, horizontal scaling, high availability, and enterprise-grade security.

When to Use This Skill

Use this skill when you need to:

Design MongoDB schemas and data models
Write CRUD operations and complex queries
Build aggregation pipelines for data transformation
Optimize query performance with indexes
Configure replication for high availability
Set up sharding for horizontal scaling
Implement security (authentication, authorization, encryption)
Deploy MongoDB (Atlas, self-managed, Kubernetes)
Integrate MongoDB with applications (15+ official drivers)
Troubleshoot performance issues or errors
Implement Atlas Search or Vector Search
Work with time series data or change streams

Documentation Coverage

This skill synthesizes 24,618 documentation links across 172 major MongoDB sections, covering:

MongoDB versions 5.0 through 8.1 (upcoming)
15+ official driver languages
50+ integration tools (Kafka, Spark, BI Connector, Kubernetes Operator)
Complete deployment spectrum (Atlas cloud, self-managed, Kubernetes)

I. CORE DATABASE OPERATIONS

A. CRUD Operations

Read Operations

// Find documents
db.collection.find({ status: 'active' });
db.collection.findOne({ _id: ObjectId('...') });

// Query operators
db.users.find({ age: { $gte: 18, $lt: 65 } });
db.posts.find({ tags: { $in: ['mongodb', 'database'] } });
db.products.find({ price: { $exists: true } });

// Projection (select specific fields)
db.users.find({ status: 'active' }, { name: 1, email: 1 });

// Cursor operations
db.collection.find().sort({ createdAt: -1 }).limit(10).skip(20);

Write Operations

// Insert
db.collection.insertOne({ name: 'Alice', age: 30 });
db.collection.insertMany([{ name: 'Bob' }, { name: 'Charlie' }]);

// Update
db.users.updateOne({ _id: userId }, { $set: { status: 'verified' } });
db.users.updateMany({ lastLogin: { $lt: cutoffDate } }, { $set: { status: 'inactive' } });

// Replace entire document
db.users.replaceOne({ _id: userId }, newUserDoc);

// Delete
db.users.deleteOne({ _id: userId });
db.users.deleteMany({ status: 'deleted' });

// Upsert (update or insert if not exists)
db.users.updateOne(
  { email: 'user@example.com' },
  { $set: { name: 'User', lastSeen: new Date() } },
  { upsert: true }
);

Atomic Operations

// Increment counter
db.posts.updateOne({ _id: postId }, { $inc: { views: 1 } });

// Add to array (if not exists)
db.users.updateOne({ _id: userId }, { $addToSet: { interests: 'mongodb' } });

// Push to array
db.posts.updateOne({ _id: postId }, { $push: { comments: { author: 'Alice', text: 'Great!' } } });

// Find and modify atomically
db.counters.findAndModify({
  query: { _id: 'sequence' },
  update: { $inc: { value: 1 } },
  new: true,
  upsert: true,
});

B. Query Operators (100+)

Comparison Operators

($eq, $ne, $gt, $gte, $lt, $lte);
($in, $nin);

Logical Operators

($and, $or, $not, $nor);

// Example
db.products.find({
  $and: [{ price: { $gte: 100 } }, { stock: { $gt: 0 } }],
});

Array Operators

($all, $elemMatch, $size);
($firstN, $lastN, $maxN, $minN);

// Example: Find docs with all tags
db.posts.find({ tags: { $all: ['mongodb', 'database'] } });

// Match array element with multiple conditions
db.products.find({
  reviews: {
    $elemMatch: { rating: { $gte: 4 }, verified: true },
  },
});

Existence & Type

($exists, $type);

// Find documents with optional field
db.users.find({ phoneNumber: { $exists: true } });

// Type checking
db.data.find({ value: { $type: 'string' } });

C. Aggregation Pipeline

MongoDB's most powerful feature for data transformation and analysis.

Core Pipeline Stages (40+)

db.orders.aggregate([
  // Stage 1: Filter documents
  { $match: { status: 'completed', total: { $gte: 100 } } },

  // Stage 2: Join with customers
  {
    $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer',
    },
  },

  // Stage 3: Unwind array
  { $unwind: '$items' },

  // Stage 4: Group and aggregate
  {
    $group: {
      _id: '$items.category',
      totalRevenue: { $sum: '$items.total' },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: '$total' },
    },
  },

  // Stage 5: Sort results
  { $sort: { totalRevenue: -1 } },

  // Stage 6: Limit results
  { $limit: 10 },

  // Stage 7: Reshape output
  {
    $project: {
      category: '$_id',
      revenue: '$totalRevenue',
      orders: '$orderCount',
      avgValue: { $round: ['$avgOrderValue', 2] },
      _id: 0,
    },
  },
]);

Common Pipeline Patterns

Time-Based Aggregation:

db.events.aggregate([
  { $match: { timestamp: { $gte: startDate, $lt: endDate } } },
  {
    $group: {
      _id: {
        year: { $year: '$timestamp' },
        month: { $month: '$timestamp' },
        day: { $dayOfMonth: '$timestamp' },
      },
      count: { $sum: 1 },
    },
  },
]);

Faceted Search (Multiple Aggregations):

db.products.aggregate([
  { $match: { category: 'electronics' } },
  {
    $facet: {
      priceRanges: [
        {
          $bucket: {
            groupBy: '$price',
            boundaries: [0, 100, 500, 1000, 5000],
            default: '5000+',
            output: { count: { $sum: 1 } },
          },
        },
      ],
      topBrands: [
        { $group: { _id: '$brand', count: { $sum: 1 } } },
        { $sort: { count: -1 } },
        { $limit: 5 },
      ],
      avgPrice: [{ $group: { _id: null, avg: { $avg: '$price' } } }],
    },
  },
]);

Window Functions:

db.sales.aggregate([
  {
    $setWindowFields: {
      partitionBy: '$region',
      sortBy: { date: 1 },
      output: {
        runningTotal: { $sum: '$amount', window: { documents: ['unbounded', 'current'] } },
        movingAvg: { $avg: '$amount', window: { documents: [-7, 0] } },
      },
    },
  },
]);

Aggregation Operators (150+)

Math Operators:

($add, $subtract, $multiply, $divide, $mod);
($abs, $ceil, $floor, $round, $sqrt, $pow);
($log, $log10, $ln, $exp);

String Operators:

($concat, $substr, $toLower, $toUpper);
($trim, $ltrim, $rtrim, $split);
($regexMatch, $regexFind, $regexFindAll);

Array Operators:

($arrayElemAt, $slice, $first, $last, $reverse);
($sortArray, $filter, $map, $reduce);
($zip, $concatArrays);

Date/Time Operators:

($dateAdd, $dateDiff, $dateFromString, $dateToString);
($dayOfMonth, $month, $year, $dayOfWeek);
($week, $hour, $minute, $second);

Type Conversion:

($toInt, $toString, $toDate, $toDouble);
($toDecimal, $toObjectId, $toBool);

II. INDEXING & PERFORMANCE

A. Index Types

Single Field Index

db.users.createIndex({ email: 1 }); // ascending
db.posts.createIndex({ createdAt: -1 }); // descending

Compound Index

// Order matters! Index on { status: 1, createdAt: -1 }
db.orders.createIndex({ status: 1, createdAt: -1 });

// Supports queries on:
// - { status: "..." }
// - { status: "...", createdAt: ... }
// Does NOT efficiently support: { createdAt: ... } alone

Text Index (Full-Text Search)

db.articles.createIndex({ title: 'text', body: 'text' });

// Search
db.articles.find({ $text: { $search: 'mongodb database' } });

// With relevance score
db.articles
  .find({ $text: { $search: 'mongodb' } }, { score: { $meta: 'textScore' } })
  .sort({ score: { $meta: 'textScore' } });

Geospatial Indexes

// 2dsphere for earth-like geometry
db.places.createIndex({ location: '2dsphere' });

// Find nearby
db.places.find({
  location: {
    $near: {
      $geometry: { type: 'Point', coordinates: [lon, lat] },
      $maxDistance: 5000, // meters
    },
  },
});

Wildcard Index

// Index all fields in subdocuments
db.products.createIndex({ 'attributes.$**': 1 });

// Supports queries on any field under attributes
db.products.find({ 'attributes.color': 'red' });

Partial Index

// Index only documents matching filter
db.orders.createIndex({ customerId: 1 }, { partialFilterExpression: { status: 'active' } });

TTL Index (Auto-delete)

// Delete documents 24 hours after createdAt
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });

Hashed Index (for sharding)

db.users.createIndex({ userId: 'hashed' });

B. Query Optimization

Explain Query Plans

// Basic explain
db.users.find({ email: 'user@example.com' }).explain();

// Execution stats (shows actual performance)
db.users.find({ age: { $gte: 18 } }).explain('executionStats');

// Key metrics to check:
// - executionTimeMillis
// - totalDocsExamined vs. nReturned (should be close)
// - stage: "IXSCAN" (using index) vs. "COLLSCAN" (full scan - BAD)

Covered Queries

// Create index
db.users.createIndex({ email: 1, name: 1 });

// Query covered by index (no document fetch needed)
db.users.find(
  { email: 'user@example.com' },
  { email: 1, name: 1, _id: 0 } // project only indexed fields
);

Index Hints

// Force specific index
db.users.find({ status: 'active', city: 'NYC' }).hint({ status: 1, createdAt: -1 });

Index Management

// List all indexes
db.collection.getIndexes();

// Drop index
db.collection.dropIndex('indexName');

// Hide index (test before dropping)
db.collection.hideIndex('indexName');
db.collection.unhideIndex('indexName');

// Index stats
db.collection.aggregate([{ $indexStats: {} }]);

III. DATA MODELING PATTERNS

A. Relationship Patterns

One-to-One (Embedded)

// User with single address
{
  _id: ObjectId("..."),
  name: "Alice",
  email: "alice@example.com",
  address: {
    street: "123 Main St",
    city: "NYC",
    zipcode: "10001"
  }
}

One-to-Few (Embedded Array)

// Blog post with comments (< 100 comments)
{
  _id: ObjectId("..."),
  title: "MongoDB Guide",
  comments: [
    { author: "Bob", text: "Great post!", date: ISODate("...") },
    { author: "Charlie", text: "Thanks!", date: ISODate("...") }
  ]
}

One-to-Many (Referenced)

// Author collection
{ _id: ObjectId("author1"), name: "Alice" }

// Books collection (many books per author)
{ _id: ObjectId("book1"), title: "Book 1", authorId: ObjectId("author1") }
{ _id: ObjectId("book2"), title: "Book 2", authorId: ObjectId("author1") }

Many-to-Many (Array of References)

// Users collection
{
  _id: ObjectId("user1"),
  name: "Alice",
  groupIds: [ObjectId("group1"), ObjectId("group2")]
}

// Groups collection
{
  _id: ObjectId("group1"),
  name: "MongoDB Users",
  memberIds: [ObjectId("user1"), ObjectId("user2")]
}

B. Advanced Patterns

Time Series Pattern

// High-frequency sensor data
{
  _id: ObjectId("..."),
  sensorId: "sensor-123",
  timestamp: ISODate("2025-01-01T00:00:00Z"),
  readings: [
    { time: 0, temp: 23.5, humidity: 45 },
    { time: 60, temp: 23.6, humidity: 46 },
    { time: 120, temp: 23.4, humidity: 45 }
  ]
}

// Create time series collection
db.createCollection("sensor_data", {
  timeseries: {
    timeField: "timestamp",
    metaField: "sensorId",
    granularity: "minutes"
  }
})

Computed Pattern (Cache Results)

// User document with pre-computed stats
{
  _id: ObjectId("..."),
  username: "alice",
  stats: {
    postCount: 150,
    followerCount: 2500,
    lastUpdated: ISODate("...")
  }
}

// Update stats periodically or with triggers

Schema Versioning

// Support schema evolution
{
  _id: ObjectId("..."),
  schemaVersion: 2,
  // v2 fields
  name: { first: "Alice", last: "Smith" },
  // Migration code handles v1 format
}

C. Schema Validation

db.createCollection('users', {
  validator: {
    $jsonSchema: {
      bsonType: 'object',
      required: ['email', 'name'],
      properties: {
        email: {
          bsonType: 'string',
          pattern: '^.+@.+$',
          description: 'must be a valid email',
        },
        age: {
          bsonType: 'int',
          minimum: 0,
          maximum: 120,
        },
        status: {
          enum: ['active', 'inactive', 'pending'],
        },
      },
    },
  },
  validationLevel: 'strict', // or "moderate"
  validationAction: 'error', // or "warn"
});

IV. REPLICATION & HIGH AVAILABILITY

A. Replica Sets

Architecture:

Primary: Accepts writes, replicates to secondaries
Secondaries: Replicate primary's oplog, can serve reads
Arbiter: Votes in elections, holds no data

Configuration:

rs.initiate({
  _id: 'myReplicaSet',
  members: [
    { _id: 0, host: 'mongo1:27017' },
    { _id: 1, host: 'mongo2:27017' },
    { _id: 2, host: 'mongo3:27017' },
  ],
});

// Check status
rs.status();

// Add member
rs.add('mongo4:27017');

// Remove member
rs.remove('mongo4:27017');

B. Write Concern

Controls acknowledgment of write operations:

// Wait for majority acknowledgment (durable)
db.users.insertOne({ name: 'Alice' }, { writeConcern: { w: 'majority', wtimeout: 5000 } });

// Common levels:
// w: 1 - primary acknowledges (default)
// w: "majority" - majority of nodes acknowledge (recommended for production)
// w: <number> - specific number of nodes
// w: 0 - no acknowledgment (fire and forget)

C. Read Preference

Controls where reads are served from:

// Options:
// - primary (default): read from primary only
// - primaryPreferred: primary if available, else secondary
// - secondary: read from secondary only
// - secondaryPreferred: secondary if available, else primary
// - nearest: lowest network latency

db.collection.find().readPref('secondaryPreferred');

D. Transactions

Multi-document ACID transactions:

const session = client.startSession();
session.startTransaction();

try {
  await accounts.updateOne({ _id: fromAccount }, { $inc: { balance: -amount } }, { session });

  await accounts.updateOne({ _id: toAccount }, { $inc: { balance: amount } }, { session });

  await session.commitTransaction();
} catch (error) {
  await session.abortTransaction();
  throw error;
} finally {
  session.endSession();
}

V. SHARDING & HORIZONTAL SCALING

A. Sharded Cluster Architecture

Components:

Shards: Replica sets holding data subsets
Config Servers: Store cluster metadata
Mongos: Query routers directing operations to shards

B. Shard Key Selection

CRITICAL: Shard key determines data distribution and query performance.

Good Shard Keys:

High cardinality (many unique values)
Even distribution (no hotspots)
Query-aligned (queries include shard key)

// Enable sharding on database
sh.enableSharding('myDatabase');

// Shard collection with hashed key
sh.shardCollection('myDatabase.users', { userId: 'hashed' });

// Shard with compound key
sh.shardCollection('myDatabase.orders', { customerId: 1, orderDate: 1 });

C. Zone Sharding

Assign data ranges to specific shards:

// Add shard tags
sh.addShardTag('shard0', 'US-EAST');
sh.addShardTag('shard1', 'US-WEST');

// Assign ranges to zones
sh.addTagRange('myDatabase.users', { zipcode: '00000' }, { zipcode: '50000' }, 'US-EAST');

D. Query Routing

// Targeted query (includes shard key) - fast
db.users.find({ userId: '12345' });

// Scatter-gather (no shard key) - slow
db.users.find({ email: 'user@example.com' });

VI. SECURITY

A. Authentication

Methods:

SCRAM (Username/Password) - Default
X.509 Certificates - Mutual TLS
LDAP (Enterprise)
Kerberos (Enterprise)
AWS IAM
OIDC (OpenID Connect)

// Create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "strongPassword",
  roles: ["root"]
})

// Create database user
use myDatabase
db.createUser({
  user: "appUser",
  pwd: "password",
  roles: [
    { role: "readWrite", db: "myDatabase" }
  ]
})

B. Role-Based Access Control (RBAC)

Built-in Roles:

read, readWrite: Collection-level
dbAdmin, dbOwner: Database administration
userAdmin: User management
clusterAdmin: Cluster management
root: Superuser

Custom Roles:

db.createRole({
  role: 'customRole',
  privileges: [
    {
      resource: { db: 'myDatabase', collection: 'users' },
      actions: ['find', 'update'],
    },
  ],
  roles: [],
});

C. Encryption

Encryption at Rest

// Configure in mongod.conf
security:
  enableEncryption: true
  encryptionKeyFile: /path/to/keyfile

Encryption in Transit (TLS/SSL)

// mongod.conf
net:
  tls:
    mode: requireTLS
    certificateKeyFile: /path/to/cert.pem
    CAFile: /path/to/ca.pem

Client-Side Field Level Encryption (CSFLE)

// Automatic encryption of sensitive fields
const clientEncryption = new ClientEncryption(client, {
  keyVaultNamespace: "encryption.__keyVault",
  kmsProviders: {
    aws: {
      accessKeyId: "...",
      secretAccessKey: "..."
    }
  }
})

// Create data key
const dataKeyId = await clientEncryption.createDataKey("aws", {
  masterKey: { region: "us-east-1", key: "..." }
})

// Configure auto-encryption
const encryptedClient = new MongoClient(uri, {
  autoEncryption: {
    keyVaultNamespace: "encryption.__keyVault",
    kmsProviders: { aws: {...} },
    schemaMap: {
      "myDatabase.users": {
        bsonType: "object",
        properties: {
          ssn: {
            encrypt: {
              keyId: [dataKeyId],
              algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"
            }
          }
        }
      }
    }
  }
})

VII. DEPLOYMENT OPTIONS

A. MongoDB Atlas (Cloud)

Recommended for most use cases.

Quick Start:

Create free M0 cluster at mongodb.com/atlas
Whitelist IP address
Create database user
Get connection string

Features:

Auto-scaling
Automated backups
Multi-cloud (AWS, Azure, GCP)
Multi-region deployments
Atlas Search & Vector Search
Charts (embedded analytics)
Data Federation
Serverless instances

Connection:

const uri = 'mongodb+srv://user:pass@cluster.mongodb.net/database?retryWrites=true&w=majority';
const client = new MongoClient(uri);

B. Self-Managed

Installation:

# Ubuntu/Debian
wget -qO - https://www.mongodb.org/static/pgp/server-8.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-8.0.list
sudo apt-get update
sudo apt-get install -y mongodb-org

# Start
sudo systemctl start mongod
sudo systemctl enable mongod

Configuration (mongod.conf):

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true

net:
  port: 27017
  bindIp: 127.0.0.1

security:
  authorization: enabled

replication:
  replSetName: 'myReplicaSet'

C. Kubernetes Deployment

MongoDB Kubernetes Operator:

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-replica-set
spec:
  members: 3
  type: ReplicaSet
  version: '8.0'
  security:
    authentication:
      modes: ['SCRAM']
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: mongodb-admin-password
      roles:
        - name: root
          db: admin
  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes: ['ReadWriteOnce']
            resources:
              requests:
                storage: 10Gi

VIII. INTEGRATION & DRIVERS

A. Official Drivers (15+ Languages)

Node.js

const { MongoClient } = require('mongodb');

const client = new MongoClient(uri);
await client.connect();

const db = client.db('myDatabase');
const collection = db.collection('users');

// CRUD
await collection.insertOne({ name: 'Alice' });
const user = await collection.findOne({ name: 'Alice' });
await collection.updateOne({ name: 'Alice' }, { $set: { age: 30 } });
await collection.deleteOne({ name: 'Alice' });

Python (PyMongo)

from pymongo import MongoClient

client = MongoClient(uri)
db = client.myDatabase
collection = db.users

# CRUD
collection.insert_one({"name": "Alice"})
user = collection.find_one({"name": "Alice"})
collection.update_one({"name": "Alice"}, {"$set": {"age": 30}})
collection.delete_one({"name": "Alice"})

Java

MongoClient mongoClient = MongoClients.create(uri);
MongoDatabase database = mongoClient.getDatabase("myDatabase");
MongoCollection<Document> collection = database.getCollection("users");

// Insert
collection.insertOne(new Document("name", "Alice"));

// Find
Document user = collection.find(eq("name", "Alice")).first();

// Update
collection.updateOne(eq("name", "Alice"), set("age", 30));

Go

client, _ := mongo.Connect(context.TODO(), options.Client().ApplyURI(uri))
collection := client.Database("myDatabase").Collection("users")

// Insert
collection.InsertOne(context.TODO(), bson.M{"name": "Alice"})

// Find
var user bson.M
collection.FindOne(context.TODO(), bson.M{"name": "Alice"}).Decode(&user)

B. Integration Tools

Kafka Connector

{
  "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
  "connection.uri": "mongodb://localhost:27017",
  "database": "myDatabase",
  "collection": "events",
  "topics": "my-topic"
}

Spark Connector

val df = spark.read
  .format("mongodb")
  .option("uri", "mongodb://localhost:27017/myDatabase.myCollection")
  .load()

df.filter($"age" > 18).show()

BI Connector (SQL Interface)

-- Query MongoDB using SQL
SELECT name, AVG(age) as avg_age
FROM users
WHERE status = 'active'
GROUP BY name;

IX. ADVANCED FEATURES

A. Atlas Search (Full-Text)

Create Search Index:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "lucene.standard"
      },
      "description": {
        "type": "string",
        "analyzer": "lucene.english"
      }
    }
  }
}

Query:

db.articles.aggregate([
  {
    $search: {
      text: {
        query: 'mongodb database',
        path: ['title', 'description'],
        fuzzy: { maxEdits: 1 },
      },
    },
  },
  { $limit: 10 },
  { $project: { title: 1, description: 1, score: { $meta: 'searchScore' } } },
]);

B. Atlas Vector Search

For AI/ML similarity search:

db.products.aggregate([
  {
    $vectorSearch: {
      index: "vector_index",
      path: "embedding",
      queryVector: [0.123, 0.456, ...],  // 1536 dimensions for OpenAI
      numCandidates: 100,
      limit: 10
    }
  },
  {
    $project: {
      name: 1,
      description: 1,
      score: { $meta: "vectorSearchScore" }
    }
  }
])

C. Change Streams (Real-Time)

const changeStream = collection.watch([{ $match: { 'fullDocument.status': 'active' } }]);

changeStream.on('change', (change) => {
  console.log('Change detected:', change);
  // change.operationType: "insert", "update", "delete", "replace"
  // change.fullDocument: entire document (if configured)
});

// Resume from specific point
const resumeToken = changeStream.resumeToken;
const newStream = collection.watch([], { resumeAfter: resumeToken });

D. Bulk Operations

const bulkOps = [
  { insertOne: { document: { name: 'Alice', age: 30 } } },
  {
    updateOne: {
      filter: { name: 'Bob' },
      update: { $set: { age: 25 } },
      upsert: true,
    },
  },
  { deleteOne: { filter: { name: 'Charlie' } } },
];

const result = await collection.bulkWrite(bulkOps, { ordered: false });
console.log(`Inserted: ${result.insertedCount}, Updated: ${result.modifiedCount}`);

X. PERFORMANCE OPTIMIZATION

Best Practices

Index Critical Fields
- Index fields used in queries, sorts, joins
- Monitor slow queries (>100ms)
- Use compound indexes for multi-field queries

Use Projection

// Good: Only return needed fields
db.users.find({ status: 'active' }, { name: 1, email: 1 });

// Bad: Return entire document
db.users.find({ status: 'active' });

Limit Result Sets
```
db.users.find().limit(100);
```
Use Aggregation Pipeline
- Process data server-side instead of client-side
- Use $match early to filter
- Use $project to reduce document size

Connection Pooling

const client = new MongoClient(uri, {
  maxPoolSize: 50,
  minPoolSize: 10,
});

Batch Writes

// Good: Batch insert
await collection.insertMany(documents);

// Bad: Individual inserts
for (const doc of documents) {
  await collection.insertOne(doc);
}

Write Concern Tuning
- Use w: 1 for non-critical writes (faster)
- Use w: "majority" for critical data (safer)
Read Preference
- Use secondary for read-heavy analytics
- Use primary for strong consistency

Monitoring

// Check slow queries
db.setProfilingLevel(1, { slowms: 100 });
db.system.profile.find().sort({ ts: -1 }).limit(10);

// Current operations
db.currentOp();

// Server status
db.serverStatus();

// Collection stats
db.collection.stats();

XI. TROUBLESHOOTING

Common Errors

Error	Cause	Solution
`MongoNetworkError`	Connection failed	Check network, IP whitelist, credentials
`E11000 duplicate key`	Duplicate unique field	Check unique indexes, handle duplicates
`ValidationError`	Schema validation failed	Check document structure, field types
`OperationTimeout`	Query too slow	Add indexes, optimize query, increase timeout
`AggregationResultTooLarge`	Result > 16MB	Use `$limit`, `$project`, or `$out`
`InvalidSharKey`	Bad shard key	Choose high-cardinality, even-distribution key
`ChunkTooBig`	Jumbo chunk	Use `refineShardKey` or re-shard
`OplogTailFailed`	Replication lag	Check network, increase oplog size

Debugging Tools

// Explain query plan
db.collection.find({ field: value }).explain('executionStats');

// Check index usage
db.collection.aggregate([{ $indexStats: {} }]);

// Analyze slow queries
db.setProfilingLevel(2); // Profile all queries
db.system.profile.find({ millis: { $gt: 100 } });

// Check replication lag
rs.printReplicationInfo();
rs.printSecondaryReplicationInfo();

XII. QUICK REFERENCE

Top 20 Operations (by Frequency)

find() - Query documents
updateOne() / updateMany() - Modify documents
insertOne() / insertMany() - Add documents
deleteOne() / deleteMany() - Remove documents
aggregate() - Complex queries
createIndex() - Performance optimization
explain() - Query analysis
findOne() - Get single document
countDocuments() - Count matches
replaceOne() - Replace document
distinct() - Get unique values
bulkWrite() - Batch operations
findAndModify() - Atomic update
watch() - Monitor changes
sort() / limit() / skip() - Result manipulation
$lookup - Join collections
$group - Aggregate data
$match - Filter pipeline
$project - Shape output
hint() - Force index

Common Patterns

Pagination:

const page = 2;
const pageSize = 20;
db.collection
  .find()
  .skip((page - 1) * pageSize)
  .limit(pageSize);

Cursor-based Pagination (Better):

const lastId = ObjectId('...');
db.collection.find({ _id: { $gt: lastId } }).limit(20);

Atomic Counter:

db.counters.findAndModify({
  query: { _id: 'sequence' },
  update: { $inc: { value: 1 } },
  new: true,
  upsert: true,
});

Soft Delete:

// Mark as deleted
db.users.updateOne({ _id: userId }, { $set: { deleted: true, deletedAt: new Date() } });

// Query active only
db.users.find({ deleted: { $ne: true } });

XIII. RESOURCES

Official Documentation

Full Docs: https://www.mongodb.com/docs/
MongoDB Manual: https://www.mongodb.com/docs/manual/
Drivers: https://www.mongodb.com/docs/drivers/
Atlas: https://www.mongodb.com/docs/atlas/

Tools

MongoDB Compass - GUI for MongoDB
MongoDB Shell (mongosh) - Modern shell
Atlas CLI - Automate Atlas operations
Database Tools - mongodump, mongorestore, mongoimport

Best Practices Summary

Always use indexes for queried fields
Embedded vs. Referenced: Embed for 1-to-few, reference for 1-to-many
Shard key: High cardinality + even distribution + query-aligned
Security: Enable auth, use TLS, encrypt at rest for production
Replication: Minimum 3 nodes for high availability
Write concern: w: "majority" for critical data
Monitor: Track slow queries, replication lag, disk usage
Test: Use explain() to verify query performance
Connection pooling: Configure appropriate pool size
Schema validation: Define schema for data integrity

XIV. VERSION-SPECIFIC FEATURES

MongoDB 8.0 (Current)

Config shard (combined config + shard role)
Improved aggregation performance
Enhanced security features

MongoDB 7.0

Auto-merging chunks
Time series improvements
Queryable encryption GA

MongoDB 6.0

Resharding support
Clustered collections
Time series collections improvements

MongoDB 5.0

Time series collections
Live resharding
Versioned API

Common Use Cases

E-Commerce

Product catalog (embedded attributes)
Orders (transactions for consistency)
User sessions (TTL indexes for cleanup)
Search (Atlas Search for products)

IoT/Time Series

Sensor data (time series collections)
Real-time analytics (change streams)
Retention policies (TTL indexes)

Social Network

User profiles (embedded or referenced)
Posts & comments (embedded for small, referenced for large)
Real-time feeds (change streams)
Search (Atlas Search for content)

Analytics

Event tracking (high write throughput)
Aggregation pipelines (complex analytics)
Data federation (query across sources)

When NOT to Use MongoDB

Strong consistency over availability (use traditional RDBMS)
Complex multi-table joins (SQL databases excel here)
Extremely small dataset (<1GB) with simple queries
ACID transactions across multiple databases (not supported)

This skill provides comprehensive MongoDB knowledge for implementing database solutions, from basic CRUD operations to advanced distributed systems with sharding, replication, and security. Always refer to official documentation for the latest features and version-specific details.

About

SKILL.md

About

Guide for implementing MongoDB - a document database platform with CRUD operations, aggregation pipelines, indexing, replication, sharding, search capabilities, and comprehensive security...

SKILL.md

MongoDB Agent Skill

A comprehensive guide for working with MongoDB - a document-oriented database platform that provides powerful querying, horizontal scaling, high availability, and enterprise-grade security.

When to Use This Skill

Use this skill when you need to:

Design MongoDB schemas and data models
Write CRUD operations and complex queries
Build aggregation pipelines for data transformation
Optimize query performance with indexes
Configure replication for high availability
Set up sharding for horizontal scaling
Implement security (authentication, authorization, encryption)
Deploy MongoDB (Atlas, self-managed, Kubernetes)
Integrate MongoDB with applications (15+ official drivers)
Troubleshoot performance issues or errors
Implement Atlas Search or Vector Search
Work with time series data or change streams

Documentation Coverage

This skill synthesizes 24,618 documentation links across 172 major MongoDB sections, covering:

MongoDB versions 5.0 through 8.1 (upcoming)
15+ official driver languages
50+ integration tools (Kafka, Spark, BI Connector, Kubernetes Operator)
Complete deployment spectrum (Atlas cloud, self-managed, Kubernetes)

I. CORE DATABASE OPERATIONS

A. CRUD Operations

Read Operations

// Find documents
db.collection.find({ status: 'active' });
db.collection.findOne({ _id: ObjectId('...') });

// Query operators
db.users.find({ age: { $gte: 18, $lt: 65 } });
db.posts.find({ tags: { $in: ['mongodb', 'database'] } });
db.products.find({ price: { $exists: true } });

// Projection (select specific fields)
db.users.find({ status: 'active' }, { name: 1, email: 1 });

// Cursor operations
db.collection.find().sort({ createdAt: -1 }).limit(10).skip(20);

Write Operations

// Insert
db.collection.insertOne({ name: 'Alice', age: 30 });
db.collection.insertMany([{ name: 'Bob' }, { name: 'Charlie' }]);

// Update
db.users.updateOne({ _id: userId }, { $set: { status: 'verified' } });
db.users.updateMany({ lastLogin: { $lt: cutoffDate } }, { $set: { status: 'inactive' } });

// Replace entire document
db.users.replaceOne({ _id: userId }, newUserDoc);

// Delete
db.users.deleteOne({ _id: userId });
db.users.deleteMany({ status: 'deleted' });

// Upsert (update or insert if not exists)
db.users.updateOne(
  { email: 'user@example.com' },
  { $set: { name: 'User', lastSeen: new Date() } },
  { upsert: true }
);

Atomic Operations

// Increment counter
db.posts.updateOne({ _id: postId }, { $inc: { views: 1 } });

// Add to array (if not exists)
db.users.updateOne({ _id: userId }, { $addToSet: { interests: 'mongodb' } });

// Push to array
db.posts.updateOne({ _id: postId }, { $push: { comments: { author: 'Alice', text: 'Great!' } } });

// Find and modify atomically
db.counters.findAndModify({
  query: { _id: 'sequence' },
  update: { $inc: { value: 1 } },
  new: true,
  upsert: true,
});

B. Query Operators (100+)

Comparison Operators

($eq, $ne, $gt, $gte, $lt, $lte);
($in, $nin);

Logical Operators

($and, $or, $not, $nor);

// Example
db.products.find({
  $and: [{ price: { $gte: 100 } }, { stock: { $gt: 0 } }],
});

Array Operators

($all, $elemMatch, $size);
($firstN, $lastN, $maxN, $minN);

// Example: Find docs with all tags
db.posts.find({ tags: { $all: ['mongodb', 'database'] } });

// Match array element with multiple conditions
db.products.find({
  reviews: {
    $elemMatch: { rating: { $gte: 4 }, verified: true },
  },
});

Existence & Type

($exists, $type);

// Find documents with optional field
db.users.find({ phoneNumber: { $exists: true } });

// Type checking
db.data.find({ value: { $type: 'string' } });

C. Aggregation Pipeline

MongoDB's most powerful feature for data transformation and analysis.

Core Pipeline Stages (40+)

db.orders.aggregate([
  // Stage 1: Filter documents
  { $match: { status: 'completed', total: { $gte: 100 } } },

  // Stage 2: Join with customers
  {
    $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer',
    },
  },

  // Stage 3: Unwind array
  { $unwind: '$items' },

  // Stage 4: Group and aggregate
  {
    $group: {
      _id: '$items.category',
      totalRevenue: { $sum: '$items.total' },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: '$total' },
    },
  },

  // Stage 5: Sort results
  { $sort: { totalRevenue: -1 } },

  // Stage 6: Limit results
  { $limit: 10 },

  // Stage 7: Reshape output
  {
    $project: {
      category: '$_id',
      revenue: '$totalRevenue',
      orders: '$orderCount',
      avgValue: { $round: ['$avgOrderValue', 2] },
      _id: 0,
    },
  },
]);

Common Pipeline Patterns

Time-Based Aggregation:

db.events.aggregate([
  { $match: { timestamp: { $gte: startDate, $lt: endDate } } },
  {
    $group: {
      _id: {
        year: { $year: '$timestamp' },
        month: { $month: '$timestamp' },
        day: { $dayOfMonth: '$timestamp' },
      },
      count: { $sum: 1 },
    },
  },
]);

Faceted Search (Multiple Aggregations):

db.products.aggregate([
  { $match: { category: 'electronics' } },
  {
    $facet: {
      priceRanges: [
        {
          $bucket: {
            groupBy: '$price',
            boundaries: [0, 100, 500, 1000, 5000],
            default: '5000+',
            output: { count: { $sum: 1 } },
          },
        },
      ],
      topBrands: [
        { $group: { _id: '$brand', count: { $sum: 1 } } },
        { $sort: { count: -1 } },
        { $limit: 5 },
      ],
      avgPrice: [{ $group: { _id: null, avg: { $avg: '$price' } } }],
    },
  },
]);

Window Functions:

db.sales.aggregate([
  {
    $setWindowFields: {
      partitionBy: '$region',
      sortBy: { date: 1 },
      output: {
        runningTotal: { $sum: '$amount', window: { documents: ['unbounded', 'current'] } },
        movingAvg: { $avg: '$amount', window: { documents: [-7, 0] } },
      },
    },
  },
]);

Aggregation Operators (150+)

Math Operators:

($add, $subtract, $multiply, $divide, $mod);
($abs, $ceil, $floor, $round, $sqrt, $pow);
($log, $log10, $ln, $exp);

String Operators:

($concat, $substr, $toLower, $toUpper);
($trim, $ltrim, $rtrim, $split);
($regexMatch, $regexFind, $regexFindAll);

Array Operators:

($arrayElemAt, $slice, $first, $last, $reverse);
($sortArray, $filter, $map, $reduce);
($zip, $concatArrays);

Date/Time Operators:

($dateAdd, $dateDiff, $dateFromString, $dateToString);
($dayOfMonth, $month, $year, $dayOfWeek);
($week, $hour, $minute, $second);

Type Conversion:

($toInt, $toString, $toDate, $toDouble);
($toDecimal, $toObjectId, $toBool);

II. INDEXING & PERFORMANCE

A. Index Types

Single Field Index

db.users.createIndex({ email: 1 }); // ascending
db.posts.createIndex({ createdAt: -1 }); // descending

Compound Index

// Order matters! Index on { status: 1, createdAt: -1 }
db.orders.createIndex({ status: 1, createdAt: -1 });

// Supports queries on:
// - { status: "..." }
// - { status: "...", createdAt: ... }
// Does NOT efficiently support: { createdAt: ... } alone

Text Index (Full-Text Search)

db.articles.createIndex({ title: 'text', body: 'text' });

// Search
db.articles.find({ $text: { $search: 'mongodb database' } });

// With relevance score
db.articles
  .find({ $text: { $search: 'mongodb' } }, { score: { $meta: 'textScore' } })
  .sort({ score: { $meta: 'textScore' } });

Geospatial Indexes

// 2dsphere for earth-like geometry
db.places.createIndex({ location: '2dsphere' });

// Find nearby
db.places.find({
  location: {
    $near: {
      $geometry: { type: 'Point', coordinates: [lon, lat] },
      $maxDistance: 5000, // meters
    },
  },
});

Wildcard Index

// Index all fields in subdocuments
db.products.createIndex({ 'attributes.$**': 1 });

// Supports queries on any field under attributes
db.products.find({ 'attributes.color': 'red' });

Partial Index

// Index only documents matching filter
db.orders.createIndex({ customerId: 1 }, { partialFilterExpression: { status: 'active' } });

TTL Index (Auto-delete)

// Delete documents 24 hours after createdAt
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });

Hashed Index (for sharding)

db.users.createIndex({ userId: 'hashed' });

B. Query Optimization

Explain Query Plans

// Basic explain
db.users.find({ email: 'user@example.com' }).explain();

// Execution stats (shows actual performance)
db.users.find({ age: { $gte: 18 } }).explain('executionStats');

// Key metrics to check:
// - executionTimeMillis
// - totalDocsExamined vs. nReturned (should be close)
// - stage: "IXSCAN" (using index) vs. "COLLSCAN" (full scan - BAD)

Covered Queries

// Create index
db.users.createIndex({ email: 1, name: 1 });

// Query covered by index (no document fetch needed)
db.users.find(
  { email: 'user@example.com' },
  { email: 1, name: 1, _id: 0 } // project only indexed fields
);

Index Hints

// Force specific index
db.users.find({ status: 'active', city: 'NYC' }).hint({ status: 1, createdAt: -1 });

Index Management

// List all indexes
db.collection.getIndexes();

// Drop index
db.collection.dropIndex('indexName');

// Hide index (test before dropping)
db.collection.hideIndex('indexName');
db.collection.unhideIndex('indexName');

// Index stats
db.collection.aggregate([{ $indexStats: {} }]);

III. DATA MODELING PATTERNS

A. Relationship Patterns

One-to-One (Embedded)

// User with single address
{
  _id: ObjectId("..."),
  name: "Alice",
  email: "alice@example.com",
  address: {
    street: "123 Main St",
    city: "NYC",
    zipcode: "10001"
  }
}

One-to-Few (Embedded Array)

// Blog post with comments (< 100 comments)
{
  _id: ObjectId("..."),
  title: "MongoDB Guide",
  comments: [
    { author: "Bob", text: "Great post!", date: ISODate("...") },
    { author: "Charlie", text: "Thanks!", date: ISODate("...") }
  ]
}

One-to-Many (Referenced)

// Author collection
{ _id: ObjectId("author1"), name: "Alice" }

// Books collection (many books per author)
{ _id: ObjectId("book1"), title: "Book 1", authorId: ObjectId("author1") }
{ _id: ObjectId("book2"), title: "Book 2", authorId: ObjectId("author1") }

Many-to-Many (Array of References)

// Users collection
{
  _id: ObjectId("user1"),
  name: "Alice",
  groupIds: [ObjectId("group1"), ObjectId("group2")]
}

// Groups collection
{
  _id: ObjectId("group1"),
  name: "MongoDB Users",
  memberIds: [ObjectId("user1"), ObjectId("user2")]
}

B. Advanced Patterns

Time Series Pattern

// High-frequency sensor data
{
  _id: ObjectId("..."),
  sensorId: "sensor-123",
  timestamp: ISODate("2025-01-01T00:00:00Z"),
  readings: [
    { time: 0, temp: 23.5, humidity: 45 },
    { time: 60, temp: 23.6, humidity: 46 },
    { time: 120, temp: 23.4, humidity: 45 }
  ]
}

// Create time series collection
db.createCollection("sensor_data", {
  timeseries: {
    timeField: "timestamp",
    metaField: "sensorId",
    granularity: "minutes"
  }
})

Computed Pattern (Cache Results)

// User document with pre-computed stats
{
  _id: ObjectId("..."),
  username: "alice",
  stats: {
    postCount: 150,
    followerCount: 2500,
    lastUpdated: ISODate("...")
  }
}

// Update stats periodically or with triggers

Schema Versioning

// Support schema evolution
{
  _id: ObjectId("..."),
  schemaVersion: 2,
  // v2 fields
  name: { first: "Alice", last: "Smith" },
  // Migration code handles v1 format
}

C. Schema Validation

db.createCollection('users', {
  validator: {
    $jsonSchema: {
      bsonType: 'object',
      required: ['email', 'name'],
      properties: {
        email: {
          bsonType: 'string',
          pattern: '^.+@.+$',
          description: 'must be a valid email',
        },
        age: {
          bsonType: 'int',
          minimum: 0,
          maximum: 120,
        },
        status: {
          enum: ['active', 'inactive', 'pending'],
        },
      },
    },
  },
  validationLevel: 'strict', // or "moderate"
  validationAction: 'error', // or "warn"
});

IV. REPLICATION & HIGH AVAILABILITY

A. Replica Sets

Architecture:

Primary: Accepts writes, replicates to secondaries
Secondaries: Replicate primary's oplog, can serve reads
Arbiter: Votes in elections, holds no data

Configuration:

rs.initiate({
  _id: 'myReplicaSet',
  members: [
    { _id: 0, host: 'mongo1:27017' },
    { _id: 1, host: 'mongo2:27017' },
    { _id: 2, host: 'mongo3:27017' },
  ],
});

// Check status
rs.status();

// Add member
rs.add('mongo4:27017');

// Remove member
rs.remove('mongo4:27017');

B. Write Concern

Controls acknowledgment of write operations:

// Wait for majority acknowledgment (durable)
db.users.insertOne({ name: 'Alice' }, { writeConcern: { w: 'majority', wtimeout: 5000 } });

// Common levels:
// w: 1 - primary acknowledges (default)
// w: "majority" - majority of nodes acknowledge (recommended for production)
// w: <number> - specific number of nodes
// w: 0 - no acknowledgment (fire and forget)

C. Read Preference

Controls where reads are served from:

// Options:
// - primary (default): read from primary only
// - primaryPreferred: primary if available, else secondary
// - secondary: read from secondary only
// - secondaryPreferred: secondary if available, else primary
// - nearest: lowest network latency

db.collection.find().readPref('secondaryPreferred');

D. Transactions

Multi-document ACID transactions:

const session = client.startSession();
session.startTransaction();

try {
  await accounts.updateOne({ _id: fromAccount }, { $inc: { balance: -amount } }, { session });

  await accounts.updateOne({ _id: toAccount }, { $inc: { balance: amount } }, { session });

  await session.commitTransaction();
} catch (error) {
  await session.abortTransaction();
  throw error;
} finally {
  session.endSession();
}

V. SHARDING & HORIZONTAL SCALING

A. Sharded Cluster Architecture

Components:

Shards: Replica sets holding data subsets
Config Servers: Store cluster metadata
Mongos: Query routers directing operations to shards

B. Shard Key Selection

CRITICAL: Shard key determines data distribution and query performance.

Good Shard Keys:

High cardinality (many unique values)
Even distribution (no hotspots)
Query-aligned (queries include shard key)

// Enable sharding on database
sh.enableSharding('myDatabase');

// Shard collection with hashed key
sh.shardCollection('myDatabase.users', { userId: 'hashed' });

// Shard with compound key
sh.shardCollection('myDatabase.orders', { customerId: 1, orderDate: 1 });

C. Zone Sharding

Assign data ranges to specific shards:

// Add shard tags
sh.addShardTag('shard0', 'US-EAST');
sh.addShardTag('shard1', 'US-WEST');

// Assign ranges to zones
sh.addTagRange('myDatabase.users', { zipcode: '00000' }, { zipcode: '50000' }, 'US-EAST');

D. Query Routing

// Targeted query (includes shard key) - fast
db.users.find({ userId: '12345' });

// Scatter-gather (no shard key) - slow
db.users.find({ email: 'user@example.com' });

VI. SECURITY

A. Authentication

Methods:

SCRAM (Username/Password) - Default
X.509 Certificates - Mutual TLS
LDAP (Enterprise)
Kerberos (Enterprise)
AWS IAM
OIDC (OpenID Connect)

// Create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "strongPassword",
  roles: ["root"]
})

// Create database user
use myDatabase
db.createUser({
  user: "appUser",
  pwd: "password",
  roles: [
    { role: "readWrite", db: "myDatabase" }
  ]
})

B. Role-Based Access Control (RBAC)

Built-in Roles:

read, readWrite: Collection-level
dbAdmin, dbOwner: Database administration
userAdmin: User management
clusterAdmin: Cluster management
root: Superuser

Custom Roles:

db.createRole({
  role: 'customRole',
  privileges: [
    {
      resource: { db: 'myDatabase', collection: 'users' },
      actions: ['find', 'update'],
    },
  ],
  roles: [],
});

C. Encryption

Encryption at Rest

// Configure in mongod.conf
security:
  enableEncryption: true
  encryptionKeyFile: /path/to/keyfile

Encryption in Transit (TLS/SSL)

// mongod.conf
net:
  tls:
    mode: requireTLS
    certificateKeyFile: /path/to/cert.pem
    CAFile: /path/to/ca.pem

Client-Side Field Level Encryption (CSFLE)

// Automatic encryption of sensitive fields
const clientEncryption = new ClientEncryption(client, {
  keyVaultNamespace: "encryption.__keyVault",
  kmsProviders: {
    aws: {
      accessKeyId: "...",
      secretAccessKey: "..."
    }
  }
})

// Create data key
const dataKeyId = await clientEncryption.createDataKey("aws", {
  masterKey: { region: "us-east-1", key: "..." }
})

// Configure auto-encryption
const encryptedClient = new MongoClient(uri, {
  autoEncryption: {
    keyVaultNamespace: "encryption.__keyVault",
    kmsProviders: { aws: {...} },
    schemaMap: {
      "myDatabase.users": {
        bsonType: "object",
        properties: {
          ssn: {
            encrypt: {
              keyId: [dataKeyId],
              algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"
            }
          }
        }
      }
    }
  }
})

VII. DEPLOYMENT OPTIONS

A. MongoDB Atlas (Cloud)

Recommended for most use cases.

Quick Start:

Create free M0 cluster at mongodb.com/atlas
Whitelist IP address
Create database user
Get connection string

Features:

Auto-scaling
Automated backups
Multi-cloud (AWS, Azure, GCP)
Multi-region deployments
Atlas Search & Vector Search
Charts (embedded analytics)
Data Federation
Serverless instances

Connection:

const uri = 'mongodb+srv://user:pass@cluster.mongodb.net/database?retryWrites=true&w=majority';
const client = new MongoClient(uri);

B. Self-Managed

Installation:

# Ubuntu/Debian
wget -qO - https://www.mongodb.org/static/pgp/server-8.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-8.0.list
sudo apt-get update
sudo apt-get install -y mongodb-org

# Start
sudo systemctl start mongod
sudo systemctl enable mongod

Configuration (mongod.conf):

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true

net:
  port: 27017
  bindIp: 127.0.0.1

security:
  authorization: enabled

replication:
  replSetName: 'myReplicaSet'

C. Kubernetes Deployment

MongoDB Kubernetes Operator:

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-replica-set
spec:
  members: 3
  type: ReplicaSet
  version: '8.0'
  security:
    authentication:
      modes: ['SCRAM']
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: mongodb-admin-password
      roles:
        - name: root
          db: admin
  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes: ['ReadWriteOnce']
            resources:
              requests:
                storage: 10Gi

VIII. INTEGRATION & DRIVERS

A. Official Drivers (15+ Languages)

Node.js

const { MongoClient } = require('mongodb');

const client = new MongoClient(uri);
await client.connect();

const db = client.db('myDatabase');
const collection = db.collection('users');

// CRUD
await collection.insertOne({ name: 'Alice' });
const user = await collection.findOne({ name: 'Alice' });
await collection.updateOne({ name: 'Alice' }, { $set: { age: 30 } });
await collection.deleteOne({ name: 'Alice' });

Python (PyMongo)

from pymongo import MongoClient

client = MongoClient(uri)
db = client.myDatabase
collection = db.users

# CRUD
collection.insert_one({"name": "Alice"})
user = collection.find_one({"name": "Alice"})
collection.update_one({"name": "Alice"}, {"$set": {"age": 30}})
collection.delete_one({"name": "Alice"})

Java

MongoClient mongoClient = MongoClients.create(uri);
MongoDatabase database = mongoClient.getDatabase("myDatabase");
MongoCollection<Document> collection = database.getCollection("users");

// Insert
collection.insertOne(new Document("name", "Alice"));

// Find
Document user = collection.find(eq("name", "Alice")).first();

// Update
collection.updateOne(eq("name", "Alice"), set("age", 30));

Go

client, _ := mongo.Connect(context.TODO(), options.Client().ApplyURI(uri))
collection := client.Database("myDatabase").Collection("users")

// Insert
collection.InsertOne(context.TODO(), bson.M{"name": "Alice"})

// Find
var user bson.M
collection.FindOne(context.TODO(), bson.M{"name": "Alice"}).Decode(&user)

B. Integration Tools

Kafka Connector

{
  "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
  "connection.uri": "mongodb://localhost:27017",
  "database": "myDatabase",
  "collection": "events",
  "topics": "my-topic"
}

Spark Connector

val df = spark.read
  .format("mongodb")
  .option("uri", "mongodb://localhost:27017/myDatabase.myCollection")
  .load()

df.filter($"age" > 18).show()

BI Connector (SQL Interface)

-- Query MongoDB using SQL
SELECT name, AVG(age) as avg_age
FROM users
WHERE status = 'active'
GROUP BY name;

IX. ADVANCED FEATURES

A. Atlas Search (Full-Text)

Create Search Index:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "lucene.standard"
      },
      "description": {
        "type": "string",
        "analyzer": "lucene.english"
      }
    }
  }
}

Query:

db.articles.aggregate([
  {
    $search: {
      text: {
        query: 'mongodb database',
        path: ['title', 'description'],
        fuzzy: { maxEdits: 1 },
      },
    },
  },
  { $limit: 10 },
  { $project: { title: 1, description: 1, score: { $meta: 'searchScore' } } },
]);

B. Atlas Vector Search

For AI/ML similarity search:

db.products.aggregate([
  {
    $vectorSearch: {
      index: "vector_index",
      path: "embedding",
      queryVector: [0.123, 0.456, ...],  // 1536 dimensions for OpenAI
      numCandidates: 100,
      limit: 10
    }
  },
  {
    $project: {
      name: 1,
      description: 1,
      score: { $meta: "vectorSearchScore" }
    }
  }
])

C. Change Streams (Real-Time)

const changeStream = collection.watch([{ $match: { 'fullDocument.status': 'active' } }]);

changeStream.on('change', (change) => {
  console.log('Change detected:', change);
  // change.operationType: "insert", "update", "delete", "replace"
  // change.fullDocument: entire document (if configured)
});

// Resume from specific point
const resumeToken = changeStream.resumeToken;
const newStream = collection.watch([], { resumeAfter: resumeToken });

D. Bulk Operations

const bulkOps = [
  { insertOne: { document: { name: 'Alice', age: 30 } } },
  {
    updateOne: {
      filter: { name: 'Bob' },
      update: { $set: { age: 25 } },
      upsert: true,
    },
  },
  { deleteOne: { filter: { name: 'Charlie' } } },
];

const result = await collection.bulkWrite(bulkOps, { ordered: false });
console.log(`Inserted: ${result.insertedCount}, Updated: ${result.modifiedCount}`);

X. PERFORMANCE OPTIMIZATION

Best Practices

Index Critical Fields
- Index fields used in queries, sorts, joins
- Monitor slow queries (>100ms)
- Use compound indexes for multi-field queries

Use Projection

// Good: Only return needed fields
db.users.find({ status: 'active' }, { name: 1, email: 1 });

// Bad: Return entire document
db.users.find({ status: 'active' });

Limit Result Sets
```
db.users.find().limit(100);
```
Use Aggregation Pipeline
- Process data server-side instead of client-side
- Use $match early to filter
- Use $project to reduce document size

Connection Pooling

const client = new MongoClient(uri, {
  maxPoolSize: 50,
  minPoolSize: 10,
});

Batch Writes

// Good: Batch insert
await collection.insertMany(documents);

// Bad: Individual inserts
for (const doc of documents) {
  await collection.insertOne(doc);
}

Write Concern Tuning
- Use w: 1 for non-critical writes (faster)
- Use w: "majority" for critical data (safer)
Read Preference
- Use secondary for read-heavy analytics
- Use primary for strong consistency

Monitoring

// Check slow queries
db.setProfilingLevel(1, { slowms: 100 });
db.system.profile.find().sort({ ts: -1 }).limit(10);

// Current operations
db.currentOp();

// Server status
db.serverStatus();

// Collection stats
db.collection.stats();

XI. TROUBLESHOOTING

Common Errors

Error	Cause	Solution
`MongoNetworkError`	Connection failed	Check network, IP whitelist, credentials
`E11000 duplicate key`	Duplicate unique field	Check unique indexes, handle duplicates
`ValidationError`	Schema validation failed	Check document structure, field types
`OperationTimeout`	Query too slow	Add indexes, optimize query, increase timeout
`AggregationResultTooLarge`	Result > 16MB	Use `$limit`, `$project`, or `$out`
`InvalidSharKey`	Bad shard key	Choose high-cardinality, even-distribution key
`ChunkTooBig`	Jumbo chunk	Use `refineShardKey` or re-shard
`OplogTailFailed`	Replication lag	Check network, increase oplog size

Debugging Tools

// Explain query plan
db.collection.find({ field: value }).explain('executionStats');

// Check index usage
db.collection.aggregate([{ $indexStats: {} }]);

// Analyze slow queries
db.setProfilingLevel(2); // Profile all queries
db.system.profile.find({ millis: { $gt: 100 } });

// Check replication lag
rs.printReplicationInfo();
rs.printSecondaryReplicationInfo();

XII. QUICK REFERENCE

Top 20 Operations (by Frequency)

find() - Query documents
updateOne() / updateMany() - Modify documents
insertOne() / insertMany() - Add documents
deleteOne() / deleteMany() - Remove documents
aggregate() - Complex queries
createIndex() - Performance optimization
explain() - Query analysis
findOne() - Get single document
countDocuments() - Count matches
replaceOne() - Replace document
distinct() - Get unique values
bulkWrite() - Batch operations
findAndModify() - Atomic update
watch() - Monitor changes
sort() / limit() / skip() - Result manipulation
$lookup - Join collections
$group - Aggregate data
$match - Filter pipeline
$project - Shape output
hint() - Force index

Common Patterns

Pagination:

const page = 2;
const pageSize = 20;
db.collection
  .find()
  .skip((page - 1) * pageSize)
  .limit(pageSize);

Cursor-based Pagination (Better):

const lastId = ObjectId('...');
db.collection.find({ _id: { $gt: lastId } }).limit(20);

Atomic Counter:

db.counters.findAndModify({
  query: { _id: 'sequence' },
  update: { $inc: { value: 1 } },
  new: true,
  upsert: true,
});

Soft Delete:

// Mark as deleted
db.users.updateOne({ _id: userId }, { $set: { deleted: true, deletedAt: new Date() } });

// Query active only
db.users.find({ deleted: { $ne: true } });

XIII. RESOURCES

Official Documentation

Full Docs: https://www.mongodb.com/docs/
MongoDB Manual: https://www.mongodb.com/docs/manual/
Drivers: https://www.mongodb.com/docs/drivers/
Atlas: https://www.mongodb.com/docs/atlas/

Tools

MongoDB Compass - GUI for MongoDB
MongoDB Shell (mongosh) - Modern shell
Atlas CLI - Automate Atlas operations
Database Tools - mongodump, mongorestore, mongoimport

Best Practices Summary

Always use indexes for queried fields
Embedded vs. Referenced: Embed for 1-to-few, reference for 1-to-many
Shard key: High cardinality + even distribution + query-aligned
Security: Enable auth, use TLS, encrypt at rest for production
Replication: Minimum 3 nodes for high availability
Write concern: w: "majority" for critical data
Monitor: Track slow queries, replication lag, disk usage
Test: Use explain() to verify query performance
Connection pooling: Configure appropriate pool size
Schema validation: Define schema for data integrity

XIV. VERSION-SPECIFIC FEATURES

MongoDB 8.0 (Current)

Config shard (combined config + shard role)
Improved aggregation performance
Enhanced security features

MongoDB 7.0

Auto-merging chunks
Time series improvements
Queryable encryption GA

MongoDB 6.0

Resharding support
Clustered collections
Time series collections improvements

MongoDB 5.0

Time series collections
Live resharding
Versioned API

Common Use Cases

E-Commerce

Product catalog (embedded attributes)
Orders (transactions for consistency)
User sessions (TTL indexes for cleanup)
Search (Atlas Search for products)

IoT/Time Series

Sensor data (time series collections)
Real-time analytics (change streams)
Retention policies (TTL indexes)

Social Network

User profiles (embedded or referenced)
Posts & comments (embedded for small, referenced for large)
Real-time feeds (change streams)
Search (Atlas Search for content)

Analytics

Event tracking (high write throughput)
Aggregation pipelines (complex analytics)
Data federation (query across sources)

When NOT to Use MongoDB

Strong consistency over availability (use traditional RDBMS)
Complex multi-table joins (SQL databases excel here)
Extremely small dataset (<1GB) with simple queries
ACID transactions across multiple databases (not supported)