Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    senior-data-engineer

    davila7/senior-data-engineer
    Coding
    19,892
    12 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack...

    SKILL.md

    Senior Data Engineer

    World-class senior data engineer skill for production-grade AI/ML/Data systems.

    Quick Start

    Main Capabilities

    # Core Tool 1
    python scripts/pipeline_orchestrator.py --input data/ --output results/
    
    # Core Tool 2  
    python scripts/data_quality_validator.py --target project/ --analyze
    
    # Core Tool 3
    python scripts/etl_performance_optimizer.py --config config.yaml --deploy
    

    Core Expertise

    This skill covers world-class capabilities in:

    • Advanced production patterns and architectures
    • Scalable system design and implementation
    • Performance optimization at scale
    • MLOps and DataOps best practices
    • Real-time processing and inference
    • Distributed computing frameworks
    • Model deployment and monitoring
    • Security and compliance
    • Cost optimization
    • Team leadership and mentoring

    Tech Stack

    Languages: Python, SQL, R, Scala, Go ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost Data Tools: Spark, Airflow, dbt, Kafka, Databricks LLM Frameworks: LangChain, LlamaIndex, DSPy Deployment: Docker, Kubernetes, AWS/GCP/Azure Monitoring: MLflow, Weights & Biases, Prometheus Databases: PostgreSQL, BigQuery, Snowflake, Pinecone

    Reference Documentation

    1. Data Pipeline Architecture

    Comprehensive guide available in references/data_pipeline_architecture.md covering:

    • Advanced patterns and best practices
    • Production implementation strategies
    • Performance optimization techniques
    • Scalability considerations
    • Security and compliance
    • Real-world case studies

    2. Data Modeling Patterns

    Complete workflow documentation in references/data_modeling_patterns.md including:

    • Step-by-step processes
    • Architecture design patterns
    • Tool integration guides
    • Performance tuning strategies
    • Troubleshooting procedures

    3. Dataops Best Practices

    Technical reference guide in references/dataops_best_practices.md with:

    • System design principles
    • Implementation examples
    • Configuration best practices
    • Deployment strategies
    • Monitoring and observability

    Production Patterns

    Pattern 1: Scalable Data Processing

    Enterprise-scale data processing with distributed computing:

    • Horizontal scaling architecture
    • Fault-tolerant design
    • Real-time and batch processing
    • Data quality validation
    • Performance monitoring

    Pattern 2: ML Model Deployment

    Production ML system with high availability:

    • Model serving with low latency
    • A/B testing infrastructure
    • Feature store integration
    • Model monitoring and drift detection
    • Automated retraining pipelines

    Pattern 3: Real-Time Inference

    High-throughput inference system:

    • Batching and caching strategies
    • Load balancing
    • Auto-scaling
    • Latency optimization
    • Cost optimization

    Best Practices

    Development

    • Test-driven development
    • Code reviews and pair programming
    • Documentation as code
    • Version control everything
    • Continuous integration

    Production

    • Monitor everything critical
    • Automate deployments
    • Feature flags for releases
    • Canary deployments
    • Comprehensive logging

    Team Leadership

    • Mentor junior engineers
    • Drive technical decisions
    • Establish coding standards
    • Foster learning culture
    • Cross-functional collaboration

    Performance Targets

    Latency:

    • P50: < 50ms
    • P95: < 100ms
    • P99: < 200ms

    Throughput:

    • Requests/second: > 1000
    • Concurrent users: > 10,000

    Availability:

    • Uptime: 99.9%
    • Error rate: < 0.1%

    Security & Compliance

    • Authentication & authorization
    • Data encryption (at rest & in transit)
    • PII handling and anonymization
    • GDPR/CCPA compliance
    • Regular security audits
    • Vulnerability management

    Common Commands

    # Development
    python -m pytest tests/ -v --cov
    python -m black src/
    python -m pylint src/
    
    # Training
    python scripts/train.py --config prod.yaml
    python scripts/evaluate.py --model best.pth
    
    # Deployment
    docker build -t service:v1 .
    kubectl apply -f k8s/
    helm upgrade service ./charts/
    
    # Monitoring
    kubectl logs -f deployment/service
    python scripts/health_check.py
    

    Resources

    • Advanced Patterns: references/data_pipeline_architecture.md
    • Implementation Guide: references/data_modeling_patterns.md
    • Technical Reference: references/dataops_best_practices.md
    • Automation Scripts: scripts/ directory

    Senior-Level Responsibilities

    As a world-class senior professional:

    1. Technical Leadership

      • Drive architectural decisions
      • Mentor team members
      • Establish best practices
      • Ensure code quality
    2. Strategic Thinking

      • Align with business goals
      • Evaluate trade-offs
      • Plan for scale
      • Manage technical debt
    3. Collaboration

      • Work across teams
      • Communicate effectively
      • Build consensus
      • Share knowledge
    4. Innovation

      • Stay current with research
      • Experiment with new approaches
      • Contribute to community
      • Drive continuous improvement
    5. Production Excellence

      • Ensure high availability
      • Monitor proactively
      • Optimize performance
      • Respond to incidents
    Recommended Servers
    ThinAir Data
    ThinAir Data
    InstantDB
    InstantDB
    Tinybird
    Tinybird
    Repository
    davila7/claude-code-templates
    Files