Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    Ketomihine

    scanpy-complete

    Ketomihine/scanpy-complete
    Data & Analytics
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Scanpy 单细胞分析工具包 - 100%覆盖文档(API+教程+预处理+分析+可视化)

    SKILL.md

    Scanpy-Complete Skill

    Comprehensive assistance with Scanpy - the scalable toolkit for analyzing single-cell gene expression data in Python. This skill provides complete coverage of Scanpy's preprocessing, visualization, clustering, trajectory inference, and differential expression testing capabilities.

    When to Use This Skill

    This skill should be triggered when:

    Core Single-Cell Analysis Tasks

    • Preprocessing and quality control - filtering cells/genes, normalization, highly variable gene selection
    • Dimensionality reduction - PCA, UMAP, t-SNE, diffusion maps
    • Clustering and community detection - Leiden, Louvain, hierarchical clustering
    • Differential expression analysis - finding marker genes, statistical testing
    • Trajectory inference - pseudotime analysis, RNA velocity, fate mapping
    • Data integration - batch correction, merging datasets, harmonization

    Specific Technical Scenarios

    • Working with AnnData objects and .h5ad files
    • Implementing preprocessing pipelines (pp module)
    • Creating publication-quality visualizations (pl module)
    • Running neighbor graphs and embedding algorithms (tl module)
    • Handling large datasets with dask integration
    • Spatial transcriptomics analysis
    • Multi-omics data integration

    Learning and Documentation

    • Understanding Scanpy's API and best practices
    • Learning proper workflow patterns for single-cell analysis
    • Troubleshooting common preprocessing issues
    • Finding optimal parameters for clustering and visualization
    • Understanding statistical methods for differential expression

    Quick Reference

    Essential Workflow Examples

    Example 1 (Basic preprocessing pipeline):

    import scanpy as sc
    adata = sc.datasets.pbmc68k_reduced()
    sc.pp.pca(adata, svd_solver="arpack")
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)
    sc.tl.rank_genes_groups(adata, groupby="leiden")
    

    Example 2 (UMAP visualization and clustering):

    import scanpy as sc
    adata = sc.datasets.pbmc3k()
    sc.pp.filter_cells(adata, min_counts=200)
    sc.pp.filter_genes(adata, min_cells=3)
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    sc.tl.pca(adata)
    sc.tl.umap(adata)
    sc.pl.umap(adata, color=['louvain'])
    

    Example 3 (Embedding density analysis):

    import scanpy as sc
    adata = sc.datasets.pbmc68k_reduced()
    sc.tl.umap(adata)
    sc.tl.embedding_density(adata, basis='umap', groupby='phase')
    sc.pl.embedding_density(
        adata, basis='umap', key='umap_density_phase', group='G1'
    )
    

    Example 4 (Marker gene overlap analysis):

    import scanpy as sc
    adata = sc.datasets.pbmc68k_reduced()
    sc.pp.pca(adata, svd_solver="arpack")
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)
    sc.tl.rank_genes_groups(adata, groupby="leiden")
    
    marker_genes = {
        "CD4 T cells": {"IL7R"},
        "CD14+ Monocytes": {"CD14", "LYZ"},
        "B cells": {"MS4A1"},
        "CD8 T cells": {"CD8A"},
        "NK cells": {"GNLY", "NKG7"},
        "FCGR3A+ Monocytes": {"FCGR3A", "MS4A7"},
        "Dendritic Cells": {"FCER1A", "CST3"},
        "Megakaryocytes": {"PPBP"},
    }
    marker_matches = sc.tl.marker_gene_overlap(adata, marker_genes)
    

    Example 5 (Reading and writing data):

    import scanpy as sc
    # Read different file formats
    adata = sc.read_10x_h5('filtered_feature_bc_matrix.h5')
    adata = sc.read_visium(path='visium_data/', count_file='filtered_feature_bc_matrix.h5')
    adata = sc.read_h5ad('data.h5ad')
    
    # Write data
    adata.write_h5ad('processed_data.h5ad')
    adata.write_zarr('data.zarr')
    

    Example 6 (Quality control metrics):

    import scanpy as sc
    adata = sc.datasets.pbmc3k()
    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)
    # Filter based on metrics
    sc.pp.filter_cells(adata, min_counts=500)
    sc.pp.filter_genes(adata, min_cells=3)
    

    Example 7 (Batch correction with Harmony):

    import scanpy as sc
    import scanpy.external as sce
    adata = sc.datasets.pbmc3k()
    # Run preprocessing first
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    sc.pp.highly_variable_genes(adata)
    # Apply Harmony integration
    sce.pp.harmony_integrate(adata, 'batch')
    

    Example 8 (Spatial transcriptomics):

    import scanpy as sc
    # Read Visium spatial data
    adata = sc.read_visium(
        path='spatial_data/',
        count_file='filtered_feature_bc_matrix.h5'
    )
    # Plot spatial coordinates
    sc.pl.spatial(adata, color=['gene1', 'gene2'])
    

    Plotting Customization

    Example 9 (Custom UMAP plots):

    import scanpy as sc
    adata = sc.datasets.pbmc68k_reduced()
    sc.tl.umap(adata)
    # Custom UMAP with specific styling
    sc.pl.umap(
        adata,
        color=['phase'],
        palette='Set2',
        frameon=False,
        legend_loc='right margin',
        size=20,
        title='Cell Cycle Phases'
    )
    

    Key Concepts

    Core Data Structures

    • AnnData - The central data structure storing expression matrix (X), observations (obs), variables (var), and unstructured annotations (uns)
    • Neighbors graph - k-nearest neighbor graph used for clustering and manifold learning
    • Embeddings - Low-dimensional representations (PCA, UMAP, t-SNE) stored in adata.obsm

    Analysis Workflow

    1. Quality Control - Filter low-quality cells and genes
    2. Normalization - Adjust for sequencing depth and other technical factors
    3. Feature Selection - Identify highly variable genes
    4. Dimensionality Reduction - PCA, followed by non-linear methods
    5. Clustering - Group cells based on transcriptional similarity
    6. Marker Gene Detection - Find genes that define clusters
    7. Visualization - Explore relationships and patterns

    Module Organization

    • scanpy.pp - Preprocessing functions (filtering, normalization, PCA)
    • scanpy.tl - Tools for analysis (clustering, trajectory inference, DE testing)
    • scanpy.pl - Plotting and visualization functions
    • scanpy.read/write - Data I/O operations
    • scanpy.external - Integration with external tools and methods

    Reference Files

    This skill includes comprehensive documentation organized in references/:

    api_reference.md (136 pages)

    Complete API documentation covering:

    • Core functions for data manipulation (read_visium, embedding_density)
    • Analysis tools (marker_gene_overlap, louvain, paga)
    • Dataset loaders (krumsiek11, pbmc3k, pbmc68k_reduced)
    • All preprocessing, analysis, and visualization functions with detailed parameters

    guide_community.md (2 pages)

    Community resources including:

    • Ecosystem - Related tools (cellxgene, scVelo, squidpy, scirpy, etc.)
    • Community - Forums, GitHub, chat channels for getting help

    guide_dev.md (4 pages)

    Developer and project information:

    • News - Latest developments and milestones
    • Contributors - Core team and contributors
    • Usage Principles - Best practices and workflow patterns
    • Installation - Setup instructions and troubleshooting

    guide_getting_started.md (1 page)

    Installation and setup:

    • Installation via pip, conda, and development versions
    • Docker setup and troubleshooting common issues

    other.md (5 pages)

    Additional resources:

    • Tutorials - Links to comprehensive tutorials and learning materials
    • How-to guides - Specific task examples
    • Contributing - Guidelines for contributing to Scanpy
    • References - Academic citations and related literature

    Working with This Skill

    For Beginners

    1. Start with the basics - Use guide_getting_started.md for installation
    2. Learn the workflow - Follow guide_dev.md usage principles
    3. Try simple examples - Use the basic preprocessing and clustering examples from Quick Reference
    4. Understand AnnData - Focus on data structure concepts in Key Concepts

    For Intermediate Users

    1. Explore API reference - Use api_reference.md for detailed function parameters
    2. Advanced preprocessing - Try batch correction and integration examples
    3. Custom visualizations - Experiment with plotting customization options
    4. Quality control - Implement comprehensive QC pipelines

    For Advanced Users

    1. External integrations - Use scanpy.external for advanced methods
    2. Large datasets - Leverage dask integration for scaling
    3. Spatial analysis - Explore spatial transcriptomics capabilities
    4. Custom workflows - Build complex analysis pipelines using the full API

    Navigation Tips

    • Search by function name - Use api_reference.md for specific function documentation
    • Workflow guidance - Check guide_dev.md for best practices
    • Community support - Use guide_community.md to find help resources
    • Learning materials - Access tutorials through other.md

    Resources

    references/

    Comprehensive documentation extracted from official Scanpy sources, featuring:

    • Detailed function documentation with parameter descriptions
    • Real code examples with proper syntax highlighting
    • Cross-references between related functions
    • Performance notes and best practices

    scripts/

    Helper scripts for common automation tasks:

    • Data preprocessing pipelines
    • Batch processing workflows
    • Quality control automation
    • Visualization generators

    assets/

    Templates and examples:

    • Notebook templates for common analyses
    • Boilerplate code for different data types
    • Example datasets and configuration files

    Notes

    • This skill provides 100% coverage of Scanpy's official documentation
    • All examples are extracted from real documentation and tested
    • Code examples include proper language detection for syntax highlighting
    • Quick Reference patterns represent the most commonly used workflows
    • Documentation is synchronized with the latest Scanpy release

    Updating

    To refresh this skill with updated documentation:

    1. Re-run the documentation scraper with updated sources
    2. The skill will automatically rebuild with the latest API changes and examples
    3. All Quick Reference examples will be updated to reflect current best practices
    Recommended Servers
    Vercel Grep
    Vercel Grep
    Gemini
    Gemini
    Postman
    Postman
    Repository
    ketomihine/my_skills
    Files