Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    ArtificialAnalysis

    data-analysis

    ArtificialAnalysis/data-analysis
    Data & Analytics
    289
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    High-performance data analysis using Polars - load, transform, aggregate, visualize and export tabular data...

    SKILL.md

    Data Analysis Skill

    Comprehensive data analysis toolkit using Polars - a blazingly fast DataFrame library. This skill provides instructions, reference documentation, and ready-to-use scripts for common data analysis tasks.

    Iteration Checkpoints

    Step What to Present User Input Type
    Data Loading Shape, columns, sample rows "Is this the right data?"
    Data Exploration Summary stats, data quality issues "Any columns to focus on?"
    Transformation Before/after comparison "Does this transformation look correct?"
    Analysis Key findings, charts "Should I dig deeper into anything?"
    Export Output preview "Ready to save, or any changes?"

    Quick Start

    import polars as pl
    from polars import col
    
    # Load data
    df = pl.read_csv("data.csv")
    
    # Explore
    print(df.shape, df.schema)
    df.describe()
    
    # Transform and analyze
    result = (
        df.filter(col("value") > 0)
        .group_by("category")
        .agg(col("value").sum().alias("total"))
        .sort("total", descending=True)
    )
    
    # Export
    result.write_csv("output.csv")
    

    When to Use This Skill

    • Loading datasets (CSV, JSON, Parquet, Excel, databases)
    • Data cleaning, filtering, and transformation
    • Aggregations, grouping, and pivot tables
    • Statistical analysis and summary statistics
    • Time series analysis and resampling
    • Joining and merging multiple datasets
    • Creating visualizations and charts
    • Exporting results to various formats

    Skill Contents

    Reference Documentation

    Detailed API reference and patterns for specific operations:

    • reference/loading.md - Loading data from all supported formats
    • reference/transformations.md - Column operations, filtering, sorting, type casting
    • reference/aggregations.md - Group by, window functions, running totals
    • reference/time_series.md - Date parsing, resampling, lag features
    • reference/statistics.md - Correlations, distributions, hypothesis testing setup
    • reference/visualization.md - Creating charts with matplotlib/plotly

    Ready-to-Use Scripts

    Executable Python scripts for common tasks:

    • scripts/explore_data.py - Quick dataset exploration and profiling
    • scripts/summary_stats.py - Generate comprehensive statistics report

    Core Patterns

    Loading Data

    # CSV (most common)
    df = pl.read_csv("data.csv")
    
    # Lazy loading for large files
    df = pl.scan_csv("large.csv").filter(col("x") > 0).collect()
    
    # Parquet (recommended for large datasets)
    df = pl.read_parquet("data.parquet")
    
    # JSON
    df = pl.read_json("data.json")
    df = pl.read_ndjson("data.ndjson")  # Newline-delimited
    

    Filtering and Selection

    # Select columns
    df.select("col1", "col2")
    df.select(col("name"), col("value") * 2)
    
    # Filter rows
    df.filter(col("age") > 25)
    df.filter((col("status") == "active") & (col("value") > 100))
    df.filter(col("name").str.contains("Smith"))
    

    Transformations

    # Add/modify columns
    df = df.with_columns(
        (col("price") * col("qty")).alias("total"),
        col("date_str").str.to_date("%Y-%m-%d").alias("date"),
    )
    
    # Conditional values
    df = df.with_columns(
        pl.when(col("score") >= 90).then(pl.lit("A"))
        .when(col("score") >= 80).then(pl.lit("B"))
        .otherwise(pl.lit("C"))
        .alias("grade")
    )
    

    Aggregations

    # Group by
    df.group_by("category").agg(
        col("value").sum().alias("total"),
        col("value").mean().alias("avg"),
        pl.len().alias("count"),
    )
    
    # Window functions
    df.with_columns(
        col("value").sum().over("group").alias("group_total"),
        col("value").rank().over("group").alias("rank_in_group"),
    )
    

    Exporting

    df.write_csv("output.csv")
    df.write_parquet("output.parquet")
    df.write_json("output.json", row_oriented=True)
    

    Best Practices

    1. Use lazy evaluation for large datasets: pl.scan_csv() + .collect()
    2. Filter early to reduce data volume before expensive operations
    3. Select only needed columns to minimize memory usage
    4. Prefer Parquet for storage - faster I/O, better compression
    5. Use .explain() to understand and optimize query plans
    Recommended Servers
    InfraNodus Knowledge Graphs & Text Analysis
    InfraNodus Knowledge Graphs & Text Analysis
    Tinybird
    Tinybird
    Google BigQuery
    Google BigQuery
    Repository
    artificialanalysis/stirrup
    Files