Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    Starlitnightly

    data-stats-analysis

    Starlitnightly/data-stats-analysis
    Data & Analytics
    834
    2 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Perform statistical tests, hypothesis testing, correlation analysis, and multiple testing corrections using scipy and statsmodels. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

    SKILL.md

    Statistical Analysis (Universal)

    Overview

    This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (scipy, statsmodels, numpy) and executes locally in your environment, making it compatible with ALL LLM providers including GPT, Gemini, Claude, DeepSeek, and Qwen.

    When to Use This Skill

    • Compare means between groups (t-tests, ANOVA)
    • Test for correlations between variables
    • Perform hypothesis testing with p-value calculation
    • Apply multiple testing corrections (FDR, Bonferroni)
    • Calculate statistical summaries and confidence intervals
    • Test for normality and distribution fitting
    • Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)

    How to Use

    Step 1: Import Required Libraries

    import numpy as np
    import pandas as pd
    from scipy import stats
    from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr
    from scipy.stats import f_oneway, kruskal, chi2_contingency
    from statsmodels.stats.multitest import multipletests
    from statsmodels.stats.proportion import proportions_ztest
    import warnings
    warnings.filterwarnings('ignore')
    

    Step 2: Two-Sample t-Test

    # Compare means between two groups
    # group1, group2: arrays of numeric values
    
    # Perform independent t-test
    t_statistic, p_value = ttest_ind(group1, group2)
    
    print(f"t-statistic: {t_statistic:.4f}")
    print(f"p-value: {p_value:.4e}")
    
    if p_value < 0.05:
        print("✅ Significant difference between groups (p < 0.05)")
    else:
        print("❌ No significant difference (p >= 0.05)")
    
    # With equal variance assumption check
    # Levene's test for equal variances
    _, levene_p = stats.levene(group1, group2)
    if levene_p < 0.05:
        # Use Welch's t-test (unequal variances)
        t_stat, p_val = ttest_ind(group1, group2, equal_var=False)
        print(f"Welch's t-test p-value: {p_val:.4e}")
    else:
        print("Equal variances assumed")
    

    Step 3: One-Way ANOVA

    # Compare means across multiple groups
    # groups: list of arrays, e.g., [group1, group2, group3]
    
    # Perform one-way ANOVA
    f_statistic, p_value = f_oneway(*groups)
    
    print(f"F-statistic: {f_statistic:.4f}")
    print(f"p-value: {p_value:.4e}")
    
    if p_value < 0.05:
        print("✅ Significant difference between groups (p < 0.05)")
        print("Note: Use post-hoc tests to identify which groups differ")
    else:
        print("❌ No significant difference between groups")
    
    # Post-hoc pairwise t-tests with Bonferroni correction
    from itertools import combinations
    
    group_names = ['Group A', 'Group B', 'Group C']
    pairwise_results = []
    
    for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2):
        _, p = ttest_ind(data1, data2)
        pairwise_results.append({
            'comparison': f'{name1} vs {name2}',
            'p_value': p
        })
    
    # Apply Bonferroni correction
    pairwise_df = pd.DataFrame(pairwise_results)
    n_tests = len(pairwise_df)
    pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests
    pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)
    
    print("\nPairwise Comparisons (Bonferroni-corrected):")
    print(pairwise_df)
    

    Step 4: Correlation Analysis

    # Pearson correlation (linear relationships)
    r_pearson, p_pearson = pearsonr(variable1, variable2)
    
    print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")
    
    # Spearman correlation (monotonic relationships, robust to outliers)
    r_spearman, p_spearman = spearmanr(variable1, variable2)
    
    print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")
    
    # Interpretation
    if abs(r_pearson) < 0.3:
        strength = "weak"
    elif abs(r_pearson) < 0.7:
        strength = "moderate"
    else:
        strength = "strong"
    
    direction = "positive" if r_pearson > 0 else "negative"
    print(f"Interpretation: {strength} {direction} correlation")
    
    if p_pearson < 0.05:
        print("✅ Statistically significant (p < 0.05)")
    else:
        print("❌ Not statistically significant")
    

    Step 5: Multiple Testing Correction

    # Scenario: Testing 1000 genes for differential expression
    # p_values: array of p-values from individual tests
    
    # Method 1: Benjamini-Hochberg FDR correction (recommended)
    reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')
    
    # Method 2: Bonferroni correction (more conservative)
    reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')
    
    # Create results DataFrame
    results_df = pd.DataFrame({
        'gene': gene_names,
        'p_value': p_values,
        'q_value_fdr': p_adjusted_fdr,
        'p_adjusted_bonferroni': p_adjusted_bonf,
        'significant_fdr': reject_fdr,
        'significant_bonf': reject_bonf
    })
    
    # Summary
    print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}")
    print(f"Significant after FDR correction: {reject_fdr.sum()}")
    print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")
    
    # Save results
    results_df.to_csv('statistical_results.csv', index=False)
    print("✅ Results saved to: statistical_results.csv")
    

    Step 6: Non-Parametric Tests

    # Use when data is not normally distributed
    
    # Mann-Whitney U test (alternative to t-test)
    u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')
    
    print(f"Mann-Whitney U test:")
    print(f"U-statistic: {u_statistic:.4f}")
    print(f"p-value: {p_value_mw:.4e}")
    
    # Kruskal-Wallis H test (alternative to ANOVA)
    h_statistic, p_value_kw = kruskal(*groups)
    
    print(f"\nKruskal-Wallis H test:")
    print(f"H-statistic: {h_statistic:.4f}")
    print(f"p-value: {p_value_kw:.4e}")
    

    Advanced Features

    Normality Testing

    from scipy.stats import shapiro, normaltest, kstest
    
    # Test if data follows normal distribution
    
    # Shapiro-Wilk test (best for n < 5000)
    stat_sw, p_sw = shapiro(data)
    print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")
    
    # D'Agostino-Pearson test
    stat_dp, p_dp = normaltest(data)
    print(f"D'Agostino-Pearson test: stat={stat_dp:.4f}, p={p_dp:.4e}")
    
    # Interpretation
    if p_sw < 0.05:
        print("❌ Data does NOT follow normal distribution (p < 0.05)")
        print("→ Recommendation: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis)")
    else:
        print("✅ Data appears normally distributed (p >= 0.05)")
        print("→ OK to use parametric tests (t-test, ANOVA)")
    

    Chi-Square Test for Contingency Tables

    # Test independence between categorical variables
    # contingency_table: 2D array (rows=categories1, columns=categories2)
    
    # Example: Cell type distribution across conditions
    contingency_table = np.array([
        [50, 30, 20],  # Condition A: T cells, B cells, NK cells
        [40, 45, 15],  # Condition B
        [35, 25, 40]   # Condition C
    ])
    
    chi2, p_value, dof, expected = chi2_contingency(contingency_table)
    
    print(f"Chi-square statistic: {chi2:.4f}")
    print(f"p-value: {p_value:.4e}")
    print(f"Degrees of freedom: {dof}")
    print(f"\nExpected frequencies:\n{expected}")
    
    if p_value < 0.05:
        print("✅ Significant association between variables (p < 0.05)")
    else:
        print("❌ No significant association")
    

    Confidence Intervals

    from scipy.stats import t as t_dist
    
    def calculate_confidence_interval(data, confidence=0.95):
        """Calculate confidence interval for mean"""
        n = len(data)
        mean = np.mean(data)
        std_err = stats.sem(data)  # Standard error of mean
    
        # t-distribution critical value
        t_crit = t_dist.ppf((1 + confidence) / 2, df=n-1)
    
        margin_error = t_crit * std_err
        ci_lower = mean - margin_error
        ci_upper = mean + margin_error
    
        return mean, ci_lower, ci_upper
    
    # Usage
    mean, ci_low, ci_high = calculate_confidence_interval(data, confidence=0.95)
    
    print(f"Mean: {mean:.4f}")
    print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
    

    Effect Size Calculation

    def cohens_d(group1, group2):
        """Calculate Cohen's d effect size"""
        n1, n2 = len(group1), len(group2)
        var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    
        # Pooled standard deviation
        pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    
        # Cohen's d
        d = (np.mean(group1) - np.mean(group2)) / pooled_std
    
        return d
    
    # Usage
    effect_size = cohens_d(group1, group2)
    print(f"Cohen's d: {effect_size:.4f}")
    
    # Interpretation
    if abs(effect_size) < 0.2:
        print("Effect size: negligible")
    elif abs(effect_size) < 0.5:
        print("Effect size: small")
    elif abs(effect_size) < 0.8:
        print("Effect size: medium")
    else:
        print("Effect size: large")
    

    Common Use Cases

    Differential Gene Expression Statistical Testing

    # Compare gene expression between two conditions
    # gene_expression_df: rows=genes, columns=samples
    # condition_labels: array indicating which condition each sample belongs to
    
    results = []
    
    for gene in gene_expression_df.index:
        # Get expression values for each condition
        cond1_expr = gene_expression_df.loc[gene, condition_labels == 'Condition1']
        cond2_expr = gene_expression_df.loc[gene, condition_labels == 'Condition2']
    
        # t-test
        t_stat, p_val = ttest_ind(cond1_expr, cond2_expr)
    
        # Log2 fold change
        log2fc = np.log2(cond2_expr.mean() / cond1_expr.mean())
    
        results.append({
            'gene': gene,
            'log2FC': log2fc,
            'p_value': p_val,
            'mean_cond1': cond1_expr.mean(),
            'mean_cond2': cond2_expr.mean()
        })
    
    deg_results = pd.DataFrame(results)
    
    # Apply FDR correction
    _, deg_results['q_value'], _, _ = multipletests(
        deg_results['p_value'],
        alpha=0.05,
        method='fdr_bh'
    )
    
    # Filter significant genes
    significant_genes = deg_results[
        (deg_results['q_value'] < 0.05) &
        (abs(deg_results['log2FC']) > 1)
    ]
    
    print(f"✅ Identified {len(significant_genes)} differentially expressed genes")
    print(f"   - Upregulated: {(significant_genes['log2FC'] > 1).sum()}")
    print(f"   - Downregulated: {(significant_genes['log2FC'] < -1).sum()}")
    
    # Save
    significant_genes.to_csv('deg_results.csv', index=False)
    

    Cluster Enrichment Analysis

    # Test if a cell type is enriched in a specific cluster
    # total_cells: total number of cells
    # cluster_cells: number of cells in cluster
    # celltype_total: total cells of this type
    # celltype_in_cluster: cells of this type in cluster
    
    from scipy.stats import fisher_exact
    
    # Create contingency table
    contingency = [
        [celltype_in_cluster, cluster_cells - celltype_in_cluster],  # In cluster
        [celltype_total - celltype_in_cluster, total_cells - cluster_cells - (celltype_total - celltype_in_cluster)]  # Not in cluster
    ]
    
    odds_ratio, p_value = fisher_exact(contingency, alternative='greater')
    
    print(f"Odds ratio: {odds_ratio:.4f}")
    print(f"p-value: {p_value:.4e}")
    
    if p_value < 0.05 and odds_ratio > 1:
        print(f"✅ Cell type is significantly ENRICHED in cluster (p < 0.05)")
    elif p_value < 0.05 and odds_ratio < 1:
        print(f"⚠️ Cell type is significantly DEPLETED in cluster (p < 0.05)")
    else:
        print("❌ No significant enrichment/depletion")
    

    Batch Effect Detection

    # Test if there's a batch effect using ANOVA
    # gene_expression: DataFrame with genes as rows, samples as columns
    # batch_labels: array indicating batch for each sample
    
    batch_effect_results = []
    
    for gene in gene_expression.index:
        # Get expression values for each batch
        batches = [
            gene_expression.loc[gene, batch_labels == batch]
            for batch in np.unique(batch_labels)
        ]
    
        # ANOVA test
        f_stat, p_val = f_oneway(*batches)
    
        batch_effect_results.append({
            'gene': gene,
            'f_statistic': f_stat,
            'p_value': p_val
        })
    
    batch_df = pd.DataFrame(batch_effect_results)
    
    # Apply FDR correction
    _, batch_df['q_value'], _, _ = multipletests(batch_df['p_value'], alpha=0.05, method='fdr_bh')
    
    # Count genes with batch effects
    genes_with_batch_effect = (batch_df['q_value'] < 0.05).sum()
    
    print(f"Genes with significant batch effect: {genes_with_batch_effect} ({genes_with_batch_effect/len(batch_df)*100:.1f}%)")
    
    if genes_with_batch_effect > len(batch_df) * 0.1:
        print("⚠️ WARNING: Strong batch effect detected (>10% genes affected)")
        print("→ Recommendation: Apply batch correction (ComBat, Harmony, etc.)")
    else:
        print("✅ Minimal batch effect")
    

    Best Practices

    1. Check Assumptions: Always test normality before using parametric tests (t-test, ANOVA)
    2. Multiple Testing: Apply FDR or Bonferroni correction when testing many hypotheses
    3. Effect Size: Report effect sizes (Cohen's d) alongside p-values
    4. Sample Size: Ensure adequate sample size for statistical power
    5. Outliers: Check for and handle outliers appropriately
    6. Non-Parametric Alternatives: Use when assumptions are violated (Mann-Whitney instead of t-test)
    7. Report Details: Always report test used, test statistic, p-value, and correction method
    8. Visualization: Combine statistical tests with visualizations (box plots, violin plots)

    Troubleshooting

    Issue: "Warning: p-value is very small"

    Solution: This is normal for highly significant results. Report as p < 0.001 or use scientific notation

    if p_value < 0.001:
        print(f"p < 0.001")
    else:
        print(f"p = {p_value:.4f}")
    

    Issue: "Division by zero in effect size calculation"

    Solution: Check for zero variance (all values identical)

    if np.std(group1) == 0 or np.std(group2) == 0:
        print("Cannot calculate effect size: zero variance in one or both groups")
    else:
        d = cohens_d(group1, group2)
    

    Issue: "Test fails with NaN values"

    Solution: Remove or impute NaN values before testing

    # Remove NaN
    group1_clean = group1[~np.isnan(group1)]
    group2_clean = group2[~np.isnan(group2)]
    
    # Or filter in DataFrame
    df_clean = df.dropna(subset=['column_name'])
    

    Issue: "Insufficient sample size warning"

    Solution: Minimum sample sizes for reliable tests:

    • t-test: n ≥ 30 per group (or ≥ 5 if normally distributed)
    • ANOVA: n ≥ 20 per group
    • Correlation: n ≥ 30 total
    if len(group1) < 30 or len(group2) < 30:
        print("⚠️ Warning: Small sample size. Results may not be reliable.")
        print("Consider using non-parametric tests or collecting more data.")
    

    Technical Notes

    • Libraries: Uses scipy.stats and statsmodels (widely supported, stable)
    • Execution: Runs locally in the agent's sandbox
    • Compatibility: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
    • Performance: Most tests complete in milliseconds; large-scale testing (>10K genes) takes 1-5 seconds
    • Precision: Uses double-precision floating point (numpy default)
    • Corrections: FDR (Benjamini-Hochberg) recommended for genomics; Bonferroni for small numbers of tests

    References

    • scipy.stats documentation: https://docs.scipy.org/doc/scipy/reference/stats.html
    • statsmodels documentation: https://www.statsmodels.org/stable/index.html
    • Multiple testing: https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
    • Statistical testing guide: https://docs.scipy.org/doc/scipy/tutorial/stats.html
    Recommended Servers
    Jina AI
    Jina AI
    ThinAir Data
    ThinAir Data
    InfraNodus Knowledge Graphs & Text Analysis
    InfraNodus Knowledge Graphs & Text Analysis
    Repository
    starlitnightly/omicverse
    Files