Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    gget

    davila7/gget
    Data & Analytics
    19,892
    7 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    CLI/Python toolkit for rapid bioinformatics queries. Preferred for quick BLAST searches.

    SKILL.md

    gget

    Overview

    gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions.

    Important: The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary.

    Installation

    Install gget in a clean virtual environment to avoid conflicts:

    # Using uv (recommended)
    uv uv pip install gget
    
    # Or using pip
    uv pip install --upgrade gget
    
    # In Python/Jupyter
    import gget
    

    Quick Start

    Basic usage pattern for all modules:

    # Command-line
    gget <module> [arguments] [options]
    
    # Python
    gget.module(arguments, options)
    

    Most modules return:

    • Command-line: JSON (default) or CSV with -csv flag
    • Python: DataFrame or dictionary

    Common flags across modules:

    • -o/--out: Save results to file
    • -q/--quiet: Suppress progress information
    • -csv: Return CSV format (command-line only)

    Module Categories

    1. Reference & Gene Information

    gget ref - Reference Genome Downloads

    Retrieve download links and metadata for Ensembl reference genomes.

    Parameters:

    • species: Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse'
    • -w/--which: Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all
    • -r/--release: Ensembl release number (default: latest)
    • -l/--list_species: List available vertebrate species
    • -liv/--list_iv_species: List available invertebrate species
    • -ftp: Return only FTP links
    • -d/--download: Download files (requires curl)

    Examples:

    # List available species
    gget ref --list_species
    
    # Get all reference files for human
    gget ref homo_sapiens
    
    # Download only GTF annotation for mouse
    gget ref -w gtf -d mouse
    
    # Python
    gget.ref("homo_sapiens")
    gget.ref("mus_musculus", which="gtf", download=True)
    

    gget search - Gene Search

    Locate genes by name or description across species.

    Parameters:

    • searchwords: One or more search terms (case-insensitive)
    • -s/--species: Target species (e.g., 'homo_sapiens', 'mouse')
    • -r/--release: Ensembl release number
    • -t/--id_type: Return 'gene' (default) or 'transcript'
    • -ao/--andor: 'or' (default) finds ANY searchword; 'and' requires ALL
    • -l/--limit: Maximum results to return

    Returns: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL

    Examples:

    # Search for GABA-related genes in human
    gget search -s human gaba gamma-aminobutyric
    
    # Find specific gene, require all terms
    gget search -s mouse -ao and pax7 transcription
    
    # Python
    gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")
    

    gget info - Gene/Transcript Information

    Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI.

    Parameters:

    • ens_ids: One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs
    • -n/--ncbi: Disable NCBI data retrieval
    • -u/--uniprot: Disable UniProt data retrieval
    • -pdb: Include PDB identifiers (increases runtime)

    Returns: UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript

    Examples:

    # Get info for multiple genes
    gget info ENSG00000034713 ENSG00000104853 ENSG00000170296
    
    # Include PDB IDs
    gget info ENSG00000034713 -pdb
    
    # Python
    gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)
    

    gget seq - Sequence Retrieval

    Fetch nucleotide or amino acid sequences for genes and transcripts.

    Parameters:

    • ens_ids: One or more Ensembl identifiers
    • -t/--translate: Fetch amino acid sequences instead of nucleotide
    • -iso/--isoforms: Return all transcript variants (gene IDs only)

    Returns: FASTA format sequences

    Examples:

    # Get nucleotide sequences
    gget seq ENSG00000034713 ENSG00000104853
    
    # Get all protein isoforms
    gget seq -t -iso ENSG00000034713
    
    # Python
    gget.seq(["ENSG00000034713"], translate=True, isoforms=True)
    

    2. Sequence Analysis & Alignment

    gget blast - BLAST Searches

    BLAST nucleotide or amino acid sequences against standard databases.

    Parameters:

    • sequence: Sequence string or path to FASTA/.txt file
    • -p/--program: blastn, blastp, blastx, tblastn, tblastx (auto-detected)
    • -db/--database:
      • Nucleotide: nt, refseq_rna, pdbnt
      • Protein: nr, swissprot, pdbaa, refseq_protein
    • -l/--limit: Max hits (default: 50)
    • -e/--expect: E-value cutoff (default: 10.0)
    • -lcf/--low_comp_filt: Enable low complexity filtering
    • -mbo/--megablast_off: Disable MegaBLAST (blastn only)

    Examples:

    # BLAST protein sequence
    gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
    
    # BLAST from file with specific database
    gget blast sequence.fasta -db swissprot -l 10
    
    # Python
    gget.blast("MKWMFK...", database="swissprot", limit=10)
    

    gget blat - BLAT Searches

    Locate genomic positions of sequences using UCSC BLAT.

    Parameters:

    • sequence: Sequence string or path to FASTA/.txt file
    • -st/--seqtype: 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected)
    • -a/--assembly: Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.)

    Returns: genome, query size, alignment positions, matches, mismatches, alignment percentage

    Examples:

    # Find genomic location in human
    gget blat ATCGATCGATCGATCG
    
    # Search in different assembly
    gget blat -a mm39 ATCGATCGATCGATCG
    
    # Python
    gget.blat("ATCGATCGATCGATCG", assembly="mouse")
    

    gget muscle - Multiple Sequence Alignment

    Align multiple nucleotide or amino acid sequences using Muscle5.

    Parameters:

    • fasta: Sequences or path to FASTA/.txt file
    • -s5/--super5: Use Super5 algorithm for faster processing (large datasets)

    Returns: Aligned sequences in ClustalW format or aligned FASTA (.afa)

    Examples:

    # Align sequences from file
    gget muscle sequences.fasta -o aligned.afa
    
    # Use Super5 for large dataset
    gget muscle large_dataset.fasta -s5
    
    # Python
    gget.muscle("sequences.fasta", save=True)
    

    gget diamond - Local Sequence Alignment

    Perform fast local protein or translated DNA alignment using DIAMOND.

    Parameters:

    • Query: Sequences (string/list) or FASTA file path
    • --reference: Reference sequences (string/list) or FASTA file path (required)
    • --sensitivity: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive
    • --threads: CPU threads (default: 1)
    • --diamond_db: Save database for reuse
    • --translated: Enable nucleotide-to-amino acid alignment

    Returns: Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores

    Examples:

    # Align against reference
    gget diamond GGETISAWESQME -ref reference.fasta --threads 4
    
    # Save database for reuse
    gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd
    
    # Python
    gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)
    

    3. Structural & Protein Analysis

    gget pdb - Protein Structures

    Query RCSB Protein Data Bank for structure and metadata.

    Parameters:

    • pdb_id: PDB identifier (e.g., '7S7U')
    • -r/--resource: Data type (pdb, entry, pubmed, assembly, entity types)
    • -i/--identifier: Assembly, entity, or chain ID

    Returns: PDB format (structures) or JSON (metadata)

    Examples:

    # Download PDB structure
    gget pdb 7S7U -o 7S7U.pdb
    
    # Get metadata
    gget pdb 7S7U -r entry
    
    # Python
    gget.pdb("7S7U", save=True)
    

    gget alphafold - Protein Structure Prediction

    Predict 3D protein structures using simplified AlphaFold2.

    Setup Required:

    # Install OpenMM first
    uv pip install openmm
    
    # Then setup AlphaFold
    gget setup alphafold
    

    Parameters:

    • sequence: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling
    • -mr/--multimer_recycles: Recycling iterations (default: 3; recommend 20 for accuracy)
    • -mfm/--multimer_for_monomer: Apply multimer model to single proteins
    • -r/--relax: AMBER relaxation for top-ranked model
    • plot: Python-only; generate interactive 3D visualization (default: True)
    • show_sidechains: Python-only; include side chains (default: True)

    Returns: PDB structure file, JSON alignment error data, optional 3D visualization

    Examples:

    # Predict single protein structure
    gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
    
    # Predict multimer with higher accuracy
    gget alphafold sequence1.fasta -mr 20 -r
    
    # Python with visualization
    gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)
    
    # Multimer prediction
    gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)
    

    gget elm - Eukaryotic Linear Motifs

    Predict Eukaryotic Linear Motifs in protein sequences.

    Setup Required:

    gget setup elm
    

    Parameters:

    • sequence: Amino acid sequence or UniProt Acc
    • -u/--uniprot: Indicates sequence is UniProt Acc
    • -e/--expand: Include protein names, organisms, references
    • -s/--sensitivity: DIAMOND alignment sensitivity (default: "very-sensitive")
    • -t/--threads: Number of threads (default: 1)

    Returns: Two outputs:

    1. ortholog_df: Linear motifs from orthologous proteins
    2. regex_df: Motifs directly matched in input sequence

    Examples:

    # Predict motifs from sequence
    gget elm LIAQSIGQASFV -o results
    
    # Use UniProt accession with expanded info
    gget elm --uniprot Q02410 -e
    
    # Python
    ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
    

    4. Expression & Disease Data

    gget archs4 - Gene Correlation & Tissue Expression

    Query ARCHS4 database for correlated genes or tissue expression data.

    Parameters:

    • gene: Gene symbol or Ensembl ID (with --ensembl flag)
    • -w/--which: 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas)
    • -s/--species: 'human' (default) or 'mouse' (tissue data only)
    • -e/--ensembl: Input is Ensembl ID

    Returns:

    • Correlation mode: Gene symbols, Pearson correlation coefficients
    • Tissue mode: Tissue identifiers, min/Q1/median/Q3/max expression values

    Examples:

    # Get correlated genes
    gget archs4 ACE2
    
    # Get tissue expression
    gget archs4 -w tissue ACE2
    
    # Python
    gget.archs4("ACE2", which="tissue")
    

    gget cellxgene - Single-Cell RNA-seq Data

    Query CZ CELLxGENE Discover Census for single-cell data.

    Setup Required:

    gget setup cellxgene
    

    Parameters:

    • --gene (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse)
    • --tissue: Tissue type(s)
    • --cell_type: Specific cell type(s)
    • --species (-s): 'homo_sapiens' (default) or 'mus_musculus'
    • --census_version (-cv): Version ("stable", "latest", or dated)
    • --ensembl (-e): Use Ensembl IDs
    • --meta_only (-mo): Return metadata only
    • Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type

    Returns: AnnData object with count matrices and metadata (or metadata-only dataframes)

    Examples:

    # Get single-cell data for specific genes and cell types
    gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad
    
    # Metadata only
    gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv
    
    # Python
    adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")
    

    gget enrichr - Enrichment Analysis

    Perform ontology enrichment analysis on gene lists using Enrichr.

    Parameters:

    • genes: Gene symbols or Ensembl IDs
    • -db/--database: Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes')
    • -s/--species: human (default), mouse, fly, yeast, worm, fish
    • -bkg_l/--background_list: Background genes for comparison
    • -ko/--kegg_out: Save KEGG pathway images with highlighted genes
    • plot: Python-only; generate graphical results

    Database Shortcuts:

    • 'pathway' → KEGG_2021_Human
    • 'transcription' → ChEA_2016
    • 'ontology' → GO_Biological_Process_2021
    • 'diseases_drugs' → GWAS_Catalog_2019
    • 'celltypes' → PanglaoDB_Augmented_2021

    Examples:

    # Enrichment analysis for ontology
    gget enrichr -db ontology ACE2 AGT AGTR1
    
    # Save KEGG pathways
    gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/
    
    # Python with plot
    gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)
    

    gget bgee - Orthology & Expression

    Retrieve orthology and gene expression data from Bgee database.

    Parameters:

    • ens_id: Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when type=expression
    • -t/--type: 'orthologs' (default) or 'expression'

    Returns:

    • Orthologs mode: Matching genes across species with IDs, names, taxonomic info
    • Expression mode: Anatomical entities, confidence scores, expression status

    Examples:

    # Get orthologs
    gget bgee ENSG00000169194
    
    # Get expression data
    gget bgee ENSG00000169194 -t expression
    
    # Multiple genes
    gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression
    
    # Python
    gget.bgee("ENSG00000169194", type="orthologs")
    

    gget opentargets - Disease & Drug Associations

    Retrieve disease and drug associations from OpenTargets.

    Parameters:

    • Ensembl gene ID (required)
    • -r/--resource: diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions
    • -l/--limit: Cap results count
    • Filter arguments (vary by resource):
      • drugs: --filter_disease
      • pharmacogenetics: --filter_drug
      • expression/depmap: --filter_tissue, --filter_anat_sys, --filter_organ
      • interactions: --filter_protein_a, --filter_protein_b, --filter_gene_b

    Examples:

    # Get associated diseases
    gget opentargets ENSG00000169194 -r diseases -l 5
    
    # Get associated drugs
    gget opentargets ENSG00000169194 -r drugs -l 10
    
    # Get tissue expression
    gget opentargets ENSG00000169194 -r expression --filter_tissue brain
    
    # Python
    gget.opentargets("ENSG00000169194", resource="diseases", limit=5)
    

    gget cbio - cBioPortal Cancer Genomics

    Plot cancer genomics heatmaps using cBioPortal data.

    Two subcommands:

    search - Find study IDs:

    gget cbio search breast lung
    

    plot - Generate heatmaps:

    Parameters:

    • -s/--study_ids: Space-separated cBioPortal study IDs (required)
    • -g/--genes: Space-separated gene names or Ensembl IDs (required)
    • -st/--stratification: Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample)
    • -vt/--variation_type: Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence)
    • -f/--filter: Filter by column value (e.g., 'study_id:msk_impact_2017')
    • -dd/--data_dir: Cache directory (default: ./gget_cbio_cache)
    • -fd/--figure_dir: Output directory (default: ./gget_cbio_figures)
    • -dpi: Resolution (default: 100)
    • -sh/--show: Display plot in window
    • -nc/--no_confirm: Skip download confirmations

    Examples:

    # Search for studies
    gget cbio search esophag ovary
    
    # Create heatmap
    gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences
    
    # Python
    gget.cbio_search(["esophag", "ovary"])
    gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")
    

    gget cosmic - COSMIC Database

    Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database.

    Important: License fees apply for commercial use. Requires COSMIC account credentials.

    Parameters:

    • searchterm: Gene name, Ensembl ID, mutation notation, or sample ID
    • -ctp/--cosmic_tsv_path: Path to downloaded COSMIC TSV file (required for querying)
    • -l/--limit: Maximum results (default: 100)

    Database download flags:

    • -d/--download_cosmic: Activate download mode
    • -gm/--gget_mutate: Create version for gget mutate
    • -cp/--cosmic_project: Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen)
    • -cv/--cosmic_version: COSMIC version
    • -gv/--grch_version: Human reference genome (37 or 38)
    • --email, --password: COSMIC credentials

    Examples:

    # First download database
    gget cosmic -d --email user@example.com --password xxx -cp cancer
    
    # Then query
    gget cosmic EGFR -ctp cosmic_data.tsv -l 10
    
    # Python
    gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)
    

    5. Additional Tools

    gget mutate - Generate Mutated Sequences

    Generate mutated nucleotide sequences from mutation annotations.

    Parameters:

    • sequences: FASTA file path or direct sequence input (string/list)
    • -m/--mutations: CSV/TSV file or DataFrame with mutation data (required)
    • -mc/--mut_column: Mutation column name (default: 'mutation')
    • -sic/--seq_id_column: Sequence ID column (default: 'seq_ID')
    • -mic/--mut_id_column: Mutation ID column
    • -k/--k: Length of flanking sequences (default: 30 nucleotides)

    Returns: Mutated sequences in FASTA format

    Examples:

    # Single mutation
    gget mutate ATCGCTAAGCT -m "c.4G>T"
    
    # Multiple sequences with mutations from file
    gget mutate sequences.fasta -m mutations.csv -o mutated.fasta
    
    # Python
    import pandas as pd
    mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]})
    gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)
    

    gget gpt - OpenAI Text Generation

    Generate natural language text using OpenAI's API.

    Setup Required:

    gget setup gpt
    

    Important: Free tier limited to 3 months after account creation. Set monthly billing limits.

    Parameters:

    • prompt: Text input for generation (required)
    • api_key: OpenAI authentication (required)
    • Model configuration: temperature, top_p, max_tokens, frequency_penalty, presence_penalty
    • Default model: gpt-3.5-turbo (configurable)

    Examples:

    gget gpt "Explain CRISPR" --api_key your_key_here
    
    # Python
    gget.gpt("Explain CRISPR", api_key="your_key_here")
    

    gget setup - Install Dependencies

    Install/download third-party dependencies for specific modules.

    Parameters:

    • module: Module name requiring dependency installation
    • -o/--out: Output folder path (elm module only)

    Modules requiring setup:

    • alphafold - Downloads ~4GB of model parameters
    • cellxgene - Installs cellxgene-census (may not support latest Python)
    • elm - Downloads local ELM database
    • gpt - Configures OpenAI integration

    Examples:

    # Setup AlphaFold
    gget setup alphafold
    
    # Setup ELM with custom directory
    gget setup elm -o /path/to/elm_data
    
    # Python
    gget.setup("alphafold")
    

    Common Workflows

    Workflow 1: Gene Discovery to Sequence Analysis

    Find and analyze genes of interest:

    # 1. Search for genes
    results = gget.search(["GABA", "receptor"], species="homo_sapiens")
    
    # 2. Get detailed information
    gene_ids = results["ensembl_id"].tolist()
    info = gget.info(gene_ids[:5])
    
    # 3. Retrieve sequences
    sequences = gget.seq(gene_ids[:5], translate=True)
    

    Workflow 2: Sequence Alignment and Structure

    Align sequences and predict structures:

    # 1. Align multiple sequences
    alignment = gget.muscle("sequences.fasta")
    
    # 2. Find similar sequences
    blast_results = gget.blast(my_sequence, database="swissprot", limit=10)
    
    # 3. Predict structure
    structure = gget.alphafold(my_sequence, plot=True)
    
    # 4. Find linear motifs
    ortholog_df, regex_df = gget.elm(my_sequence)
    

    Workflow 3: Gene Expression and Enrichment

    Analyze expression patterns and functional enrichment:

    # 1. Get tissue expression
    tissue_expr = gget.archs4("ACE2", which="tissue")
    
    # 2. Find correlated genes
    correlated = gget.archs4("ACE2", which="correlation")
    
    # 3. Get single-cell data
    adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")
    
    # 4. Perform enrichment analysis
    gene_list = correlated["gene_symbol"].tolist()[:50]
    enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
    

    Workflow 4: Disease and Drug Analysis

    Investigate disease associations and therapeutic targets:

    # 1. Search for genes
    genes = gget.search(["breast cancer"], species="homo_sapiens")
    
    # 2. Get disease associations
    diseases = gget.opentargets("ENSG00000169194", resource="diseases")
    
    # 3. Get drug associations
    drugs = gget.opentargets("ENSG00000169194", resource="drugs")
    
    # 4. Query cancer genomics data
    study_ids = gget.cbio_search(["breast"])
    gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")
    
    # 5. Search COSMIC for mutations
    cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")
    

    Workflow 5: Comparative Genomics

    Compare proteins across species:

    # 1. Get orthologs
    orthologs = gget.bgee("ENSG00000169194", type="orthologs")
    
    # 2. Get sequences for comparison
    human_seq = gget.seq("ENSG00000169194", translate=True)
    mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)
    
    # 3. Align sequences
    alignment = gget.muscle([human_seq, mouse_seq])
    
    # 4. Compare structures
    human_structure = gget.pdb("7S7U")
    mouse_structure = gget.alphafold(mouse_seq)
    

    Workflow 6: Building Reference Indices

    Prepare reference data for downstream analysis (e.g., kallisto|bustools):

    # 1. List available species
    gget ref --list_species
    
    # 2. Download reference files
    gget ref -w gtf -w cdna -d homo_sapiens
    
    # 3. Build kallisto index
    kallisto index -i transcriptome.idx transcriptome.fasta
    
    # 4. Download genome for alignment
    gget ref -w dna -d homo_sapiens
    

    Best Practices

    Data Retrieval

    • Use --limit to control result sizes for large queries
    • Save results with -o/--out for reproducibility
    • Check database versions/releases for consistency across analyses
    • Use --quiet in production scripts to reduce output

    Sequence Analysis

    • For BLAST/BLAT, start with default parameters, then adjust sensitivity
    • Use gget diamond with --threads for faster local alignment
    • Save DIAMOND databases with --diamond_db for repeated queries
    • For multiple sequence alignment, use -s5/--super5 for large datasets

    Expression and Disease Data

    • Gene symbols are case-sensitive in cellxgene (e.g., 'PAX7' vs 'Pax7')
    • Run gget setup before first use of alphafold, cellxgene, elm, gpt
    • For enrichment analysis, use database shortcuts for convenience
    • Cache cBioPortal data with -dd to avoid repeated downloads

    Structure Prediction

    • AlphaFold multimer predictions: use -mr 20 for higher accuracy
    • Use -r flag for AMBER relaxation of final structures
    • Visualize results in Python with plot=True
    • Check PDB database first before running AlphaFold predictions

    Error Handling

    • Database structures change; update gget regularly: uv pip install --upgrade gget
    • Process max ~1000 Ensembl IDs at once with gget info
    • For large-scale analyses, implement rate limiting for API queries
    • Use virtual environments to avoid dependency conflicts

    Output Formats

    Command-line

    • Default: JSON
    • CSV: Add -csv flag
    • FASTA: gget seq, gget mutate
    • PDB: gget pdb, gget alphafold
    • PNG: gget cbio plot

    Python

    • Default: DataFrame or dictionary
    • JSON: Add json=True parameter
    • Save to file: Add save=True or specify out="filename"
    • AnnData: gget cellxgene

    Resources

    This skill includes reference documentation for detailed module information:

    references/

    • module_reference.md - Comprehensive parameter reference for all modules
    • database_info.md - Information about queried databases and their update frequencies
    • workflows.md - Extended workflow examples and use cases

    For additional help:

    • Official documentation: https://pachterlab.github.io/gget/
    • GitHub issues: https://github.com/pachterlab/gget/issues
    • Citation: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836
    Repository
    davila7/claude-code-templates
    Files