Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.
This skill sheet distills the OmicVerse single-cell downstream tutorials into an executable checklist. Each module highlights prerequisites, the core API entry points, interpretation checkpoints, resource planning notes, and any optional validation or export steps surfaced in the notebooks.
Before running any downstream module, verify prerequisites:
# Before AUCell: verify embeddings exist
assert 'X_umap' in adata.obsm or 'X_pca' in adata.obsm, \
"Embedding required. Run ov.pp.umap(adata) or ov.pp.pca(adata) first."
# Before metacell DEG: verify raw counts are preserved
assert adata.raw is not None, "adata.raw required. Set adata.raw = adata.copy() before HVG filtering."
# Before SCENIC: verify raw counts (not log-transformed) are available
if hasattr(adata.X, 'max') and adata.X.max() < 20:
print("WARNING: SCENIC expects raw counts. Data may be log-transformed.")
# Before scDrug: verify tumor annotations
# assert 'cell_type' in adata.obs.columns, "Cell type annotation required for scDrug"
t_aucell.ipynb)AnnData object with clustering/embedding (adata.obsm['X_umap']) is prepared.ov.single.geneset_aucell for one pathway; ov.single.pathway_aucell for multiple pathways.ov.single.pathway_aucell_enrichment to score all pathways in a library (set num_workers for parallelism).sc.pl.embedding to confirm pathway activity patterns.sc.tl.rank_genes_groups on the AUCell AnnData to find cluster-enriched pathways and visualize with
sc.pl.rank_genes_groups_dotplot.num_workers=8 in tutorial) and sufficient memory for the
dense AUCell matrix.adata_aucs.write_h5ad('...') for reuse.ov.single.pathway_enrichment and ov.single.pathway_enrichment_plot heatmaps.t_scdeg.ipynb)ov.pp.qc, ov.pp.preprocess, ov.pp.scale, ov.pp.pca).adata.raw before HVG filtering.ov.bulk.pyDEG(test_adata.to_df(...).T) for full-cell and metacell views.ov.single.MetaCell(..., use_gpu=True) when GPU is available for acceleration.dds.plot_volcano) and targeted boxplots (dds.plot_boxplot) for top DEGs.ov.pl.embedding to confirm localization.pyDEG.legend_* settings for publication-ready visuals.t_deg_single.ipynb)adata with condition, cell_label, and optional batch metadata.ov.settings.cpu_gpu_mixed_init()).ov.single.DEG(..., method='wilcoxon'|'t-test'|'memento-de') with deg_obj.run(...) to target cell types.ov.single.DCT(..., method='sccoda'|'milo') for differential composition testing.ov.pp.preprocess, ov.single.batch_correction, ov.pp.neighbors, ov.pp.umap.deg_obj (Wilcoxon / memento) and adjust capture rate / bootstraps for stability.sim_results.set_fdr(); interpret boxplots with condition-level shifts.num_cpus, num_boot, high k); ensure adequate compute time.ov.pl.embedding), Milo beeswarm plots, and custom color palettes.t_scdrug.ipynb)infercnvpy.datasets.maynard2020_3k).ov.utils.get_gene_annotation (requires GTF from GENCODE or T2T-CHM13).ov.utils.download_GDSC_data() and ov.utils.download_CaDRReS_model() for drug-response models.git clone https://github.com/CSB5/CaDRReS-Sc).ov.single.autoResolution(adata, cpus=4).ov.single.Drug_Response(adata, scriptpath='CaDRReS-Sc', modelpath='models/', output='result').output; cross-reference with inferred CNV states.AnnData (adata.write('scanpyobj.h5ad')) to reuse for downstream analyses or re-runs.t_scenic.ipynb)For comprehensive SCENIC guidance (database downloads, RegDiffusion tuning, RSS interpretation, GRN visualization), use search_skills('SCENIC regulon GRN') to load the dedicated SCENIC skill.
ov.single.mouse_hsc_nestorowa16() (or provide preprocessed data with raw counts).*.feather) and motif annotations (motifs-*.tbl) for the species; allocate3 GB disk space and verify paths (
db_glob,motif_path).
ov.single.SCENIC(adata, db_glob=..., motif_path=..., n_jobs=12).scenic_obj.auc_mtx.head()), RSS scores, and embeddings colored by regulon activity.n_jobs matches available cores); ensure enough RAM for motif enrichment.scenic_obj (ov.utils.save) and regulon AnnData (regulon_ad.write).t_cnmf.ipynb)ov.pp.preprocess), scaling (ov.pp.scale), PCA, and have UMAP embeddings for inspection.np.arange(5, 11)) and iterations; ensure output directory exists.ov.single.cNMF(..., output_dir='...', name='...').cnmf_obj.factorize(...), cnmf_obj.combine(...), cnmf_obj.k_selection_plot(),
cnmf_obj.consensus(...).cnmf_obj.load_results(...), cnmf_obj.get_results(...), optional RF classifier via get_results_rfc.ov.pl.embedding), cluster labels, and dotplots of top genes.total_workers) and verifying disk
space for intermediate factorization files.t_nocd.ipynb)ov.single.scanpy_lazy (automated preprocessing) before running NOCD.scbrca = ov.single.scnocd(adata) followed by chained methods (matrix_transform, matrix_normalize,
GNN_configure, GNN_preprocess, GNN_model, GNN_result, GNN_plot, cal_nocd, calculate_nocd).sc.pl.umap) for nocd, nocd_n, and Leiden labels using shared color maps.t_lazy.ipynb)sample_key) and optionally initialize hybrid compute (ov.settings.cpu_gpu_mixed_init()).ov.single.lazy(adata, species='mouse', sample_key='batch', ...) with optional reforce_steps
and module-specific kwargs.ov.single.generate_scRNA_report(...) to build HTML summary; ov.generate_reference_table(adata) for
citation tracking.ov.pl.embedding) for quality and annotation alignment.reforce_steps accordingly.