Read, write, and create single-cell data objects using Seurat (R) and Scanpy (Python)...
Reference examples tested with: Cell Ranger 8.0+, anndata 0.10+, numpy 1.26+, pandas 2.2+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturespackageVersion('<pkg>') then ?function_name to verify parametersIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Read, write, and create single-cell data objects for analysis.
Goal: Load, create, and save single-cell data objects using Scanpy and AnnData.
Approach: Read 10X Genomics output, CSV, or Loom formats into AnnData objects, manipulate metadata and layers, and write to h5ad format.
"Load my 10X data" → Read Cell Ranger output directory or h5 file into an AnnData object with expression matrix, cell barcodes, and gene annotations.
import scanpy as sc
import anndata as ad
import pandas as pd
import numpy as np
# Read 10X cellranger output (filtered_feature_bc_matrix directory)
adata = sc.read_10x_mtx('filtered_feature_bc_matrix/', var_names='gene_symbols', cache=True)
print(f'Loaded {adata.n_obs} cells x {adata.n_vars} genes')
# Read 10X h5 file directly
adata = sc.read_10x_h5('filtered_feature_bc_matrix.h5')
# AnnData stores:
# - adata.X: expression matrix (cells x genes)
# - adata.obs: cell metadata (DataFrame)
# - adata.var: gene metadata (DataFrame)
# - adata.uns: unstructured annotations (dict)
# - adata.obsm: cell embeddings (PCA, UMAP)
# - adata.varm: gene embeddings
# - adata.obsp: cell-cell graphs
# - adata.layers: alternative matrices (raw counts, normalized)
print(f'Shape: {adata.shape}')
print(f'Cell metadata: {adata.obs.columns.tolist()}')
print(f'Gene metadata: {adata.var.columns.tolist()}')
import anndata as ad
import numpy as np
import pandas as pd
counts = np.random.poisson(1, size=(100, 500)) # 100 cells x 500 genes
cell_ids = [f'cell_{i}' for i in range(100)]
gene_ids = [f'gene_{i}' for i in range(500)]
adata = ad.AnnData(
X=counts,
obs=pd.DataFrame(index=cell_ids),
var=pd.DataFrame(index=gene_ids)
)
# h5ad is the native AnnData format
adata = sc.read_h5ad('data.h5ad')
# Write to h5ad
adata.write_h5ad('output.h5ad')
# Write compressed
adata.write_h5ad('output.h5ad', compression='gzip')
# CSV/TSV (genes as columns, cells as rows)
adata = sc.read_csv('counts.csv')
# Loom format
adata = sc.read_loom('data.loom')
# Text file (tab-separated)
adata = sc.read_text('counts.txt')
# Add cell metadata
adata.obs['sample'] = 'sample_1'
adata.obs['batch'] = ['batch_1'] * 50 + ['batch_2'] * 50
# Add gene metadata
adata.var['gene_type'] = 'protein_coding'
# Add unstructured data
adata.uns['experiment'] = 'PBMC_3k'
# Subset by cells
adata_subset = adata[adata.obs['batch'] == 'batch_1'].copy()
# Subset by genes
adata_subset = adata[:, adata.var['highly_variable']].copy()
# Boolean indexing
adata_subset = adata[adata.obs['n_genes'] > 200, :].copy()
# Store raw counts before normalization
adata.raw = adata.copy()
# Access raw counts later
raw_counts = adata.raw.X
# Or use layers
adata.layers['counts'] = adata.X.copy()
Goal: Load, create, and save single-cell data objects using Seurat.
Approach: Read 10X Genomics output into Seurat objects, manipulate metadata, merge samples, and serialize with RDS or h5Seurat formats.
library(Seurat)
library(Matrix)
# Read 10X cellranger output
counts <- Read10X(data.dir = 'filtered_feature_bc_matrix/')
# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = counts, project = 'PBMC', min.cells = 3, min.features = 200)
print(seurat_obj)
# Read h5 file directly
counts <- Read10X_h5('filtered_feature_bc_matrix.h5')
seurat_obj <- CreateSeuratObject(counts = counts, project = 'PBMC')
# Seurat v5 uses layers instead of slots
# - Layers: counts, data, scale.data
# - Metadata: seurat_obj@meta.data
# - Reductions: seurat_obj@reductions
# - Graphs: seurat_obj@graphs
# Access layers (v5 syntax)
counts <- LayerData(seurat_obj, layer = 'counts')
# Or shorthand
counts <- seurat_obj[['RNA']]$counts
# Access metadata
head(seurat_obj@meta.data)
# Create from sparse matrix
counts <- Matrix(rpois(1000 * 500, 1), nrow = 500, ncol = 1000, sparse = TRUE)
rownames(counts) <- paste0('gene_', 1:500)
colnames(counts) <- paste0('cell_', 1:1000)
seurat_obj <- CreateSeuratObject(counts = counts, project = 'MyProject')
# Save Seurat object
saveRDS(seurat_obj, file = 'seurat_obj.rds')
# Load Seurat object
seurat_obj <- readRDS('seurat_obj.rds')
# Add cell metadata
seurat_obj$sample <- 'sample_1'
seurat_obj$batch <- c(rep('batch_1', 500), rep('batch_2', 500))
# Or using AddMetaData
metadata_df <- data.frame(
cell_type = rep('unknown', ncol(seurat_obj)),
row.names = colnames(seurat_obj)
)
seurat_obj <- AddMetaData(seurat_obj, metadata = metadata_df)
# Subset by metadata
seurat_subset <- subset(seurat_obj, subset = batch == 'batch_1')
# Subset by cells
seurat_subset <- subset(seurat_obj, cells = colnames(seurat_obj)[1:500])
# Subset by features
seurat_subset <- subset(seurat_obj, features = rownames(seurat_obj)[1:100])
# Merge multiple Seurat objects
merged <- merge(seurat_obj1, y = c(seurat_obj2, seurat_obj3), add.cell.ids = c('S1', 'S2', 'S3'))
# Join layers after merge (v5)
merged <- JoinLayers(merged)
Goal: Convert single-cell data objects between Seurat (R) and AnnData (Python) formats.
Approach: Use SeuratDisk as an intermediary to convert via h5Seurat/h5ad bridge files.
# In R: save as h5Seurat
library(SeuratDisk)
SaveH5Seurat(seurat_obj, filename = 'data.h5seurat')
Convert('data.h5seurat', dest = 'h5ad')
# In Python: read converted file
adata = sc.read_h5ad('data.h5ad')
# In Python: save as h5ad
adata.write_h5ad('data.h5ad')
# In R: convert and load
library(SeuratDisk)
Convert('data.h5ad', dest = 'h5seurat')
seurat_obj <- LoadH5Seurat('data.h5seurat')
| Format | Extension | Description | Tool |
|---|---|---|---|
| 10X MTX | folder | Cellranger output | Both |
| 10X h5 | .h5 | Cellranger HDF5 | Both |
| h5ad | .h5ad | AnnData native | Scanpy |
| RDS | .rds | R serialized | Seurat |
| Loom | .loom | HDF5-based | Both |
| h5Seurat | .h5seurat | Seurat HDF5 | Seurat |