gsplat-optimizer

ckorhonen/gsplat-optimizer

Design

3 installs

About

SKILL.md

gsplat-optimizer

ckorhonen/gsplat-optimizer

Design

3 installs

About

Optimize 3D Gaussian Splat scenes for real-time rendering on iOS, macOS, and visionOS...

SKILL.md

Gaussian Splat Optimizer

Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.

When to Use

Optimizing .ply or .splat files for mobile/Apple GPU targets
Reducing gaussian count for performance (pruning strategies)
Implementing Level-of-Detail (LOD) for large scenes
Compressing splat data for bandwidth/storage constraints
Profiling and optimizing Metal rendering performance
Targeting specific FPS goals on Apple hardware

Quick Start

Input: Provide a .ply/.splat file path, target device class, and FPS target.

# Analyze a splat file
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60

Output: The skill provides:

Point/gaussian pruning plan (opacity, size, error thresholds)
LOD scheme suggestion (distance bins, gaussian subsets)
Compression recommendation (if bandwidth/storage bound)
Metal profiling checklist with shader/compute tips

Optimization Workflow

Step 1: Analyze the Scene

First, understand your scene characteristics:

Gaussian count: Total number of splats
Opacity distribution: Histogram of opacity values
Size distribution: Gaussian scale statistics
Memory footprint: Estimated GPU memory usage

Step 2: Determine Target Device

Device Class	GPU Budget	Max Gaussians (60fps)	Storage Mode
iPhone (A15+)	4-6GB unified	~2-4M	Shared
iPad Pro (M1+)	8-16GB unified	~6-8M	Shared
Mac (M1-M3)	8-24GB unified	~8-12M	Shared/Managed
Vision Pro	16GB unified	~4-6M (stereo)	Shared
Mac (discrete GPU)	8-24GB VRAM	~10-15M	Private

Step 3: Apply Pruning

If gaussian count exceeds device budget:

Opacity threshold: Remove gaussians with opacity < 0.01-0.05
Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
Importance pruning: Use LODGE algorithm for error-proxy selection
Foveated rendering: For Vision Pro, reduce density in peripheral view

See references/pruning-strategies.md for details.

Step 4: Implement LOD (Large Scenes)

For scenes exceeding single-frame budget:

Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
Hierarchical structure: Octree or LoD tree for spatial queries
Chunk streaming: Load/unload based on camera position
Smooth transitions: Opacity blending at chunk boundaries

See references/lod-schemes.md for details.

Step 5: Apply Compression (If Needed)

For bandwidth/storage constraints:

Method	Compression	Use Case
SOGS	20x	Web delivery, moderate quality
SOG	24x	Web delivery, better quality
CodecGS	30x+	Maximum compression
C3DGS	31x	Fast rendering priority

See references/compression.md for details.

Step 6: Profile and Optimize Metal

Choose storage mode: Private for static data, Shared for dynamic
Optimize shaders: Function constants, thread occupancy
Profile with Xcode: GPU Frame Capture, Metal System Trace
Iterate: Measure, optimize, repeat

See references/metal-profiling.md for details.

Common Pitfalls

1. Point Cloud Density Mismatch

Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.

Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing

Debugging:

# Analyze gaussian distribution
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram

# Check against device budget
# Compare total_gaussians vs. device_max in the output table

Strategy:

Start with device budget from Step 2 (e.g., 4M for iPhone)
If scene exceeds budget by >20%, apply pruning before training
If visual quality drops too much after pruning, consider LOD or chunking
Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones

2. Training Instability (Gradient Explosions, Divergence)

Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:

Loss suddenly jumps to NaN
Gaussians disappear or explode in scale
Model becomes unrecoverable mid-session

Debugging:

# Monitor loss during training
tail -f training.log | grep -E "loss|nan|inf"

# Check gradient magnitudes
python -c "
import numpy as np
from plyfile import PlyData
ply_data = PlyData.read('scene.ply')
scales = ply_data['vertex']['scale_0'].data
print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}')
print(f'Any NaN: {np.isnan(scales).any()}')
"

Strategy:

Gradient clipping: Cap gradient updates to ±0.1 scale per step
Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
Checkpoint early: Save state every 10 iterations; rollback if loss spikes
Freeze covariance: If converged, stop updating scale/rotation after 80% of training
For device training: Reduce batch size or resolution if instability persists

3. Memory Limitations (OOM Errors on Large Scenes)

Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.

iPhone: 4–6GB shared between app + GPU
iPad Pro: 8–16GB shared
Vision Pro: 16GB (but stereo doubles gaussian count)

Debugging:

# Estimate memory footprint
python << 'EOF'
num_gaussians = 5_000_000  # Your count
bytes_per_gaussian = 56  # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12)
total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2)
print(f"Est. memory: {total_mb:.1f} MB")
print(f"Safe for iPhone A15: {total_mb < 2000}")  # Leave headroom for app
EOF

# Monitor live memory in Xcode
# Memory graph + Allocations instrument during scene load

Strategy:

Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
Lazy loading: Keep only active LOD level in memory; unload far chunks
Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye

4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)

Problem: Optimizing heavily for one metric breaks another:

Maximize FPS → visual artifacts: Over-pruning removes important geometry
Maximize quality → frame drops: Too many gaussians for target device
Minimize memory → banding/posterization: Excessive quantization or LOD culling

Debugging:

# Profile before/after each change
python << 'EOF'
metrics = {
  "original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"},
  "after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"},
}
for label, m in metrics.items():
    print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}")
EOF

Strategy:

Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
Measure baseline: Profile original unoptimized scene first
Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)

5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)

Problem: Rendering pipeline stalls despite low gaussian count:

First frame (cold start): 2–5s delay while shaders compile
Mid-scene: Frame drops spike when new LOD levels load
Smooth playback → stuttering after 30–60s

Debugging:

# Capture Metal frame statistics
# In Xcode: Product > Scheme > Edit > Run > Diagnostics
# Enable: Metal API Validation, GPU Frame Capture

# Check shader compilation time
python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \
  --capture-shader-compile \
  --target iphone14

# Monitor frame time distribution
tail -f xcode.log | grep -E "frame_time|stutter"

Strategy:

Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)

Key Metrics

Metric	Target	How to Measure
Frame time	16.6ms (60fps)	Metal System Trace
GPU memory	< device budget	Xcode Memory Graph
Bandwidth	< 50GB/s	GPU Counters
Shader time	< 10ms	GPU Frame Capture

Reference Implementation

MetalSplatter is the primary reference for Swift/Metal gaussian splatting:

Repository: https://github.com/scier/MetalSplatter
Supports iOS, macOS, visionOS
~8M splat capacity with v1.1 optimizations
Stereo rendering for Vision Pro

Getting Started with MetalSplatter

git clone https://github.com/scier/MetalSplatter.git
cd MetalSplatter
open SampleApp/MetalSplatter_SampleApp.xcodeproj
# Set to Release scheme for best performance

Resources

Reference Documentation

Pruning Strategies - Gaussian reduction techniques
LOD Schemes - Level-of-detail approaches
Compression - Bandwidth/storage optimization
Metal Profiling - Apple GPU optimization

Research Papers

LODGE - LOD for large-scale scenes
FLoD - Flexible LOD for variable hardware
Voyager - City-scale mobile rendering
3DGS Compression Survey

Apple Developer Resources

About

SKILL.md

About

Optimize 3D Gaussian Splat scenes for real-time rendering on iOS, macOS, and visionOS...

SKILL.md

Gaussian Splat Optimizer

Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.

When to Use

Optimizing .ply or .splat files for mobile/Apple GPU targets
Reducing gaussian count for performance (pruning strategies)
Implementing Level-of-Detail (LOD) for large scenes
Compressing splat data for bandwidth/storage constraints
Profiling and optimizing Metal rendering performance
Targeting specific FPS goals on Apple hardware

Quick Start

Input: Provide a .ply/.splat file path, target device class, and FPS target.

# Analyze a splat file
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60

Output: The skill provides:

Point/gaussian pruning plan (opacity, size, error thresholds)
LOD scheme suggestion (distance bins, gaussian subsets)
Compression recommendation (if bandwidth/storage bound)
Metal profiling checklist with shader/compute tips

Optimization Workflow

Step 1: Analyze the Scene

First, understand your scene characteristics:

Gaussian count: Total number of splats
Opacity distribution: Histogram of opacity values
Size distribution: Gaussian scale statistics
Memory footprint: Estimated GPU memory usage

Step 2: Determine Target Device

Device Class	GPU Budget	Max Gaussians (60fps)	Storage Mode
iPhone (A15+)	4-6GB unified	~2-4M	Shared
iPad Pro (M1+)	8-16GB unified	~6-8M	Shared
Mac (M1-M3)	8-24GB unified	~8-12M	Shared/Managed
Vision Pro	16GB unified	~4-6M (stereo)	Shared
Mac (discrete GPU)	8-24GB VRAM	~10-15M	Private

Step 3: Apply Pruning

If gaussian count exceeds device budget:

Opacity threshold: Remove gaussians with opacity < 0.01-0.05
Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
Importance pruning: Use LODGE algorithm for error-proxy selection
Foveated rendering: For Vision Pro, reduce density in peripheral view

See references/pruning-strategies.md for details.

Step 4: Implement LOD (Large Scenes)

For scenes exceeding single-frame budget:

Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
Hierarchical structure: Octree or LoD tree for spatial queries
Chunk streaming: Load/unload based on camera position
Smooth transitions: Opacity blending at chunk boundaries

See references/lod-schemes.md for details.

Step 5: Apply Compression (If Needed)

For bandwidth/storage constraints:

Method	Compression	Use Case
SOGS	20x	Web delivery, moderate quality
SOG	24x	Web delivery, better quality
CodecGS	30x+	Maximum compression
C3DGS	31x	Fast rendering priority

See references/compression.md for details.

Step 6: Profile and Optimize Metal

Choose storage mode: Private for static data, Shared for dynamic
Optimize shaders: Function constants, thread occupancy
Profile with Xcode: GPU Frame Capture, Metal System Trace
Iterate: Measure, optimize, repeat

See references/metal-profiling.md for details.

Common Pitfalls

1. Point Cloud Density Mismatch

Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.

Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing

Debugging:

# Analyze gaussian distribution
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram

# Check against device budget
# Compare total_gaussians vs. device_max in the output table

Strategy:

Start with device budget from Step 2 (e.g., 4M for iPhone)
If scene exceeds budget by >20%, apply pruning before training
If visual quality drops too much after pruning, consider LOD or chunking
Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones

2. Training Instability (Gradient Explosions, Divergence)

Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:

Loss suddenly jumps to NaN
Gaussians disappear or explode in scale
Model becomes unrecoverable mid-session

Debugging:

# Monitor loss during training
tail -f training.log | grep -E "loss|nan|inf"

# Check gradient magnitudes
python -c "
import numpy as np
from plyfile import PlyData
ply_data = PlyData.read('scene.ply')
scales = ply_data['vertex']['scale_0'].data
print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}')
print(f'Any NaN: {np.isnan(scales).any()}')
"

Strategy:

Gradient clipping: Cap gradient updates to ±0.1 scale per step
Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
Checkpoint early: Save state every 10 iterations; rollback if loss spikes
Freeze covariance: If converged, stop updating scale/rotation after 80% of training
For device training: Reduce batch size or resolution if instability persists

3. Memory Limitations (OOM Errors on Large Scenes)

Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.

iPhone: 4–6GB shared between app + GPU
iPad Pro: 8–16GB shared
Vision Pro: 16GB (but stereo doubles gaussian count)

Debugging:

# Estimate memory footprint
python << 'EOF'
num_gaussians = 5_000_000  # Your count
bytes_per_gaussian = 56  # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12)
total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2)
print(f"Est. memory: {total_mb:.1f} MB")
print(f"Safe for iPhone A15: {total_mb < 2000}")  # Leave headroom for app
EOF

# Monitor live memory in Xcode
# Memory graph + Allocations instrument during scene load

Strategy:

Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
Lazy loading: Keep only active LOD level in memory; unload far chunks
Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye

4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)

Problem: Optimizing heavily for one metric breaks another:

Maximize FPS → visual artifacts: Over-pruning removes important geometry
Maximize quality → frame drops: Too many gaussians for target device
Minimize memory → banding/posterization: Excessive quantization or LOD culling

Debugging:

# Profile before/after each change
python << 'EOF'
metrics = {
  "original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"},
  "after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"},
}
for label, m in metrics.items():
    print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}")
EOF

Strategy:

Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
Measure baseline: Profile original unoptimized scene first
Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)

5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)

Problem: Rendering pipeline stalls despite low gaussian count:

First frame (cold start): 2–5s delay while shaders compile
Mid-scene: Frame drops spike when new LOD levels load
Smooth playback → stuttering after 30–60s

Debugging:

# Capture Metal frame statistics
# In Xcode: Product > Scheme > Edit > Run > Diagnostics
# Enable: Metal API Validation, GPU Frame Capture

# Check shader compilation time
python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \
  --capture-shader-compile \
  --target iphone14

# Monitor frame time distribution
tail -f xcode.log | grep -E "frame_time|stutter"

Strategy:

Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)

Key Metrics

Metric	Target	How to Measure
Frame time	16.6ms (60fps)	Metal System Trace
GPU memory	< device budget	Xcode Memory Graph
Bandwidth	< 50GB/s	GPU Counters
Shader time	< 10ms	GPU Frame Capture

Reference Implementation

MetalSplatter is the primary reference for Swift/Metal gaussian splatting:

Repository: https://github.com/scier/MetalSplatter
Supports iOS, macOS, visionOS
~8M splat capacity with v1.1 optimizations
Stereo rendering for Vision Pro

Getting Started with MetalSplatter

git clone https://github.com/scier/MetalSplatter.git
cd MetalSplatter
open SampleApp/MetalSplatter_SampleApp.xcodeproj
# Set to Release scheme for best performance