Phi-4 LLM interaction skill for generating text completions via Ollama API.
This skill provides a Python wrapper for interacting with Ollama's REST API to generate text completions using the Phi-4 model (14B parameters, 16K context window). It handles timeouts, retries, and structured logging for all LLM operations.
Use this skill when you need to:
IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory
cd .agent/skills/ollama-client
uv sync # Creates .venv and installs dependencies from pyproject.toml
Dependencies are in pyproject.toml:
requests - HTTP client for Ollama APICRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.
# From .agent/skills/ollama-client/ directory
# Run with: uv run python -c "..."
from ollama_client import OllamaClient
# Initialize client
client = OllamaClient(
host="http://localhost:11434", # Default from OLLAMA_HOST env var
model="phi4:14b", # Default from OLLAMA_MODEL env var
timeout=300 # 5 minutes default
)
# Generate completion
result = client.generate(
prompt="Summarize the following clinical note: ...",
temperature=0.1, # Low temperature for deterministic outputs
max_tokens=1000, # Optional token limit
stop_sequences=["END"] # Optional stop sequences
)
print(result["response"])
print(f"Execution time: {result['execution_time_ms']}ms")
import os
# Set in .env or docker-compose.yml
os.environ['OLLAMA_HOST'] = 'http://localhost:11434'
os.environ['OLLAMA_MODEL'] = 'phi4:14b'
# Client uses env vars automatically
client = OllamaClient()
When importing this skill from agents or other code:
import sys
from pathlib import Path
# Add skill to path (use relative path from your location)
skill_path = Path(__file__).parent.parent.parent / ".agent/skills/ollama-client"
sys.path.insert(0, str(skill_path))
from ollama_client import OllamaClient
client = OllamaClient()
# Check if Ollama server is accessible
if client.is_available():
print("Ollama server is healthy")
else:
print("Ollama server unavailable")
Environment Variables:
OLLAMA_HOST: Server URL (default: http://localhost:11434)OLLAMA_MODEL: Model name (default: phi4:14b)Parameters:
temperature: Sampling temperature (0.0-1.0, default: 0.1 for deterministic outputs)max_tokens: Maximum tokens to generate (optional)stop_sequences: List of strings to stop generation (optional)timeout: Request timeout in seconds (default: 300)The skill raises exceptions for:
All errors include execution time for debugging.
temperature=0.1 for clinical tasks requiring consistencyAgents use this skill for all LLM operations:
See ollama_client.py for the full Python implementation.