Use when extracting entities and relationships, building ontologies, compressing large graphs, or analyzing knowledge structures - provides structural equivalence-based compression achieving 57-95%...
Systematic extraction and analysis of entities, relationships, and ontological structures from unstructured text—enhanced with categorical metagraph compression enabling scale-invariant representation through structural equivalence, k-bisimulation summarization, and quotient constructions that preserve query-answering capabilities while achieving dramatic size reductions.
/mnt/skills/user/knowledge-graph/schemas/core_ontology.md for entity/relationship types/mnt/skills/user/knowledge-graph/templates/extraction_template.md# 1. Extract → validate → analyze topology
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --topology
# 2. Compute structural equivalence and compress
python scripts/compress_graph.py graph.json --method k-bisim --k 5
# 3. Verify query preservation
python scripts/verify_compression.py original.json compressed.json --queries reachability,pattern
Structural equivalence enables compression through a precise mechanistic chain:
Equivalence → Redundancy → Quotient → Preservation
Equivalence relations partition structures: Graph automorphisms and categorical isomorphisms identify structurally interchangeable elements—vertices with identical connection patterns to equivalent neighbors belong to the same automorphism orbit
Orbits represent information redundancy: For k vertices in one orbit, (k-1) are informationally redundant since they encode the same structural relationships
Quotient constructions eliminate redundancy: Categorical quotients collapse equivalence classes to single representatives while the universal property guarantees any construction respecting the equivalence factors uniquely through the compressed representation
Functors preserve structure across scales: The quotient functor Q: C → C/R is full and bijective on objects—no essential categorical information is lost
The connection between automorphisms and Kolmogorov complexity:
K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)
Graphs with large automorphism groups have lower complexity because only one representative from each orbit needs encoding. For highly symmetric structures, compression can reach n/log n factor.
Knowledge graphs exhibit natural structural regularities:
| Pattern | Compression Mechanism | Typical Reduction |
|---|---|---|
| Type hierarchies | Automorphism orbits | 40-60% |
| Repeated subgraphs | k-bisimulation equivalence | 50-80% |
| Community structure | Block quotients | 30-50% |
| Self-similar patterns | Scale-invariant quotients | 60-95% |
Extract entities with confidence scores, provenance tracking, and property attribution:
Key principle: Every extraction must include confidence score and source tracking for auditability.
Identify and classify relationships between entities:
/mnt/skills/user/knowledge-graph/schemas/General domains use core_ontology.md
Coding/software domains additionally use coding_domain.md which adds:
Identify and exploit structural redundancy through automorphism detection:
Automorphism-Based Compression:
# Compute automorphism group
aut_group = compute_automorphisms(graph)
orbits = partition_by_orbits(graph.nodes, aut_group)
# Each orbit → single representative
compressed_nodes = [orbit.canonical_representative() for orbit in orbits]
compression_ratio = len(graph.nodes) / len(compressed_nodes)
Equivalence Types:
Compress graphs while preserving query semantics using k-bisimulation:
Definition: Two nodes are k-bisimilar if they have:
Implementation:
python scripts/compress_graph.py graph.json \
--method k-bisim \
--k 5 \ # k=5 sufficient for most graphs
--preserve-queries reachability,pattern
Empirical Results:
Apply category-theoretic compression with provable structure preservation:
The Universal Property Guarantee:
For any quotient Q: C → C/R, if H: C → D is any functor such that H(f) = H(g) whenever f ~ g in R, then H factors uniquely as H = H' ∘ Q.
This unique factorization means the quotient is the "freest" (most compressed) object respecting the equivalence—any construction built on the original that respects the equivalence can be equivalently built on the quotient.
Skeleton Construction:
# Every category is equivalent to its skeleton
skeleton = compute_skeleton(category)
# skeleton contains exactly one representative per isomorphism class
# All categorical properties preserved (limits, colimits, exactness)
Support edge-of-edge structures for multi-scale representation:
Metagraph Definition: MG = ⟨V, MV, E, ME⟩
Why Metagraphs Enable Scale Invariance:
The edge-of-edge capability creates holonic structure—self-similar patterns where the relationship between a metavertex and its contents mirrors the relationship between the entire metagraph and its top-level components. Automorphisms operate at multiple levels simultaneously, creating compression opportunities at each scale when these automorphism structures are isomorphic across levels.
2-Category Interpretation:
The interchange law ensures scale-independent composition.
Graph Quality Metrics:
| Metric | Formula | Target | Significance |
|---|---|---|---|
| Edge-to-Node Ratio | |E|/|V| | ≥4:1 | Enables emergence through dense connectivity |
| Isolation Rate | |V_isolated|/|V| | <20% | Measures integration completeness |
| Clustering Coefficient | Local triangles/possible triangles | >0.3 | Small-world property indicator |
| Fractal Dimension | d_B from box-covering | Finite | Self-similarity/compressibility |
| Average Path Length | Mean geodesic distance | Low | Information flow efficiency |
Scale-Invariance Indicators:
N_B(l_B) ~ l_B^(-d_B)
Networks with finite fractal dimension d_B are self-similar and can be compressed at multiple resolutions with compression ratio scaling as l^(d_B).
Validation Script:
python scripts/validate_graph.py graph.json --topology --compression-potential
Structural Entropy:
H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)
The term -n·log(n) represents compression gain from removing label information.
Minimum Description Length (MDL):
For graph G and model M:
L(G,M) = L(M) + L(G|M)
Optimal compression minimizes this total description length. Community structure reduces entropy by ~k·log(n) bits for k communities.
Compressibility Predictors:
| Score | Criteria | Example |
|---|---|---|
| 0.9-1.0 | Explicitly stated with clear evidence | "Dr. Jane Smith works for MIT" |
| 0.7-0.89 | Strongly implied by context | Person with @mit.edu email |
| 0.5-0.69 | Reasonably inferred but ambiguous | Co-authorship implies collaboration |
| 0.3-0.49 | Weak inference, requires validation | Similar domain suggests relationship |
| 0.0-0.29 | Speculative, likely incorrect | Pure assumption |
Create stable, meaningful identifiers:
{type}_{normalized_name} (e.g., person_jane_smith, org_mit)Always include:
source_document: Document ID or filenamesource_location: Page number, section, line rangeextraction_timestamp: ISO 8601 formatextractor_version: Skill version identifier# 1. Initial extraction
# (Extract to graph.json)
# 2. Validate and analyze
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --full
# 3. Compute structural equivalence
python scripts/structural_equivalence.py graph.json \
--output equivalence_classes.json \
--method automorphism
# 4. Apply k-bisimulation compression
python scripts/compress_graph.py graph.json \
--equivalence equivalence_classes.json \
--method k-bisim --k 5 \
--output compressed.json
# 5. Verify preservation
python scripts/verify_compression.py graph.json compressed.json \
--queries reachability,pattern,neighborhood
# 6. Generate topology report
python scripts/topology_metrics.py compressed.json --report
Termination criteria:
For complex domains requiring hierarchical representation:
# 1. Extract at multiple granularities
python scripts/extract_hierarchical.py source.txt \
--levels strategic,tactical,operational \
--output metagraph.json
# 2. Compute cross-level automorphisms
python scripts/metagraph_automorphisms.py metagraph.json
# 3. Apply scale-invariant compression
python scripts/compress_metagraph.py metagraph.json \
--preserve-hierarchy \
--output compressed_metagraph.json
Compress while guaranteeing specific query types remain answerable:
# Define query preservation requirements
queries = {
"reachability": True, # 95% reduction possible
"pattern_match": True, # 57% reduction possible
"neighborhood_k": 3, # Preserve 3-hop neighborhoods
}
# Compress with guarantees
compressed = compress_with_guarantees(
graph,
method="k-bisimulation",
k=max(5, queries["neighborhood_k"]),
preserve=queries
)
Maintain compression as graph evolves:
# Update cost: O(Δ·d^k)
# Δ = number of changes
# d = maximum degree
# k = bisimulation depth
def update_compression(compressed_graph, changes):
affected_classes = identify_affected_equivalence_classes(changes)
recompute_local_bisimulation(affected_classes, k=5)
return updated_compressed_graph
Use ologs (ontology logs) for categorical knowledge representation:
# Olog: category where objects = noun phrases, morphisms = verb phrases
olog = {
"objects": ["a person", "an organization", "a concept"],
"morphisms": [
{"source": "a person", "target": "an organization", "label": "works for"},
{"source": "a concept", "target": "a concept", "label": "relates to"}
]
}
# Yoneda embedding: object determined by morphisms into it
# Compression: store relationships, not internal structure
When compression produces unexpected results:
When graph metrics fall outside targets:
/mnt/skills/user/knowledge-graph/
├── SKILL.md # This file
├── schemas/
│ ├── core_ontology.md # Universal entity/relationship types
│ ├── coding_domain.md # Software development extension
│ └── categorical_ontology.md # Category-theoretic type system
├── templates/
│ ├── extraction_template.md # JSON format specification
│ └── metagraph_template.md # Hierarchical metagraph format
└── scripts/
├── validate_graph.py # Quality validation
├── merge_graphs.py # Deduplication and merging
├── analyze_graph.py # Refinement strategy generation
├── compress_graph.py # k-bisimulation compression
├── structural_equivalence.py # Automorphism computation
├── topology_metrics.py # Graph topology analysis
└── verify_compression.py # Query preservation verification
All scripts require Python 3.7+ with standard library only (no external packages for core functionality). Optional NetworkX for advanced topology metrics.
This skill composes naturally with:
mapping:
strategic_level: metagraph_level_0
tactical_level: metagraph_level_1
operational_level: metagraph_level_2
convergence_metrics: compression_ratio, query_preservation
A high-quality extraction with compression demonstrates:
Core Philosophy: Knowledge graphs emerge through iterative refinement—initial extraction establishes structure, topology analysis reveals density gaps, structural equivalence enables compression, and categorical quotients preserve essential relationships while eliminating redundancy. The compression is "lossy but structure-preserving" because categorical equivalence guarantees that compressed representations support all the same inferences as their originals.