# GoldenMatch

Find and remove duplicate records from CSV files. Deduplicate customer lists, merge patient records, clean CRM data. Works with any CSV � auto-detects columns, no config needed. 30 tools: deduplicate…

## Quick Start

```bash
# Connect this server (installs CLI if needed)
npx -y @smithery/cli@latest mcp add benzsevern/goldenmatch

# Browse available tools
npx -y @smithery/cli@latest tool list benzsevern/goldenmatch

# Get full schema for a tool
npx -y @smithery/cli@latest tool get benzsevern/goldenmatch analyze_data

# Call a tool
npx -y @smithery/cli@latest tool call benzsevern/goldenmatch analyze_data '{}'
```

## Direct MCP Connection

Endpoint: `https://goldenmatch--benzsevern.run.tools`

## Tools (27)

- `analyze_data` — Profile data, detect domain, recommend ER strategy
- `auto_configure` — Generate optimal matching config from data analysis
- `agent_deduplicate` — Run full ER pipeline with confidence gating and reasoning
- `agent_match_sources` — Match two files with intelligent strategy selection
- `agent_explain_pair` — Natural language explanation for a record pair
- `agent_explain_cluster` — Explain why records are in the same cluster
- `agent_review_queue` — Get borderline pairs awaiting approval
- `agent_approve_reject` — Approve or reject a review queue pair
- `agent_compare_strategies` — Compare ER strategies on your data
- `suggest_pprl` — Check if data needs privacy-preserving matching
- `get_stats` — Get dataset statistics: record count, cluster count, match rate, cluster sizes.
- `find_duplicates` — Find duplicate matches for a record. Provide field values to search against the loaded dataset.
- `explain_match` — Explain why two records match or don't match. Shows per-field score breakdown.
- `list_clusters` — List duplicate clusters found in the dataset. Returns cluster IDs, sizes, and member counts.
- `get_cluster` — Get details of a specific cluster: all member records and their field values.
- `get_golden_record` — Get the merged golden (canonical) record for a cluster.
- `match_record` — Match a single record against the loaded dataset in real-time. Paste a record's fields and instantly see if it matches …
- `unmerge_record` — Remove a record from its cluster. The record becomes a singleton. Remaining cluster members are re-clustered using stor…
- `shatter_cluster` — Break an entire cluster into individual records. All members become singletons. Use when a cluster is completely wrong.
- `suggest_config` — Analyze bad merges and suggest config changes. Provide examples of incorrect merges (pairs that should NOT have matched…
- `profile_data` — Get data quality profile: column types, null rates, unique counts, sample values.
- `export_results` — Export matching results to a file (CSV or JSON).
- `list_domains` — List available domain extraction rulebooks (built-in + user-defined).
- `create_domain` — Create a custom domain extraction rulebook. Define patterns for a specific data domain (medical devices, automotive par…
- `test_domain` — Test a domain extraction rulebook against sample records. Shows what features would be extracted from the loaded data.
- `pprl_auto_config` — Analyze the loaded dataset and recommend optimal PPRL (privacy-preserving record linkage) configuration. Returns recomm…
- `pprl_link` — Run privacy-preserving record linkage between two parties' data. Computes bloom filters, matches records without sharin…

```bash
# Get full input/output schema for a tool
npx -y @smithery/cli@latest tool get benzsevern/goldenmatch <tool-name>
```
