url-dump

huytieu/url-dump

Productivity

120

About

SKILL.md

url-dump

huytieu/url-dump

Productivity

120

About

Quick capture URLs with automatic content extraction, insights, and categorization into knowledge booklets

SKILL.md

COG URL Dump Skill

Purpose

Transform raw URLs into structured, insightful knowledge entries through intelligent content extraction, categorization, and integration with the user's knowledge base. Quick capture with automatic insight generation.

When to Invoke

User shares a URL they want to save
User says "save this link", "bookmark this", "url dump", or "save for later"
User pastes a URL and wants to capture it
User wants to organize web resources into their knowledge base

Agent Mode Awareness

Check agent_mode in 00-inbox/MY-PROFILE.md frontmatter:

If agent_mode: team — delegate content extraction, analysis, and categorization to a sub-agent while handling user interaction directly. The sub-agent fetches URL content, generates insights, and returns structured results for filing.
If agent_mode: solo (default) — handle everything directly in the conversation. No delegation.

Pre-Flight Check

Before executing, check for user profile:

Look for 00-inbox/MY-PROFILE.md in the vault

If NOT found:

Welcome to COG! It looks like this is your first time.

Before we start, let's quickly set up your profile (takes 2 minutes).

Would you like to run onboarding first, or should I proceed with default settings?

If found:
- Read the profile to get user's interests and projects
- Use interests to help with auto-categorization
- Check for existing booklet categories in 05-knowledge/booklets/

Process Flow

1. User Interaction & Input Collection

Accept URL(s) from the user (single URL or batch)
Optionally accept user's quick note about why they're saving this
Accept any format: bare URL, markdown link, or with notes

Prompt:

What URL(s) would you like to save?
(You can paste one or more URLs, optionally with a note about why you're saving it)

2. URL Validation & Fetch

Validate URL format
Check if URL is accessible
Detect duplicate URLs in existing knowledge base
Fetch the web page content

Content Extraction

Extract from the page:

Page Title: [extracted-title]
Meta Description: [if available]
Author: [if detected]
Published Date: [if detected]
Word Count: [estimated]
Read Time: [X minutes]
Main Content: [extracted body text]
Key Headings: [list of H1/H2s]

3. Category Selection

Default Categories:

Articles & Blogs: Long-form content, tutorials, opinion pieces
Tools & Resources: Software, utilities, services, APIs
Reference: Documentation, specs, standards
Research: Papers, studies, academic content
Inspiration: Design, ideas, creative references
Videos & Media: YouTube, podcasts, multimedia
News & Updates: Industry news, announcements
Project-Specific: Related to a specific project (offer project list from MY-PROFILE.md)
To Review: Unsure, save for later categorization

Custom Categories:

Check 05-knowledge/booklets/ for existing custom categories
Offer to create new category if needed

Auto-suggestion: Based on content analysis, suggest the most likely category but let user confirm or change.

4. Content Analysis and Processing

Phase 1: Content Classification

Determine:

Content Category: [article|tool|reference|research|video|news|etc]
Primary Topics: [topic1, topic2, topic3]
Tone: [informative|opinion|tutorial|news|etc]
Quality Assessment: [high|medium|low]
Credibility Indicators: [author credentials, citations, etc]

Phase 2: Insight Extraction

Generate:

Executive Summary: [2-3 sentences]
Key Insights:
1. [Insight 1 with context]
2. [Insight 2 with context]
3. [Insight 3 with context]
Notable Quotes: [if any stand out]
Action Items: [practical takeaways]

Phase 3: Relevance Assessment

Analyze:

User Interest Match: [high|medium|low] - [which interests from profile]
Project Relevance: [project-name] - [why relevant]
Knowledge Gap: [yes|no] - [what gap it fills]
Timeliness: [evergreen|current|dated]
Uniqueness: [novel|common|duplicate-adjacent]

Phase 4: Cross-Reference

Identify connections to:

Related Bookmarks: [existing similar saves]
Related Braindumps: [if content connects]
Related Projects: [if applicable]
Suggested Tags: [tag1, tag2, tag3]

5. Generate Structured Output

Create bookmark file with this structure:

---
type: "url-bookmark"
category: "[category-name]"
domain: "[source-domain.com]"
date_saved: "YYYY-MM-DD"
date_accessed: "YYYY-MM-DD HH:MM"
url: "[original-url]"
title: "[page-title]"
author: "[author-if-available]"
published: "[publish-date-if-available]"
tags: ["#bookmark", "#category-tag", "#topic-tags"]
relevance: "[high|medium|low]"
status: "unread"
related_projects: ["project1", "project2"]
confidence: "[high|medium|low]"
---

# [Title]

## Quick Summary
[2-3 sentence summary of the content]

## Key Insights
- **Insight 1:** [description with context]
- **Insight 2:** [description with context]
- **Insight 3:** [description with context]

## Why This Matters
[Connection to user's interests/projects. What makes this worth saving?]

## User Note
[Original user note if provided, otherwise omit section]

## Content Highlights
[Key excerpts or quotes from the content - 200-400 words max]

## Practical Takeaways
- [ ] [Action item 1 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] [Action item 2 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]

## Related Knowledge
- **Similar Bookmarks:** [[bookmark1]], [[bookmark2]]
- **Connected Projects:** [[project1]]
- **Related Notes:** [[note1]], [[note2]]

## Source Details
| Field | Value |
|-------|-------|
| Domain | [domain] |
| Author | [author or "Unknown"] |
| Published | [date or "Unknown"] |
| Word Count | [~X words] |
| Read Time | [~X minutes] |

## Processing Notes
- **Extracted:** [timestamp]
- **Category Confidence:** [percentage]
- **Review Needed:** [yes|no] - [reason if yes]

---

*Processed by COG URL Curator*

Save to appropriate location:

Standard: 05-knowledge/booklets/[category-slug]/[title-slug]-YYYY-MM-DD.md
Project-specific: 04-projects/[project-slug]/resources/[title-slug]-YYYY-MM-DD.md
Mixed/Unclear: 00-inbox/url-[title-slug]-YYYY-MM-DD.md

6. Tool/Resource Special Handling

For tools and software, use enhanced template:

---
type: "url-tool"
category: "tools"
domain: "[domain]"
url: "[url]"
title: "[tool-name]"
date_saved: "YYYY-MM-DD"
pricing: "[free|freemium|paid|enterprise]"
tags: ["#tool", "#category-tags"]
status: "to-evaluate"
---

# [Tool Name]

## What It Does
[1-2 sentence description]

## Key Features
- Feature 1
- Feature 2
- Feature 3

## Use Cases
- Use case 1
- Use case 2

## Pricing
[Pricing details if available]

## Why It's Relevant
[Connection to user's work/interests]

## Evaluation Status
- [ ] Sign up / try demo 📅 [YYYY-MM-DD = date +3 days from today]
- [ ] Test key features 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Compare with alternatives 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Decision: [use|pass|revisit] 📅 [YYYY-MM-DD = date +2 weeks from today]

## Notes
[Space for user's evaluation notes]

---

*Processed by COG URL Curator*

7. Batch Processing

For multiple URLs:

Processing [X] URLs...

1. [URL 1] → [category] → Saved to [path]
2. [URL 2] → [category] → Saved to [path]
3. [URL 3] → [category] → Saved to [path]

Summary:
- Articles: 2 saved
- Tools: 1 saved
- Total: 3 URLs processed

8. Confirm Completion

Confirm file(s) created
Show user: "URL saved to [file path]"
Show quick summary: title, category, key insight preview
Ask if they want to:
- Add another URL
- Deep-dive into the content
- Connect to specific project or braindump

Booklet Structure

URLs are organized into "booklets" (category folders):

05-knowledge/
└── booklets/
    ├── articles/
    │   ├── _index.md (category overview - auto-created)
    │   └── [article-entries].md
    ├── tools/
    │   ├── _index.md
    │   └── [tool-entries].md
    ├── reference/
    │   ├── _index.md
    │   └── [reference-entries].md
    ├── research/
    │   ├── _index.md
    │   └── [research-entries].md
    ├── inspiration/
    │   ├── _index.md
    │   └── [inspiration-entries].md
    ├── videos/
    │   ├── _index.md
    │   └── [video-entries].md
    └── [custom-category]/
        ├── _index.md
        └── [entries].md

Category Index Template

When creating a new category, also create an index file:

---
type: "booklet-index"
category: "[category-name]"
created: "YYYY-MM-DD"
last_updated: "YYYY-MM-DD"
entry_count: 0
---

# [Category Name] Booklet

## Description
[What this category contains]

## Recent Additions
[Auto-updated list - most recent 10 entries]

## Top Entries
[Manually curated or most-accessed entries]

## Tags in This Category
[List of common tags used]

## Related Categories
- [[other-category-1]]
- [[other-category-2]]

YAML Formatting Requirements

CRITICAL: All YAML frontmatter must use proper Obsidian-compatible formatting:

All string values MUST be quoted with double quotes
Arrays MUST use quoted strings: ["item1", "item2", "item3"]
URLs MUST be quoted to handle special characters
Boolean values should NOT be quoted: true or false
Ensure proper YAML syntax to prevent parsing errors in Obsidian

Examples:

# CORRECT
type: "url-bookmark"
url: "https://example.com/path?query=value"
tags: ["#bookmark", "#article", "#ai"]
relevance: "high"
reviewed: false

# INCORRECT
type: url-bookmark
url: https://example.com/path?query=value
tags: [#bookmark, #article, #ai]
relevance: high
reviewed: "false"

Verification Protocols

Content Accuracy

Title Verification: Ensure extracted title matches page
Author Attribution: Verify author if stated
Date Accuracy: Confirm publication date if shown
Summary Fidelity: Ensure summary accurately represents content

Categorization Verification

Category Fit: Confirm content matches selected category
Tag Relevance: Verify tags accurately describe content
Interest Alignment: Confirm relevance assessment is accurate
Project Connection: Verify project relevance if claimed

Quality Checks

Completeness: All required fields populated
Formatting: Proper markdown and YAML syntax
Links: All internal links valid
Metadata: Frontmatter properly formatted

Uncertainty Handling

When Content is Unclear

Paywalled Content: Note limitation, extract available preview
Dynamic Content: Note if content may change
Complex Content: Flag for manual review if needed
Non-English: Note language, provide translation if possible

Confidence Indicators

High Confidence (90%+): Clear content with obvious categorization
Medium Confidence (70-89%): Generally clear with some ambiguity
Low Confidence (50-69%): Significant ambiguity requiring user input
Very Low Confidence (<50%): Major uncertainty, save to inbox

Always explicitly state confidence levels and reasoning in processing notes.

Integration with Other Skills

Immediate Follow-up

After URL capture, suggest:

/braindump - Capture thoughts about the URL
/knowledge-consolidation - Integrate into knowledge frameworks
Daily brief will surface relevant saved URLs

Cross-Referencing

Automatically check for connections to:

Active projects (from MY-PROFILE.md)
Recent braindumps
Competitive watchlist companies (if exists)
User interests

Success Metrics

Speed of capture (< 30 seconds for single URL)
Accurate categorization with user confirmation
Useful insight extraction
Proper integration with existing knowledge
Easy retrieval and discovery later
High confidence in extractions

Learning and Adaptation

Pattern Learning

Track which bookmarks get revisited
Learn user's categorization preferences
Improve relevance scoring based on engagement
Refine insight extraction based on what user finds useful

Continuous Improvement

Monitor categorization accuracy over time
Adapt to user's preferred tag taxonomy
Learn domain-specific terminology
Improve cross-referencing accuracy

About

SKILL.md

About

Quick capture URLs with automatic content extraction, insights, and categorization into knowledge booklets

SKILL.md

COG URL Dump Skill

Purpose

When to Invoke

User shares a URL they want to save
User says "save this link", "bookmark this", "url dump", or "save for later"
User pastes a URL and wants to capture it
User wants to organize web resources into their knowledge base

Agent Mode Awareness

Check agent_mode in 00-inbox/MY-PROFILE.md frontmatter:

If agent_mode: team — delegate content extraction, analysis, and categorization to a sub-agent while handling user interaction directly. The sub-agent fetches URL content, generates insights, and returns structured results for filing.
If agent_mode: solo (default) — handle everything directly in the conversation. No delegation.

Pre-Flight Check

Before executing, check for user profile:

Look for 00-inbox/MY-PROFILE.md in the vault

If NOT found:

Welcome to COG! It looks like this is your first time.

Before we start, let's quickly set up your profile (takes 2 minutes).

Would you like to run onboarding first, or should I proceed with default settings?

If found:
- Read the profile to get user's interests and projects
- Use interests to help with auto-categorization
- Check for existing booklet categories in 05-knowledge/booklets/

Process Flow

1. User Interaction & Input Collection

Accept URL(s) from the user (single URL or batch)
Optionally accept user's quick note about why they're saving this
Accept any format: bare URL, markdown link, or with notes

Prompt:

What URL(s) would you like to save?
(You can paste one or more URLs, optionally with a note about why you're saving it)

2. URL Validation & Fetch

Validate URL format
Check if URL is accessible
Detect duplicate URLs in existing knowledge base
Fetch the web page content

Content Extraction

Extract from the page:

Page Title: [extracted-title]
Meta Description: [if available]
Author: [if detected]
Published Date: [if detected]
Word Count: [estimated]
Read Time: [X minutes]
Main Content: [extracted body text]
Key Headings: [list of H1/H2s]

3. Category Selection

Default Categories:

Articles & Blogs: Long-form content, tutorials, opinion pieces
Tools & Resources: Software, utilities, services, APIs
Reference: Documentation, specs, standards
Research: Papers, studies, academic content
Inspiration: Design, ideas, creative references
Videos & Media: YouTube, podcasts, multimedia
News & Updates: Industry news, announcements
Project-Specific: Related to a specific project (offer project list from MY-PROFILE.md)
To Review: Unsure, save for later categorization

Custom Categories:

Check 05-knowledge/booklets/ for existing custom categories
Offer to create new category if needed

Auto-suggestion: Based on content analysis, suggest the most likely category but let user confirm or change.

4. Content Analysis and Processing

Phase 1: Content Classification

Determine:

Content Category: [article|tool|reference|research|video|news|etc]
Primary Topics: [topic1, topic2, topic3]
Tone: [informative|opinion|tutorial|news|etc]
Quality Assessment: [high|medium|low]
Credibility Indicators: [author credentials, citations, etc]

Phase 2: Insight Extraction

Generate:

Executive Summary: [2-3 sentences]
Key Insights:
1. [Insight 1 with context]
2. [Insight 2 with context]
3. [Insight 3 with context]
Notable Quotes: [if any stand out]
Action Items: [practical takeaways]

Phase 3: Relevance Assessment

Analyze:

User Interest Match: [high|medium|low] - [which interests from profile]
Project Relevance: [project-name] - [why relevant]
Knowledge Gap: [yes|no] - [what gap it fills]
Timeliness: [evergreen|current|dated]
Uniqueness: [novel|common|duplicate-adjacent]

Phase 4: Cross-Reference

Identify connections to:

Related Bookmarks: [existing similar saves]
Related Braindumps: [if content connects]
Related Projects: [if applicable]
Suggested Tags: [tag1, tag2, tag3]

5. Generate Structured Output

Create bookmark file with this structure:

---
type: "url-bookmark"
category: "[category-name]"
domain: "[source-domain.com]"
date_saved: "YYYY-MM-DD"
date_accessed: "YYYY-MM-DD HH:MM"
url: "[original-url]"
title: "[page-title]"
author: "[author-if-available]"
published: "[publish-date-if-available]"
tags: ["#bookmark", "#category-tag", "#topic-tags"]
relevance: "[high|medium|low]"
status: "unread"
related_projects: ["project1", "project2"]
confidence: "[high|medium|low]"
---

# [Title]

## Quick Summary
[2-3 sentence summary of the content]

## Key Insights
- **Insight 1:** [description with context]
- **Insight 2:** [description with context]
- **Insight 3:** [description with context]

## Why This Matters
[Connection to user's interests/projects. What makes this worth saving?]

## User Note
[Original user note if provided, otherwise omit section]

## Content Highlights
[Key excerpts or quotes from the content - 200-400 words max]

## Practical Takeaways
- [ ] [Action item 1 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] [Action item 2 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]

## Related Knowledge
- **Similar Bookmarks:** [[bookmark1]], [[bookmark2]]
- **Connected Projects:** [[project1]]
- **Related Notes:** [[note1]], [[note2]]

## Source Details
| Field | Value |
|-------|-------|
| Domain | [domain] |
| Author | [author or "Unknown"] |
| Published | [date or "Unknown"] |
| Word Count | [~X words] |
| Read Time | [~X minutes] |

## Processing Notes
- **Extracted:** [timestamp]
- **Category Confidence:** [percentage]
- **Review Needed:** [yes|no] - [reason if yes]

---

*Processed by COG URL Curator*

Save to appropriate location:

Standard: 05-knowledge/booklets/[category-slug]/[title-slug]-YYYY-MM-DD.md
Project-specific: 04-projects/[project-slug]/resources/[title-slug]-YYYY-MM-DD.md
Mixed/Unclear: 00-inbox/url-[title-slug]-YYYY-MM-DD.md

6. Tool/Resource Special Handling

For tools and software, use enhanced template:

---
type: "url-tool"
category: "tools"
domain: "[domain]"
url: "[url]"
title: "[tool-name]"
date_saved: "YYYY-MM-DD"
pricing: "[free|freemium|paid|enterprise]"
tags: ["#tool", "#category-tags"]
status: "to-evaluate"
---

# [Tool Name]

## What It Does
[1-2 sentence description]

## Key Features
- Feature 1
- Feature 2
- Feature 3

## Use Cases
- Use case 1
- Use case 2

## Pricing
[Pricing details if available]

## Why It's Relevant
[Connection to user's work/interests]

## Evaluation Status
- [ ] Sign up / try demo 📅 [YYYY-MM-DD = date +3 days from today]
- [ ] Test key features 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Compare with alternatives 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Decision: [use|pass|revisit] 📅 [YYYY-MM-DD = date +2 weeks from today]

## Notes
[Space for user's evaluation notes]

---

*Processed by COG URL Curator*

7. Batch Processing

For multiple URLs:

Processing [X] URLs...

1. [URL 1] → [category] → Saved to [path]
2. [URL 2] → [category] → Saved to [path]
3. [URL 3] → [category] → Saved to [path]

Summary:
- Articles: 2 saved
- Tools: 1 saved
- Total: 3 URLs processed

8. Confirm Completion

Confirm file(s) created
Show user: "URL saved to [file path]"
Show quick summary: title, category, key insight preview
Ask if they want to:
- Add another URL
- Deep-dive into the content
- Connect to specific project or braindump

Booklet Structure

URLs are organized into "booklets" (category folders):

05-knowledge/
└── booklets/
    ├── articles/
    │   ├── _index.md (category overview - auto-created)
    │   └── [article-entries].md
    ├── tools/
    │   ├── _index.md
    │   └── [tool-entries].md
    ├── reference/
    │   ├── _index.md
    │   └── [reference-entries].md
    ├── research/
    │   ├── _index.md
    │   └── [research-entries].md
    ├── inspiration/
    │   ├── _index.md
    │   └── [inspiration-entries].md
    ├── videos/
    │   ├── _index.md
    │   └── [video-entries].md
    └── [custom-category]/
        ├── _index.md
        └── [entries].md

Category Index Template

When creating a new category, also create an index file:

---
type: "booklet-index"
category: "[category-name]"
created: "YYYY-MM-DD"
last_updated: "YYYY-MM-DD"
entry_count: 0
---

# [Category Name] Booklet

## Description
[What this category contains]

## Recent Additions
[Auto-updated list - most recent 10 entries]

## Top Entries
[Manually curated or most-accessed entries]

## Tags in This Category
[List of common tags used]

## Related Categories
- [[other-category-1]]
- [[other-category-2]]

YAML Formatting Requirements

CRITICAL: All YAML frontmatter must use proper Obsidian-compatible formatting:

All string values MUST be quoted with double quotes
Arrays MUST use quoted strings: ["item1", "item2", "item3"]
URLs MUST be quoted to handle special characters
Boolean values should NOT be quoted: true or false
Ensure proper YAML syntax to prevent parsing errors in Obsidian

Examples:

# CORRECT
type: "url-bookmark"
url: "https://example.com/path?query=value"
tags: ["#bookmark", "#article", "#ai"]
relevance: "high"
reviewed: false

# INCORRECT
type: url-bookmark
url: https://example.com/path?query=value
tags: [#bookmark, #article, #ai]
relevance: high
reviewed: "false"

Verification Protocols

Content Accuracy

Title Verification: Ensure extracted title matches page
Author Attribution: Verify author if stated
Date Accuracy: Confirm publication date if shown
Summary Fidelity: Ensure summary accurately represents content

Categorization Verification

Category Fit: Confirm content matches selected category
Tag Relevance: Verify tags accurately describe content
Interest Alignment: Confirm relevance assessment is accurate
Project Connection: Verify project relevance if claimed

Quality Checks

Completeness: All required fields populated
Formatting: Proper markdown and YAML syntax
Links: All internal links valid
Metadata: Frontmatter properly formatted

Uncertainty Handling

When Content is Unclear

Paywalled Content: Note limitation, extract available preview
Dynamic Content: Note if content may change
Complex Content: Flag for manual review if needed
Non-English: Note language, provide translation if possible

Confidence Indicators

High Confidence (90%+): Clear content with obvious categorization
Medium Confidence (70-89%): Generally clear with some ambiguity
Low Confidence (50-69%): Significant ambiguity requiring user input
Very Low Confidence (<50%): Major uncertainty, save to inbox

Always explicitly state confidence levels and reasoning in processing notes.

Integration with Other Skills

Immediate Follow-up

After URL capture, suggest:

/braindump - Capture thoughts about the URL
/knowledge-consolidation - Integrate into knowledge frameworks
Daily brief will surface relevant saved URLs

Cross-Referencing

Automatically check for connections to:

Active projects (from MY-PROFILE.md)
Recent braindumps
Competitive watchlist companies (if exists)
User interests

Success Metrics

Speed of capture (< 30 seconds for single URL)
Accurate categorization with user confirmation
Useful insight extraction
Proper integration with existing knowledge
Easy retrieval and discovery later
High confidence in extractions