Create, read, edit, convert, repair, or inspect OpenDocument Text files (.odt).
An .odt file is an OpenDocument ZIP package. Important package files:
mimetype - should be the first ZIP entry and stored uncompressed as application/vnd.oasis.opendocument.textcontent.xml - document bodystyles.xml - named styles and page layoutmeta.xml - document metadatasettings.xml - application settingsMETA-INF/manifest.xml - package manifest| Task | Preferred approach |
|---|---|
| Extract text and structure | Use scripts/extract_text.py or parse content.xml |
| Create styled/template document | Start from an .odt template, preserve styles/page layout, edit XML |
| Create simple structured document | Generate ODT package XML directly |
| Convert Markdown/HTML/DOCX to ODT | Use Pandoc/LibreOffice only when the source already exists or interoperability requires it |
| Preserve complex formatting | Unpack the ODT, edit XML with a structured XML parser, then repack carefully |
| Inspect raw structure | unzip -l file.odt; python -m zipfile -e file.odt unpacked/ |
Before starting a real ODT task, check which tools are available:
which pandoc
python3 -c "import odf; print('odfpy available')"
Resolve the LibreOffice command as described in docs/soffice-resolver.md.
For plain text and semantic structure:
pandoc input.odt -t markdown -o output.md
For raw package inspection:
python -m zipfile -e input.odt unpacked_odt
Read content.xml with an XML parser. Do not use regex for XML edits.
Bundled scripts for common inspection tasks:
# Extract headings, paragraphs, lists, tables, and footnotes.
python scripts/extract_text.py input.odt
python scripts/extract_text.py input.odt --json
# Inspect package files, media references, styles, tables, and document structure.
python scripts/inspect_package.py input.odt
For the full script reference, see docs/script-reference.md.
An ODT document normally stores body content in content.xml under:
office:document-content
office:body
office:text
Important body elements:
text:h - real headings; text:outline-level controls hierarchytext:p - paragraphs, including styled body texttext:list, text:list-item - real liststable:table, table:table-row, table:table-cell - tablestext:section - named document sectionstext:note - footnotes/endnotes with citation and body contentdraw:frame + draw:image - embedded or linked imagestext:bookmark, text:reference-mark, text:table-of-content - references and generated structures when presentStyles are name-based. Common references include:
text:style-name for paragraphs/headings/listsdraw:style-name for image frames or shapestable:style-name and table:default-cell-style-name for tablesstyle:name in styles.xml and automatic styles in content.xmlHeaders, footers, page styles, and page layout are usually in styles.xml, not the main body. When changing page size, margins, headers, footers, or numbering, inspect style:master-page, style:page-layout, style:header, and style:footer.
ODT is an XML package and can be generated directly. Do not default to DOCX/Markdown as an intermediate when the deliverable is natively ODT.
Choose the creation path by fidelity needs:
| Scenario | Use |
|---|---|
| Institutional letter/report with exact styles, header/footer, page layout | Template-first ODT |
| Rich prose — headings, bold/italic, links, lists, tables, footnotes | Markdown authoring (create_from_markdown.py) |
| Simple generated memo/report/protocol | Direct ODT XML generation |
| Existing HTML/DOCX source or explicit cross-format conversion | Pandoc/LibreOffice conversion fallback |
Use this when layout, styles, headers, footers, or institutional formatting matter.
styles.xml for paragraph, text, table, page, header, and footer styles.content.xml for placeholder paragraphs, sections, tables, and image frames.Pictures/ and update META-INF/manifest.xml.Use this for simple structured documents. Generate a minimal package with:
mimetypecontent.xmlstyles.xmlmeta.xmlsettings.xmlMETA-INF/manifest.xmlPictures/...Minimum body structure:
office:body
office:text
text:h text:outline-level="1"
text:p
text:list
table:table
Keep direct generation deliberately small: headings, paragraphs, lists, simple tables, images, and footnotes. Add advanced fields, tracked changes, indexes, or generated tables of contents only when the task requires them and QA confirms they survive LibreOffice rendering.
When the deliverable is rich prose, write it as Markdown and convert with
create_from_markdown.py — the structure is the prose, so there is no
block-level JSON to hand-assemble:
python scripts/create_from_markdown.py article.md article.odt
python scripts/create_from_markdown.py article.md article.odt --title "Q3 Report"
The Markdown parser is standard-library only (no Pandoc dependency). It covers a pragmatic CommonMark subset plus GFM tables and footnotes:
code, links (inline + reference)[^id] + [^id]:) → text:noteInline formatting becomes text:span runs, so the output is real rich text,
not plain paragraphs. Style names are fixed (Heading1–Heading6, Body,
Quote, CodeBlock, Strong, Emphasis, Code, …); a branded styles.xml
reusing those names can be injected with inject_styles_from_file. Not
supported: indented code blocks, setext headings, raw HTML, autolinks, and
task-list checkboxes.
Use Pandoc or LibreOffice conversion when the source already exists in another format or when interoperability is the task:
pandoc input.md -o output.odt
When a reference template is available, use it to carry page styles, fonts, headers, footers, and bibliography styling:
pandoc input.md --reference-doc=template.odt -o output.odt
Set explicit heading hierarchy in the source. Avoid manually faking headings with bold text.
For one-shot conversion between ODT and Microsoft Word formats, convert.py
wraps soffice --headless --convert-to with an isolated temp profile:
# ODT → DOCX:
python scripts/convert.py doc.odt --to docx --outdir qa
# DOCX → ODT (the bridge: edit a Word document with our skills, then export back):
python scripts/convert.py source.docx --to odt --outdir qa
python scripts/replace_text.py qa/source.odt "Old text" "New text" -o edited.odt
python scripts/convert.py edited.odt --to docx --outdir qa
# Legacy MS Word 97-2003 (.doc):
python scripts/convert.py doc.odt --to doc --outdir qa
python scripts/convert.py legacy.doc --to odt --outdir qa
Fidelity caveat: soffice handles the 80% case (prose, simple tables, footnotes, basic styles) well. Round-tripping documents with complex master pages, embedded MathML, advanced bibliography features, or heavy custom formatting can lose detail. Inspect the output before relying on it.
For spreadsheets, use the ods skill's convert.py (ODS ↔ XLSX/XLS); for
presentations, the odp skill's convert.py (ODP ↔ PPTX/PPT). The skill
boundary enforces format families — cross-family conversions (e.g. ODT →
XLSX) are not supported by soffice and the script rejects them with a clear
hint.
create_minimal_odt.py and create_from_markdown.py accept --theme NAME — a
curated colour palette and font pairing applied to the generated document.
Five themes:
| Theme | Feel |
|---|---|
corporate-blue |
clean corporate blue |
warm-editorial |
cream background, terracotta serif — reports, essays |
high-contrast |
black on white, bold — accessibility, print |
slate-mono |
slate palette, monospaced headings — technical docs |
forest |
deep green, sans heading + serif body |
python scripts/create_minimal_odt.py spec.json out.odt --theme warm-editorial
python scripts/create_from_markdown.py in.md out.odt --theme forest
Without --theme the output is unchanged. Themes name fonts as stacks with a
Liberation fallback, so a themed document renders even where the first-choice
font is absent. For full branded designs (not just palette + fonts), use a
template (next section).
A template is a complete branded design (styles + page layout + master page + outline numbering) packaged as a directory. The skill ships five templates plus three tools — themes are quick palette+font tweaks; templates are full document-class designs.
skills/odt/templates/:
| Template | Use case |
|---|---|
grant-proposal |
research-grant proposal for any agency (ERC / VW / Thyssen / EU). A4, 2.5 cm margins, navy Lato headings + Source Serif body, outline-numbered 1./1.1./1.1.1. |
academic-paper |
IMRaD article (Title / Abstract / Introduction / Methods / Results / Discussion / References) with hanging-indent References style |
letterhead |
DIN-5008-ish business letter — institution placeholder header, asymmetric margins for envelope-window alignment, signature block |
cv |
academic CV — navy section headers with bottom rule, compact 2 cm margins, EntryTitle/EntryDetail/DateRange styles |
dissertation |
long-form thesis / Habilitation — A4 with 3 cm margins, 5-level outline numbering, chapter-per-page (Heading1 page-break-before), 1.4 line-height, hanging-indent bibliography |
All templates are English-first and institution-neutral. For German
localisation aimed at a third-party funder, see examples/dao/.
apply_template.py wraps inject-styles + embed-pictures + validate-refs
into one call:
# Build a base document
python scripts/create_minimal_odt.py spec.json doc.odt
# Apply a shipped template by name
python scripts/apply_template.py doc.odt \
--template-name grant-proposal -o branded.odt
# Or by path (e.g. a user-supplied template directory)
python scripts/apply_template.py doc.odt \
--template /path/to/my-template -o branded.odt
Before authoring, ask what the template offers:
python scripts/inspect_template.py \
skills/odt/templates/grant-proposal/styles.xml --json
Output is JSON with page_layouts (margins, header/footer heights),
master_pages (header/footer previews, frames), outline_styles (heading
numbering levels), named paragraph_styles / text_styles /
list_styles, and font_face_decls.
To turn an existing .odt/.ott/.docx (DOCX via the v1.11 OOXML bridge)
into a reusable template:
python scripts/extract_template.py corporate-letterhead.docx \
--name corporate-letterhead --outdir skills/odt/templates/ \
--license CC-BY-4.0 --source "https://example.com/template"
The extractor filters office:automatic-styles to keep only what master
pages reference (plus parent-style chains), copies master-page-referenced
Pictures/, and writes LICENSE.txt/PROVENANCE.md/README.md metadata.
Templates pair naturally with the scholarly stack (v0.3 – v1.10):
# Apply template, then add the full apparatus
python scripts/apply_template.py doc.odt --template-name dissertation -o branded.odt
python scripts/fill_citations.py branded.odt --source refs.bib -o branded.odt
python scripts/add_toc.py branded.odt --at start --title "Contents" --levels 4 -o branded.odt
python scripts/add_bibliography.py branded.odt --at end --title "Bibliography" -o branded.odt
python scripts/update_indexes.py branded.odt --outdir qa
Templates live inside skills/odt/, so install_skills.py bundles them
into every Smithery / skills.sh / Claude Code plugin install.
For creation and editing scripts, see docs/script-reference.md. All scripts use the Python standard library and are invoked as:
python scripts/<script_name>.py [args]
text:h headings with outline levels; do not fake headings with bold paragraphs.text:list lists; avoid manually typed bullets when generating XML.table:table structures; check table widths, cell padding, and page breaks in PDF output.For content-only edits:
--reference-doc when layout matters.For precise edits that must preserve layout:
content.xml / styles.xml with an XML library.META-INF/manifest.xml if adding or removing embedded files.mimetype first and uncompressed.Repack pattern:
cd unpacked_odt
zip -0 -X ../output.odt mimetype
zip -r -X ../output.odt . -x mimetype
Assume generated or edited ODT files have problems until proven otherwise. Writer layout can change because of fonts, page styles, table widths, image anchoring, footnotes, and conversion filters.
Extract structure:
python scripts/extract_text.py output.odt
python scripts/extract_text.py output.odt --json > qa/text.json
Check headings, paragraph order, lists, tables, image references, footnotes/endnotes, and leftover placeholders such as Lorem, TODO, XXXX, or template instructions.
Inspect and validate:
python scripts/inspect_package.py output.odt > qa/package.json
python scripts/validate_refs.py output.odt
Check that mimetype is first, required XML files exist, media targets exist, manifest entries are present, and style references are not broken.
Rendering is a design step, not only a final check. Render an early draft, look at it, fix what is wrong, then continue — do not author the whole document blind and render once at the end.
python scripts/render.py output.odt --outdir qa # PDF
python scripts/render.py output.odt --outdir qa --contact-sheet # all pages in one image
python scripts/render.py output.odt --outdir qa --png # one PNG per page
The contact sheet composes every page into a single labelled grid image — the fastest way to judge page breaks and cross-page consistency at a glance. Open the rendered PDF or contact sheet and actually look at it.
Inspect page breaks, headers/footers, table overflow, footnote placement, missing images, changed fonts, and unexpected style loss. If only Pandoc is available, pandoc output.odt -t markdown gives a partial content check.
The final pass of a loop you should already be running while authoring:
For scholarly prose with apparatus, the suite provides direct ODF-native helpers — no DOCX or pandoc-citeproc round-trip needed.
# Insert a footnote after a text anchor:
python scripts/add_footnote.py input.odt --anchor "strittige Behauptung" \
--body "Quelle: Müller 2020, S. 42" -o output.odt
# Append to the third paragraph:
python scripts/add_footnote.py input.odt --paragraph 3 --position end \
--body "Lange Anmerkung." --class endnote -o output.odt
# Inspect all notes:
python scripts/list_notes.py output.odt --json
IDs auto-increment (ftn0, ftn1, … / edn0, edn1, …) unless --id is given. Inline children (text:span, text:bookmark) around the anchor are preserved.
# Insert a single citation, source auto-detected from extension:
python scripts/add_citation.py input.odt --anchor "frühere Studien" \
--source refs.bib --key Mueller2020 -o output.odt
python scripts/add_citation.py input.odt --anchor "frühere Studien" \
--source refs.json --key Mueller2020 -o output.odt
# Manually:
python scripts/add_citation.py input.odt --anchor "frühere Studien" \
--identifier Mueller2020 --field bibliography-type=article \
--field author="Müller, K." --field year=2020 \
--field title="Beispieltitel" --field journal="ZAW" -o output.odt
# Bulk-fill pandoc-style placeholders:
python scripts/fill_citations.py template.odt --source refs.bib -o output.odt
# Scans for `[@bibkey]` markers, replaces each with text:bibliography-mark.
# Inspect citations:
python scripts/list_citations.py output.odt --json
LibreOffice renders the citation through the bibliography style. The bibliography index at document end is not generated here — let LibreOffice build it from the inserted text:bibliography-mark elements.
BibTeX support requires the optional bibtexparser dependency:
pip install open-document-skills[scholarly]
CSL-JSON works with stdlib only.
# Mark a target with a bookmark:
python scripts/add_bookmark.py input.odt --name "Kapitel3" \
--anchor "3. Methodik" -o output.odt
# Reference the target later:
python scripts/add_reference.py input.odt --ref-to "Kapitel3" --kind bookmark \
--anchor "siehe Kapitel" --display chapter -o output.odt
# Auto-numbered figure caption:
python scripts/add_sequence.py input.odt --sequence Figure --name "fig:karte" \
--anchor "Karte zeigt" -o output.odt
# Reference to the figure:
python scripts/add_sequence.py input.odt --ref-to "fig:karte" \
--anchor "siehe Abbildung" -o output.odt
# Inspect everything (bookmarks, ranges, sequences, refs):
python scripts/list_refs.py output.odt --json
text:bookmark (point + range), text:reference-mark (point + range), and text:sequence (Figure/Table/Equation) are supported. Display modes for refs: page, chapter, number, direction, text. The validator detects dangling references and duplicate names.
# LaTeX → MathML (requires pandoc):
python scripts/add_math.py input.odt --latex "E = mc^2" \
--anchor "Einstein-Gleichung" -o output.odt
# Raw MathML from a file:
python scripts/add_math.py input.odt --mathml formula.mml \
--anchor "Datierungsformel" -o output.odt
# Inline MathML XML:
python scripts/add_math.py input.odt --paragraph 3 \
--mathml-inline '<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>' \
-o output.odt
Formulas are embedded as Object N/ sub-packages — the LibreOffice-native convention — with proper manifest entries (application/vnd.oasis.opendocument.formula). LibreOffice opens, renders, and roundtrips them.
The skill ships inserters for the four ODF index types plus a marker script for
the alphabetical index. Each script writes the index container (with the
proper text:<kind>-source configuration) and an empty text:index-body
placeholder. The actual entries are filled by LibreOffice — run
update_indexes.py to dispatch the refresh headlessly, or open the document
in LibreOffice GUI and press F9 (Tools → Update → Update All Indexes).
# Table of contents over a doc's headings (default outline level 3):
python scripts/add_toc.py input.odt --at start --title "Inhalt" -o output.odt
# Bibliography (entries come from text:bibliography-mark — see add_citation.py):
python scripts/add_bibliography.py input.odt --at end --title "Literatur" -o output.odt
# Illustration index / table index — pass --sequence Figure | Table | Equation
# (entries come from text:sequence captions inserted via add_sequence.py):
python scripts/add_illustration_index.py input.odt --at end --sequence Figure -o output.odt
# Alphabetical index — combine the container with point markers:
python scripts/add_alphabetical_index.py input.odt --at end -o with_idx.odt
python scripts/add_index_mark.py with_idx.odt --anchor "Datierung" \
--key1 "Methoden" --key2 "C14" -o with_marks.odt
# Refresh all index bodies via headless soffice (mirrors recalc.py for Calc):
python scripts/update_indexes.py with_marks.odt --outdir qa
update_indexes.py writes an isolated -env:UserInstallation temp profile
with a one-off Standard/Module1.RefreshIndexes Basic library, invokes
soffice on the document + macro URL, then deletes the temp profile. The real
LibreOffice user profile is never touched. On platforms where headless macro
execution is blocked the script prints a clear diagnosis; opening the file
once in LibreOffice GUI and pressing F9 is the manual fallback.
validate_refs.py warns when an index container is structurally empty:
TOC with no matching headings, bibliography without text:bibliography-mark,
illustration index referencing a caption-sequence-name that nothing in the
body uses, or alphabetical index without any text:alphabetical-index-mark.
For document review, record edits as tracked changes a human can accept or reject, and attach comments — no DOCX round-trip needed.
# Point comment after an anchor:
python scripts/add_comment.py doc.odt --anchor "claim" \
--author "Reviewer" --text "Source?" -o out.odt
# Range comment spanning a phrase:
python scripts/add_comment.py doc.odt --start-anchor "Solar" \
--end-anchor "additions" --author "Editor" --text "Verify." -o out.odt
python scripts/list_comments.py out.odt # JSON
A comment is an office:annotation (point) or an office:annotation /
office:annotation-end pair (range), each with dc:creator, dc:date, and
a text:p body.
# Record an insertion, a deletion, or a replacement:
python scripts/track_change.py doc.odt --insert " (draft)" \
--anchor "Report" --author "Reviewer" -o out.odt
python scripts/track_change.py doc.odt --delete "very " --author "Reviewer" -o out.odt
python scripts/track_change.py doc.odt --replace "old term" \
--with "new term" --author "Reviewer" -o out.odt
python scripts/list_changes.py out.odt # JSON
# Accept or reject — all changes or one by id:
python scripts/resolve_changes.py out.odt --accept --all -o final.odt
python scripts/resolve_changes.py out.odt --reject --id ct2 -o final.odt
Each change is a text:changed-region (text:insertion / text:deletion,
with office:change-info); insertions are wrapped in text:change-start /
text:change-end markers, deletions leave a text:change marker and move
the removed text into the region. LibreOffice shows them as underline /
strike-through with a change bar. Deletions operate on a text run within one
paragraph; insertions work at any anchor.
Beyond inline text replacement, four scripts restructure an existing document — bulk restyle, insert and delete whole blocks, and edit tables.
# Bulk-restyle: apply a style to matching paragraphs/headings.
python scripts/restyle.py doc.odt --headings --style "DAO-Heading-1" -o out.odt
python scripts/restyle.py doc.odt --current-style "Body" --style "DAO-Body" -o out.odt
python scripts/restyle.py doc.odt --level 2 --style "Sub" -o out.odt
# Insert a block fragment (same JSON `blocks` format as create_minimal_odt):
python scripts/insert_blocks.py doc.odt --blocks frag.json \
--after-anchor "Introduction" -o out.odt # or --before-anchor / --at-paragraph N / --at start|end
# Delete a whole block:
python scripts/delete_block.py doc.odt --anchor "Obsolete heading" -o out.odt
python scripts/delete_block.py doc.odt --paragraph 3 --type table -o out.odt
# Edit a table by name:
python scripts/edit_table.py doc.odt --table "Results" --add-row 2024 1500 -o out.odt
python scripts/edit_table.py doc.odt --table "Results" --add-column "Note" -o out.odt
python scripts/edit_table.py doc.odt --table "Results" --set-cell 2 3 "ok" -o out.odt
python scripts/edit_table.py doc.odt --table "Results" --delete-row 5 -o out.odt
restyle.py changes text:style-name on text:p/text:h; selectors
(--current-style, --headings/--paragraphs, --level) combine.
insert_blocks.py consumes a JSON blocks array (heading/paragraph/list/
table); for images and footnotes use add_image.py/add_footnote.py.
edit_table.py expands number-columns-repeated/number-rows-repeated
before editing, so it works on LibreOffice-saved tables too.
Pictures/...; the manifest must include them.Part of the open-document-skills suite: