Skip to content

leomajewski/engine-books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

engine-books

A Claude Code engine that synthesizes multiple PDF books into a single, maximally complete study book on any topic — merging, deduplicating, and integrating content from every relevant source so that nothing worth knowing gets left behind.


The problem

You have a shelf of PDFs on a subject. Each book covers some aspects thoroughly and skips others. The best explanation of concept A is in book 3, the best examples are in book 7, and a critical caveat only appears in an appendix of book 2. Reading them sequentially takes weeks. Cross-referencing by hand is exhausting.

The result is that most of that knowledge stays unused — buried across dozens of files you never find time to reconcile.

The idea

engine-books treats your PDF collection as a distributed knowledge base and uses Claude to compile it into a single reference document. For each subtopic, it reads every source that covers it, extracts the best from each, and writes a unified passage that is more complete than any individual book.

The output is not a summary. It is a synthesis: everything any of your sources knew about the topic, deduplicated, organised, and written as one coherent book.


How it works

The engine runs in two phases, by design.

Phase 1 — /summary (map the terrain)

Claude scans the table of contents of every PDF in your pdfs/ folder, finds all chapters relevant to your topic, and produces a merged outline that maps each subsection to the specific pages in each book that cover it.

You review and approve this outline before any book content is generated. You can add sections, remove subtopics, reorder, or mark gaps with ⚠️ to signal that a subtopic is not covered by your books (Claude will fill it from general knowledge).

The outline is saved as books/summary_TOPIC.md.

Why a separate phase? The outline is cheap to generate and gives you full control over scope before any heavy reading begins. It also enforces a discipline that prevents the book from drifting — every section in the final book must be accounted for in the approved outline.

Phase 2 — /book (synthesize the book)

Claude reads the approved outline and, for each subsection, fetches the exact pages listed for it from every mapped source — in parallel. It then integrates them:

  • The richest, most detailed source leads each subsection
  • Every secondary source is read completely; anything it adds that the leading source does not cover gets incorporated
  • Only exact duplicate sentences are dropped; complementary phrasings that reinforce the same concept are kept
  • Conflicts between sources are surfaced explicitly, not silently resolved
  • Subsections marked ⚠️ are written from general knowledge, clearly flagged

The result is saved as books/TOPIC.md.


Requirements

  • Claude Code (CLI or IDE extension)
  • Python 3.11+
  • uv (recommended) or pip install pypdf

Setup

git clone https://github.com/leomajewski/engine-books
cd engine-books

# Place your PDFs here — any subfolder structure is fine
mkdir -p pdfs

Directory layout

engine-books/
├── pdfs/                       ← your source PDFs (any subfolder structure)
│
├── books/                      ← all generated output (auto-created)
│   ├── summary_TOPIC.md        ← merged outline — review and approve (Phase 1)
│   ├── TOPIC.md                ← the final synthesised book (Phase 2)
│   ├── cache/                  ← extracted page ranges, cached for re-use
│   └── text/                   ← optional: full pre-extracted texts
│
├── extract_pdf_pages.py        ← fetches specific pages or TOC from one PDF
├── extract_all_pdfs.py         ← pre-processes all PDFs to text (optional, faster)
│
└── .claude/commands/
    ├── summary.md              ← /summary skill (Phase 1)
    └── book.md                 ← /book skill (Phase 2)

Workflow

Step 1 — Generate the outline

Open Claude Code in this directory and run:

/summary <your topic>

Claude scans all PDFs, maps relevant chapters, and writes a merged outline to books/summary_TOPIC.md. Review it, adjust as needed, then tell Claude to proceed.

Step 2 — Generate the book

/book <your topic>

Claude reads the approved outline, fetches the source pages, synthesizes the content, and writes books/TOPIC.md.


Optional: pre-extract all PDFs

For large collections, run this once before generating your first book:

python extract_all_pdfs.py

This converts every PDF to a plain-text file in books/text/. During /book, Claude reads these files directly instead of extracting from PDFs in real time — noticeably faster when working with dozens of books.


Scripts reference

# Fetch specific pages from one PDF (with automatic caching)
uv run --with pypdf python extract_pdf_pages.py "pdfs/book.pdf" 42 80

# Fetch the table of contents of a PDF
uv run --with pypdf python extract_pdf_pages.py "pdfs/book.pdf" toc

# Bypass cache and re-extract
uv run --with pypdf python extract_pdf_pages.py "pdfs/book.pdf" 42 80 --no-cache

# Delete all cached extractions
uv run --with pypdf python extract_pdf_pages.py --clear-cache

# Pre-extract all PDFs in pdfs/ to books/text/
python extract_all_pdfs.py

# Pre-extract from a different directory
python extract_all_pdfs.py --dir /path/to/your/books

# Inspect what would be extracted (dry run)
python extract_all_pdfs.py --list

Notes

  • Domain-agnostic — works for any subject: science, law, medicine, history, engineering, philosophy, or anything else your PDFs cover.
  • Subfolders — organise pdfs/ however you like; the engine scans recursively.
  • Caching — extracted page ranges are cached in books/cache/. Re-running /book after editing the outline does not re-read PDFs already cached.
  • ⚠️ gaps — mark a summary entry with ⚠️ to signal a missing topic. Claude writes that section from general knowledge and labels it explicitly.
  • Incremental — generate a book on one topic today, another tomorrow. Each /summary and /book pair is independent.

License

MIT

About

Claude Code engine to synthesize multiple PDFs into one complete study book

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages