A Self-Hosted Read / Listen / Ask Reader

I read a lot of long-form: books, papers, the kind of essay that wants forty uninterrupted minutes. I am also, like most people, not great at giving any of it forty uninterrupted minutes. The failure mode isn’t “I lost interest.” It’s that the modality is wrong for the moment: I’m tired and trying to read, or I’m walking and trying to highlight, or I want to interrogate a paragraph and the closest LLM is three tab-switches away.

So I built a reader where the modality can follow my attention instead of fighting it. One web app, running on my laptop: EPUBs and PDFs in the browser, a local voice that reads aloud, and a local LLM I can point at whatever I just selected. Read, listen, ask. Three modes on one page, and nothing leaves the machine. The repo is mustafa-zidan/reader3.

The three modes

Read. EPUB and PDF with infinite scroll. Highlight in five colors, leave notes, drop bookmarks. Full-text search runs across every book you’ve loaded (Ctrl/⌘+F). Double-click a word to get a dictionary definition you can save to a vocabulary list. Scroll position saves as you go, so reopening a book lands you where you left off.
Listen. A speaker button next to each paragraph. The voice is Kokoro, an 82M-parameter TTS model running locally through mlx-audio on Apple Silicon: around fifty voices spanning eight languages, generated server-side and handed back as a WAV. Skip the TTS extra and it falls back to the browser’s built-in speech synthesis, which needs no setup at all.
Ask. Select a passage, send it to the chat sidebar, and ask “what’s this author actually claiming?” without retyping anything. The chat talks to a local model through LM Studio or Ollama, and history is kept per book.

I wanted what every commercial reader-plus-AI product is selling, with the data staying on my laptop: a simple UI, an expressive voice, and an LLM I can aim at the exact text in front of me.

What the loop looks like

You open a book and read until you don’t want to read anymore. When that happens you hand the next few paragraphs to the voice and keep walking. When a claim doesn’t sit right, you select it and ask the model. The win isn’t any single mode; it’s that switching between them costs nothing, so a dip in focus stops bouncing me out of the book entirely.

Running it locally

You’ll need Python 3.12+ and uv. The ask side wants LM Studio or Ollama running with a model loaded (LM Studio defaults to http://localhost:1234/v1, Ollama to http://localhost:11434; you point the reader at it from the AI settings panel). The listen side wants the tts extra, which pulls mlx-audio and is oriented at Apple Silicon.

git clone git@github.com:mustafa-zidan/reader3.git
cd reader3
uv sync                    # reader + chat
uv sync --extra tts        # add the local Kokoro voice
uv run python -m reader3   # opens http://127.0.0.1:8123

The browser opens on its own. Upload an EPUB or PDF from the library page and start reading. Project Gutenberg is the lowest-friction source.

Where it’s still rough

Kokoro is local and quick for a sentence, but a long selection still generates as one blocking call: the audio waits for the whole passage before the first word plays. Chunked, streaming generation is the fix I want next, so the first sentence doesn’t sit behind the last. The chat is only as good as the model you load, and “send selection to chat” is a handoff, not a conversation that follows you down the page. The thing I actually want there is semantic search across the whole book (“where did this character first show up?”), not just questions about the paragraph on screen.

The bigger trap has nothing to do with code. Listening is easier than reading, which is exactly why it slides into background noise for the parts of a book that deserve to be read slowly. Mode-switching is a feature; it isn’t a license to always pick the easy mode. The modes are not interchangeable, and the tool can’t decide for you which one a given page deserves.

If you run it and something breaks, I’d genuinely like to hear what you changed.