← Back to blog

A Self-Hosted “Read ↔ Listen ↔ Ask” Reader: My Journey & How to Use It

Stitching together a local EPUB reader, Orpheus TTS, and an LLM into a self-hosted reading workflow that fits how I actually read.

#ai#llm#tts#self-hosting#reading

Why I Built This (My Frustration with Traditional Reading)

I love books, papers, and long-form articles, but I kept running into the same problem: long texts demand uninterrupted focus. Often, I find myself skimming, losing track, skipping around, or simply giving up halfway because the effort to stay mentally present becomes too draining.

What I realized I needed was a new way to read that reduces friction, adapts to my attention span, and offers flexible modalities: sometimes I want to read, sometimes I want to listen, and sometimes I want to pause and question. I didn’t want to keep juggling tabs, copying and pasting between the reader and AI, or rely on cloud services. I wanted a self-hosted, streamlined tool that adapts to how I read.

That’s when I decided to stitch together three powerful but separate ideas: a minimal EPUB reader, a modern open-source TTS engine, and an LLM-ready highlight-to-text workflow. The result: a reader that lets me read, listen, or ask questions about any passage, all local.

What I Started From (and Why These Projects)

  • reader3 (by Andrej Karpathy): a lightweight, self-hosted EPUB reader. Simple, clean, no fluff. You load a book, browse chapter by chapter, and can easily copy whatever text you want.
  • Orpheus-TTS (by Canopy AI): a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone. It produces human-like, expressive speech with intonation and rhythm, far beyond robotic traditional TTS.
  • orpheus-tts-local (by Isaiah Bjork): a lightweight wrapper/client that allows running Orpheus-TTS locally (for example via LM Studio), exposing a local server for text → speech so there’s no cloud dependency.

I chose this combination because it aligned with what I value: simplicity (reader), expressivity (TTS), privacy + control (local), and flexibility (read, listen, ask).

What This Setup Gives You: Modes & Flexibility

With this hybrid toolchain, you get three flexible consumption modes:

  • Read (traditional): open an EPUB, scroll at your pace, highlight & copy text.
  • Listen (audio): send highlighted text to Orpheus-TTS, and play it aloud, perfect when your focus is shaky, or when you want to rest your eyes.
  • Ask / Process (LLM interaction): automatically copy highlighted text into a local LLM input (chat box), then ask for summaries, explanations, translations, commentary, etc.

You can fluidly switch between modes depending on your mental state, context, or time, making long reading sessions more manageable, less draining, and more interactive.

How to Set Up & Use It (Quick Start)

Prerequisites

  • A machine with a GPU (ideally), or at least enough resources to run a TTS/LLM pipeline locally.
  • Python 3.12+ and uv
  • An EPUB book (or collection) you want to read. The Gutenberg project is a good start.
  • LM Studio with orpheus-3b-0.1-ft model downloaded and your choice of conversational LLM, I usually go for qwen3-coder-30b-a3b-instruct-1m

Step-By-Step Setup

git clone git@github.com:mustafa-zidan/reader3.git
cd reader3
uv sync
# Install dependencies
uv sync [--extras tts] # add only if you want to enable tts
# Run the app
uv run python launcher.py
# visit http://127.0.0.1:8123

What Works and What’s Still Rough

Pros

  • Flexible reading modalities: read, listen, or ask depending on what you need at the moment.
  • Lower friction: no manual copy-paste juggling, cloud APIs, or vendor lock-in. Everything runs locally on your hardware.
  • Better engagement with long texts: audio helps when focus lags; LLM-interaction helps when comprehension or recall is hard; reading helps when you want to slow down and reflect.
  • Privacy and control: no data leaves your machine (unless you choose), no external dependencies beyond what you install.

Trade-offs / Limitations

  • Resource requirements: a good GPU (or at least sufficient compute) helps; otherwise, TTS or LLM generation may lag or fail.
  • Set up overhead: you need to clone, install, and configure, not “zero-click”. That might discourage casual users.
  • Chunking and pacing matter: very long passages may cause latency, or audio artifacts; you may need to split text manually or rely on chunking logic.
  • Audio vs comprehension trade-off: listening may feel effortless, but it can be tempting to treat it like background noise; you might miss the depth that slow, deliberate reading offers.

Who This Is For (And Why It Matters)

This setup is especially useful if you:

  • Deal with long-form reading: novels, non-fiction, papers, design docs, research, etc.
  • Sometimes struggle to keep focus or sit through long reading sessions.
  • Want flexibility: read when alert, listen when tired, ask when curious.
  • Value privacy and control: you don’t want your reading or thoughts going through external APIs.
  • Are comfortable with a bit of setup and tinkering, and maybe want to tweak or extend the tool for your own tastes.

For programmers, engineers, and researchers, especially those juggling dense material and limited attention, this hybrid reader can make reading a lot more bearable, adaptive, and even enjoyable again.

What’s Next: Where This Could Evolve

  • Smarter chunking & buffering logic: automatically break chapters into “listen-ready” segments to avoid lag or audio artifacts.
  • Smoother UI: seamless highlight → mode-switch (read ↔ listen ↔ ask) etc.
  • Multi-voice, multi-language support: integrate different voice models, multilingual TTS, customizable pace/emotion/voice style.
  • Deeper AI integration: on-the-fly summarization, note-taking, semantic search across the book, cross-references, concept mapping; not just “highlight → ask”, but a full reading assistant.

If you pick this up, I’d love to hear the feedback: what worked, what felt clunky, what you’d improve.

Originally published at https://medium.com/@moosezidan/a-self-hosted-read-listen-ask-reader-my-journey-how-to-use-it-8cdc5c3c9330.