Skip to content

Read Aloud

Read Aloud lets you listen to articles and other text sources in FeynmanLM. It is designed for the moments when reading is awkward but listening works: walks, chores, commutes, or reviewing a source before a Feynman session.

Start listening

Open a source in the Reader and choose Read Aloud from the toolbar or source menu. FeynmanLM extracts the readable text, prepares audio, and shows playback controls at the bottom of the reader.

Playback controls include:

  • Pause and resume
  • Back 15 seconds
  • Forward 15 seconds
  • Speed selection from 0.75x to 2x
  • Stop

Text-to-speech providers

Choose your Read Aloud provider in Settings -> Text-to-Speech.

Supported modes include:

  • On Device: macOS speech synthesis, free and local
  • Groq
  • OpenAI
  • Together AI
  • Deepgram
  • ElevenLabs

Cloud providers require the matching API key in Settings and may incur provider usage costs. On-device speech does not send text to a third-party TTS API.

Caching

For cloud TTS providers, generated audio can be cached so replaying the same source does not regenerate every chunk. Studio can show a Read Aloud cache indicator for sources with cached audio.

If the source changes or the chosen provider, model, or voice changes, FeynmanLM may need to generate fresh audio.

When cloud TTS finishes generating every chunk for a source, FeynmanLM also writes a synced audio manifest, source text, and chunk files to iCloud Drive. The iPhone companion app lists only sources with complete synced audio, so it does not attempt text-to-speech generation while you are away from your Mac. The iPhone listener starts hands-free voice control on launch. You can talk naturally about what you want to listen to, review, ask about, or change; an agent interprets the request, chooses the right playback or source-review tool, and keeps the conversation grounded in the synced source text. If you pause while listening because you want to talk about the article, the app loads the current source into voice chat so your next question can be answered from the text.

iPhone voice agent

The iPhone listener is designed as a voice-first interface, not a manual playback screen. It listens for natural speech, sends the transcript plus current app state to the voice agent, and the agent chooses one app action. The examples below describe what is implemented, not exact phrases you must memorize.

The iPhone screen shows listening status, the available synced sources, and the current source. It does not expose manual playback controls for source selection, seeking, speed changes, or play/pause. Those actions are handled by voice.

The agent can control listening:

  • Start playing a source by title, position, current source, or best available default
  • Pause, resume, or stop playback
  • Move forward or backward in the current audio
  • Change playback speed between 0.75x and 2x
  • Tell you what source is currently active
  • List synced sources that are available on the phone
  • Refresh the synced audio library

The agent can enter source chat:

  • Start a voice review for the current source or another synced source
  • Pause the current TTS audio and prepare the source text for questions
  • Treat a spoken question during playback as a request to pause and answer from the current text
  • Continue a source-grounded conversation once review mode is active
  • End review mode and optionally return to audio playback

Typical flows:

  • Start the iPhone app in the car and ask it to play an article.
  • While listening, ask to pause so you can talk about the article. FeynmanLM pauses playback, loads the synced source text, and waits for your question.
  • Ask any question about the text. The source chat answers from the synced source text and speaks the response aloud.
  • Continue asking follow-up questions naturally.
  • Ask to go back to listening when you are done reviewing.

Voice chat currently uses turn-based speech recognition and spoken responses. It is hands-free after launch, but it is not a full-duplex realtime audio stream.

Voice agent and source chat require an OpenAI API key on the iPhone. Source chat also requires synced source text, which is written when cloud TTS finishes preparing the source on the Mac.

Good use cases

Read Aloud pairs especially well with Schedule:

  • Listen to today's assigned article
  • Pause to make a quick note or ask Chat a question
  • Start a Feynman review after listening
  • Mark the scheduled source complete after the review is linked