Source and Method

SamNav indexes public posts from Sam Altman's blog.

The source is the public site itself:

the main blog
the public archive
the public Atom feed
individual public post pages

The crawler first checks the public feed, then reads the archive by date. The feed is useful, but it only exposes the newest posts. The full archive is organized through month-based archive pages, so SamNav follows those public date routes to discover older posts.

Once post URLs are discovered, the indexing script fetches each public post page and extracts basic information:

title
publication date
original URL
canonical URL
visible post text for local processing
excerpt hints, when available

That local crawl data is then passed through an enrichment step. The enrichment script creates public-safe metadata: summaries, short excerpts, topics, reading time, word count, related-post references, previous and next links, and Navigation Severity Scores.

The public site is generated from that enriched metadata. Full post text is not included in the public data file or rendered pages.

A separate offline enrichment command sends each complete public article to OpenAI's Responses API to generate a structured semantic dossier. Each dossier contains paraphrased retrieval metadata such as a deeper synopsis, claims, answerable questions, concepts, caveats, and authorship context. Requests use store: false, unchanged posts are cached by content hash and configuration, and neither the article bodies nor the dossier file are served as part of the public site.

Search is powered by Pagefind, which indexes the generated SamNav pages after the site is built. That means search covers SamNav's metadata, summaries, topics, paths, and page text, not the full original article bodies.

Ask Sam first creates a query embedding for the reader's question, then combines semantic similarity with a local keyword index and the archive's related-post and reading-path links. The resulting shortlist contains public-safe metadata and compact semantic dossiers for fourteen candidate posts by default. OpenAI's Responses API may choose IDs from that shortlist, order a route, assign navigational roles, and explain why each stop belongs. Titles, dates, topics, severity scores, SamNav links, and original source URLs are always attached afterward from the local catalog. Full post text is never sent with these live requests.

Before that process begins, the endpoint applies a per-client request limit, verifies a Cloudflare Turnstile challenge when production enforcement is enabled, and screens the question with OpenAI's moderation service. Production rate limits can be shared across application instances through a TLS Redis-compatible Valkey service. Client network addresses are converted to one-way pseudonymous keys before rate-limit counters are stored, and raw questions are not written to the rate-limit store or ordinary logs.

The site is built with Astro, TypeScript, Tailwind, and Pagefind. Existing pages are generated as static HTML, while the Ask Sam endpoint is rendered on demand by Astro's standalone Node server in the production Docker container.

Corrections, removals, and other concerns can be sent to [email protected] .

In plain terms: SamNav follows the public links, adds a little structure, and sends readers back to the source.