SamNav indexes public posts from Sam Altman's blog.
The source is the public site itself:
- the main blog
- the public archive
- the public Atom feed
- individual public post pages
The crawler first checks the public feed, then reads the archive by date. The feed is useful, but it only exposes the newest posts. The full archive is organized through month-based archive pages, so SamNav follows those public date routes to discover older posts.
Once post URLs are discovered, the indexing script fetches each public post page and extracts basic information:
- title
- publication date
- original URL
- canonical URL
- visible post text for local processing
- excerpt hints, when available
That local crawl data is then passed through an enrichment step. The enrichment script creates public-safe metadata: summaries, short excerpts, topics, reading time, word count, related-post references, previous and next links, and Navigation Severity Scores.
The public site is generated from that enriched metadata. Full post text is not included in the public data file or rendered pages.
Search is powered by Pagefind, which indexes the generated SamNav pages after the site is built. That means search covers SamNav's metadata, summaries, topics, paths, and page text, not the full original article bodies.
The site is built with Astro, TypeScript, Tailwind, and Pagefind. The production version is a static build served from a Docker container using Nginx.
Corrections, removals, and other concerns can be sent through the contact page.
In plain terms: SamNav follows the public links, adds a little structure, and sends readers back to the source.