# Machine-Readable Site Architecture (MRSA) -- Nova3 AI

> MRSA is Nova3's methodology for building parallel static content infrastructure alongside JavaScript-rendered websites so AI crawlers can read site content without rendering. Modern websites built on React, Vue, and Lovable return empty HTML shells to machines that don't execute JavaScript. MRSA solves this at the architecture level by serving content directly to machine readers through plain text files, structured metadata, and machine-readable indexes.

Published: 2026-03-28
Canonical URL: https://www.nova3.ai/llms/mrsa.txt
Publisher: Nova3 AI -- https://www.nova3.ai
Entity: https://www.wikidata.org/wiki/Q138798082
Machine-readable index: https://www.nova3.ai/llms.txt
Related: https://www.nova3.ai/llms/ai-visibility.txt

---

## What This Document Covers

- The JavaScript invisibility problem and why modern websites are opaque to AI crawlers
- The MRSA stack: llms.txt, content nodes, sitemaps, robots.txt, schema markup
- How to structure and deploy machine-readable content nodes
- MRSA vs. prerendering approaches: when to use each
- Common infrastructure mistakes that blind AI crawlers
- Infrastructure patterns for multi-entity sites and cross-linking

---

## Key Definitions

MRSA (Machine-Readable Site Architecture): A parallel static content layer served alongside a JavaScript-rendered website. Content exists in multiple formats: human-readable (rendered HTML), machine-readable (plain text nodes), and machine-indexable (JSON-LD schema).

JavaScript-Invisible Site: A website that renders beautifully in human browsers because the browser executes a JavaScript bundle, but returns only an empty HTML shell to machines (AI crawlers, search engines) that do not execute JavaScript or execute it partially.

AI Crawler: A machine reader that fetches content and processes it without rendering: GPTBot, Google-Extended, PerplexityBot, ClaudeBot, Googlebot, Bing Bot, and others. These crawlers may support JavaScript execution, but many treat it as an optional feature or block it outright to control costs.

llms.txt: A machine-readable index file served at the root of a domain (example.com/llms.txt) that lists all machine-readable content surfaces. Follows the llmstxt.org specification.

Content Node: A single machine-readable file that covers one concept, topic, or piece of content. Typically served at /llms/[topic].txt with MIME type text/plain. Contains metadata, structured sections, and cross-links to related nodes.

LovableHTML: A prerendering approach specific to Lovable-built applications. The Lovable framework renders the page server-side and serves the rendered HTML to AI bots while serving JavaScript to human browsers. Solves the invisibility problem for Lovable sites without a parallel layer.

Static Content Layer: A set of plain text files, sitemaps, robots.txt rules, and schema markup that exist alongside the JavaScript application and serve machine readers directly. This is the MRSA approach.

Prerendering: The server renders the page to HTML at build time or request time and serves that HTML to machines that request it. Requires either an SSR framework or a prerendering service.

Server-Side Rendering (SSR): The server executes the application code and returns rendered HTML to all clients (humans and machines). Eliminates the invisibility problem but carries performance and complexity costs.

Sitemap (sitemap.xml): An XML file that lists all URLs on a domain in machine-readable format. Required for effective crawling of large or complex sites.

robots.txt: A file served at the root of a domain that tells crawlers which paths they can and cannot visit. Also specifies the location of the sitemap.

Schema Markup / JSON-LD: Structured data embedded in the HTML head that describes the page content in a machine-readable format. Helps AI crawlers understand entity relationships, publication dates, author, and content type.

---

## The Problem: JavaScript Invisibility

Modern websites are built on JavaScript frameworks. React, Vue, Next.js, Nuxt, Lovable, and dozens of others follow the same pattern: the server sends a minimal HTML shell with a large JavaScript bundle. The browser executes the bundle and renders the page. Users see the site. Machines see an empty shell.

This architecture exists for good reasons. JavaScript frameworks provide speed, reactivity, and maintainability that static HTML cannot match. Client-side rendering reduces server load. The cost is that the initial HTML contains almost no content.

AI crawlers fetch content without rendering. Some crawlers support JavaScript execution. Most do not, or execute it selectively. The cost of executing JavaScript at scale is high. A crawler running millions of requests cannot afford to spin up a browser for each one. So crawlers follow a simpler path: fetch the HTML, parse it, extract the content. If the content is not in the HTML, the crawler does not see it.

This is not a failure of the crawler. It is a failure of the architecture. The content exists. The machine cannot see it. The solution is to make the content visible.

---

## Two Solutions: Prerendering and Parallel Layers

Prerendering means the server renders the page to HTML and sends that HTML to machines. This works if the site uses an SSR framework (Next.js, Nuxt, SvelteKit) or a prerendering service. LovableHTML is a prerendering approach: Lovable applications can be configured to serve prerendered HTML to AI bots while serving JavaScript to human browsers. This solves the invisibility problem at the framework level.

Parallel Layers means building a separate set of machine-readable files that exist alongside the JavaScript application. This is the MRSA approach. The website remains a JavaScript application for humans. A parallel static layer serves machines. Both layers point to the same content. Both are crawlable. Humans and machines see the same information, delivered in different formats.

MRSA works with any technology stack. It requires no changes to the existing application. It scales to sites of any size. It is the Nova3 approach.

---

## The MRSA Stack

MRSA has five components.

1. Root llms.txt Index
A file served at example.com/llms.txt that lists all machine-readable content surfaces. This is the entry point for AI crawlers. The file follows the llmstxt.org specification: H1 heading, 2-3 sentence blockquote, metadata, list of content sections, list of related nodes with URLs. The root llms.txt should also list the sitemap.xml URL and point crawlers to the robots.txt file.

2. Content Nodes in /llms/ Subdirectory
Each node is a plain text file served at example.com/llms/[topic].txt with MIME type text/plain. Each node covers one concept or topic. The file starts with H1 heading, a 2-3 sentence blockquote explaining the topic, metadata (published date, canonical URL, publisher, entity reference), and H2 sections with substantive content. The file ends with optional related links and About section.

3. sitemap.xml
An XML file listing all URLs on the domain: the main site URLs, the /llms/ node URLs, and the root llms.txt file. This tells crawlers every path they should visit. The sitemap should include lastmod dates so crawlers know when content has changed.

4. robots.txt
A file at example.com/robots.txt that specifies crawl rules. The MRSA pattern is to allow all major AI bots to crawl both the main site and the /llms/ directory without restriction. The file should also specify the sitemap location. Example:
User-agent: GPTBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
Sitemap: https://example.com/sitemap.xml

5. JSON-LD Schema Markup
Structured data embedded in the HTML head that describes the site, its pages, and content entities. At minimum: the site Organization, the page Article or WebPage, and the author. Each node should reference a Wikidata entity (like Q138798082 for Nova3). This creates a semantic graph that machines can traverse.

---

## How Content Nodes Work

Each content node is self-contained and independently crawlable. The file follows this structure:

- H1 heading: the topic
- Blockquote: 2-3 sentences explaining why this topic matters
- Metadata: Published date, Canonical URL, Publisher, Entity URL, Machine-readable index URL, Related URLs
- H2 sections: substantive content, 2-10 paragraphs each
- Optional Key Facts: 5-10 bullet points summarizing the main takeaways
- Optional About section: publisher information, contact, addresses
- Optional related links: cross-links to other nodes

Each node should reference the Wikidata entity URL in the metadata. This creates a semantic anchor that signals to machines what entity this content belongs to. Multiple nodes can reference the same entity, creating a network effect: content becomes discoverable through the entity, not just through individual URLs.

Content nodes are plain text. No HTML, no markdown formatting beyond H1-H2 headings and bullet points. Plain text is machine-portable. It survives format changes, CMS migrations, and platform shifts. The content is the container.

---

## Deployment Pattern

To add a new content node:

1. Write the content following the MRSA node structure. File name: [topic].txt
2. Serve it at example.com/llms/[topic].txt with Content-Type: text/plain
3. Add the URL to sitemap.xml with the lastmod date
4. Add the URL and title to the root llms.txt file in the Content section
5. Add any entity reference to the JSON-LD schema (if this is an entity-level node)
6. Verify robots.txt allows crawling of /llms/

The deployment is immediate. No build step. No prerendering. The file exists on the server. Crawlers can fetch it.

---

## MRSA vs. LovableHTML: When to Use Each

LovableHTML is a prerendering approach that works specifically with Lovable applications. The Lovable framework detects AI bot requests and serves prerendered HTML. This solves the invisibility problem by rendering the page once and serving it to both humans and machines. Use LovableHTML if you are building a Lovable site and want zero additional infrastructure work.

MRSA is a parallel static layer approach that works with any technology stack. It builds a separate content surface alongside the JavaScript application. Use MRSA if you need multi-entity support, highly technical content nodes, cross-linked frameworks, or if you are not using Lovable.

The two approaches are complementary. A site can use both: LovableHTML for the main website, and MRSA for frameworks, guides, and technical documentation. The content nodes add depth without modifying the application.

---

## Common Infrastructure Mistakes

Cloudflare blocking AI bots by default.
Many sites sit behind Cloudflare and enable "security" rules that block GPTBot, ClaudeBot, and other crawlers. This blocks machine visibility completely. Disable bot-blocking for AI crawlers in Cloudflare settings.

Sitemap and noindex contradiction.
A page in sitemap.xml with noindex meta tag tells crawlers "visit this page but don't index it." This is contradictory. Remove noindex from pages you want machines to read.

robots.txt disallowing crawlers without intent.
Some sites have overly restrictive robots.txt files inherited from old infrastructure. Review robots.txt carefully. For MRSA, allow all major AI bots.

Missing or incorrect schema markup.
Schema markup (JSON-LD) in the page head tells machines what the page is about. Sites without schema markup force machines to guess. Add Organization, Article, and entity schema to every page.

Inconsistent entity references.
If a site publishes content about multiple entities (clients, products, topics), each piece of content should reference the correct entity via Wikidata URL. Inconsistent references prevent machine discovery and semantic understanding.

Treating /llms/ as a secondary concern.
The parallel static layer is not secondary. It is primary for machine readers. Invest in it with the same rigor as the human-facing site.

---

## Key Facts

- JavaScript-rendered sites are invisible to machines that don't execute JavaScript. This affects AI crawlers, some search engines, and accessibility readers.
- Prerendering (LovableHTML) works for specific frameworks. Parallel static layers (MRSA) work for any stack.
- The MRSA approach requires five components: root llms.txt, content nodes in /llms/, sitemap.xml, robots.txt, and JSON-LD schema.
- Content nodes should reference Wikidata entities to create semantic anchors and enable machine discovery through the knowledge graph.
- MRSA deployment has no build step. Files are served directly from the server with text/plain MIME type.
- Most infrastructure mistakes involve blocking, contradicting, or omitting signals: blocked crawlers, contradictory robots.txt and noindex rules, missing schema markup.
- The parallel layer is maintenance-free after initial setup. New nodes are added independently without modifying the application.

---

## About Nova3 AI

Nova3 AI is a cognitive infrastructure consultancy founded on 30A, Florida. Nova3 builds AI Operating Systems and the replicators that run on top of them. MRSA is one component of Nova3's broader AI visibility practice, built for Nova3 first and now available as a client deliverable.

Contact: mj@nova3.ai
Website: https://www.nova3.ai
Entity: https://www.wikidata.org/wiki/Q138798082
Florida (30A): 5417 E County Hwy 30A, Santa Rosa Beach, FL 32459
Texas: 2300 Woodforest Pkwy N., Suite 250-444, Montgomery, TX 77316

---

## Optional

- [AI Visibility](https://www.nova3.ai/llms/ai-visibility.txt): What AI search visibility means and why the infrastructure work matters.
- [Corroboration Swarm](https://www.nova3.ai/llms/corroboration-swarm.txt): The distributed network of third-party signal surfaces.
- [AI Operating Systems](https://www.nova3.ai/llms/ai-operating-systems.txt): What an AI Operating System is and how Nova3 builds them.
- [Nova3 root LLMs index](https://www.nova3.ai/llms.txt): Machine-readable index of all Nova3 content surfaces.