Layer 2 is the AI-generated atmospheric drone — the F-minor pentatonic shimmer that sits under the bowls and carries the harmonic field. It's the layer that lets us say "never the same audio twice." The integration is built end-to-end and gated on a single env variable. Set STABLE_AUDIO_API_KEY on the server and Layer 02 flips from COMING SOON to LIVE in the Recipe Inspector — no other change required.

Why AI for this layer

The brief is wrong if it says "use AI for everything." The brief is right when it says use AI for the layer where infinite variation matters and the listener has no strong prior about what it should sound like. Ambient pads fit. Singing bowls don't.

Concretely: every AmberRoom session generates a unique pad. The recipe engine takes the intent's parameters (key, mode, harmonic density, noise correlation) and constructs a prompt for Stable Audio, the AI gets called once per session, the result gets cached against the prompt hash, and the audio gets streamed into Howler alongside Layer 1 + 3.

The point of generation isn't novelty for novelty's sake. It's that the brain develops tolerance to repeated audio — what habituation researchers call "the loop tune-out" — and a subtly different pad each session keeps the listener engaged across multi-week use.

Why Stable Audio specifically

We evaluated three AI music generators against three criteria: license cleanliness, ambient/instrumental quality, and API availability for production use.

Stable Audio (Stability AI). Clean commercial license, designed for instrumental and ambient generation, mature API, controllable via text prompt + duration + seed. Recommended primary.
ElevenLabs Music. Clean enterprise license, quality competitive with Stable. Same vendor that handles voice (Layer 4), so single-vendor simplicity if budget allows. Worth A/B testing in production.
Suno via API. Pro tier with Warner license deal, but evolving terms and ongoing label legal action. Higher song-output quality than Stable but biased toward vocals-with-lyrics, not the ambient pads we need. Avoid for production.
MusicGen / Riffusion (open source). Free to self-host. Quality is below Stable but close enough for a budget-constrained backup. Worth keeping as a fallback if Stable Audio pricing or terms change.

Cost and caching

Per-minute generation runs roughly $0.05–0.15 via Stable Audio's API. A 30-minute session generated fresh would cost $1.50–$4.50 — workable but not free. Aggressive caching brings this down dramatically:

Hash-based cache. Cache key = SHA-256 of (intent + length + key + seed). If a different user requests the same parameters, we serve the cached file. Most sessions cache-hit within a week.
Pre-generation queue. Background job generates the next-likely-needed pads during off-peak hours. Common combinations (anxiety + 30 min, sleep + 60 min) are pre-warmed.
Per-user freshness. A returning user gets a different cached pad each session — variety without re-billing. Once they exhaust the cache for their parameter set, we generate a new one.

Realistic blended cost per session at scale: $0.05–$0.20. At $9/mo with 20 sessions per month per Pro user, gross margin on the audio layer alone is 70%+.

How it's wired

The server endpoint at /api/pad?intent=…&seed=… takes intent + seed index, builds the Stable Audio prompt, returns cached MP3 or freshly generated audio.
Disk cache lives at .next/cache/pad/{intent}_{seed}.mp3. Cache hits don't count toward the daily generation budget (default 80/day, tunable via PAD_DAILY_BUDGET).
The client-side pad runner decodes the MP3 into an AudioBuffer, then loops with a 4-second equal-power crossfade so there's no audible seam.
The orchestrator routes the pad through the same convolution reverb chain as bowls + binaural + noise, then to destination.
Per-intent, four prompt variants cycle deterministically by session id — the listener feels variety while the cache stays small (~32 generations to warm the entire library).
If STABLE_AUDIO_API_KEY is unset, generation fails, decode fails, or the daily budget is exhausted, the pad layer silently no-ops and the rest of the chain continues. The Recipe Inspector reads padReady from the server-rendered recipe and shows COMING SOON until the key is configured.

Status

Built and end-to-end testable. The orchestrator wiring, server endpoint, disk cache, daily budget guard, prompt-variant rotation, and crossfade loop are all in place. The remaining work is operational: provision a Stability AI API key, set the env variable on the server, and warm the cache by hitting the endpoint once per (intent, seed) combination. At Stable Audio's pricing, fully warming the library is under $5.

The ambient pad. Different every time.

Why AI for this layer

Why Stable Audio specifically

Cost and caching

How it's wired

Status