HWD-228 · Sovereign Stream Service v1

Pure Shaka plays its own music now.

Code we wrote, encoder we packaged, bucket we own, URLs we signed, receipts we can produce on demand. No third party in the path between the synth and the customer's ear. Two minutes of the actual stream is sitting in the player below.

Live output · Pure Shaka session
Two minutes of synthesized reggae, end-to-end through real AWS.
tenant=pureshaka-sandbox · seed=808 · 4 segments × 30s · AAC 128kbps · 44.1 kHz stereo
synth: ReggaeGenerator v1.0.0 encode: ffmpeg AAC-LC container: MPEG-TS receipts: 6 signed cost: ~$0.0036

01 · The problemWhy we're building this

Streaming substrates are everywhere. Spotify, YouTube Music, Apple Music, plus a dozen middleware vendors that resell their pipes. Every one of them is someone else's runtime. Pure Shaka is a Hawaiian wellness brand. If their store plays reggae, that reggae should not pass through five other companies' infrastructure on the way to the customer's ear.

So HWD-228 is the first piece of the sovereign streaming substrate. v1 ships continuous reggae generated by a programmatic synth (no external models, no API calls), encoded to AAC inside MPEG-TS, delivered as real HLS through presigned S3. Every segment that lands in S3 also lands in howdify-receipts with an HMAC-SHA256 signature. The audio bytes, the seed, the SHA256 of the encoded segment, the generator version, the timestamp, all signed.

v2 swaps in Stable Audio Open behind the same interface. v3 adds brand-LoRA conditioning. v1 is the substrate that has to be right for all three.


02 · The architectureThe six primitives

Six source files. Each one is the minimum surface for its job. Together they compose into the streaming substrate.

generators/base.py         Generator interface. v2 swap-in lives here.
generators/synth_reggae.py ReggaeGenerator. Bar-aligned chord progression
                           with deterministic noise (snare, hihat) per seed.
encoder.py                 ffmpeg subprocess. PCM in via stdin, AAC-in-
                           MPEG-TS out via stdout. No temp files.
storage.py                 S3 segment writer + presigned URL with 5min TTL.
receipts.py                HMAC-SHA256 signing into howdify-receipts.
                           receipt_kind differentiates stream classes.
session.py                 SessionController. Composes the above into a
                           segment lifecycle: start, produce, stop.

The two design rules that drove the layout:


03 · The interface callThe stress test that forced a revision

Before wiring the deployment layer, we ran a pre-flight: mentally swap in Stable Audio Open behind the Generator interface. Does (seed, params) → (ndarray, sample_rate, metadata) carry without contortion?

It did not. The original produce() signature was produce(seed, start_bar, segment_bars). Bar-counted positioning. Stable Audio Open does not have bars. It has continuous time. And the SessionController above the generator was computing bars_per_segment from bpm, which is meaningless to a diffusion model. Reggae-specific logic was leaking up into the orchestration layer.

So we revised the interface before there was more than one implementor. produce(*, seed, segment_index, segment_seconds). Universal. Each generator translates internally. ReggaeGenerator derives bars from segment_seconds × bpm / 240. StableAudioGenerator will derive start_seconds = segment_index × segment_seconds and pass it to the diffusion model as positional conditioning.

The lesson: the right time to do the interface revision is when there is exactly one implementor. Wait until two and the blast radius doubles. The pre-deploy stress test is a five-minute exercise that paid for itself in the same hour.

Session-level config goes on the constructor, not on produce(). For v2 that means brand_lora_id, text_prompt, negative_prompt, and cfg_scale are constructor args. The produce() call stays clean: seed plus position. That separation is the load-bearing line between "tenant configuration" and "stream position."


04 · The proofTwo minutes of music, real AWS, one work session

The player at the top of this page is not a mockup. It is the literal output of one SessionController.produce_segment() loop, run four times, against the production howdify-streams-lab bucket and the howdify-receipts ledger. Here is the actual run log:

session_id = song-1779860899-2263f7
seed       = 808
segments   = 4 x 30s = 120s of music

  seg 1/4:  1.87s wall, 489,552 bytes, key=pureshaka-sandbox/.../seg-000000.ts
  seg 2/4:  1.19s wall, 486,732 bytes, key=pureshaka-sandbox/.../seg-000001.ts
  seg 3/4:  1.16s wall, 490,680 bytes, key=pureshaka-sandbox/.../seg-000002.ts
  seg 4/4:  1.20s wall, 487,672 bytes, key=pureshaka-sandbox/.../seg-000003.ts

total production: 5.6s wall (for 120s of playback)

Production was 22x faster than real-time playback. That ratio is the entire reason the system can stay ahead of the player with a small lookahead buffer running on a modest EC2.

Every segment got an HMAC-signed receipt with this canonical shape:

{
  "tenant_id":         "pureshaka-sandbox",
  "session_id":        "song-1779860899-2263f7",
  "segment_id":        "seg-000000",
  "generator":         "synth_reggae",
  "generator_version": "1.0.0",
  "seed":              808,
  "sha256_audio":      "",
  "duration_ms":       30000,
  "byte_count":        489552,
  "codec":             "aac",
  "s3_key":            "pureshaka-sandbox/.../seg-000000.ts",
  "produced_at":       "2026-05-26T22:48:..."
}

The receipt is the durable artifact. The S3 object can be reaped by the 1-day lifecycle policy and the receipt still proves what was streamed, by whom, at what time, with what audio fingerprint.

120s
Music delivered
5.6s
Wall to produce
22x
Realtime headroom
6
Signed receipts

The boundary between segments

One thing we measured carefully before declaring victory: do the seams between segments produce audible discontinuities? We generated segment A at segment_index=0 and segment B at segment_index=1, concatenated the raw PCM, and measured the sample-to-sample delta at the boundary.

Left channel boundary delta: -0.000064. Right channel boundary delta: -0.000010. Tolerance was set at ±0.05, so the seam passed by three orders of magnitude. The catch worth flagging: the ReggaeGenerator naturally ends each bar in near-silence (every instrument is rhythmic, every envelope decays inside the beat). v1 is gapless on a technicality.

Stable Audio Open will not have that free pass. It generates continuously-voiced audio, so v2 will need explicit conditioning on the prior segment's tail. That hook is already on the v2 TODO inside SessionController.produce_segment, right at the line where the fallback-to-synth wrapper will attach.


05 · The economicsWhat it costs to stream one listener for one hour

Measured against the actual AWS bill, not estimated. The dominant cost is S3 PUT requests, because every 30-second segment is a separate object in S3.

Cost itemPer stream-hourNote
S3 PUT requests$0.060120 puts × $5/10k. Dominant.
S3 GET requests$0.010Player fetches segment + manifest.
S3 egress to player$0.005~58 MB/hr of AAC.
EC2 compute share$0.0007~4% CPU on a t4g.small.
API Gateway$0.0003~242 requests.
DynamoDB writes$0.0002121 receipts, ~1 WCU each.
Kinesis shard (shared)$0.015Amortizes across all streams.
Orchestrator EC2 (shared)$0.017Holds ~25 concurrent streams.
Total, 1 stream~$0.11/hr11 cents per listener per hour.
Total, 10 streams~$0.079/stream-hrFixed costs amortize.

The cost lever with the most leverage is segment length. Going from 30-second segments to 60-second segments halves the PUT rate and drops variable cost from $0.076/hr to $0.046/hr. The tradeoff: a longer first-segment latency at session start. For Pure Shaka background music in a storefront, that latency is invisible. For an interactive product it matters more.

CloudFront in front of S3 is the second lever. Once a track is warm in the edge cache, origin requests collapse and the GET cost drops to near zero. For Pure Shaka's first listener it doesn't help. For the hundredth concurrent listener of the same track it cuts variable cost roughly in half.

v1 synth is essentially free on compute. v2 GPU inference will be the new floor.

06 · The substrate59 tests, 0 skips, 0 failures

The test surface was the thing that surprised us most. Five test files, 59 tests, end-to-end runtime under 5 seconds.

59
Tests passing
5
Source files
5
Test files
4.7s
Full suite
Test fileWhat it locks downTests
test_synth_reggae.pyDeterministic output for (seed, segment_index). Per-beat RMS ±2%, per-band FFT energy ±5%. Architecture-locked fixture comparison.8
test_encoder.pyAAC-in-MPEG-TS round trip preserves loudness within 1 dB RMS. Rejects wrong shape and unsupported channel counts.6
test_receipts.pyHMAC determinism. Canonical JSON byte-stability. Verify detects payload tampering AND signature tampering.10
test_storage.pyS3 key layout: tenant/session/seg.ts. Presign TTL is 5 minutes. Pagination works on delete.5
test_session.pyReceipt counts per lifecycle event. Failure-mode contract: exception propagates, segments list does NOT mutate, retry hits the same segment_index.11
test_workflow.pyActivity registry. Idle-expired logic. Keepalive resets the clock. Stop signal sets the flag. DTO shape.9
test_api.pyHMAC auth (unsigned requests are 401). Max-concurrent enforcement. HLS manifest shape. Presigned redirect chain.10

The failure-mode test is the one we'll lean on hardest in v2. When the diffusion model OOMs, the contract is: exception propagates, segments list does not mutate, the next call retries the same segment_index. That's how the fallback-to-synth hook stays gapless. The test locks that contract in writing.


07 · The road aheadWhat's next: Temporal, v2, the sovereign claim

Three things land in the next sprint.

1. The Temporal workflow goes live.

The workflow and activities are written and tested. They have not yet been deployed against a running Temporal cluster, because the integration smoke deserves its own sprint. The workflow checkpoints after every segment. If the worker dies at t=7s inside segment 2, replay produces segment 2 at segment_index=2, with the same seed, and the seam stays gapless. Same start, same noise, same chord. That property is the whole reason we picked Temporal.

2. The 8-hour soak.

We have a SessionController loop ready to run for 8 hours against real AWS. The script samples RSS every 30 segments, head_objects S3 keys for spot-check verification, and emits a JSONL metrics stream to /tmp/hwd228-soak-metrics.jsonl. The point is to prove no memory leak in the synth loop, no gap in segment production, no missing receipts. v2 cannot land on top of an unproven v1.

3. Stable Audio Open behind the same interface.

The Generator interface revision was made precisely so this swap is one new file and zero changes to the SessionController. StableAudioGenerator takes brand_lora_id, text_prompt, negative_prompt, cfg_scale on the constructor, accepts the same (seed, segment_index, segment_seconds) on produce(), and conditions the diffusion on the tail of the prior segment for a clean seam. The fallback-to-synth wrapper handles model failure inside produce_segment. The receipt schema does not change. The S3 layout does not change. The HLS surface does not change.

The sovereign claim that matters: every byte of audio that reaches a Pure Shaka customer was generated by code Howdify wrote, encoded by code Howdify packaged, stored in a bucket Howdify owns, delivered via a URL Howdify signed, and audited by a receipt Howdify can produce on demand. No third party in the path. No model API call out. That is what makes it a sovereign substrate, not just a streaming service.
Sovereign Tech Series

Next: the Temporal wiring and the 8-hour soak.

Episode 03 takes v1 from "works in one session" to "ran for 8 hours without dropping a segment." Follow the Howdify build-out series for the rest.