HALO · Lab POC · Day-1 results

We ran sovereign hybrid AI end to end today.

Real Qwen3 inference on infrastructure we own. A signed receipt chain across three tiers, written to our own audit ledger. A live tamper test that caught a single mutated byte. Built and torn down in one work session, with the cost broken out below.

Chain verified Tamper detected Real Qwen3 inference ~$14.50 total spend $0/day going forward
HALO three-tier pipeline diagram with cryptographic receipt ledger linking sensor edge, local routing phone, and sovereign cloud substrate via parent-pointer chains Tap to expand
The full HALO pipeline. Today's run exercised the substrate column with real Qwen3 inference and the receipt ledger underneath with cryptographic chain verification.

01What we set out to run

The HALO project is a sovereign hybrid AI architecture: an embodied sensor tier, a local routing tier on a phone, and an inference substrate running open-weights MoE inference on infrastructure the tenant controls. The previous post laid out the wire protocol, the receipt schema, and the work packages. The job for this session was simpler: actually run it end to end, with real inference and signed receipts, on real AWS.

Scope for the Day-1 run:


02The actual run, end to end

Below is the literal log from the working run, with the loading-progress noise trimmed.

=== HALO end-to-end demo at 2026-05-27T22:19:09 UTC ===
session_id = 323fbf8f-0c73-481f-8b7b-03fc7a325abd
tenant_id  = halo-demo-tenant
model      = /data/models/Qwen3-30B-A3B

[0/5] derive HKDF keys
  TIER_KEY[0..2] derived (32B each)

[1/5] TIER 0: simulated sensor capture, audio chunk
  receipt_id=d80e9016-..., hmac=3e66a9b0...   ✓ signed

[2/5] TIER 1: routing decision, escalate to substrate
  receipt_id=803e334e-..., hmac=02a04a2a...   ✓ signed, parent-pointer to T0

[3/5] TIER 2: load Qwen3-30B-A3B
  model loaded in 9.6s
  prompt: 'What is sovereign inference, in one sentence?' (9 tokens)
  generated 32 tokens in 5.1s = 6.260 tok/s
  decoded: ' Sovereign inference is the process by which a sovereign
            entity, such as a nation-state, draws logical conclusions
            or makes decisions based on its own authority, independent of'
  receipt_id=adbb6cb9-..., hmac=712fc224...   ✓ signed, parent-pointer to T1

[4/5] HALO-VERIFY: walk chain, check HMACs + parent pointers
  CLEAN PATH RESULT: PASS - 3 receipts verified, chain intact

[5/5] TAMPER TEST: flip one byte in Tier 2 payload, re-verify
  mutated event_payload.output_preview on receipt adbb6cb9-...
  TAMPERED PATH RESULT: CORRECTLY DETECTED
    HMAC FAIL at receipt adbb6cb9-... (tier 2, kind halo_tier_response)

=== demo done at 2026-05-27T22:29:33 UTC ===
3 receipts in howdify-receipts (org_id=halo-demo-tenant)
3
Signed receipts
6.26
tok/s decode
9.6s
Model load
100%
Tamper catch rate

Notes on the inference number. Qwen3-30B-A3B is a 30B-parameter Mixture of Experts model with 3B active parameters per token. We ran it in plain Hugging Face transformers on CPU, with no specialized inference framework, on a single r7iz.4xlarge instance (16 vCPU, Intel Xeon Gold 6455B with AMX). 6.26 tokens per second is the unaccelerated baseline: the next phase wires the KTransformers AMX kernels into the inference path, which should land the substrate in the 30+ tok/s range we projected in the previous post.

Three receipts were written to our existing audit ledger, parent-pointer linked, ready for any future audit. The session id 323fbf8f-0c73-481f-8b7b-03fc7a325abd can be replayed at any time by the verifier.


03The tamper test (the moment that matters)

Logging a chain is one thing. Detecting active tampering is what makes the chain matter.

The test was straightforward. After the verifier reported PASS on the clean chain, we read the Tier 2 receipt back from DynamoDB, mutated a single string inside its event payload, and wrote it back. Then we re-ran the verifier on the same session id.

The verifier deterministically identified:

No false alarms on the upstream tiers. No silent acceptance. The mutation broke verification at the next downstream check, exactly where the protocol said it should.

This is the bit that compounds. Inference models commoditize on a 12-month clock. Hardware commoditizes faster. The cross-tier audit chain is the architectural element that survives those cycles and gives regulated industries a reason to deploy sovereign infrastructure instead of accepting vendor-cloud opacity.

04What this validates

LayerStatusNote
VPC + private subnet provisioning✓ workingHowdify Lab VPC, in-band via VPCE for DynamoDB
HKDF key hierarchy (per-tier subkeys from root)✓ workingThree tier keys derived per session, byte-stable
Per-event HMAC-SHA256 signing✓ workingTier 0, 1, 2 all signing with their own keys
Parent-pointer chain across tiers✓ workingEach receipt commits to parent's SHA-256 hash
Real LLM inference on sovereign substrate✓ workingQwen3-30B-A3B, 6.26 tok/s on plain transformers CPU
Receipts persisted to howdify-receipts DDB✓ working3 receipts written, all retrievable
Canonical-bytes round-trip across DDB✓ workingNumeric normalization fix lands the read-side verify
Tamper detection via re-verify✓ workingSingle mutated byte caught deterministically
AMX-accelerated MoE inference path~ deferredKernel layer validated, end-to-end wiring is next-phase
Physical hardware (Halo glasses) integration~ pendingWaits on the device shipping

Everything in the upper rows is in place. The two lower rows are the gap between Lab POC and the next phase.


05What broke along the way

Lab POC integration always reveals the assumptions that the spec didn't make explicit. Five things broke before the final pass. Each fix took 1-5 minutes; together they were the difference between "we have a working stack" and "we have a working chain."

  1. SSM Session Manager. The instance launched in our private subnet but couldn't register with SSM because the SSM Interface VPC endpoint's security group did not allow ingress from our spike SG. One ingress rule resolved it.
  2. Stock benchmark scripts assume 48 cores. The published kt bench harness spent significant time trying to pin threads to cores 8-47 (which don't exist on r7iz.4xlarge). For the next phase we will either upsize the instance to one with the expected core count or write a custom benchmark targeted at the 8-core shape.
  3. Secrets Manager VPC endpoint not in our subnet. The HKDF root key lookup timed out because the SM endpoint lives in a different subnet of the same VPC. We fell back to a fixed demo root for the POC; production wires either a per-subnet SM endpoint or routes the lookup through Lambda.
  4. DynamoDB float rejection. Python floats are not a valid DynamoDB attribute type; they must be Decimal. Classic boto3 gotcha. Two-line fix to normalize at write time.
  5. Canonical bytes diverged across DDB round-trip. Subtle and important: Python ints write to DDB as Number, but read back as Decimal, not int. JSON canonicalization treated them differently, breaking the HMAC re-verify on read. Fixed by normalizing all numerics to Decimal before signing, so write-side and read-side compute over identical bytes.

Issue 5 is the one worth remembering. The spec called for "canonical JSON of the receipt payload" as the signing input. Spec-correct, implementation-broken: the canonicalization function has to match the type system at both write and read time, not just at write. This is the kind of detail that only surfaces when you actually run the full round-trip against the real database.


06The cost, with receipts

Full transparency. The day-1 spike cost ~$14.50 total, spread across:

ResourceUsageCost
r7iz.4xlarge compute~5 hours active$6.75
NAT Gateway hourly + data transfer~70 GB through (model + source repos + pip wheels)$3.15
EBS gp3 4 TB (model storage)Prorated for the run$2.10
Bedrock Haiku (intent classifier stub)Not invoked in the demo$0.00
DynamoDB writes (3 receipts)PAY_PER_REQUESTmarginal
Snapshots, EIPs, idle NAT cyclesBrief windows$2.50
Day-1 total~$14.50

After the run, we terminated the EC2 instance (which auto-deletes the attached EBS), deleted the NAT gateway, released the elastic IP, removed the IAM role and instance profile, and dropped the security group rules we added. Going-forward day-to-day cost: $0.

The next run rebuilds from the same provisioning script in about 10 minutes. The model re-downloads from Hugging Face in about half an hour. No persistent infrastructure carries forward, which is the right shape for spike work.


07What's next

Three concrete items, in priority order.

1. Wire the KTransformers AMX path into the inference call

Today's 6.26 tok/s was plain transformers on CPU. The kernel layer is already validated: kt_kernel imports cleanly, AMX-INT8 and AMX-BF16 extensions load on the Xeon Gold 6455B, and the doctor command confirms the full instruction set. The next session wires the kernel into the model's MoE blocks and re-measures. We expect a meaningful step up.

2. Build the production substrate WebSocket endpoint

Right now the substrate is a Python script that loads the model, runs inference, signs a receipt, and exits. Production needs the FastAPI WebSocket endpoint that the Tier 1 phone calls into: receive escalation request, verify the incoming receipt chain, run the inference, sign per-token receipts as they stream, return tokens back to the phone in real time. That's the WP-B inference endpoint as scoped in the SOW.

3. Land the hardware integration when the device arrives

The Tier 0 emulator implemented the wire protocol exactly as specified. When the open-source glasses arrive, the firmware engineers implement the same protocol natively, the transport changes from local WebSocket to BLE GATT, and the rest of the stack works unchanged. The audit chain begins at the lens.

What this run tells buyers: the architecture works. The chain holds. The tamper detection is not a slide; it is a 10-minute live demo against a real database. The hardware is the next milestone, not the foundation.

08Where this fits in the series

Sovereign Tech Series

Next: AMX-accelerated inference + production WebSocket substrate.

Episode 04 lands the KTransformers AMX path and the FastAPI inference endpoint. Episode 05 picks up when the hardware ships.

Receipt chain first. Hardware second. Demos third.