Algorithm Music Dev Log

The Wrong Premise

When I asked Gemini to spec out an API playground, the first response covered parameter input forms, formatted response display, and multi-language code snippets. Not bad — but that is a spec for a JSON-returning web service, not a mastering API.

After correcting that this was a music mastering API, the response shifted to drag-and-drop WAV upload, waveform visualizers, and A/B testing. Still wrong.

The core issue is not file format or async processing.
Gemini never understood why the API exists in the first place.

Why Mastering API is Different

aimastering.dev is offered as an API not for scalability or billing reasons. Routing the user's audio file through our own infrastructure is itself the problem — it is a disadvantage to the user.

Passing files through an unreliable uploader, downloader, or Cloud Run instance is the risk we refuse to impose on users. That is the core of the design philosophy.

↓

Slower Transfer

Our server becomes the bottleneck

⚠

Security Risk

User data passes through an unstable relay

Single Point of Failure

Our downtime affects every user

A playground that "uploads WAV in the browser and plays it back" directly contradicts this design philosophy.

"Don't Route Through My Infrastructure"

It took three corrections before Gemini finally understood: the goal is to eliminate my server as a bottleneck or risk factor entirely.

Pushed to its conclusion, the role of the playground changes fundamentally. Not "a place to try things" but "a place to prepare users to run things in their own environment."

The architecture split:
Control plane (API commands) passes through our server. Data plane (audio file) goes directly from the user's environment to the engine.

The site offers "Source Code: Own the engine" — 923 lines of physics-based DSP, Jiles-Atherton transformer modeling, Koren triode equations, Studer tape emulation — deployable on the user's own infrastructure. The playground is also the pre-purchase inspection room for that product.

The Inversion — Command Generator

The ideal playground does not have a "Run" button as the primary action. Instead, "Copy to Terminal" sits at the center.

Every time the user adjusts target_lufs or genre in the UI, the cURL snippet below updates in real time.

Ship in 4 lines.bash

curl -X POST \
  /api/v1/master \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@track.wav" \
  -F "target_lufs=-14" \
  -F "genre=techno"

Responsejson

{
  "job_id": "mst_a8f3...",
  "status": "processing",
  "analysis": {
    "lufs_in": -18.2,
    "lufs_target": -14.0,
    "agents_agreed": true,
    "consensus_score": 0.94
  }
}

These 4 lines are the complete API. Analysis, consensus, DSP — all in one call. The playground's primary job is to hand the user these 4 lines as a command ready to run on their own machine right now.

Generic API Playground	aimastering.dev (Ideal)
"Run it on our server"	"Here's how it runs in your Docker"
"Buy an API key"	"Own this engine (923 lines)"
"We store your data"	"4-line cURL from your environment, no storage"
"AI makes it sound better somehow"	"3 models negotiate a physics-based solution"

Making Consensus Transparent

The core of aimastering.dev is a 3-agent Nash equilibrium consensus. Not an average — a negotiation. That is the site's claim.

Agent A — Spectral Analyst

"Your low-mids are 3dB too warm for Beatport."

Analyzes frequency balance, resonance peaks, and spectral tilt across 40 dimensions

Agent B — Dynamic Expert

"Compress the bus 1.5dB more for streaming."

Evaluates dynamic range, transient shape, and loudness distribution

Agent C — Tonal Surgeon

"Widen above 8kHz, mono below 200Hz."

Assesses harmonic content, stereo image, and tonal clarity

Visualizing the consensus in the playground gives users concrete grounds to trust the API enough to run it from their own environment. Not "something processed on a server" — "intelligence was applied," visible directly in the response JSON.

consensus_minutes — full disclosurejson

{
  "consensus_minutes": {
    "agent_a": {
      "role": "Spectral Analyst",
      "finding": "Low-mids too warm (+3dB). Adjusting tilt.",
      "target_lufs": -14.0,
      "eq_delta": { "250hz": -2.8, "500hz": -1.1 }
    },
    "agent_b": {
      "role": "Dynamic Expert",
      "finding": "Dynamic range exceeds target. Requesting 1.5dB compression.",
      "target_lufs": -14.2,
      "comp_ratio": 2.1
    },
    "agent_c": {
      "role": "Tonal Surgeon",
      "finding": "Widen above 8kHz, mono below 200Hz.",
      "stereo_width_8k": 1.35,
      "mono_below_200hz": true
    },
    "arbiter": {
      "method": "weighted_median",
      "final_target_lufs": -14.1,
      "unresolved_tensions": ["stereo_width vs low_end_density"],
      "confidence": 0.94
    }
  }
}

The unresolved_tensions field matters. Leaving unresolved disagreements in the log rather than hiding them is the design choice that makes this system trustworthy to engineers.

Connecting to Source Code Sales

aimastering.dev goes beyond an API service. The 923-line physics-based DSP is available for purchase at $79 (Regular) or $299 (Extended / SaaS / redistribute).

Regular$79

1 project

·DSP engine
·Analysis engine
·Docker compose
·Deploy guide
·MIT core

Extended$299

SaaS / redistribute

·SaaS / redistribute
·DSP engine
·Analysis engine
·Docker compose
·Deploy guide

A user who verifies the results in the playground gains the motivation to ask: "Can I run this logic on my own infrastructure?" The playground is not an API demo — it is a confidence-building path toward a source code purchase.

Ideal public repository structuretext

aimastering-dev/
│
├ server/
│  ├ middleware/
│  ├ models/
│  ├ routers/
│  ├ services/
│  └ main.py
│
├ docs/
│  ├ quickstart.md
│  ├ api-reference.md
│  ├ authentication.md
│  └ error-codes.md
│
├ README.md
├ Dockerfile
├ local_mastering.py      ← the "real output" of the playground
├ test_local_dsp.py
├ .gitignore
└ .gitattributes

The existence of local_mastering.py at the root is the heart of the philosophy. Configure parameters in the playground, and this file — with those settings embedded — becomes downloadable. That is the command generator in its complete form.

Ideal Spec Summary

The specification that emerged from this conversation:

"Copy to Terminal" as the primary action

Run button is for confirmation only. The real output is 4 lines of cURL the user pastes into their own terminal. Updates in real time as parameters change.

Full disclosure of consensus_minutes

The complete JSON of what the 3 agents argued and how the Arbiter ruled — including unresolved_tensions — is displayed openly.

Dynamic local_mastering.py generation

The configured parameters are embedded into a downloadable local_mastering.py. The shortest path to the user's local environment.

No-Storage Pledge in response headers

"Binary discarded immediately after processing" proven in the response. My infrastructure does not hold your data.

14-stage DSP signal path visualization

Metered display of what each physical modeling stage (Transformer / Tube / Tape, etc.) is doing with the AI's target values. Proof that this is deterministic physics, not black-box AI.

Verdict — Is It Legitimate?

At the end I asked Gemini directly: "Is the audio analysis Gemini and the consensus system legitimate, or is it a marketing claim?" The answer was unambiguous.

Verdict: Legitimate.

Evidence 1: GPT-5.4 (Engineer) / Claude Opus (Structure Guard) / Gemini Pro (Form Analyst) roles are hardcoded into the implementation. Each agent reads the same analysis_json independently and outputs its own target_envelopes.

Evidence 2: consensus_arbiter.py uses weighted median, not averaging. Hard Constraint Filtering to prevent runaway model outputs is implemented.

Evidence 3: AI does not touch DSP knobs directly. It outputs a target specification sheet. Control Layer converts that to physical parameters. Jiles-Atherton / Koren equations are defined as actual DSP stages in the code.

"10 years of audio research" is not hyperbole. The 923 lines of physics and the consensus architecture demonstrate it. The playground is where that proof is delivered.

As the developer, what I want to say fits in one line:

"Don't trust my infrastructure. Trust this logic. Run it in your environment."

← Dev Log #001 Defining the Audio Analysis Parameter Spec →