#001 Defining the Audio Analysis Parameter Spec

Origin — Engineering Weaknesses Revealed by Prompt Evaluation

We evaluated an AI-generation prompt for a hard techno track (138 BPM, 909 kick, JD-800 pad, JUNO-106 bass) across multiple dimensions. Aesthetic scores were consistently high, but engineering descriptions were close to failing.

Prompt evaluation scores

Concept	93 / 100
World-building consistency	91 / 100
Sonic image precision	89 / 100
Club functionality	88 / 100
Acoustic engineering accuracy	54 / 100
Validity as mastering material instruction	61 / 100

The following weaknesses were identified:

—"−8.5 LUFS low-end" — LUFS is a full-spectrum loudness metric, not a band-specific descriptor
—Low-frequency band separation unspecified
—Missing preconditions for excessive sidechain compression
—No compatibility conditions for 3D reverb + low-end density coexistence
—Kick vs bass priority undefined

This evaluation crystallized the recognition that creative prompts and engineering mastering specs are entirely different artifacts.

Current Pipeline Problems — The Limits of Static Mastering

Current flowflow

// Before: Analysis → AI params → DSP
single analysis → single param set → single global pass

Current pipeline evaluation

Pipeline overall	68 / 100
Static mastering fitness	81 / 100
Dynamic mastering fitness	52 / 100

The core problem: the AI consensus outputs DSP values against whole-track representative averages. For dynamic mastering, what matters is not the full-track mean but the local state of each section —intro / verse / build / drop / breakdown / outro.

Weakness breakdown

Single global preset for entire track	38 / 100
Section-tracking capability	44 / 100
Structure-following capability	49 / 100
Instantaneous loudness control	57 / 100
Cross-genre resilience	61 / 100

Architecture Shift — From "Knob Output" to "Target Output"

Abandon AI directly deciding DSP values. Move to an architecture where AI decides time-varying targets and DSP tracks them.

New architectureflow

// After: Analysis → Section map → Target envelopes → Control layer → DSP
Audio analysis
  ↓
Section map (structure mapping)
  ↓
Target envelopes (time-varying target values)
  ↓
Control layer (convert targets to DSP)
  ↓
Dynamic DSP Render

Recommended architecture evaluation

Commercial viability	87 / 100
Dynamic mastering performance	84 / 100
Scalability	83 / 100
Explainability	79 / 100

Analysis Output Specification — From Seconds-Based to Structure-Based

Gemini's optimal role is not outputting "when, how many dB" but "why this segment exists."The analysis AI's job is not a diagnostic report but a time-annotated target specification.

Output method comparison

Time-based output (seconds)	58 / 100
Structure-based output (sections)	91 / 100

Three output streams

1.
Global numeric baseline — integrated LUFS, true peak, LRA, PSR, crest, stereo width/correlation, mono correlation below 120Hz, band energy ratios, transient sharpness, harshness/mud/clip risk, etc.
2.
Structure analysis — form type, section segmentation, structural role of each section, musical function, repeat/variation relationships, energy profile, build/release/contrast/reset classification
3.
Per-section numerics — short-term loudness, PSR, crest, width, low-end balance, transient, vocal presence, risk profile

Final schema: `dynamic_mastering_formplan_v2`

schema

track_identity
whole_track_metrics     ← global numeric measurements
whole_track_targets     ← global targets
whole_track_deltas      ← gap between current state and targets
macro_form              ← structure analysis + per-section numbers, targets, protected elements
transition_logic        ← transition semantics and numeric deltas
global_mastering_strategy ← overall policy + failure conditions
problems                ← detected issues (with evidence)
confidence              ← per-field confidence scores

Global metrics — required importance

Integrated LUFS	100 / 100
True Peak	100 / 100
Low mono correlation below 120Hz	96 / 100
Band energy ratios	95 / 100
PSR / Crest	94 / 100
LRA	92 / 100
Harshness / mud / clip risk	91 / 100
Stereo width / correlation	90 / 100

Delegating Structure Analysis to Gemini — 5 Required Tasks

Gemini required tasks — importance

Structure classification (intro/build/drop/break/outro)	100 / 100
Section role identification (main drop vs variation drop, etc.)	96 / 100
Transition meaning (build-up / release / fake-out / pull-back)	94 / 100
Per-section protected elements (what must not be damaged)	92 / 100
Global flattening prevention (uniformity monitoring)	98 / 100

What Gemini must NOT output

"at 3.2s, width +0.14"41 / 100weak

"at 12.8s, limiter drive +0.7dB"46 / 100weak

"this bar, a bit more energy"28 / 100unacceptable

Gemini is not a sample-accurate controller. Output should lean toward meaningful structural units, comparable state labels, and constraints with priorities.

3-Model Parallel Consensus System (MAGI-style)

Distribute the same analysis_json to all 3 models, lock their roles, and integrate the outputs in a separate layer.

Consensus method comparison

3-model majority vote (equal weight)	62 / 100
3 models co-generate a single output	57 / 100
3 models fixed roles + Arbiter	91 / 100

Role assignments

Model	Role	Weight	Primary responsibility
GPT-5.4 Thinking	Engineer	0.40	Engineering validity, LUFS/TP/low-end constraints, numeric target realism
Claude Opus 4.6 Thinking	Structure Guard	0.30	Structure destruction detection, flattening risk, inter-section contradictions, failure conditions
Gemini Pro 3.1	Form Analyst	0.30	Structure comprehension, section role, transition meaning, dynamic policy

Per-field responsibility weights

フィールド	GPT-5.4	Claude Opus	Gemini Pro
macro_form	0.20	0.30	0.50
whole_track_targets	0.55	0.20	0.25
section_targets	0.40	0.20	0.40
transition_logic	0.20	0.35	0.45
failure_conditions	0.30	0.50	0.20

Arbiter Design

Integration is handled not by any of the 3 models themselves, but by an independent Arbiter. The Arbiter does not hold opinions — it adjudicates by rules.

Arbiter method comparison

Let one model integrate	58 / 100
3-model re-deliberation	64 / 100
Dedicated Arbiter	93 / 100
Rule-based Arbiter	95 / 100
Creative chairperson	57 / 100

Arbiter responsibilities

1.
Schema validation — Verify the 3 model JSONs are well-formed
2.
Hard constraint filtering — Reject TP violations, mono low-end violations, drop < build, stereo instability, etc.
3.
Field-wise merge — Numerics: weighted median. Risks: upper-median or max. do_not_damage: union
4.
Contradiction detection — Resolve contradictions between whole-track and per-section targets
5.
Minutes generation — Generate structured minutes JSON

Merge rules

numeric fields→ weighted median (safer than mean)

risk fields→ max or upper-median

do_not_damage→ union (preserve all model protection requests)

objective→ prefer 2+ model agreement; if Claude flags flattening, defer to suppression side

minority opinions→ do not discard — preserve in unresolved_tensions in final output

Structured Consensus Minutes

Minutes format comparison

Consensus result only	78 / 100
Consensus result + structured minutes	94 / 100
Free-text minutes	67 / 100

6 required fields in minutes

model_positions (each model's stance)	100 / 100
hard_constraint_checks	98 / 100
discussion_log (dispute log)	96 / 100
unresolved_tensions (minority opinions preserved)	95 / 100
resolved_items	92 / 100
final_rationale	90 / 100

Minutes are not "chat log storage" but "dispute log storage." Record by issue unit —target_integrated_lufs, drop_section_objective, low_end_width_policy, etc.

Final Pipeline Flow

Dynamic Mastering Pipeline

Audio Upload

Validation

Analysis AI → produce analysis_json

whole_track_metrics

macro_form (structure analysis)

section metrics / targets

transition_logic

problems

3-Model Parallel Consensus

GPT-5.4 Engineer

Claude Opus 4.6 Structure Guard

Gemini Pro 3.1 Form Analyst

Arbiter Integration

hard constraint filtering

field-wise weighted merge

contradiction check

minutes generation

mastering_consensus_bundle_v1

consensus_result

consensus_minutes

control_layer_input

Control Layer (targets → DSP parameters)

Dynamic DSP Render

Post-analysis Verification

Save output → return download URL

10.

DSP Engine v2 — Engineering Evaluation (Current State)

DSP Engine v2 evaluation

Architecture	84 / 100
Signal-Flow Design	81 / 100
Standards Compliance (BS.1770-4 LUFS)	86 / 100
Oversampled saturation concept	82 / 100
Multiband crossover accuracy	44 / 100
Stereo-linked limiting integrity	49 / 100
Final true-peak safety logic	54 / 100
Overall DSP engineering	72 / 100

Priority DSP fixes

1.Replace _split_4bands with complementary crossover+8 to +12
2.Switch Limiter to stereo-linked+6 to +9
3.Replace final sample-peak trim with oversampled TP safety pass+3 to +5
4.Fix stage numbering / comments+1 to +2

Score ceiling after fixes: 88 / 100

Implementation priority (expected score improvement)

Expected gain (+score)

Abolish single global params	16 / 20
Section-aware mastering	14 / 20
Convert AI output to target profile	12 / 20
Introduce control layer	11 / 20
Post-mastering auto re-analysis → re-optimization	8 / 20

11.

Next Steps — Implementation Roadmap

Translating the design decisions confirmed in this post into concrete implementation. Recorded in priority order.

3-model MAGI consensus system
GPT-5.4 / Claude Opus 4.6 / Gemini Pro 3.1. Fixed roles (Engineer / Structure Guard / Form Analyst) + per-field weighted integration.
consensus_arbiter.py
Rule-based merge: weighted median, risk max, do_not_damage union, contradiction detection, minutes generation.

control_layer.py
formplan targets → per-section DSP parameter conversion. Bridge between AI output and DSP engine.
DSP engine: section-adaptive processing
Abolish single global params → per-section params. The core of dynamic mastering.

DSP fix: _split_4bands → complementary Linkwitz-Riley crossover
Replace current band splitting with complementary crossover. Expected improvement: +8 to +12.
DSP fix: TP Limiter → stereo-linked
Switch to stereo-linked limiter. Expected improvement: +6 to +9.
DSP fix: final safety pass → oversampled true peak
Replace sample-peak trim with oversampled true peak safety pass. Expected improvement: +3 to +5.

post_verification.py
Auto re-analysis after mastering → diff against targets → report generation. Completing the feedback loop.
DSP fix: TPDF dither naming correction
Remove or rename HF shaping. Resolves naming vs. implementation mismatch.

← Dev LogNext: consensus_arbiter.py implementation