Algorithm Music
#0012026-03-07ishij

Defining the Audio Analysis Parameter Spec

aimastering.dev — Dynamic Mastering Architecture v2 Design

architectureDSPAI consensusdynamic mastering
We abandon the architecture where AI directly decides DSP knob values. Instead, AI outputs a time-varying "target specification" and a dedicated control layer translates it into DSP parameters. This is the design closest to optimal dynamic mastering.
1.

Origin — Engineering Weaknesses Revealed by Prompt Evaluation

We evaluated an AI-generation prompt for a hard techno track (138 BPM, 909 kick, JD-800 pad, JUNO-106 bass) across multiple dimensions. Aesthetic scores were consistently high, but engineering descriptions were close to failing.

Prompt evaluation scores
Concept
93 / 100
World-building consistency
91 / 100
Sonic image precision
89 / 100
Club functionality
88 / 100
Acoustic engineering accuracy
54 / 100
Validity as mastering material instruction
61 / 100

The following weaknesses were identified:

  • "−8.5 LUFS low-end" — LUFS is a full-spectrum loudness metric, not a band-specific descriptor
  • Low-frequency band separation unspecified
  • Missing preconditions for excessive sidechain compression
  • No compatibility conditions for 3D reverb + low-end density coexistence
  • Kick vs bass priority undefined
This evaluation crystallized the recognition that creative prompts and engineering mastering specs are entirely different artifacts.
2.

Current Pipeline Problems — The Limits of Static Mastering

Current flowflow
// Before: Analysis → AI params → DSP
single analysis → single param set → single global pass
Current pipeline evaluation
Pipeline overall
68 / 100
Static mastering fitness
81 / 100
Dynamic mastering fitness
52 / 100

The core problem: the AI consensus outputs DSP values against whole-track representative averages. For dynamic mastering, what matters is not the full-track mean but the local state of each section —intro / verse / build / drop / breakdown / outro.

Weakness breakdown
Single global preset for entire track
38 / 100
Section-tracking capability
44 / 100
Structure-following capability
49 / 100
Instantaneous loudness control
57 / 100
Cross-genre resilience
61 / 100
3.

Architecture Shift — From "Knob Output" to "Target Output"

Abandon AI directly deciding DSP values. Move to an architecture where AI decides time-varying targets and DSP tracks them.
New architectureflow
// After: Analysis → Section map → Target envelopes → Control layer → DSP
Audio analysis
  ↓
Section map (structure mapping)
  ↓
Target envelopes (time-varying target values)
  ↓
Control layer (convert targets to DSP)
  ↓
Dynamic DSP Render
Recommended architecture evaluation
Commercial viability
87 / 100
Dynamic mastering performance
84 / 100
Scalability
83 / 100
Explainability
79 / 100
4.

Analysis Output Specification — From Seconds-Based to Structure-Based

Gemini's optimal role is not outputting "when, how many dB" but "why this segment exists."The analysis AI's job is not a diagnostic report but a time-annotated target specification.

Output method comparison
Time-based output (seconds)
58 / 100
Structure-based output (sections)
91 / 100

Three output streams

  1. 1.
    Global numeric baselineintegrated LUFS, true peak, LRA, PSR, crest, stereo width/correlation, mono correlation below 120Hz, band energy ratios, transient sharpness, harshness/mud/clip risk, etc.
  2. 2.
    Structure analysisform type, section segmentation, structural role of each section, musical function, repeat/variation relationships, energy profile, build/release/contrast/reset classification
  3. 3.
    Per-section numericsshort-term loudness, PSR, crest, width, low-end balance, transient, vocal presence, risk profile

Final schema: dynamic_mastering_formplan_v2

schema
track_identity
whole_track_metrics     ← global numeric measurements
whole_track_targets     ← global targets
whole_track_deltas      ← gap between current state and targets
macro_form              ← structure analysis + per-section numbers, targets, protected elements
transition_logic        ← transition semantics and numeric deltas
global_mastering_strategy ← overall policy + failure conditions
problems                ← detected issues (with evidence)
confidence              ← per-field confidence scores
Global metrics — required importance
Integrated LUFS
100 / 100
True Peak
100 / 100
Low mono correlation below 120Hz
96 / 100
Band energy ratios
95 / 100
PSR / Crest
94 / 100
LRA
92 / 100
Harshness / mud / clip risk
91 / 100
Stereo width / correlation
90 / 100
5.

Delegating Structure Analysis to Gemini — 5 Required Tasks

Gemini required tasks — importance
Structure classification (intro/build/drop/break/outro)
100 / 100
Section role identification (main drop vs variation drop, etc.)
96 / 100
Transition meaning (build-up / release / fake-out / pull-back)
94 / 100
Per-section protected elements (what must not be damaged)
92 / 100
Global flattening prevention (uniformity monitoring)
98 / 100

What Gemini must NOT output

"at 3.2s, width +0.14"41 / 100weak
"at 12.8s, limiter drive +0.7dB"46 / 100weak
"this bar, a bit more energy"28 / 100unacceptable
Gemini is not a sample-accurate controller. Output should lean toward meaningful structural units, comparable state labels, and constraints with priorities.
6.

3-Model Parallel Consensus System (MAGI-style)

Distribute the same analysis_json to all 3 models, lock their roles, and integrate the outputs in a separate layer.

Consensus method comparison
3-model majority vote (equal weight)
62 / 100
3 models co-generate a single output
57 / 100
3 models fixed roles + Arbiter
91 / 100

Role assignments

ModelRoleWeightPrimary responsibility
GPT-5.4 ThinkingEngineer0.40Engineering validity, LUFS/TP/low-end constraints, numeric target realism
Claude Opus 4.6 ThinkingStructure Guard0.30Structure destruction detection, flattening risk, inter-section contradictions, failure conditions
Gemini Pro 3.1Form Analyst0.30Structure comprehension, section role, transition meaning, dynamic policy
Per-field responsibility weights
フィールドGPT-5.4Claude OpusGemini Pro
macro_form0.200.300.50
whole_track_targets0.550.200.25
section_targets0.400.200.40
transition_logic0.200.350.45
failure_conditions0.300.500.20
7.

Arbiter Design

Integration is handled not by any of the 3 models themselves, but by an independent Arbiter. The Arbiter does not hold opinions — it adjudicates by rules.
Arbiter method comparison
Let one model integrate
58 / 100
3-model re-deliberation
64 / 100
Dedicated Arbiter
93 / 100
Rule-based Arbiter
95 / 100
Creative chairperson
57 / 100

Arbiter responsibilities

  1. 1.
    Schema validationVerify the 3 model JSONs are well-formed
  2. 2.
    Hard constraint filteringReject TP violations, mono low-end violations, drop < build, stereo instability, etc.
  3. 3.
    Field-wise mergeNumerics: weighted median. Risks: upper-median or max. do_not_damage: union
  4. 4.
    Contradiction detectionResolve contradictions between whole-track and per-section targets
  5. 5.
    Minutes generationGenerate structured minutes JSON

Merge rules

numeric fieldsweighted median (safer than mean)
risk fieldsmax or upper-median
do_not_damageunion (preserve all model protection requests)
objectiveprefer 2+ model agreement; if Claude flags flattening, defer to suppression side
minority opinionsdo not discard — preserve in unresolved_tensions in final output
8.

Structured Consensus Minutes

Minutes format comparison
Consensus result only
78 / 100
Consensus result + structured minutes
94 / 100
Free-text minutes
67 / 100
6 required fields in minutes
model_positions (each model's stance)
100 / 100
hard_constraint_checks
98 / 100
discussion_log (dispute log)
96 / 100
unresolved_tensions (minority opinions preserved)
95 / 100
resolved_items
92 / 100
final_rationale
90 / 100
Minutes are not "chat log storage" but "dispute log storage." Record by issue unit —target_integrated_lufs, drop_section_objective, low_end_width_policy, etc.
9.

Final Pipeline Flow

Dynamic Mastering Pipeline
Audio Upload
Validation
Analysis AI → produce analysis_json
whole_track_metrics
macro_form (structure analysis)
section metrics / targets
transition_logic
problems
3-Model Parallel Consensus
GPT-5.4 Engineer
Claude Opus 4.6 Structure Guard
Gemini Pro 3.1 Form Analyst
Arbiter Integration
hard constraint filtering
field-wise weighted merge
contradiction check
minutes generation
mastering_consensus_bundle_v1
consensus_result
consensus_minutes
control_layer_input
Control Layer (targets → DSP parameters)
Dynamic DSP Render
Post-analysis Verification
Save output → return download URL
10.

DSP Engine v2 — Engineering Evaluation (Current State)

DSP Engine v2 evaluation
Architecture
84 / 100
Signal-Flow Design
81 / 100
Standards Compliance (BS.1770-4 LUFS)
86 / 100
Oversampled saturation concept
82 / 100
Multiband crossover accuracy
44 / 100
Stereo-linked limiting integrity
49 / 100
Final true-peak safety logic
54 / 100
Overall DSP engineering
72 / 100

Priority DSP fixes

  1. 1.Replace _split_4bands with complementary crossover+8 to +12
  2. 2.Switch Limiter to stereo-linked+6 to +9
  3. 3.Replace final sample-peak trim with oversampled TP safety pass+3 to +5
  4. 4.Fix stage numbering / comments+1 to +2
Score ceiling after fixes: 88 / 100

Implementation priority (expected score improvement)

Expected gain (+score)
Abolish single global params
16 / 20
Section-aware mastering
14 / 20
Convert AI output to target profile
12 / 20
Introduce control layer
11 / 20
Post-mastering auto re-analysis → re-optimization
8 / 20
11.

Next Steps — Implementation Roadmap

Translating the design decisions confirmed in this post into concrete implementation. Recorded in priority order.
P1
  • 3-model MAGI consensus system
    GPT-5.4 / Claude Opus 4.6 / Gemini Pro 3.1. Fixed roles (Engineer / Structure Guard / Form Analyst) + per-field weighted integration.
  • consensus_arbiter.py
    Rule-based merge: weighted median, risk max, do_not_damage union, contradiction detection, minutes generation.
P2
  • control_layer.py
    formplan targets → per-section DSP parameter conversion. Bridge between AI output and DSP engine.
  • DSP engine: section-adaptive processing
    Abolish single global params → per-section params. The core of dynamic mastering.
P3
  • DSP fix: _split_4bands → complementary Linkwitz-Riley crossover
    Replace current band splitting with complementary crossover. Expected improvement: +8 to +12.
  • DSP fix: TP Limiter → stereo-linked
    Switch to stereo-linked limiter. Expected improvement: +6 to +9.
  • DSP fix: final safety pass → oversampled true peak
    Replace sample-peak trim with oversampled true peak safety pass. Expected improvement: +3 to +5.
P4
  • post_verification.py
    Auto re-analysis after mastering → diff against targets → report generation. Completing the feedback loop.
  • DSP fix: TPDF dither naming correction
    Remove or rename HF shaping. Resolves naming vs. implementation mismatch.
← Dev LogNext: consensus_arbiter.py implementation