Origin — Engineering Weaknesses Revealed by Prompt Evaluation
We evaluated an AI-generation prompt for a hard techno track (138 BPM, 909 kick, JD-800 pad, JUNO-106 bass) across multiple dimensions. Aesthetic scores were consistently high, but engineering descriptions were close to failing.
| Concept | 93 / 100 |
| World-building consistency | 91 / 100 |
| Sonic image precision | 89 / 100 |
| Club functionality | 88 / 100 |
| Acoustic engineering accuracy | 54 / 100 |
| Validity as mastering material instruction | 61 / 100 |
The following weaknesses were identified:
- —"−8.5 LUFS low-end" — LUFS is a full-spectrum loudness metric, not a band-specific descriptor
- —Low-frequency band separation unspecified
- —Missing preconditions for excessive sidechain compression
- —No compatibility conditions for 3D reverb + low-end density coexistence
- —Kick vs bass priority undefined
Current Pipeline Problems — The Limits of Static Mastering
// Before: Analysis → AI params → DSP
single analysis → single param set → single global pass| Pipeline overall | 68 / 100 |
| Static mastering fitness | 81 / 100 |
| Dynamic mastering fitness | 52 / 100 |
The core problem: the AI consensus outputs DSP values against whole-track representative averages. For dynamic mastering, what matters is not the full-track mean but the local state of each section —intro / verse / build / drop / breakdown / outro.
| Single global preset for entire track | 38 / 100 |
| Section-tracking capability | 44 / 100 |
| Structure-following capability | 49 / 100 |
| Instantaneous loudness control | 57 / 100 |
| Cross-genre resilience | 61 / 100 |
Architecture Shift — From "Knob Output" to "Target Output"
// After: Analysis → Section map → Target envelopes → Control layer → DSP
Audio analysis
↓
Section map (structure mapping)
↓
Target envelopes (time-varying target values)
↓
Control layer (convert targets to DSP)
↓
Dynamic DSP Render| Commercial viability | 87 / 100 |
| Dynamic mastering performance | 84 / 100 |
| Scalability | 83 / 100 |
| Explainability | 79 / 100 |
Analysis Output Specification — From Seconds-Based to Structure-Based
Gemini's optimal role is not outputting "when, how many dB" but "why this segment exists."The analysis AI's job is not a diagnostic report but a time-annotated target specification.
| Time-based output (seconds) | 58 / 100 |
| Structure-based output (sections) | 91 / 100 |
Three output streams
- 1.Global numeric baseline — integrated LUFS, true peak, LRA, PSR, crest, stereo width/correlation, mono correlation below 120Hz, band energy ratios, transient sharpness, harshness/mud/clip risk, etc.
- 2.Structure analysis — form type, section segmentation, structural role of each section, musical function, repeat/variation relationships, energy profile, build/release/contrast/reset classification
- 3.Per-section numerics — short-term loudness, PSR, crest, width, low-end balance, transient, vocal presence, risk profile
Final schema: dynamic_mastering_formplan_v2
track_identity
whole_track_metrics ← global numeric measurements
whole_track_targets ← global targets
whole_track_deltas ← gap between current state and targets
macro_form ← structure analysis + per-section numbers, targets, protected elements
transition_logic ← transition semantics and numeric deltas
global_mastering_strategy ← overall policy + failure conditions
problems ← detected issues (with evidence)
confidence ← per-field confidence scores| Integrated LUFS | 100 / 100 |
| True Peak | 100 / 100 |
| Low mono correlation below 120Hz | 96 / 100 |
| Band energy ratios | 95 / 100 |
| PSR / Crest | 94 / 100 |
| LRA | 92 / 100 |
| Harshness / mud / clip risk | 91 / 100 |
| Stereo width / correlation | 90 / 100 |
Delegating Structure Analysis to Gemini — 5 Required Tasks
| Structure classification (intro/build/drop/break/outro) | 100 / 100 |
| Section role identification (main drop vs variation drop, etc.) | 96 / 100 |
| Transition meaning (build-up / release / fake-out / pull-back) | 94 / 100 |
| Per-section protected elements (what must not be damaged) | 92 / 100 |
| Global flattening prevention (uniformity monitoring) | 98 / 100 |
What Gemini must NOT output
"at 3.2s, width +0.14"41 / 100weak"at 12.8s, limiter drive +0.7dB"46 / 100weak"this bar, a bit more energy"28 / 100unacceptable3-Model Parallel Consensus System (MAGI-style)
Distribute the same analysis_json to all 3 models, lock their roles, and integrate the outputs in a separate layer.
| 3-model majority vote (equal weight) | 62 / 100 |
| 3 models co-generate a single output | 57 / 100 |
| 3 models fixed roles + Arbiter | 91 / 100 |
Role assignments
| Model | Role | Weight | Primary responsibility |
|---|---|---|---|
| GPT-5.4 Thinking | Engineer | 0.40 | Engineering validity, LUFS/TP/low-end constraints, numeric target realism |
| Claude Opus 4.6 Thinking | Structure Guard | 0.30 | Structure destruction detection, flattening risk, inter-section contradictions, failure conditions |
| Gemini Pro 3.1 | Form Analyst | 0.30 | Structure comprehension, section role, transition meaning, dynamic policy |
| フィールド | GPT-5.4 | Claude Opus | Gemini Pro |
|---|---|---|---|
| macro_form | 0.20 | 0.30 | 0.50 |
| whole_track_targets | 0.55 | 0.20 | 0.25 |
| section_targets | 0.40 | 0.20 | 0.40 |
| transition_logic | 0.20 | 0.35 | 0.45 |
| failure_conditions | 0.30 | 0.50 | 0.20 |
Arbiter Design
| Let one model integrate | 58 / 100 |
| 3-model re-deliberation | 64 / 100 |
| Dedicated Arbiter | 93 / 100 |
| Rule-based Arbiter | 95 / 100 |
| Creative chairperson | 57 / 100 |
Arbiter responsibilities
- 1.Schema validation — Verify the 3 model JSONs are well-formed
- 2.Hard constraint filtering — Reject TP violations, mono low-end violations, drop < build, stereo instability, etc.
- 3.Field-wise merge — Numerics: weighted median. Risks: upper-median or max. do_not_damage: union
- 4.Contradiction detection — Resolve contradictions between whole-track and per-section targets
- 5.Minutes generation — Generate structured minutes JSON
Merge rules
numeric fields→ weighted median (safer than mean)risk fields→ max or upper-mediando_not_damage→ union (preserve all model protection requests)objective→ prefer 2+ model agreement; if Claude flags flattening, defer to suppression sideminority opinions→ do not discard — preserve in unresolved_tensions in final outputStructured Consensus Minutes
| Consensus result only | 78 / 100 |
| Consensus result + structured minutes | 94 / 100 |
| Free-text minutes | 67 / 100 |
| model_positions (each model's stance) | 100 / 100 |
| hard_constraint_checks | 98 / 100 |
| discussion_log (dispute log) | 96 / 100 |
| unresolved_tensions (minority opinions preserved) | 95 / 100 |
| resolved_items | 92 / 100 |
| final_rationale | 90 / 100 |
target_integrated_lufs, drop_section_objective, low_end_width_policy, etc.Final Pipeline Flow
DSP Engine v2 — Engineering Evaluation (Current State)
| Architecture | 84 / 100 |
| Signal-Flow Design | 81 / 100 |
| Standards Compliance (BS.1770-4 LUFS) | 86 / 100 |
| Oversampled saturation concept | 82 / 100 |
| Multiband crossover accuracy | 44 / 100 |
| Stereo-linked limiting integrity | 49 / 100 |
| Final true-peak safety logic | 54 / 100 |
| Overall DSP engineering | 72 / 100 |
Priority DSP fixes
- 1.Replace _split_4bands with complementary crossover+8 to +12
- 2.Switch Limiter to stereo-linked+6 to +9
- 3.Replace final sample-peak trim with oversampled TP safety pass+3 to +5
- 4.Fix stage numbering / comments+1 to +2
Implementation priority (expected score improvement)
| Abolish single global params | 16 / 20 |
| Section-aware mastering | 14 / 20 |
| Convert AI output to target profile | 12 / 20 |
| Introduce control layer | 11 / 20 |
| Post-mastering auto re-analysis → re-optimization | 8 / 20 |
Next Steps — Implementation Roadmap
- 3-model MAGI consensus systemGPT-5.4 / Claude Opus 4.6 / Gemini Pro 3.1. Fixed roles (Engineer / Structure Guard / Form Analyst) + per-field weighted integration.
- consensus_arbiter.pyRule-based merge: weighted median, risk max, do_not_damage union, contradiction detection, minutes generation.
- control_layer.pyformplan targets → per-section DSP parameter conversion. Bridge between AI output and DSP engine.
- DSP engine: section-adaptive processingAbolish single global params → per-section params. The core of dynamic mastering.
- DSP fix: _split_4bands → complementary Linkwitz-Riley crossoverReplace current band splitting with complementary crossover. Expected improvement: +8 to +12.
- DSP fix: TP Limiter → stereo-linkedSwitch to stereo-linked limiter. Expected improvement: +6 to +9.
- DSP fix: final safety pass → oversampled true peakReplace sample-peak trim with oversampled true peak safety pass. Expected improvement: +3 to +5.
- post_verification.pyAuto re-analysis after mastering → diff against targets → report generation. Completing the feedback loop.
- DSP fix: TPDF dither naming correctionRemove or rename HF shaping. Resolves naming vs. implementation mismatch.