audio_analysis_circuit.py v2.3
12 Traditions Encoded in the Code
Yomibito Shirazu — nobody knows who said them first.
But only those who survived the field know these frequency hacks.
This is a record of the thinking behind audio_analysis_circuit.py v2.3. K-Weighting, True Peak, LRA, Spectral Centroid — international standards and field-derived rules of thumb have been converted into numbers and algorithms. Before the equations, you must understand why those specific numbers exist. These 12 traditions are that answer.
K-Weighting Exposes the Lie of the Human Ear
f0 = 1500.0 gain = 4.0 Q = 0.7071
Human hearing does not perceive all frequencies equally. We are oversensitive to the 1.5–6kHz range and perceive low frequencies as quieter than they physically are. The BS.1770-4 K-Weighting filter burns this human lie into a number: a high-shelf boosting 1.5kHz+ by +4dB, and a high-pass cutting below 38Hz. This two-stage configuration is the only internationally standardized method for converting perceived loudness into a physical quantity.
Core of the tradition
Measuring loudness with RMS is amateur. Without K-Weighting, the measured value diverges from what humans actually hear. This code abandoned v1's rms_db and moved to K-Weighted LUFS in direct compliance with this tradition.
Stereo Below 120Hz Is Criminal
sos_low_pass = butter(4, 120.0, btype='lowpass', fs=sample_rate, output='sos')
When L/R phase diverges below 120Hz, the sound disappears from club subwoofers. The moment mono-summing occurs, kick and bass cancel each other — low end vanishes from the floor. In vinyl cutting, low-frequency stereo content is a physical destructive force that throws the needle out of the groove.
Core of the tradition
Red alert when mono correlation below 120Hz drops under 0.7. This code includes low_mono_correlation as one of the 8-dimensional envelope components and detects it as phase_cancellation_lows in detected_problems — an absolute rule transmitted from club floors and vinyl studios.
200–500Hz Is a Mud Nest
mud_risk = np.clip((ratios.get("low_mid_ratio", 0.0) - 0.15) / 0.15, 0.0, 1.0)200–500Hz is where instrument fundamentals cluster. Mix without intention and a "murky" buildup is guaranteed. When this band's energy ratio exceeds 15% of total, mud begins. Above 30%, it is fatal.
Core of the tradition
The first thing a mastering engineer does is clean the low-mids. Mastering without touching this band does not exist. The 0.15 threshold (15%) is a boundary empirically established in the field as "above here, there is a problem."
2–6kHz Harshness Kills Ears
harshness_mask = (spectrum.freqs >= 2000) & (spectrum.freqs < 6000) harshness_risk = np.clip((np.sum(spectrum.mono_power[harshness_mask]) / total_energy) * 3.0, 0.0, 1.0)
The resonant frequency of the human ear canal is approximately 2.5–4kHz. Energy concentration here physically amplifies pressure on the eardrum — listeners feel "pain" or "ringing" and unconsciously lower the volume. You can win the loudness war and lose the listening time to harshness.
Core of the tradition
Raising loudness and controlling harshness are separate techniques. The 3.0 multiplier is designed so that when this band occupies 33% of total energy, risk hits maximum (1.0) — the field rule: "once it crosses a third, ears escape."
Crest Factor Is the Breath of Music
crest_envelope = 20.0 * np.log10(np.maximum(peak_values, LOG_FLOOR)) - 20.0 * np.log10(rms_values)
Crest factor is the difference between peak and RMS values. When it drops below 6dB, the material is flagged as over-compressed. Musical dynamics are breath — when the difference between inhale (quiet moments) and exhale (peaks) disappears, music suffocates.
Core of the tradition
Crest factor 6dB is the loudness war threshold. Masters that fall below it have square waveforms. Square waveforms have no amplitude variation, so through auditory adaptation, listeners begin perceiving them as "quieter" within seconds. The physical evidence of gaining loudness and achieving the opposite.
LRA Double-Gating Is the Technique of Ignoring Silence
absolute_threshold = -70.0 relative_threshold = ... - 20.0
Accurate LRA (Loudness Range) measurement requires two gates. First, exclude complete silence below -70 LUFS (absolute gate). Then apply the average loudness of the remaining signal minus -20 LU as the relative gate. This prevents extreme quiet passages (reverb tails, fade-outs) from unfairly expanding LRA.
Core of the tradition
LRA without gating lies. Tracks with silent segments show wider LRA than reality and are misclassified as "dynamically rich." This is precisely why double-gating is mandated in broadcast standards (EBU R128).
Spectral Centroid Is the Blade of Transients
weighted_frequencies = np.sum(power_chunks * frequencies, axis=1) hfc_envelope = weighted_frequencies / total_energy_per_chunk / (chunk_size // 2)
The higher the spectral centroid (the power-weighted average of all frequencies), the "sharper" the sound at that moment. Snare attacks, open hi-hats, and synth plucks drive the centroid up. Low centroid regions are pads, sub-bass, and reverb tails.
Core of the tradition
Tracking transient "sharpness" across time shows where impact exists in a track. When a mastering limiter crushes transients, this value flattens on the time axis. The only metric that can numerically monitor the prevention of flattening (importance rating 98/100 in the spec document).
True Peak Is Invisible Without 4x Oversampling
left_oversampled = resample_poly(left_chunk, 4, 1)
Between digital audio samples, instantaneous peaks exist that exceed the sample values. Between the sample points recorded at 44.1kHz, analog waveform peaks that exceed the sample values are hiding. These are called Inter-Sample Peaks. 4x oversampling is required to make them visible.
Core of the tradition
Even a sample peak at -1.0 dBFS can have an Inter-Sample Peak exceeding 0 dBTP. The moment a DAC plays this peak, clipping occurs. Keeping True Peak below -1.0 dBTP is a mandatory requirement of distribution platforms (Spotify, Apple Music) — peak measurement without oversampling is a standards violation.
Side Signal RMS Defines Stereo Width
width_envelope = np.clip(side_rms_values / rms_values, 0.0, 1.0)
Mid = (L+R)/2, Side = (L-R)/2. The ratio of Side RMS to Mid RMS is the physical definition of stereo width. Side/Mid at 0 is mono; approaching 1 means maximum L/R divergence.
Core of the tradition
The era of describing stereo width as "wide" or "narrow" by feel is over. Side/Mid ratio quantifies it — spatial changes per section can be tracked as time-series envelopes. Widening at the drop and narrowing at the build is EDM mastering orthodoxy, and this number is its physical basis.
1–5kHz Is the Vocal Throne
vocal_energy = np.sum(power_chunks[:, (frequencies >= 1000) & (frequencies < 5000)], axis=1)
The fundamental frequency of the human voice is roughly 80–400Hz, but the "presence" of a voice is determined by its harmonic structure, concentrated at 1–5kHz. Sections with high energy ratios in this band likely contain vocals or vocal-like lead melody.
Core of the tradition
Mastering constraints differ fundamentally between vocal and instrumental sections. Applying heavy limiting in vocal sections first sacrifices vocal clarity. The vocal_presence envelope is the physical basis for the Control Layer's judgment of "ease the limiter in this section."
Split Into 6 Bands. 4 Is Not Enough.
BAND_EDGES = {
"sub": (20, 60),
"bass": (60, 200),
"low_mid": (200, 500),
"mid": (500, 2000),
"high": (2000, 8000),
"air": (8000, None)
}Traditional 3-band (Low/Mid/High) or 4-band splits cannot locate problems precisely. Without separating sub and bass, the coexistence of kick and bass is invisible. Without high between mid and high, the location of harshness is unknown. Air above 8kHz is "airiness" — crush it and the sound dies.
Core of the tradition
The 6-band sub/bass/low_mid/mid/high/air split originates from the crossover settings of multiband compressors in mastering studios. Each band has its own unique problems and unique control methods. This code uses 6-band BAND_EDGES because that is exactly why the 4-band spec evaluation scored 44/100.
Chunk Size of 5 Seconds Is a Peace Treaty with OOM
chunk_size = 44100 * 5 # 5 seconds
4x oversampling quadruples memory consumption. Processing a 10-minute track in one pass requires 44100 × 600s × 4× × 8 bytes × 2ch = approximately 17GB of memory — instant REST API container death. Splitting into 5-second chunks and oversampling only chunks where peaks exceed 50% of the total reduces required memory to hundreds of MB.
Core of the tradition
Balancing precision and survival. Oversampling every sample is purism that kills in production. The 50% peak threshold is a statistical judgment: "the probability that True Peak maximum is hiding below this threshold is extremely low."
These are the 12 traditions encoded in audio_analysis_circuit.py v2.3. Nobody knows who said them first. But they are frequency hacks known only to those who survived the field, and this code has converted them into numbers and algorithms for permanent preservation. The philosophy of Yomibito Shirazu lives inside the code.