What the Shutter Saw When the Choir Went Flat
Twenty-eight voices launched into an F major triad, and the baton went insane.
Not a gentle flicker — a full strobe, cycling through green-amber-blue-green at roughly six hertz. The trail on the camera’s preview looked like a seismograph during an earthquake. I’d built this pitch-tracking baton weeks ago for Pitch-Tracking Conductor Baton, and it worked beautifully for live feedback during rehearsal. Photography broke everything.
This post is about why, and how to fix it.
The Vibrato Problem
A trained singer’s vibrato oscillates ±30 cents at around six hertz. That wobble makes a voice sound warm instead of robotic. But the pitch detector doesn’t distinguish between drift (the section going flat over four bars) and wobble (a single singer’s natural vibrato). It just sees frequency deviation and screams about it in colour.
Here’s the math. A cent is one hundredth of a semitone — a unit Alexander Ellis invented in 1885 specifically so musicologists could argue about tuning with decimal precision:
cents = 1200 × log₂(f / 440)
Zero cents is concert A. A semitone is 100 cents. And vibrato swings 30 cents in each direction, six times per second. That’s faster than any camera shutter I’d use for lightpainting.
The original baton used a five-sample averaging window over the pitch estimates — maybe 80ms of smoothing. Good enough to suppress jitter, but far too responsive for an eight-second exposure. Every vibrato cycle registered as a sharp-flat-sharp-flat oscillation. The trail became noise.
The Fix: A Longer Window
For photography, I needed to track trend rather than instant. A 500ms averaging window catches drift while ignoring vibrato. Here’s the implementation on the ESP32:
#define WINDOW_SIZE 32 // ~500ms at 60Hz sample rate
float pitchBuffer[WINDOW_SIZE];
int bufferIndex = 0;
float smoothedCents(float rawCents) {
pitchBuffer[bufferIndex] = rawCents;
bufferIndex = (bufferIndex + 1) % WINDOW_SIZE;
float sum = 0;
for (int i = 0; i < WINDOW_SIZE; i++) {
sum += pitchBuffer[i];
}
return sum / WINDOW_SIZE;
}
This introduces latency — about a quarter-second lag between pitch change and LED response. Unusable for live conducting feedback, but perfect for photography. The colour shifts now track the choir’s drift over a phrase, not the individual wobble within each note.
Polyphony and the Centroid Cheat
Basic pitch detection assumes monophony — one dominant pitch. A four-part choir is polyphonic by definition. Most algorithms fail or produce octave errors (jumping up or down an octave when harmonics confuse the autocorrelation).
Rather than fight this, I track the spectral centroid instead: the weighted average frequency of all the sound energy. When the basses carry the melody, the centroid drops; when the sopranos bloom, it rises. I’m not detecting pitch — I’m detecting where the timbral centre of mass sits.
float centroid = 0;
float totalMagnitude = 0;
for (int i = 0; i < FFT_SIZE / 2; i++) {
float freq = i * SAMPLE_RATE / FFT_SIZE;
float mag = magnitudes[i];
centroid += freq * mag;
totalMagnitude += mag;
}
centroid /= totalMagnitude;
This gives a single scalar per frame, which maps cleanly to a colour gradient. Green when the choir sits around 300-400 Hz (typical mixed-voice blend), amber when it rises, blue when the basses dominate. Less precise than true pitch detection, but more robust.
Camera Settings
When I was shooting RC Light-Trace Calligraphy, I learned that LED brightness needs to be dialled way down — maybe 35% PWM — or the trail becomes a fat, overexposed smear. A thin line reads better than a bright blob.
For the choir work:
- Exposure: 8 seconds (long enough to capture a full phrase, short enough to avoid drift blur)
- Aperture: f/8 (sufficient depth of field for baton motion across the conducting plane)
- ISO: 200 (low noise, since the LED provides all the light)
- LED brightness: 30-35% via PWM
- Room lighting: dim but not dark (the choir needs to read music)
Green LEDs at 555nm sit at peak camera sensor sensitivity and leave the thickest trails per milliwatt. Red cuts through light pollution better but photographs thinner. Blue vanishes into any blue-tinted background.
One Uncomfortable Discovery
A good a cappella choir tunes to just intonation, not equal temperament. A pure major third is 386 cents; equal temperament puts it at 400 cents. When the choir locks into a barbershop-style pure chord, my tracker shows them 14 cents flat. The LED goes blue. The singers are doing exactly what they should — the algorithm is wrong.
I haven’t solved this. The baton assumes equal temperament because that’s what the formula assumes. Detecting just intonation would require knowing the harmonic context, which is a much harder problem. For now, I treat the blue flicker during locked chords as a feature rather than a bug: it marks the moments when the ensemble is listening to each other instead of to the piano.
The camera doesn’t know the difference anyway. It just records whatever colours the baton throws. Sometimes the lie is more interesting than the truth.