Acoustic Camera — End-to-End Workflow & Algorithm

The system answers one hard question: which vehicle, in which lane, made that noise — with enough proof to issue an infringement. Sound alone gives a bearing, not an identity. The algorithm closes that gap by tying the acoustic bearing to the road surface, then to the ANPR camera that read the plate.

01The pipeline at a glance

A single triggered event flows left → right:

01 · SENSE

Capture

SoundCam mic array + Class-1 SLM + Axis ANPR, time-synced

02 · LOCALISE

Beamform

delay-and-sum sound map — where is the energy?

03 · PROJECT

Road plane

intersect the acoustic ray with the road (IPM)

04 · ATTRIBUTE

Lane + vehicle

which lane, with an uncertainty ellipse + confidence

05 · IDENTIFY

ANPR

re-project to the Axis view → read the plate

06 · FUSE

Evidence

audio + video + metrics + ID → SenBOS package

The key idea: a SoundCam hotspot is a direction (a ray), not a 3-D point. Because a road vehicle radiates from a known surface — the road plane — we intersect that ray with the plane to recover a unique real-world position, then re-project it into the ANPR camera. That single geometric step turns "loud over there" into "that vehicle."

02Stage by stage

Capture — time-synchronised sensors

A microphone array (SoundCam) images sound; a Class-1 sound-level meter (IEC 61672, e.g. ACOEM) provides the certified metrics; an Axis camera reads the plate. All three are triggered together, so one event has aligned audio, video, dB metrics and a number plate.

Beamform — the acoustic image

The array "steers" to every point on a grid: each microphone's signal is delayed by its travel-time to that point and the channels are summed. Points near the true source add up in phase (bright); elsewhere they partly cancel (dark). For recorded data the same idea runs in the frequency domain via the cross-spectral matrix.

b(p) = Σ_m s_m( t − |p − r_m| / c ) → power(p) = ⟨ b(p)² ⟩

s_m = signal at mic m · r_m = mic position · p = focus point · c = speed of sound

Why resolution matters: the main-lobe width scales with wavelength ÷ array aperture. Low-frequency noise (long wavelength) localises poorly — the hotspot spreads and can straddle two lanes. That blur is the multi-lane attribution problem, which the uncertainty ellipse in stage 04 quantifies.

Project — onto the road plane (Inverse Perspective Mapping)

The bright pixel in the acoustic map defines a ray from the SoundCam. We intersect that ray with the road plane z = 0 to get a unique ground point. Because the road is planar, the camera↔ground relationship is a single homography H (a 3×3 matrix), calibrated once from four surveyed road points.

[u v 1]^T ≃ H · [X Y 1]^T (ground ↔ image)
ground point: [X Y 1]^T ≃ H⁻¹ · [u v 1]^T

world frame: X = along-road, Y = across-road (lane offset), Z = up; road = plane z=0

Attribute — which lane, how sure

The ground point (X, Y) is compared to the surveyed lane centres. The beamformer's finite resolution is carried through as a covariance, drawn as a −3 dB uncertainty ellipse on the road. If that ellipse sits inside one lane → high-confidence attribution; if it straddles a boundary → flagged attribution ambiguous rather than guessing.

Identify — re-project into the ANPR camera

The same ground point is pushed through the Axis camera's homography to land a marker on the responsible vehicle, where ANPR reads the plate. One world point, two cameras, one identity.

Fuse — the evidence package

Audio, video, the Class-1 acoustic metrics, location, time trace, trigger details and the vehicle ID are bundled into one reviewable package and pushed to SenBOS, in vendor-neutral formats.

WAV / FLAC audioMP4 / H.264 video JPEG / PNG imagesCSV / JSON metrics LAeq · LAmax · LA1/10/90plate + lane + confidence

03The fusion algorithm — why the marker always lands right

The insight

A hotspot is a bearing, not a point. Intersecting that ray with the known road plane (one homography) gives a unique ground point — Inverse Perspective Mapping.

Why calibration error is forgiving

The lobe is placed by back-projecting a pixel to the ground and then re-projecting through the same homography. So calibration error only shears the lobe's perspective — it does not move the marker off the vehicle.

Step by step

1. acoustic peak pixel u = (u, v)
2. back-project to road: w = (X, Y) = π( H⁻¹ u )
3. lane = argmin_k | Y − c_k | with −3 dB cov Σ → confidence / ambiguity
4. ANPR marker: u_axis = π( H_axis w )

π(·) = homogeneous → cartesian divide · c_k = lane-centre offsets · H from 4-point DLT calibration

04What this viewer demonstrates

The app is a teaching/MVP simulator of the geometry and physics above — not the production system. Synthetic scenes run a real beamformer (delay-and-sum and a 56-mic cross-spectral measurement); the live NHVR trailer-08 scenes warp the acoustic lobe onto real road photos through the calibrated homography and place a marker + lane attribution on each vehicle, including multi-vehicle "which lane?" cases.

Open the interactive viewer →