The system answers one hard question: which vehicle, in which lane, made that noise — with enough proof to issue an infringement. Sound alone gives a bearing, not an identity. The algorithm closes that gap by tying the acoustic bearing to the road surface, then to the ANPR camera that read the plate.
01The pipeline at a glance
A single triggered event flows left → right:
The key idea: a SoundCam hotspot is a direction (a ray), not a 3-D point. Because a road vehicle radiates from a known surface — the road plane — we intersect that ray with the plane to recover a unique real-world position, then re-project it into the ANPR camera. That single geometric step turns "loud over there" into "that vehicle."
02Stage by stage
Capture — time-synchronised sensors
A microphone array (SoundCam) images sound; a Class-1 sound-level meter (IEC 61672, e.g. ACOEM) provides the certified metrics; an Axis camera reads the plate. All three are triggered together, so one event has aligned audio, video, dB metrics and a number plate.
Beamform — the acoustic image
The array "steers" to every point on a grid: each microphone's signal is delayed by its travel-time to that point and the channels are summed. Points near the true source add up in phase (bright); elsewhere they partly cancel (dark). For recorded data the same idea runs in the frequency domain via the cross-spectral matrix.
Why resolution matters: the main-lobe width scales with wavelength ÷ array aperture. Low-frequency noise (long wavelength) localises poorly — the hotspot spreads and can straddle two lanes. That blur is the multi-lane attribution problem, which the uncertainty ellipse in stage 04 quantifies.
Project — onto the road plane (Inverse Perspective Mapping)
The bright pixel in the acoustic map defines a ray from the SoundCam. We intersect that ray
with the road plane z = 0 to get a unique ground point. Because the road is
planar, the camera↔ground relationship is a single homography
H (a 3×3 matrix), calibrated once from four surveyed road points.
ground point: [X Y 1]T ≃ H−1 · [u v 1]T
Attribute — which lane, how sure
The ground point (X, Y) is compared to the surveyed lane centres. The
beamformer's finite resolution is carried through as a covariance, drawn as a
−3 dB uncertainty ellipse on the road. If that ellipse sits
inside one lane → high-confidence attribution; if it straddles a boundary → flagged
attribution ambiguous rather than guessing.
Identify — re-project into the ANPR camera
The same ground point is pushed through the Axis camera's homography to land a marker on the responsible vehicle, where ANPR reads the plate. One world point, two cameras, one identity.
Fuse — the evidence package
Audio, video, the Class-1 acoustic metrics, location, time trace, trigger details and the vehicle ID are bundled into one reviewable package and pushed to SenBOS, in vendor-neutral formats.
03The fusion algorithm — why the marker always lands right
A hotspot is a bearing, not a point. Intersecting that ray with the known road plane (one homography) gives a unique ground point — Inverse Perspective Mapping.
The lobe is placed by back-projecting a pixel to the ground and then re-projecting through the same homography. So calibration error only shears the lobe's perspective — it does not move the marker off the vehicle.
Step by step
2. back-project to road: w = (X, Y) = π( H−1 u )
3. lane = argmink | Y − ck | with −3 dB cov Σ → confidence / ambiguity
4. ANPR marker: uaxis = π( Haxis w )
04What this viewer demonstrates
The app is a teaching/MVP simulator of the geometry and physics above — not the production system. Synthetic scenes run a real beamformer (delay-and-sum and a 56-mic cross-spectral measurement); the live NHVR trailer-08 scenes warp the acoustic lobe onto real road photos through the calibrated homography and place a marker + lane attribution on each vehicle, including multi-vehicle "which lane?" cases.