← back to viewer

End-to-End Workflow & Algorithm

How a roadside acoustic camera turns a noisy moment into legally-defensible evidence attributed to one vehicle in one lane.

The system answers one hard question: which vehicle, in which lane, made that noise — with enough proof to issue an infringement. Sound alone gives a bearing, not an identity. The algorithm closes that gap by tying the acoustic bearing to the road surface, then to the ANPR camera that read the plate.

01The pipeline at a glance

A single triggered event flows left → right:

01 · SENSE
Capture
SoundCam mic array + Class-1 SLM + Axis ANPR, time-synced
02 · LOCALISE
Beamform
delay-and-sum sound map — where is the energy?
03 · PROJECT
Road plane
intersect the acoustic ray with the road (IPM)
04 · ATTRIBUTE
Lane + vehicle
which lane, with an uncertainty ellipse + confidence
05 · IDENTIFY
ANPR
re-project to the Axis view → read the plate
06 · FUSE
Evidence
audio + video + metrics + ID → SenBOS package

The key idea: a SoundCam hotspot is a direction (a ray), not a 3-D point. Because a road vehicle radiates from a known surface — the road plane — we intersect that ray with the plane to recover a unique real-world position, then re-project it into the ANPR camera. That single geometric step turns "loud over there" into "that vehicle."

02Stage by stage

Capture — time-synchronised sensors

A microphone array (SoundCam) images sound; a Class-1 sound-level meter (IEC 61672, e.g. ACOEM) provides the certified metrics; an Axis camera reads the plate. All three are triggered together, so one event has aligned audio, video, dB metrics and a number plate.

Beamform — the acoustic image

The array "steers" to every point on a grid: each microphone's signal is delayed by its travel-time to that point and the channels are summed. Points near the true source add up in phase (bright); elsewhere they partly cancel (dark). For recorded data the same idea runs in the frequency domain via the cross-spectral matrix.

b(p) = Σm sm( t − |prm| / c )  →  power(p) = ⟨ b(p)² ⟩
sm = signal at mic m · rm = mic position · p = focus point · c = speed of sound

Why resolution matters: the main-lobe width scales with wavelength ÷ array aperture. Low-frequency noise (long wavelength) localises poorly — the hotspot spreads and can straddle two lanes. That blur is the multi-lane attribution problem, which the uncertainty ellipse in stage 04 quantifies.

Project — onto the road plane (Inverse Perspective Mapping)

The bright pixel in the acoustic map defines a ray from the SoundCam. We intersect that ray with the road plane z = 0 to get a unique ground point. Because the road is planar, the camera↔ground relationship is a single homography H (a 3×3 matrix), calibrated once from four surveyed road points.

[u v 1]T ≃ H · [X Y 1]T   (ground ↔ image)
ground point:   [X Y 1]T ≃ H−1 · [u v 1]T
world frame: X = along-road, Y = across-road (lane offset), Z = up; road = plane z=0

Attribute — which lane, how sure

The ground point (X, Y) is compared to the surveyed lane centres. The beamformer's finite resolution is carried through as a covariance, drawn as a −3 dB uncertainty ellipse on the road. If that ellipse sits inside one lane → high-confidence attribution; if it straddles a boundary → flagged attribution ambiguous rather than guessing.

Identify — re-project into the ANPR camera

The same ground point is pushed through the Axis camera's homography to land a marker on the responsible vehicle, where ANPR reads the plate. One world point, two cameras, one identity.

Fuse — the evidence package

Audio, video, the Class-1 acoustic metrics, location, time trace, trigger details and the vehicle ID are bundled into one reviewable package and pushed to SenBOS, in vendor-neutral formats.

WAV / FLAC audioMP4 / H.264 video JPEG / PNG imagesCSV / JSON metrics LAeq · LAmax · LA1/10/90plate + lane + confidence

03The fusion algorithm — why the marker always lands right

The insight

A hotspot is a bearing, not a point. Intersecting that ray with the known road plane (one homography) gives a unique ground point — Inverse Perspective Mapping.

Why calibration error is forgiving

The lobe is placed by back-projecting a pixel to the ground and then re-projecting through the same homography. So calibration error only shears the lobe's perspective — it does not move the marker off the vehicle.

Step by step

1.  acoustic peak pixel   u = (u, v)
2.  back-project to road:   w = (X, Y) = π( H−1 u )
3.  lane = argmink | Y − ck |   with −3 dB cov Σ → confidence / ambiguity
4.  ANPR marker:   uaxis = π( Haxis w )
π(·) = homogeneous → cartesian divide · ck = lane-centre offsets · H from 4-point DLT calibration

04What this viewer demonstrates

The app is a teaching/MVP simulator of the geometry and physics above — not the production system. Synthetic scenes run a real beamformer (delay-and-sum and a 56-mic cross-spectral measurement); the live NHVR trailer-08 scenes warp the acoustic lobe onto real road photos through the calibrated homography and place a marker + lane attribution on each vehicle, including multi-vehicle "which lane?" cases.

Open the interactive viewer →