SenseCity - Acoustic Surveillance in Real-World | Wei-Cheng Wang's Personal Website

SenseCity x AsaSense: Critical Analysis of Urban Acoustic Surveillance

A strategic research collaboration with the SenseCity project (AsaSense), utilizing city-scale raw acoustic data to expose the failure modes of standard surveillance models and proposing context-aware architectural solutions.

The Research Gap & Motivation

Why “Off-the-Shelf” Fails in the Wild:
Most acoustic surveillance systems are validated on clean, curated datasets. However, their performance on raw, unprocessed urban audio remains largely unverified.

Our Mission:
In collaboration with AsaSense, we accessed a unique stream of continuous, uncurated audio from Ghent and Rotterdam. Instead of just deploying a standard model, our goal was to stress-test two dominant paradigms: anomaly detection and sound tagging, and identify why conventional paradigms fail in dynamic environments (e.g., temporal drift, open-set events), and propose robust alternatives.

Operational Context (The SenseCity Testbed)

This project leveraged a real-world infrastructure to diagnose algorithmic limitations:

Raw Data Ingestion:
Unlike academic datasets, the SenseCity sensor network captures the “messy” reality of cities across two years: wind noise, overlapping soundscapes, and non-stationary backgrounds. Most importantly, without any annotations.
System Audit:
We applied SOTA approaches on anomaly detection and sound tagging models to this raw stream. The analysis revealed that global models generate unmanageable false alarms due to contextual blindness (e.g., treating a weekend market as an anomaly because the model only knew weekday traffic), further causing operator fatigue and leading to system failure.
Core Conclusion:
Our experiments conclusively proved that a single global model is insufficient for city-scale deployment. Instead, Context-Specific Modeling (sensor-specific baselines) is a prerequisite for operational reliability.
Proposed Resolution:
Based on these findings, we formulated a Context-Aware Design Framework, advocating for sensor-specific baselines and adaptive thresholding to handle the inherent variance of city life.

Core Methodologies

Data Source: High-fidelity, long-term raw acoustic logs from the AsaSense deployment (Ghent & Rotterdam).
Diagnosis Method: Cross-context evaluation (Spatial & Temporal Domain Shift).
Algorithmic Focus: Unsupervised Deep Autoregressive Modeling (WaveNet) vs. Pre-trained Tagging Models.
Architecture Design: Feasibility analysis of Hybrid Edge-Cloud pipelines to mitigate bandwidth bottlenecks.

Technical Analysis & Innovations

1. Diagnosing the “Generalization Fallacy”

The Problem: We demonstrated that state-of-the-art anomaly detectors suffer from severe concept drift. A model trained on “winter data” failed catastrophically during summer evenings due to changed human activity patterns.
The Solution: Proposed a Context-Specific Modeling approach, proving that training lightweight, dedicated models for each sensor location significantly outperforms a massive, generic global model in anomaly retrieval.

2. The Limits of Semantic Tagging

The Finding: Standard sound taggers (trained on AudioSet) struggle with the Open-Set Nature of cities. They force novel urban sounds into rigid, pre-defined categories, leading to semantic misalignment.
The Proposal: Suggested moving from “rigid classification” to “unsupervised deviation detection” at the edge, using tagging only as a secondary enrichment layer in the cloud, rather than a primary filter.

3. Architectural Scalability (Edge vs. Cloud)

Analysis: Analyzed the trade-off between transmission cost and detection latency.
Recommendation: Proposed a “Filter-then-Forward” architecture where edge nodes perform lightweight unsupervised screening, transmitting only potential anomalies to the cloud. This reduces bandwidth consumption by orders of magnitude while preserving privacy.

Outcomes & Impact

Empirical Evidence: Provided one of the first comprehensive studies on the limitations of transfer learning in acoustic surveillance using real-world, longitudinal data.
Design Guidelines: The findings established the foundation for Privacy-Preserved & Adaptive Surveillance, directly influencing the design of subsequent research on privacy in surveillance.
Strategic Value: Delivered critical insights to the industrial partner (AsaSense) on avoiding “technical debt” by pivoting from global models to adaptive, edge-based learning.

Resources

Chapter 2: The AsaSense Project - Detailed analysis of deployment constraints and algorithmic failures.