Applied AI Systems

While my Ph.D. trained me to dissect complex problems with academic rigor, my drive for human connection, rooted in my community leadership, inspires me to build tangible solutions.

Since graduating, I bridge these worlds by leveraging Large Language Models (LLMs) and modern AI stacks to rapidly transform ideas into deployed systems. The following projects represent my agile approach to AI engineering: identifying critical daily needs and engineering practical, high-utility solutions that make life safer and easier.

Agentic Career Orchestrator - Multi-Agent LLM System for Strategic Job Triage & Advisory

Agentic Career Orchestrator - Multi-Agent LLM System for Strategic Job Triage & Advisory

Agentic Career Orchestrator: Multi-Agent LLM System for Strategic Job Triage & Advisory An ROI-driven, local-first AI ecosystem designed to augment human judgment through semantic filtering and strategic advisory in high-noise job markets, while preserving data sovereignty. Current Status: v2 Stable Release (Jan 2026) – full 5‑phase multi‑agent pipeline operational; ongoing prompt refinement, no major architecture changes planned. Motivation Job hunting as a Ph.D. graduate is fundamentally a semantic search problem disguised as a volume game. The core friction lies in the extremely low signal-to-noise ratio inherent in modern recruitment platforms. After defending my doctoral thesis, I found myself navigating a market where a single “Machine Learning Engineer” posting could mean anything from data pipeline maintenance to cutting-edge research implementation. Traditional keyword-based searches fail spectacularly: they cannot capture nuances like visa sponsorship policies, distinguish between “nice-to-have” and “required” qualifications, or recognize that my background in audio-visual surveillance directly transfers to anomaly detection in fintech. The real challenge, therefore, isn’t just about finding jobs: it’s about verification and strategic triage. Manually reviewing hundreds of listings to validate market salaries or check research alignment is exhausting and error-prone. I needed more than a search engine; I needed an ROI-driven intelligent assistant that understands context to filter out the noise, allowing me to allocate my limited cognitive bandwidth exclusively to high-leverage opportunities. Impact & Core Philosophy Since deployment, this system has transformed job hunting from a chaotic volume game into a disciplined strategic campaign. Compressed Time-to-Decision: By automating the Observe-Orient-Decide-Act of job filtering, it reduces complex risk assessment from days to…less days, allowing me to focus bandwidth exclusively on high-leverage opportunities. Deep Semantic Qualification: Moving beyond rigid keyword matching, the system evaluates contextual fit: automatically distinguishing between generic “Machine Learning Engineering” roles and research-oriented positions that truly align with my Ph.D. expertise. Uncovered Asymmetric Opportunities: By aggregating independent assessments from the a Multi-Agent Council, the system synthesizes non-obvious connections, such as bridging my audio-visual surveillance background to fintech anomaly detection, that rigid keyword filters typically discard. The core philosophy is unyielding: to build a system that empowers executive decision-making, not replaces it. I develop this ecosystem not to automatically generate resumes or cover letters for mass applying, but as a specialized analyst team to reduce the burden from repetitive but non-productive works. My aim is by defining the strategic criteria and delegating the ground-level research to the agents, the analyst team handle the preliminary debates and data crunching. This structure allows me to step back from the “grunt work” and focus entirely on the final strategic call, ensuring that every application is tailored with authentic human intuition but backed by machine-speed intelligence. Research Context: The Dual Purpose Beyond its immediate utility, this project serves as a Proof of Concept for integrating system-level engineering into the academic research workflow. Although my core research lies in deep learning rather than system engineering, my past works urge me to think my research from a bigger picture. Consequently, importing robust engineering tooling into the research loop seems to be an effective strategy to accelerate the Proof-of-Concept cycle. By validating the Mixture-of-Advisors framework here, this project explores how we can leverage “system thinking” to orchestrate complex models (like video generators and LoRAs) as distinct agents. This project serves as an architectural pilot for physically‑aware synthetic surveillance data generation, validating the Mixture‑of‑Advisors orchestration pattern before applying it to video generators and task‑specific LoRAs. Beyond its immediate utility, this project serves as a PoC for integrating system-level engineering into the academic research workflow. System Evolution: From Linear Script to Dynamic Orchestration While the core objective remains unchanged (identifying high-fit opportunities from a sea of noise) the architectural approach has fundamentally shifted from a rigid linear protocol to a dynamic resource orchestration. v1: Linear Execution (Legacy) v1 operated as a single monolithic script that enforced a “One-Size-Fits-All” protocol. Every Job Description, regardless of its domain (Academic, Startup, Big Tech), was forced through the exact same processing sequence (Step A → B → C). Rigid Protocol: The system lacked the autonomy to deviate. It applied the same generic analysis prompts to a “Research Scientist” role as it did to a “Backend Engineer” role. Resource Inefficiency: It could not dynamically allocate resources: wasting tokens on irrelevant checks while missing domain-specific nuances that required deeper investigation. v1 was a static procedure: it executed steps blindly, focusing on process completion rather than strategic adaptability. v2: Adaptive Resource Orchestration v2 refactors the system into a flexible multi-agent ecosystem. Instead of a fixed linear path, it employs a Router Agent to analyze the context of each JD and dynamically “spin up” the necessary agents, allocating computational resources only where they yield the highest ROI. Phase 1: Observation (Tool-Augmented Grounding) Before analysis begins, the system proactively invokes external tools (Salary APIs, arXiv retrieval) to ground the JD in reality, ensuring subsequent agents operate on verified data rather than assumptions. Phase 2: Orientation (Gatekeeping) A lightweight Triage Agent performs a rapid “Survival Check.” It instantly discards non-viable roles (Visa/Language constraints) before expensive reasoning agents are instantiated, optimizing the computational budget. Phase 3: Decision (Context-Aware Council) This is the core architectural shift. The system dynamically assembles a Council of Advisors specific to the role. For a Research Scientist Role: It spins up the 🔬 Academic Reviewer, assessing the domain gap between my research and the target field. For an Early-Stage Founding Engineer: It activates the 🚀 Startup Veteran (analyzing equity potential & risk) and 🏗️ System Architect (assessing scalability requirements). The Technical Advantage: By creating specialized agents, we ensure Context Isolation. This prevents “Context Pollution”, where an LLM gets confused by irrelevant information (e.g., applying corporate HR standards to a scrappy startup role). Each agent sees only what it needs to see, ensuring the signal remains pure. Phase 4 – Strategy War Room (Clustering & ROI) The system steps back from single JDs and enters a war‑room view across all dossiers. It clusters opportunities by similarity patterns (domain, seniority, tech stack, narrative angle) and estimates the ROI of working on each cluster, producing a ranked “battle plan” that prioritizes where limited effort should be invested first. Phase 5 – Briefing & War Room Editor Finally, a briefing agent synthesizes per‑cluster guidance into concise strategy briefs, while the war‑room editor generates structured tables and bullet‑level action items instead of full ghostwritten documents. It suggests narrative angles, surfaces reusable experience blocks, and prepares checklists of edits so that I stay in control of the final wording while the system handles the pattern‑matching and retrieval work. The evolution mirrors the transition from a hard‑coded script to an intelligent orchestrator: v1 was process‑centric (“run every file through these 3 steps”), whereas v2 is resource‑centric (“analyze the target, deploy the right agents, and synthesize the strategy”). The Architectural Shift The evolution mirrors the transition from a Hard-coded Script to an Intelligent Orchestrator: v1 Approach: Process-Centric. “Run every file through these 3 steps.” v2 Approach: Resource-Centric. “Analyze the target, deploy the right agents, and synthesize the strategy.” By moving from linear execution to dynamic orchestration, v2 ensures that every decision is backed by the right experts, adapting the system’s behavior to the chaos of the real-world market. Tech Stack LLM & Orchestration: Python 3.11, Google Generative AI SDK (Gemini API), semantic routing via a Smart Model Gateway. Models: Gemma‑3‑27b‑it for logic & extraction; Gemini‑2.5‑Flash for long‑context strategic synthesis. Memory & RAG: ChromaDB (all-MiniLM-L6-v2), Recursive Character Splitting, JSON/Markdown Serialization Infrastructure: Docker Compose, .env‑based local path binding, local‑first storage of CVs and history. Technical Architecture & Implementation System Architecture Overview The v2 system implements a multi-phase, multi-agent orchestration pipeline with clear separation of concerns. Unlike monolithic single-script approaches, each phase is encapsulated as an independent module with dedicated agents, enabling modular development and targeted optimization. Core Design Principles: Phase-Based Pipeline: Five sequential phases (Intel → Triage → Council → War Room → Briefing) with explicit data contracts between stages. Hybrid Model Gateway: A cost-aware routing layer that intelligently switches between quota-friendly models (Gemma) for extraction/filtering and high-capacity models (Gemini Flash) for deep reasoning. Structured Knowledge Layer: A local vector database designed for the semantic retrieval of reusable CV sections per JD. Implementation Stack Core Infrastructure: Python 3.11, Google Generative AI SDK, Docker Compose. Structured Output: Enforced JSON schemas ensure reliable agent communication and metadata parsing. Hybrid Inference Layer: The Workhorse (Gemma 3-27B-it): Handles ~90% of the workload, including JD parsing, keyword extraction, triage filtering, and initial gap analysis. Chosen for its high Rate-Limit (RPD) and efficiency. The Heavy Lifter (Gemini 2.5 Flash): Activated only for tasks requiring massive context windows or deep reasoning, such as deep RAG retrieval and strategic synthesis. Smart Gateway: A router that selects the appropriate model based on task complexity and real-time daily quota availability. Memory & Knowledge: Vector Store: ChromaDB with MiniLM embeddings. Indexing: Personal CV, academic papers, and historical applications. Purpose: Enables the system to retrieve relevant experience blocks and reusable resume snippets for each specific JD. Multi-Agent Architecture The system executes a structured workflow to ensure comprehensive coverage: Phase 1: Intelligence Gathering (The Scout) Parses raw JDs (with text extraction caching) and invokes external tools (e.g., arXiv API for research group validation, Mock Salary Validator). Output: An Enriched Dossier containing external context (team credibility, market salary band). Phase 2: Triage & Gatekeeping (The Gatekeeper) Enforces hard constraints (Visa, PhD relevance, Salary floor). Output: A structured triage decision. Only “playable” JDs proceed to the Council, saving compute resources. Phase 3: Mixture-of-Advisors (The Council) Router Agent: Dynamically selects multiple advisors within relevant field per JD (e.g., Academic Analyst and Leadership for senior research roles). Context Isolation: Each advisor has a dedicated persona definition and memory state, preventing context pollution. Output: Per-advisor scores and rationales stored in dossier metadata for downstream aggregation. Phase 4: Strategy War Room (Clustering & ROI) In Phase 4, the system steps back from single JDs and enters a War Room view across all dossiers. Instead of treating each role as an isolated decision, it clusters opportunities by similarity patterns (domain, seniority, tech stack, narrative angle) and estimates the ROI of working on each cluster. For each cluster, the War Room agent looks at: Rewrite effort: how much real editing is needed beyond a 5‑minute tweak. Reusability: whether a single narrative rewrite can unlock multiple similar roles. Strategic leverage: whether this cluster advances my long‑term trajectory (e.g., research scientist track vs. generic MLE). The output of Phase 4 is a ranked “battle plan”: a prioritized queue of clusters with concrete reasons for why they deserve attention now versus later. Phase 5: Briefing & War Room Editor Phase 5 turns strategy into execution support rather than a ghostwriter. The Briefing agent synthesizes per‑cluster guidance into a concise strategy brief: which project angles to foreground, which gaps to acknowledge, and which phrasing patterns are reusable across roles. Instead of generating full resumes or cover letters, the War Room Editor produces structured tables and bullet‑level action items that I can copy into my own documents as needed. It surfaces: Suggested narrative angles per cluster and per JD (e.g., for Job A, frame my audio‑visual surveillance work as a privacy‑preserving mechanism; for Job B, emphasize high‑throughput anomaly detection). Phrases and experience blocks retrieved from my CV/history that are safe to reuse. A short checklist of edits for each application (what to add, what to cut, what to reframe). This keeps the human firmly in control of the final wording while letting the system handle the tedious pattern‑matching and retrieval work. Cost & Efficiency Optimization The architecture is strictly ROI-Driven, prioritizing resource allocation to minimize waste: Aggressive Caching: JD text extraction and intermediate reasoning steps are cached locally to avoid redundant API calls. History Reusability: The RAG module is designed to retrieve reusable sentences from past successful applications, reducing the manual effort in rewriting the same sentence over and over again. Resources System Architecture Diagram GitHub Repository Analysis Sample - Example of Analysis Report of a Fake JD.

Taiwanese in Ghent, The Survivor Kit - A Serverless LLM-Agent Deployment

Taiwanese in Ghent, The Survivor Kit - A Serverless LLM-Agent Deployment

Taiwanese in Ghent, The Survivor Kit : AI-Powered Community Platform A serverless, AI-driven information hub designed to automate community management and solve information fragmentation for international students. Motivation & Product Philosophy Taiwanese in Ghent, The Survivor Kit is a comprehensive survival guide platform. Originally engineered for students, I collaborated with the current president of UGent Taiwanese Student Association (TSA) to redefine the product roadmap, expanding its scope to serve the entire Taiwanese expatriate community. This ensured the system aligned with actual operational needs rather than just technical novelty. “I built this not just as a developer, but as the former President who identified the root cause of platform failure.” I recognized that previous platforms failed due to high operational friction. To solve this, I set a strict constraint: The system must be “low-maintenance” and operable by non-tech staff. This drove the decision to adopt a serverless architecture combined with autonomous AI agents, allowing for rapid iteration and a “set-and-forget” operational model. Role: Product Owner & Full-Stack Engineer Scope: Requirement Analysis → System Architecture → AI Agent Development → CI/CD Tech Stack AI & NLP: Python, Gemma 3 4B (LLM), Feedparser (RSS), Prompt Engineering Backend / CMS: Google Sheets API (NoSQL/CMS), Google Apps Script, Event-Driven ETL Frontend: Next.js 14 (App Router), TypeScript, Tailwind CSS, ISR Infrastructure: Vercel (Serverless), GitHub Actions (CI/CD), Docker Technical Architecture & Implementation 1. AI-Driven Intelligence Pipeline (Event-Driven ETL) The core innovation is an automated pipeline that monitors, analyzes, and translates local news without human intervention, effectively functioning as a domain-specific AI agent: Data Ingestion: A Python-based agent continuously monitors municipal RSS feeds (stad.gent) and emergency alerts. LLM Integration (Gemma 3 4B): Deployed Gemma 3 4B to perform semantic analysis on raw Dutch texts. Structured Prompt Engineering: Designed rigorous prompt templates to enforce valid JSON output from the LLM. Tasks include: Importance Grading (Level 1-3), Audience Classification (Student vs. Resident), Traditional Chinese Translation, and Summarization. Robustness: Implemented retry logic with exponential backoff to handle API rate limits and ensure pipeline reliability. ETL Execution: Structured data is automatically validated and written back to the Google Sheets CMS, triggering frontend updates. 2. Serverless Full-Stack Architecture Designed a cost-efficient architecture suitable for long-term operation: Headless CMS (Google Sheets): Abstracted Google Sheets into a JSON API. This allows non-technical staff to manage content via a familiar spreadsheet interface, eliminating database costs ($0/month) and lowering the maintenance barrier. Frontend (Next.js 14): Implemented incremental static regeneration (60s revalidation) to ensure high performance and SEO while keeping data fresh. 3. CI/CD & DevOps GitHub Actions: Orchestrated daily cron jobs (UTC 6:00) to execute the news crawling and AI analysis agents. Security & Reproducibility: Managed API Secrets via GitHub Secrets and utilized docker to ensure environment consistency for the AI agents. Automated Deployment: Configured Vercel for automatic deployments on git push, establishing a production-ready lifecycle. Key Results & Impact 100% Automation: Achieved a fully automated loop for news gathering, translation, classification, and publishing. Zero Operational Cost: Leveraged serverless tiers to maintain costfree_, ensuring the project’s financial sustainability for the student association. Solved “Technical Debt”: Created a system that requires no coding skills to maintain, addressing the high turnover rate inherent in student organizations. Resources Live Website GitHub Repository AI Agent Source Code - Python agent for scraping and LLM processing. Prompt Engineering Templates - Structured prompts for Gemma 3-4B.

Research Projects

While my applied work prioritizes user-centric utility, it is built upon a foundation of rigorous academic inquiry established during my doctoral and master's studies.
My academic journey at Ghent University and NCKU centered on analyzing real-world data within the fields of surveillance and driver monitoring, with a specific emphasis on audio-visual modalities. I investigated the critical gap between controlled lab environments and unpredictable real-world deployments, proposing novel mechanisms and unsupervised frameworks to bridge this divide. My research spans computer vision, audio processing, and multimodal representation learning, extending into privacy preservation and transferability assessment. This body of work represents my dedication to pushing the boundaries of what AI can perceive without compromising the rights of the people it protects.

From Lab to Street:Transferable and Privacy-friendly Deep Learning for Urban Surveillance

From Lab to Street:Transferable and Privacy-friendly Deep Learning for Urban Surveillance

“AI is here they say.” We have witnessed how artificial intelligence has reshaped our perception of the world in the past years. AlphaFold 2 achieved decades of work within a few weeks on predicting protein structures to accelerate medical breakthroughs. Meanwhile, GNoME discovered 380k new structures within a few months, which is equivalent to 800 years of work using the traditional way. However, we have also seen some domains struggle to apply state-of-the-art research to real-world applications. Surveillance, the chaotic, unscripted environment of our streets, constrained by privacy, scarcity, and unpredictability, is one of them. This part of my Ph.D. research tackles the critical bottlenecks and bridges the gap between academic research and real-world deployment on different aspects: privacy risks, data scarcity, and environmental domain shifts. This project is not merely a collection of frameworks designed to extract marginal accuracy gains on scripted datasets; it is about how we can adapt advanced models to handle the chaotic, constrained, and unscripted disorder of the real world. The dissertation is built upon three core technical pillars, and culminated in a Ph.D. degree from Ghent University: Privacy-Friendly Sensing Framework Audio-Visual Representation Learning Source-Free Unsupervised Transferability Assessment Privacy-Friendly Sensing Framework The “Opt-in” Mechanism: From Audio to Visual Traditional privacy protection is fundamentally reactive (Opt-out). Users are forced to play an infinite game of whack-a-mole, trying to list every sensitive attribute they want to hide. However, raw data is inherently “bundled”: a simple voice command carries not just the semantic content, but also speaker’s gender, emotion, and identity. In the real world, it is impossible to exhaustively list and block every potential leakage. This “collect first, sanitize later” approach violates the core principle of GDPR, leaving users vulnerable to future, unforeseen extraction techniques. Addressing the trade-off between utility and privacy. I proposed a fundamental inversion of this paradigm. Instead of asking “what should we hide?”, my framework asks “what is strictly necessary?”. Using adversarial learning, I trained an on-edge obfuscator adapted from a generative architecture (CycleGAN-VC2), designed to protect attributes like speaker identity, emotion, and gender while maintaining compatibility with downstream models (e.g., DeepSpeech2).This model acts as a digital sieve that actively strips away the “bundled” sensitive attributes (like identity) at the signal level, while selectively preserving only the features required for the authorized task (e.g., speech recognition). It transforms privacy from a passive policy into an active, mathematical constraint. Crucially, this framework solves the deployment bottleneck. The obfuscated data remains mathematically compatible with off-the-shelf models. This means service providers can “plug-in” this privacy module at the edge without needing to retrain their massive backend models. It offers a source-free, scalable path to GDPR compliance that protects users without dismantling the existing AI infrastructure. Experiments on four speech datasets demonstrate that the framework suppresses unauthorized attribute recognition to near-random chance levels, while incurring a minimal performance drop (only 2-6%) on authorized tasks. Audio Domain: Published in IEEE Pervasive Computing (1st author). Visual Domain: The core “opt-in” logic proved robust enough to be adapted to the visual domain, validating its cross-modal universality. Published in Applied Intelligence (2nd author). Audio-Visual Representation Learning: The Paradox of Misalignment: Turning False Negatives into Semantic Anchors Contrastive learning is an effective way to learn representation without labels. Yet, conventional contrastive learning on multimodal data, such as surveillance, suffers from false negatives. When two ambulance cars occur at different times during the night, the temporal coherence constraint used by traditional contrastive learning treats them as unrelated events. These false negatives lead to inefficient and ineffective representation learning. In urban surveillance, spatiotemporal discontinuity is the norm, not the exception. A siren is often heard before the ambulance appears; a crash sound precedes the visual collision. By rigidly enforcing temporal alignment, traditional models discard these meaningful but asynchronous correlations as false negatives, actively unlearning the causal structure of reality. Furthermore, traditional methods suffer from an information bottleneck: by relying on a single positive pair (the exact timestamp), the model learns only the minimal sufficient features needed to match that pair, discarding rich semantic details essential for generalization. Instead of treating asynchronous signals as errors to be filtered, I utilized them as semantic anchors. I developed the Embedding-based Pair Generation (EPG) mechanism, which operates on a simple premise: if two signals share high similarity in the latent space, they belong to the same event regardless of their timestamp. Dynamic Pair Re-evaluation: EPG actively retrieves these “misaligned” samples from the memory bank and re-labels them as positive pairs. Multi-Positive Contrastive Loss: By forcing the model to recognize multiple, time-scattered instances of the same event, we break the information bottleneck. This compels the encoder to capture richer, more robust features rather than just the minimal cues needed for temporal alignment. This approach successfully transformed the chaotic charateristics of surveillance data from a performance bottleneck into a source of data augmentation. Performance: Achieved a 10% improvement over state-of-the-art baselines (TACMA, MAViL) in audio-visual event localization. Rich in Information: The proposed EPG and the multi-positive loss force the model to capture dense, semantic features. This learnt representation is general-purpose, successfully powering multiple downstream tasks including event localization, anomaly detection, and query-guided event search without retraining. Scalability: Such versatility dramatically improves the scalability of edge deployments. Instead of installing separate, heavy models for each function, a single lightweight encoder can now serve multiple analytical tasks simultaneously. The results has been published in Frontiers in Robotics and AI, 1st author. Transferability Assessment: Navigating Without a Map: The “Source-Free” Compass One of the greatest hurdle in large-scale deployment is domain shift: a model trained on a sunny day in Ghent often fails miserably on a rainy night in Taipei. In an ideal world, we would access the original training data (source data) to bridge this gap, and the well-annotated target data during the evalutaion. But in the real world, strict privacy regulations (like GDPR) often lock this data away. Engineers are forced to deploy models into new, unseen environments while effectively flying blind, unable to predict which model will survive the shift. To solve this, I leveraged the underexplored potentional of an unlikely guide: Randomness. I proposed a novel assessment framework using Randomly Initialized Neural Networks (RINNs). My research revealed that while random networks contain no knowledge, their statistical structure provides a consistent, unbiased “universal ruler.” By measuring the Centered Kernel Alignment (CKA) between a pre-trained model and a set of random networks, I derived a “fingerprint” of the model’s structural adaptability. This allows us to assess model compatibility with a new environment without ever touching the restricted source data or requiring ground-truth labels. This turns model selection from a guessing game into a precise science. Task-Agnostic Validation: I validated this metric across a spectrum of real-world surveillance tasks, ranging from object tagging and event classification to the more abstract anomaly detection. High Correlation: Evaluating on diverse real-world datasets, my metric achieved a Kendall’s $\tau$ correlation of 0.95 with actual model performance. Operational Efficiency: It acts as a “Source-Free Compass,” allowing engineers to instantly identify the best-suited model for a specific camera feed before deployment, ensuring reliability while strictly respecting data sovereignty. The results has been published in Sensors, 1st author.

SensCity - Acoustic Surveillance in Real-World

SensCity - Acoustic Surveillance in Real-World

SensCity x AsaSense: Critical Analysis of Urban Acoustic Surveillance A strategic research collaboration with the SensCity project (AsaSense), utilizing city-scale raw acoustic data to expose the failure modes of standard surveillance models and proposing context-aware architectural solutions. The Research Gap & Motivation Why “Off-the-Shelf” Fails in the Wild: Most acoustic surveillance systems are validated on clean, curated datasets. However, their performance on raw, unprocessed urban audio remains largely unverified. Our Mission: In collaboration with AsaSense, we accessed a unique stream of continuous, uncurated audio from Ghent and Rotterdam. Instead of just deploying a standard model, our goal was to stress-test two dominant paradigms: anomaly detection and sound tagging, and identify why conventional paradigms fail in dynamic environments (e.g., temporal drift, open-set events), and propose robust alternatives. Operational Context (The SensCity Testbed) This project leveraged a real-world infrastructure to diagnose algorithmic limitations: Raw Data Ingestion: Unlike academic datasets, the SensCity sensor network captures the “messy” reality of cities across two years: wind noise, overlapping soundscapes, and non-stationary backgrounds. Most importantly, without any annotations. System Audit: We applied SOTA approaches on anomaly detection and sound tagging models to this raw stream. The analysis revealed that global models generate unmanageable false alarms due to contextual blindness (e.g., treating a weekend market as an anomaly because the model only knew weekday traffic), further causing operator fatigue and leading to system failure. Core Conclusion: Our experiments conclusively proved that a single global model is insufficient for city-scale deployment. Instead, Context-Specific Modeling (sensor-specific baselines) is a prerequisite for operational reliability. Proposed Resolution: Based on these findings, we formulated a Context-Aware Design Framework, advocating for sensor-specific baselines and adaptive thresholding to handle the inherent variance of city life. Core Methodologies Data Source: High-fidelity, long-term raw acoustic logs from the AsaSense deployment (Ghent & Rotterdam). Diagnosis Method: Cross-context evaluation (Spatial & Temporal Domain Shift). Algorithmic Focus: Unsupervised Deep Autoregressive Modeling (WaveNet) vs. Pre-trained Tagging Models. Architecture Design: Feasibility analysis of Hybrid Edge-Cloud pipelines to mitigate bandwidth bottlenecks. Technical Analysis & Innovations 1. Diagnosing the “Generalization Fallacy” The Problem: We demonstrated that state-of-the-art anomaly detectors suffer from severe concept drift. A model trained on “winter data” failed catastrophically during summer evenings due to changed human activity patterns. The Solution: Proposed a Context-Specific Modeling approach, proving that training lightweight, dedicated models for each sensor location significantly outperforms a massive, generic global model in anomaly retrieval. 2. The Limits of Semantic Tagging The Finding: Standard sound taggers (trained on AudioSet) struggle with the Open-Set Nature of cities. They force novel urban sounds into rigid, pre-defined categories, leading to semantic misalignment. The Proposal: Suggested moving from “rigid classification” to “unsupervised deviation detection” at the edge, using tagging only as a secondary enrichment layer in the cloud, rather than a primary filter. 3. Architectural Scalability (Edge vs. Cloud) Analysis: Analyzed the trade-off between transmission cost and detection latency. Recommendation: Proposed a “Filter-then-Forward” architecture where edge nodes perform lightweight unsupervised screening, transmitting only potential anomalies to the cloud. This reduces bandwidth consumption by orders of magnitude while preserving privacy. Outcomes & Impact Empirical Evidence: Provided one of the first comprehensive studies on the limitations of transfer learning in acoustic surveillance using real-world, longitudinal data. Design Guidelines: The findings established the foundation for Privacy-Preserved & Adaptive Surveillance, directly influencing the design of subsequent research on privacy in surveillance. Strategic Value: Delivered critical insights to the industrial partner (AsaSense) on avoiding “technical debt” by pivoting from global models to adaptive, edge-based learning. Resources Chapter 2: The AsaSense Project - Detailed analysis of deployment constraints and algorithmic failures.

Multimodal Driver Monitoring & Temporal Face Analysis

Multimodal Driver Monitoring & Temporal Face Analysis

Multimodal Driver Safety System & Robust Face Analysis A holistic driver monitoring framework developed with ARTC, fusing visual temporal dynamics and ECG signals to enable early anomaly detection and proactive safety intervention. The Research Gap & Motivation From Passive Recording to Proactive Intervention: Standard recognition models often fail in real-world cockpits due to inter-personal variability. A generic model struggles to distinguish between a driver’s natural features (e.g., droopy eyelids) and fatigue. Our Goal: To build a safety-critical system capable of early detection of compromised states by combining non-intrusive visual monitoring with physiological signals (ECG), reducing false alarms and ensuring timely intervention. Operational User Scenario (How it Works) To address the variability mentioned above, the system operates in a three-stage safety loop: Initialization (The “Handshake”): When the driver starts the car, the system silently records a short “calibration sequence” to learn their current appearance (e.g., wearing sunglasses, heavy makeup, or fatigue). This establishes a Personalized Normal Driving Model (PNDM) for the specific trip. Dynamic Monitoring: As the vehicle moves through changing environments (e.g., entering a dark tunnel or facing high-beam glare), the alignment-free visual descriptor maintains robust tracking without being confused by lighting shifts. Proactive Intervention: If the driver shows signs of drowsiness (e.g., prolonged eye closure) AND the ECG sensor detects physiological fatigue, the system triggers a multi-stage alert—first warning the driver, and in critical cases, notifying fleet management or emergency services. Core Methodologies Visual Algorithms: Temporal Coherent Face Descriptor (alignment-free, robust to lighting). System Integration: Multimodal Sensor Fusion (Vision + ECG). Modeling Strategy: Sparse Representation-based Classification with online dictionary learning. Validation: Co-developed and tested with the Automotive Research & Testing Center (ARTC). Technical Architecture & Innovations 1. Personalized Calibration (User-Centric Design) The Problem: Drivers look different every day. Pre-trained generic models fail when users change appearance. The Solution: Implemented a rapid initialization phase that builds a dynamic baseline for each trip. The algorithm detects anomalies based on relative deviation from this baseline, effectively filtering out noise from accessories or facial structure. 2. Robust Temporal Modeling (Visual Subsystem) Alignment-Free: By leveraging temporal consistency across continuous frames, we eliminated the need for fragile face alignment steps, ensuring stability even under rapid head movements. Lighting Invariance: Utilized intensity contrast descriptors to maintain accuracy in challenging lighting conditions (e.g., nighttime driving validated in NCKU-driver database). 3. Proactive Safety Trigger (System Level) Multimodal Logic: Designed the visual module to work in tandem with ECG sensors. While ECG detects physiological drops in alertness, our visual module confirms behavioral lapses (e.g., nodding off). Impact: This cross-verification significantly reduces false positives, ensuring that alerts are only triggered for genuine safety risks. Outcomes & Validation Industry Collaboration: Co-developed with ARTC. Award-Winning: Secured Second Place at the International ICT Innovative Services Awards. Performance: Achieved real-time performance and superior accuracy over state-of-the-art baselines in nighttime scenarios. Resources Publications: Wang Wei-Cheng, Ru-Yun Hsu, Chun-Rong Huang, Li-You Syu (2015). Video gender recognition using temporal coherent face descriptor. IEEE/ACIS SNPD 2015. Chien-Yu Chiou, Wang Wei-Cheng, Shueh-Chou Lu, Chun-Rong Huang, Pau-Choo Chung, Yun-Yang Lai (2019). Driver Monitoring Using Sparse Representation With Part-Based Temporal Face Descriptors. IEEE T-ITS.

Research Engineering

"Where theoretical rigor meets production constraints."

This section showcases my work in translating complex research algorithms into robust, deployable systems. Here, the focus is on performance, reliability, and architectural precision.