Challenge
The client operates three production plants with 600+ CNC machines, presses, and assembly robots. Maintenance was calendar-based - parts got replaced every N hours regardless of actual wear. Unplanned breakdowns still caused an average of 340 hours of production downtime per month across all plants. Each hour of downtime on the main press line costs roughly €4,200. The existing SCADA system collected data but had no analytics beyond threshold alarms.
Solution
We built a sensor data platform that ingests, processes, and analyzes machine telemetry to predict failures before they happen:
- Edge gateways collecting vibration, temperature, current, and pressure data from 2,400 sensors via MQTT
- Stream processing with Apache Flink for real-time anomaly detection (rolling statistical models per machine type)
- Time-series storage on TimescaleDB with automatic downsampling (raw data retained 90 days, aggregates kept indefinitely)
- ML models (gradient boosting) trained on 18 months of historical failure data to predict remaining useful life of critical components
- Operator dashboard (React + Grafana embeds) with machine health scores, maintenance recommendations, and shift planning integration
- Alert routing to the right maintenance crew based on failure type, machine location, and crew skill matrix
- Deployed on GCP (GKE) with edge compute at each plant for low-latency local processing
Results
- 41% reduction in unplanned downtime (340h/month down to 201h/month across all plants)
- €1.8M annual savings in reduced downtime costs and optimized spare parts inventory
- 2,400 sensors streaming data in real-time with sub-second ingestion latency
- 14-day average advance warning before critical failures (vs. no warning previously)