Capturing and analyzing sensor data in real-time, finding and matching risk patterns
Modern open-source technologies are just brilliant with their variety, functionality and ease of use. There are plenty of ways to build conceptually new business solutions for wining the competition or even taking the lead. From that perspective, one interesting solution class is a Real-Time Sensor Data Analysis concept that I’m going to describe in this blog.
A transportation company has decided to implement an advanced solution for preventive accident monitoring. The solution would collect real-time statistics using sensors installed into trucks, match that to risk patterns on the backend and show the dashboard with level of overall risk and top 10 risk cases. Further this could be handled by automated system to ward the driver or by operators to take immediate measures (call and suggest taking a break, for example).
There are 1000 trucks across the country with 2500 drivers having 40-50 hours per week work schedule. The goal is to monitor risks and reduce the level of accidents to improve employee satisfaction and save on insurance premiums.
To reach out goal we are building a solution with speed and batch layers and putting ML algorithms to take care of pattern generation and matching. Risk patterns shall be refreshed on a daily bases by the backend process. Matching needs to be integrated into speed—layer and respond as quickly as possible (within few seconds).
Following data can be collected for risk pattern generation and matching:
- Real-time data from trucks (shift duration, time of last break, duration of last break, number of past breaks in shift, total shift mileage, day of the week, etc)
- Statistics from past driver and truck accidents,
- Road traffic levels,
- Local weather conditions, and
- Personal information of the drivers like age, gender, total driver mileage, number of past accidents, etc.
From ML perspective problem could be resolved by applying Random Forest, Neural Networks, or Nearest Neighbor algorithms. For this case I’ve chosen to use Nearest Neighbor (k-means) for its simplicity but best approach in real life would be experimenting with all three to pick best one from speed and quality perspective.
We need to have mobile endpoints in each truck that would collect the required data and stream it to the backend. On the backend we need to have speed-processing layer to ingest the stream and match the pattern, and also batch-processing layer for long term storage, pattern generation and analytical dashboards.
The following would be the conceptual architecture of the solution.
To fit our open-source philosophy, mobile endpoint devices must be widely available, easily customizable, support required interfaces, open for new sensors / interfaces and finally cheap to implement. A very good fit would be Raspberry PI that runs Linux with total cost for one endpoint to be under US$ 100. That would include device itself, plastic box, GPS and other sensors, 3G modem and wiring.
// due to intensive assignment that eats most of the weekends too I was not able to finish this blog but will definitely get back to it soon. If you find this interesting and would like to have a chat and get more details on implementation please feel free to reach out via my LinkedIn page https://www.linkedin.com/in/fgurbanov/
More to come…