Polisensio provides mobile air monitoring sensors that measure and track air pollution in urban areas in real-time. The processing of such large amounts of data requires a scalable data processing pipeline which should also provide accurate forecasts and insights to policymakers who in turn make decisions on how to curb pollution.
Collecting, storing, processing and analyzing data from hundreds of mobile sensors is a major challenge for any company.
Traditional statistical models can easily overfit and break when analyzing such large amounts of information.
Complex interactions between different factors such as temperature, humidity, wind direction, and speed can significantly affect the number of dust particles in the air, their speed, and trajectory. In order to provide highly accurate predictions, these factors also need to be taken into account and combined with the data from the mobile sensors.
To model the complex interactions between different environmental factors and air pollution we extracted over 50 new features from the data (combining sensor and weather data).
We then fed this new data into a state of the art machine learning model capable of handling hundreds of thousands of requests per second.
The final model can predict air pollution across a city for the next 24 hours, the next day or up to a week with an error rate of 2.8 particles (for particulate matter of size 2.5 mm) and 9.7 particles (for larger particles) with a response time of under 2 seconds.