A machine learning approach for detection of non-revenue water
Abstract
Water is the second most important commodity for the existence of life on Earth, in Uganda water is used by households, energy production, firefighting and crop irrigation, among others. On the other hand, the existence of non-revenue water in a utility company is caused by leakages or illegal water use practices or presence of unbilled water. Water utility companies are faced with problems of Non- Revenue Water (NRW) losses, caused by illegal use of water by some of their consumers, inaccurate meters, unbilled water use and leakages before and after the meter. A number of research models have been put into place to reduce NRW, such methods in this context, include the use of Machine learning classification techniques to improve on the detection of NRW, hence improving on the quality of service provided by water utility companies and keeping the water tari↵s a↵ordable to citizens. However these research models have been less e↵ective since they work well with balanced data to optimize the overall classification accuracy or related measure, yet real world problem cases are faced with big stumbling blocks of the humongous data and its distribution where the fraudulent cases significantly out-numbered by the normal or healthy cases. This research employed a generalized Machine Learning approach that used un-balanced customer water consumption data patterns to detect illegal water consumers, malfunctioning meters and leakages after the meter, results shows that from the five proposed methods, Random Forest, out-performed all the methods. Therefore Random Forest model is the ideal model for the detection of NRW in a utility company with highly Imbalanced data set and disorganized distribution network.