Application of the Data Science methodology to analyze electricity billing data. Case study: Uruguay 2000-2022

Authors

DOI:

https://doi.org/10.15381/risi.v15i1.23544

Keywords:

Principal Components Analysis, Exploratory Data Analysis, K-Means Algorithm, K-NN Algorithm, Machine Learning, Linear Regression

Abstract

The objective of this research is to analyze the electricity billing data in Uruguay during the period 2000-2022, using machine learning algorithms. It is a descriptive-explanatory type of research, and the Data Science methodology is used to achieve the stated objective. The K-Means algorithm is used to group the data according to the types of clients, the K-NN algorithm to generate a model that allows predicting the type of client of the new records, the PCA technique to reduce the dimensionality of the data, prior to the application of the Linear Regression algorithm to obtain a model to predict the total electric energy billed from the new records. The model obtained with K-Means generated a cluster for each type of customer, perfectly grouping the data. The model obtained through K-NN made it possible to predict whether the client was residential or non-residential, with 100% accuracy. Combining the correlation analysis with the PCA analysis, the dimensionality was reduced until only three explanatory variables were obtained. The linear regression model had a high coefficient of determination R2 of 0.981, and the residuals were normally distributed.

Downloads

Download data is not yet available.

Downloads

Published

2022-09-20

Issue

Section

Original Research Articles

How to Cite

[1]
“Application of the Data Science methodology to analyze electricity billing data. Case study: Uruguay 2000-2022”, Rev.Investig.sist.inform., vol. 15, no. 1, pp. 127–138, Sep. 2022, doi: 10.15381/risi.v15i1.23544.