Note

  • This tutorial is also available on nbviewer, offering an alternative platform for your learning convenience.
  • Pandas Ninja: Take your skills to the next level with comprehensive Jupyter Notebook tutorials covering dates, group by, plotting, pivot tables, and more. Includes specialized tutorials for Excel and SQL developers, helping you master data analysis with Pandas.

Description:

As a data scientist at a facility management company, you’re tasked with analyzing energy consumption patterns across different buildings and floors. The dataset contains energy usage data for various devices (lights, ACs, elevators) on each floor of three buildings.

Your goal is to:

  • Reshape the data to facilitate analysis by building and device type.
  • Calculate the total energy consumption for each building and device type.
# import libraries
import pandas as pd
import numpy as np
import sys
 
print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)
Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
Pandas version 2.2.1
Numpy version 1.26.4

The Data

The generated data simulates energy consumption readings for various devices across different floors in multiple buildings.

Here’s a breakdown of each column:

  • Building: The name of the building (e.g., “Building 1”, “Building 2”, “Building 3”)
  • Floor: The floor number within a building (e.g., “Floor 1”, “Floor 2”, …, “Floor 5”)
  • Device: The type of device consuming energy (e.g., “Lights”, “ACs”, “Elevators”)
  • Date: The date of the energy consumption reading (e.g., “2022-01-01”, “2022-01-02”, …, “2022-01-10”)
  • Energy_Usage: The amount of energy consumed by the device on a specific floor in a building on a particular date (a random value between 0 and 100)
# set the seed
np.random.seed(0)
 
buildings = ['Building 1', 'Building 2', 'Building 3']
floors = ['Floor {}'.format(i) for i in range(1, 4)]
devices = ['Lights', 'ACs', 'Elevators']
dates = pd.date_range('2022-01-01', '2022-01-10')
energy_usage = np.random.uniform(0, 100, size=(len(buildings) * len(floors) * len(devices) * len(dates)))
 
data = {
    'Building': np.repeat(buildings, len(floors) * len(devices) * len(dates)),
    'Floor': np.repeat(floors, len(floors) * len(devices) * len(dates)),
    'Device': np.repeat(devices, len(floors) * len(devices) * len(dates)),
    'Date': np.tile(dates, len(buildings) * len(floors) * len(devices)),
    'Energy_Usage': energy_usage
}
 
df = pd.DataFrame(data)
df
BuildingFloorDeviceDateEnergy_Usage
0Building 1Floor 1Lights2022-01-0154.881350
1Building 1Floor 1Lights2022-01-0271.518937
2Building 1Floor 1Lights2022-01-0360.276338
3Building 1Floor 1Lights2022-01-0454.488318
4Building 1Floor 1Lights2022-01-0542.365480
265Building 3Floor 3Elevators2022-01-0678.515291
266Building 3Floor 3Elevators2022-01-0728.173011
267Building 3Floor 3Elevators2022-01-0858.641017
268Building 3Floor 3Elevators2022-01-096.395527
269Building 3Floor 3Elevators2022-01-1048.562760

270 rows × 5 columns

Reshape the Data

This section shows various ways to reshape the data for analysis.

Using groupby and sum

The code below groups the data by four columns: Building, Floor, Device, and Date. The GroupBy function creates a GroupBy object that contains the grouped data. The sum method is then applied to calculate the sum of the energy consumption values for each group.

df.groupby(['Building', 'Floor', 'Device', 'Date']).sum()
Energy_Usage
BuildingFloorDeviceDate
Building 1Floor 1Lights2022-01-01496.682940
2022-01-02468.639934
2022-01-03547.168084
2022-01-04441.721382
2022-01-05300.984595
2022-01-06349.582977
2022-01-07370.379375
2022-01-08397.945201
2022-01-09569.918664
2022-01-10410.219459
Building 2Floor 2ACs2022-01-01505.124086
2022-01-02445.748871
2022-01-03417.843884
2022-01-04531.061226
2022-01-05504.501020
2022-01-06498.474526
2022-01-07391.778812
2022-01-08490.505067
2022-01-09580.186339
2022-01-10364.295473
Building 3Floor 3Elevators2022-01-01420.827066
2022-01-02398.878300
2022-01-03389.512838
2022-01-04425.916186
2022-01-05440.269651
2022-01-06457.216523
2022-01-07422.345950
2022-01-08371.467637
2022-01-09430.878622
2022-01-10505.755293

Using pivot_table

This line of code uses the pivot_table function to reshape the data. It sets the index as a combination of Building and Floor, and the columns as the different Device types. The values parameter specifies that the energy consumption values should be used, and the aggfunc parameter is set to ‘count’ to count the number of observations for each device type.

df.pivot_table(index=['Building', 'Floor'], columns='Device', values='Energy_Usage', aggfunc='count')
DeviceACsElevatorsLights
BuildingFloor
Building 1Floor 1NaNNaN90.0
Building 2Floor 290.0NaNNaN
Building 3Floor 3NaN90.0NaN

Similar to the previous cell, this code uses pivot_table to reshape the data. It sets the index as the Date column, and the columns as the different Device types. The values parameter specifies that the energy consumption values should be used, and the aggfunc parameter is set to ‘sum’ to calculate the total energy consumption for each device type on each date.

df.pivot_table(index=['Date'], columns='Device', values='Energy_Usage', aggfunc='sum')
DeviceACsElevatorsLights
Date
2022-01-01505.124086420.827066496.682940
2022-01-02445.748871398.878300468.639934
2022-01-03417.843884389.512838547.168084
2022-01-04531.061226425.916186441.721382
2022-01-05504.501020440.269651300.984595
2022-01-06498.474526457.216523349.582977
2022-01-07391.778812422.345950370.379375
2022-01-08490.505067371.467637397.945201
2022-01-09580.186339430.878622569.918664
2022-01-10364.295473505.755293410.219459

What other ways can you come up with to reshape the data?