Note
- This tutorial is also available on nbviewer, offering an alternative platform for your learning convenience.
- đ„ Free Pandas Course: https://hedaro.gumroad.com/l/tqqfq
Description:
Youâre a Data Analyst at a tourism board, tasked with analyzing visitor patterns for popular destinations. Your goal is to prepare and transform the data for insights.
Tasks:
- Task 1: Data Melting
Transform the dataset from wide format to long format, pivoting on the âQuarterâ column. - Task 2: Visitor Type Categories
Create a new column âVisitor_Categoryâ by mapping âDestinationâ to âLocalâ (Domestic) or âForeignâ (International). - Task 3: Destination Groups
Create a new column âRegionâ by grouping destinations into continents: Asia (Tokyo, Bangkok) Europe (Paris) Americas (New York) Oceania (Sydney)
Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
Pandas version 2.2.1
Numpy version 1.26.4
The Data
The dataset consists of 10,000 entries, representing tourist visits to popular destinations.
Columns:
- Destination: City visited (Paris, Tokyo, New York, Sydney, Bangkok)
- Age_Group: Visitor age range (18-24, 25-34, 35-44, 45-54, 55+)
- Q1, Q2, Q3, Q4: Visitors per quarter
- Revenue: Revenue generated
Destination | Age_Group | Q1 | Q2 | Q3 | Q4 | Revenue | |
---|---|---|---|---|---|---|---|
0 | Bangkok | 55+ | 76 | 75 | 85 | 97 | 679 |
1 | Paris | 25-34 | 71 | 92 | 69 | 83 | 900 |
2 | Sydney | 55+ | 51 | 19 | 55 | 12 | 555 |
3 | Sydney | 35-44 | 20 | 40 | 69 | 49 | 213 |
4 | Sydney | 25-34 | 74 | 66 | 68 | 46 | 227 |
Let us take a quick look at the data types.
We can see we have out 10,000 rows, no null values, and only the âDestinationâ and âAge_Groupâ columns are of type string.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Destination 10000 non-null object
1 Age_Group 10000 non-null object
2 Q1 10000 non-null int32
3 Q2 10000 non-null int32
4 Q3 10000 non-null int32
5 Q4 10000 non-null int32
6 Revenue 10000 non-null int32
dtypes: int32(5), object(2)
memory usage: 351.7+ KB
Task 1: Data Melting
Transform the dataset from wide format to long format, pivoting on the âQuarterâ column
This task is asking us to transform the Q1, Q2, Q3, Q4 columns into one column we can call âvisitorsâ. In order to accomplish this, pandas provides us with a method called pd.melt()
.
Parameters:
- id_vars - all of the columns to leave alone
- value_vars - all of the columns you want to melt/pivot
- var_name - what do you want to call the columns you are pivoting
- value_name - what do the values under the pivoting columns represent.
Destination | Age_Group | Revenue | Quarter | Visitors | |
---|---|---|---|---|---|
0 | Bangkok | 55+ | 679 | Q1 | 76 |
1 | Paris | 25-34 | 900 | Q1 | 71 |
2 | Sydney | 55+ | 555 | Q1 | 51 |
3 | Sydney | 35-44 | 213 | Q1 | 20 |
4 | Sydney | 25-34 | 227 | Q1 | 74 |
Task 2: Visitor Type Categories
Create a new column âVisitor_Categoryâ by mapping âDestinationâ to âLocalâ (Domestic) or âForeignâ (International).
I will first define what countries are Domerstic or International.
- Domestic = âNew Yorkâ
- International = âParisâ, âTokyoâ, âSydneyâ, âBangkokâ
Destination | Age_Group | Revenue | Quarter | Visitors | Visitor_Category | |
---|---|---|---|---|---|---|
0 | Bangkok | 55+ | 679 | Q1 | 76 | International |
1 | Paris | 25-34 | 900 | Q1 | 71 | International |
2 | Sydney | 55+ | 555 | Q1 | 51 | International |
3 | Sydney | 35-44 | 213 | Q1 | 20 | International |
4 | Sydney | 25-34 | 227 | Q1 | 74 | International |
5 | Tokyo | 45-54 | 438 | Q1 | 52 | International |
6 | Sydney | 55+ | 921 | Q1 | 60 | International |
7 | New York | 18-24 | 354 | Q1 | 98 | Domestic |
Task 3: Destination Groups
Create a new column âRegionâ by grouping destinations into continents:
- Asia (Tokyo, Bangkok)
- Europe (Paris)
- Americas (New York)
- Oceania (Sydney)
We can use the same strategy from Task #2.
Destination | Age_Group | Revenue | Quarter | Visitors | Visitor_Category | Region | |
---|---|---|---|---|---|---|---|
0 | Bangkok | 55+ | 679 | Q1 | 76 | International | Asia |
1 | Paris | 25-34 | 900 | Q1 | 71 | International | Europe |
2 | Sydney | 55+ | 555 | Q1 | 51 | International | Oceania |
3 | Sydney | 35-44 | 213 | Q1 | 20 | International | Oceania |
4 | Sydney | 25-34 | 227 | Q1 | 74 | International | Oceania |
Summary
You just went through a Pandas tutorial that walked you through data analysis of tourism trends. It covered transforming data from wide to long format, creating visitor categories like Domestic and International, and grouping destinations by continent like Asia, Europe, Americas, and Oceania. The tutorial used Pandas functions like pd.melt()
and map()
to get the job done, giving you insights into tourist visit patterns.
Key Takeaways
- Data transformation using
pd.melt()
. - Creating new columns using mapping.
- Data categorization using the
map()
function.