Note

Description:

You’re a Data Analyst at a tourism board, tasked with analyzing visitor patterns for popular destinations. Your goal is to prepare and transform the data for insights.

Tasks:

  • Task 1: Data Melting
    Transform the dataset from wide format to long format, pivoting on the ‘Quarter’ column.
  • Task 2: Visitor Type Categories
    Create a new column ‘Visitor_Category’ by mapping ‘Destination’ to ‘Local’ (Domestic) or ‘Foreign’ (International).
  • Task 3: Destination Groups
    Create a new column ‘Region’ by grouping destinations into continents: Asia (Tokyo, Bangkok) Europe (Paris) Americas (New York) Oceania (Sydney)
# import libraries
import pandas as pd
import numpy as np
import sys
 
print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)
Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
Pandas version 2.2.1
Numpy version 1.26.4

The Data

The dataset consists of 10,000 entries, representing tourist visits to popular destinations.

Columns:

  • Destination: City visited (Paris, Tokyo, New York, Sydney, Bangkok)
  • Age_Group: Visitor age range (18-24, 25-34, 35-44, 45-54, 55+)
  • Q1, Q2, Q3, Q4: Visitors per quarter
  • Revenue: Revenue generated
# set the seed
np.random.seed(0)
 
# destination options
destinations = ['Paris', 'Tokyo', 'New York', 'Sydney', 'Bangkok']
 
# age groups
age_groups = ['18-24', '25-34', '35-44', '45-54', '55+']
 
# quarter options
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
 
# generate data
data = {
    'Destination': np.random.choice(destinations, size=10000),
    'Age_Group': np.random.choice(age_groups, size=10000),
    'Q1': np.random.randint(1, 100, size=10000),
    'Q2': np.random.randint(1, 100, size=10000),
    'Q3': np.random.randint(1, 100, size=10000),
    'Q4': np.random.randint(1, 100, size=10000),
    'Revenue': np.random.randint(100, 1000, size=10000)
}
 
df = pd.DataFrame(data)
 
df.head()
DestinationAge_GroupQ1Q2Q3Q4Revenue
0Bangkok55+76758597679
1Paris25-3471926983900
2Sydney55+51195512555
3Sydney35-4420406949213
4Sydney25-3474666846227

Let us take a quick look at the data types.

We can see we have out 10,000 rows, no null values, and only the “Destination” and “Age_Group” columns are of type string.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Destination  10000 non-null  object
 1   Age_Group    10000 non-null  object
 2   Q1           10000 non-null  int32 
 3   Q2           10000 non-null  int32 
 4   Q3           10000 non-null  int32 
 5   Q4           10000 non-null  int32 
 6   Revenue      10000 non-null  int32 
dtypes: int32(5), object(2)
memory usage: 351.7+ KB

Task 1: Data Melting

Transform the dataset from wide format to long format, pivoting on the ‘Quarter’ column

This task is asking us to transform the Q1, Q2, Q3, Q4 columns into one column we can call “visitors”. In order to accomplish this, pandas provides us with a method called pd.melt().

Parameters:

  • id_vars - all of the columns to leave alone
  • value_vars - all of the columns you want to melt/pivot
  • var_name - what do you want to call the columns you are pivoting
  • value_name - what do the values under the pivoting columns represent.
# columns to keep
keep_cols = ["Destination","Age_Group","Revenue"]
 
new_df = pd.melt(df, id_vars=keep_cols, var_name="Quarter", value_name="Visitors")
new_df.head()
DestinationAge_GroupRevenueQuarterVisitors
0Bangkok55+679Q176
1Paris25-34900Q171
2Sydney55+555Q151
3Sydney35-44213Q120
4Sydney25-34227Q174

Task 2: Visitor Type Categories

Create a new column ‘Visitor_Category’ by mapping ‘Destination’ to ‘Local’ (Domestic) or ‘Foreign’ (International).

I will first define what countries are Domerstic or International.

  • Domestic = ‘New York’
  • International = ‘Paris’, ‘Tokyo’, ‘Sydney’, ‘Bangkok’
# map the values
mapping = {**dict.fromkeys(['Paris', 'Tokyo', 'Sydney', 'Bangkok'], 'International'), 
           **dict.fromkeys(['New York'], 'Domestic')}
 
# apply the map to the Destination column
new_df['Visitor_Category'] = new_df['Destination'].map(mapping)
new_df.head(8)
DestinationAge_GroupRevenueQuarterVisitorsVisitor_Category
0Bangkok55+679Q176International
1Paris25-34900Q171International
2Sydney55+555Q151International
3Sydney35-44213Q120International
4Sydney25-34227Q174International
5Tokyo45-54438Q152International
6Sydney55+921Q160International
7New York18-24354Q198Domestic

Task 3: Destination Groups

Create a new column ‘Region’ by grouping destinations into continents:

  • Asia (Tokyo, Bangkok)
  • Europe (Paris)
  • Americas (New York)
  • Oceania (Sydney)

We can use the same strategy from Task #2.

# map the values
mapping = {**dict.fromkeys(['Tokyo', 'Bangkok'], 'Asia'), 
           **dict.fromkeys(['Paris'], 'Europe'),
           **dict.fromkeys(['New York'], 'Americas'),
           **dict.fromkeys(['Sydney'], 'Oceania')}
 
# apply the map to the Destination column
new_df['Region'] = new_df['Destination'].map(mapping)
new_df.head()
DestinationAge_GroupRevenueQuarterVisitorsVisitor_CategoryRegion
0Bangkok55+679Q176InternationalAsia
1Paris25-34900Q171InternationalEurope
2Sydney55+555Q151InternationalOceania
3Sydney35-44213Q120InternationalOceania
4Sydney25-34227Q174InternationalOceania

Summary

You just went through a Pandas tutorial that walked you through data analysis of tourism trends. It covered transforming data from wide to long format, creating visitor categories like Domestic and International, and grouping destinations by continent like Asia, Europe, Americas, and Oceania. The tutorial used Pandas functions like pd.melt() and map() to get the job done, giving you insights into tourist visit patterns.

Key Takeaways

  • Data transformation using pd.melt().
  • Creating new columns using mapping.
  • Data categorization using the map() function.