Pattern NameDescriptionPattern Structure
Herd PatternGroup data, apply aggregate functionsgroup = data.groupby(column(s))
result = group[column].agg_func()
or result = group.agg(
new_col1 = pd.NamedAgg(column='column_name',aggfunc=agg_func1),
new_col2 = pd.NamedAgg(column='column_name',aggfunc=agg_func2),
...
)
Cherrypick PatternGroup data, apply aggregate functions, extract top/bottom N rowsgroup = data.groupby(column(s))
result = group.apply(lambda x: x.agg(agg_func).nlargest(n))
Fence PatternGroup data, apply aggregate functions, order results, select top N rowsgroup = data[[column(s)]].groupby(column_name)
result = group.agg(agg_func).sort_values(by=column_name).head(n)
Bin PatternDivide continuous data into discrete categoriesbins = [lower_bound, mid_bound, upper_bound]
labels = [label1, label2, label3]
binned_data = pd.cut(data[column], bins=bins, labels=labels)
Gather PatternApply custom function to each row or columncolumn_wise_result = df['column'].apply(custom_function)
row_wise_result = df.apply(custom_function, axis=1)
Mark PatternCategorize data based on keywords in text columnmapping = {**dict.fromkeys(list_of_keywords1, category1), ...}
def categorize(val): ...
df[new_column] = df[text_column].apply(categorize)
Map PatternMap specific values in a column to new categorical valuesmapping = {**dict.fromkeys([value1, value2, ...], category1), ...}
df[new_column] = df[original_column].map(mapping)
Graze PatternGroup sorted data, apply aggregation function to another columngroup = data.groupby(column(s))
data[new_column] = group[column_to_transform].transform(agg_func)

Herd Pattern

Group data by one or more columns and apply an aggregate function(s) to a selected column.

Pattern Structure:

# 1. Group data
group = data.groupby(column(s))
 
# 2. Apply aggregate function to a column
result = group[column].agg_func() 
 
# 3. Apply aggregate function(s) to a column
result = group.agg(
    new_col1 = pd.NamedAgg(column='column_name',aggfunc=agg_func1),
    new_col2 = pd.NamedAgg(column='column_name',aggfunc=agg_func2),
    ...)

Example Usage:

import pandas as pd
 
patient_data = pd.DataFrame({
    'Diagnosis': ['A', 'B', 'A', 'C', 'B', 'A'],
    'Length of Stay': [5, 3, 7, 2, 4, 6]
})
 
# Get average length of stay by diagnosis
group = patient_data.groupby('Diagnosis')
avg_stay = group['Length of Stay'].mean()
import pandas as pd
 
patient_data = pd.DataFrame({
    'Diagnosis': ['A', 'B', 'A', 'C', 'B', 'A'],
    'Length of Stay': [5, 3, 7, 2, 4, 6]
})
 
# Get multiple statistics for length of stay by diagnosis
group = patient_data.groupby('Diagnosis')
stats = group.agg(
    count_diagnosis = pd.NamedAgg(column='Diagnosis', aggfunc='count'),
    min_len_of_stay = pd.NamedAgg(column='Length of Stay', aggfunc='min'),
    amax_len_of_stay = pd.NamedAgg(column='Length of Stay', aggfunc='max')
)