Pattern Name | Description | Pattern Structure |
---|---|---|
Herd Pattern | Group data, apply aggregate functions | group = data.groupby(column(s)) result = group[column].agg_func() or result = group.agg( new_col1 = pd.NamedAgg(column='column_name',aggfunc=agg_func1), new_col2 = pd.NamedAgg(column='column_name',aggfunc=agg_func2), ... ) |
Cherrypick Pattern | Group data, apply aggregate functions, extract top/bottom N rows | group = data.groupby(column(s)) result = group.apply(lambda x: x.agg(agg_func).nlargest(n)) |
Fence Pattern | Group data, apply aggregate functions, order results, select top N rows | group = data[[column(s)]].groupby(column_name) result = group.agg(agg_func).sort_values(by=column_name).head(n) |
Bin Pattern | Divide continuous data into discrete categories | bins = [lower_bound, mid_bound, upper_bound] labels = [label1, label2, label3] binned_data = pd.cut(data[column], bins=bins, labels=labels) |
Gather Pattern | Apply custom function to each row or column | column_wise_result = df['column'].apply(custom_function) row_wise_result = df.apply(custom_function, axis=1) |
Mark Pattern | Categorize data based on keywords in text column | mapping = {**dict.fromkeys(list_of_keywords1, category1), ...} def categorize(val): ... df[new_column] = df[text_column].apply(categorize) |
Map Pattern | Map specific values in a column to new categorical values | mapping = {**dict.fromkeys([value1, value2, ...], category1), ...} df[new_column] = df[original_column].map(mapping) |
Graze Pattern | Group sorted data, apply aggregation function to another column | group = data.groupby(column(s)) data[new_column] = group[column_to_transform].transform(agg_func) |
Herd Pattern
Group data by one or more columns and apply an aggregate function(s) to a selected column.