Note

  • This tutorial is also available on nbviewer, offering an alternative platform for your learning convenience.
  • Pandas Ninja: Take your skills to the next level with comprehensive Jupyter Notebook tutorials covering dates, group by, plotting, pivot tables, and more. Includes specialized tutorials for Excel and SQL developers, helping you master data analysis with Pandas.

This Notebook will cover all the techniques to quickly sort your dataframe in Pandas. Whether you are looking to sort one or multiple columns. Or whether you are looking to order your data in an ascending or descending fashion, I have you covered.

Let’s start by importing our libraries.

import pandas as pd
import sys

Below is the version of Python and Pandas I am currently on.

print('Python: ' + sys.version.split('|')[0])
print('Pandas: ' + pd.__version__)
Python: 3.11.7 
Pandas: 2.2.1

Let’s get a dataframe up and running. We will be using a Python dictionary to generate some dummy data for this lesson. If you are new to the Pandas world and are not yet comfortable with dataframes, please take a look at my post, how to create a dataframe before you continue this lesson.

After we print the dataframe, we will get two columns. The Rev columns are short for Revenue. So we have essentially two sets of revenue numbers.

df = pd.DataFrame({'Rev':[234,345,7,345,3],
                   'Rev2':[345,67,3,88,4]})
 
df
RevRev2
0234345
134567
273
334588
434

Pandas Sort by Column

If you are familiar with Pandas, you will notice the dataframe method called sort is no longer available. I believe sort was deprecated back on Pandas version 0.17 and was replaced with sort_values.

sort_values requires you to pass in the by parameter. This parameter is used to tell Pandas by which column(s) do you want to order by.

The good news is that we can pass in a single string or a list of strings that represent column names.

If you wanted to sort by the Rev column, you would simply pass in a string named ‘Rev’ as shown below. If you look at the index of the dataframe, you will notice it has changed. Instead of showing 0, 1, 2, 3, and 4, it shows a different order. And if you haven’t picked up the pattern, the data has been sorted ascending (lowest to largest).

df.sort_values('Rev')
RevRev2
434
273
0234345
134567
334588

Pandas sorts your data ascending by default. But we can use the parameter called ascending to order the data from largest to smallest or descending.

  • ascending = True = smallest to largest
  • ascending = False = largest to smallest
df.sort_values('Rev', ascending = False)
RevRev2
134567
334588
0234345
273
434

A cool trick you are also able to do is set the ascending parameter to equal a boolean value. We can use zeros and ones instead of True or False.

  • ascending = 1 = smallest to largest
  • ascending = 0 = largest to smallest

It may not make much difference to use 0/1 vs True/False but I just wanted to make you aware of the possibility.

df.sort_values('Rev', ascending = 1)
RevRev2
434
273
0234345
134567
334588

Sorting with Multiple Columns

We can apply the same technique we have been going over but instead of passing in a string, we will need to pass in a Python list.

Let’s start with the same example as above but let’s pass in the single string as a list. Did you see a difference? No, there wasn’t.

df.sort_values(['Rev'])
RevRev2
434
273
0234345
134567
334588

This means we can pass in multiple columns and Pandas will sort by the first and then sort by the second. Yes, I know that adding the second columns didn’t really do much for us. But fear not, Pandas really is sorting both columns. The next example will clear things up.

df.sort_values(['Rev', 'Rev2'], ascending=[False,False])
RevRev2
334588
134567
0234345
273
434

Notice that the first column as a value that is repeated twice. Yes, it is the value 345. The example above has the second column (Rev2) with values 88 and then 67. Pandas is sorting this column descending.

If we change the sort order on Rev2, you will get different results. We see that we now have 67 followed by 88.

If the first column (Rev) only had unique numbers, then ordering by the second column would be useless

But since the value 345 was there twice. It really didn’t matter the order they were placed by Pandas since they are equivalent. So when we changed the sort order for Rev2, we saw a change. Bam! Did you get it? I didn’t lose you, right?

df.sort_values(['Rev', 'Rev2'], ascending=[False,True])
RevRev2
134567
334588
0234345
273
434

Sorting on a Different Axis

Let’s say we have a dataframe shaped in a slightly different way. We can achieve this by transposing our current dataframe. A dataframe shaped like the one below will not work like the example we just went over.

df.T.sort_values(‘Rev’, ascending = 1) << This code will fail <<

The reason this will not work is because there is no longer a column called Rev. The columns are now 0, 1, 2, 3, and 4.

df.T
01234
Rev23434573453
Rev2345673884

As you have imagined, Pandas has a solution for us. And the saving grace is the parameter called axis. by setting this parameter to 1, Pandas will sort based on the row names instead of the column names.

The code below is sorting the data ascending (small to large) but it is sorting the numbers from left to right. Earlier in this lesson we were sorting the data vertically. Remember?

df.T.sort_values('Rev', ascending = 1, axis=1)
42013
Rev37234345345
Rev2433456788

Pandas Sort Series

There was also an order method I used to use but this has also been deprecated and we can simply use sort_values instead. In an older version of Pandas order was used for a Pandas series since sort only worked for a dataframe. but like I mentioned, sort_values will work for both a dataframe and a series object.

Don’t forget to share this post if you know someone who might enjoy the read.