Note
- This tutorial is also available on nbviewer, offering an alternative platform for your learning convenience.
- đ„ Free Pandas Course: https://hedaro.gumroad.com/l/tqqfq
This Notebook will cover all the techniques to quickly sort your dataframe in Pandas. Whether you are looking to sort one or multiple columns. Or whether you are looking to order your data in an ascending or descending fashion, I have you covered.
Letâs start by importing our libraries.
Below is the version of Python and Pandas I am currently on.
Letâs get a dataframe up and running. We will be using a Python dictionary to generate some dummy data for this lesson. If you are new to the Pandas world and are not yet comfortable with dataframes, please take a look at my post, how to create a dataframe before you continue this lesson.
After we print the dataframe, we will get two columns. The Rev columns are short for Revenue. So we have essentially two sets of revenue numbers.
Rev | Rev2 | |
---|---|---|
0 | 234 | 345 |
1 | 345 | 67 |
2 | 7 | 3 |
3 | 345 | 88 |
4 | 3 | 4 |
Pandas Sort by Column
If you are familiar with Pandas, you will notice the dataframe method called sort is no longer available. I believe sort was deprecated back on Pandas version 0.17 and was replaced with sort_values.
sort_values requires you to pass in the by parameter. This parameter is used to tell Pandas by which column(s) do you want to order by.
The good news is that we can pass in a single string or a list of strings that represent column names.
If you wanted to sort by the Rev column, you would simply pass in a string named âRevâ as shown below. If you look at the index of the dataframe, you will notice it has changed. Instead of showing 0, 1, 2, 3, and 4, it shows a different order. And if you havenât picked up the pattern, the data has been sorted ascending (lowest to largest).
Rev | Rev2 | |
---|---|---|
4 | 3 | 4 |
2 | 7 | 3 |
0 | 234 | 345 |
1 | 345 | 67 |
3 | 345 | 88 |
Pandas sorts your data ascending by default. But we can use the parameter called ascending to order the data from largest to smallest or descending.
- ascending = True = smallest to largest
- ascending = False = largest to smallest
Rev | Rev2 | |
---|---|---|
1 | 345 | 67 |
3 | 345 | 88 |
0 | 234 | 345 |
2 | 7 | 3 |
4 | 3 | 4 |
A cool trick you are also able to do is set the ascending parameter to equal a boolean value. We can use zeros and ones instead of True or False.
- ascending =Â 1Â = smallest to largest
- ascending =Â 0Â = largest to smallest
It may not make much difference to use 0/1 vs True/False but I just wanted to make you aware of the possibility.
Rev | Rev2 | |
---|---|---|
4 | 3 | 4 |
2 | 7 | 3 |
0 | 234 | 345 |
1 | 345 | 67 |
3 | 345 | 88 |
Sorting with Multiple Columns
We can apply the same technique we have been going over but instead of passing in a string, we will need to pass in a Python list.
Letâs start with the same example as above but letâs pass in the single string as a list. Did you see a difference? No, there wasnât.
Rev | Rev2 | |
---|---|---|
4 | 3 | 4 |
2 | 7 | 3 |
0 | 234 | 345 |
1 | 345 | 67 |
3 | 345 | 88 |
This means we can pass in multiple columns and Pandas will sort by the first and then sort by the second. Yes, I know that adding the second columns didnât really do much for us. But fear not, Pandas really is sorting both columns. The next example will clear things up.
Rev | Rev2 | |
---|---|---|
3 | 345 | 88 |
1 | 345 | 67 |
0 | 234 | 345 |
2 | 7 | 3 |
4 | 3 | 4 |
Notice that the first column as a value that is repeated twice. Yes, it is the value 345. The example above has the second column (Rev2) with values 88 and then 67. Pandas is sorting this column descending.
If we change the sort order on Rev2, you will get different results. We see that we now have 67 followed by 88.
If the first column (Rev) only had unique numbers, then ordering by the second column would be useless
But since the value 345 was there twice. It really didnât matter the order they were placed by Pandas since they are equivalent. So when we changed the sort order for Rev2, we saw a change. Bam! Did you get it? I didnât lose you, right?
Rev | Rev2 | |
---|---|---|
1 | 345 | 67 |
3 | 345 | 88 |
0 | 234 | 345 |
2 | 7 | 3 |
4 | 3 | 4 |
Sorting on a Different Axis
Letâs say we have a dataframe shaped in a slightly different way. We can achieve this by transposing our current dataframe. A dataframe shaped like the one below will not work like the example we just went over.
df.T.sort_values(âRevâ, ascending = 1)Â << This code will fail <<
The reason this will not work is because there is no longer a column called Rev. The columns are now 0, 1, 2, 3, and 4.
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
Rev | 234 | 345 | 7 | 345 | 3 |
Rev2 | 345 | 67 | 3 | 88 | 4 |
As you have imagined, Pandas has a solution for us. And the saving grace is the parameter called axis. by setting this parameter to 1, Pandas will sort based on the row names instead of the column names.
The code below is sorting the data ascending (small to large) but it is sorting the numbers from left to right. Earlier in this lesson we were sorting the data vertically. Remember?
4 | 2 | 0 | 1 | 3 | |
---|---|---|---|---|---|
Rev | 3 | 7 | 234 | 345 | 345 |
Rev2 | 4 | 3 | 345 | 67 | 88 |
Pandas Sort Series
There was also an order method I used to use but this has also been deprecated and we can simply use sort_values instead. In an older version of Pandas order was used for a Pandas series since sort only worked for a dataframe. but like I mentioned, sort_values will work for both a dataframe and a series object.
Donât forget to share this post if you know someone who might enjoy the read.