Fork me on GitHub

Pandas索引排序详解

索引排序-sort_index

针对Pandas中索引的排序功能介绍,详细内容参考官网:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html

参数介绍

1
2
3
4
5
6
7
8
9
DataFrame.sort_index(axis=0,
level=None,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
sort_remaining=True,
ignore_index=False,
key=None)

参数说明:

  • axis:排序的轴:axis=0表示行,axis=1表示列
  • level:如果是多层索引的排序,表示根据指定的索引进行排序,可以是索引号,名称或者多个索引组成的列表
  • ascending:排序规则,默认是升序
  • inplace:表示是否原地修改;默认是False
  • kind:表示选的排序算法
  • na_position:空值的位置选择,first或者last。默认是last
  • sort_remaining:

数据模拟

1
2
import pandas as pd
import numpy as np
1
2
3
4
5
6
7
8
df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
"age":[24,20,19,28],
"Math":[100,120,80,150],
"address":["beijing","shanghai","shenzhen","guangzhou"]
},
index=[np.nan,2,0,1]) # 存在空值

df
name age Math address
NaN Jimmy 24 100 beijing
2.0 Ana 20 120 shanghai
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou

参数axis

1
2
# df.sort_index()  默认
df.sort_index(axis=0)
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing

默认是在axis=0轴上进行排序;且默认是升序排列

1
df.sort_index(axis=1)
Math address age name
NaN 100 beijing 24 Jimmy
2.0 120 shanghai 20 Ana
0.0 80 shenzhen 19 Tom
1.0 150 guangzhou 28 John

axis=1表示在列方向上进行排序;上面的列字段全部是字母,则根据它们的ASCII码表的大小来排序

参数ignore_index

默认情况是保留原索引。如果是设置成True,则行索引变成0,1,2…N-1

1
2
# 默认情况
df.sort_index(axis=1,ignore_index=False)
Math address age name
NaN 100 beijing 24 Jimmy
2.0 120 shanghai 20 Ana
0.0 80 shenzhen 19 Tom
1.0 150 guangzhou 28 John
1
df.sort_index(axis=1,ignore_index=True)
Math address age name
0 100 beijing 24 Jimmy
1 120 shanghai 20 Ana
2 80 shenzhen 19 Tom
3 150 guangzhou 28 John

参数key

可选项,如果不是空值,则在排序之前现将key函数作用于指定的索引上,再进行排序。

1
df.sort_index(axis=1)  # 默认axis=1
Math address age name
NaN 100 beijing 24 Jimmy
2.0 120 shanghai 20 Ana
0.0 80 shenzhen 19 Tom
1.0 150 guangzhou 28 John
1
df.sort_index(axis=1, key=lambda x: x.str.lower())
address age Math name
NaN beijing 24 100 Jimmy
2.0 shanghai 20 120 Ana
0.0 shenzhen 19 80 Tom
1.0 guangzhou 28 150 John

当指定了key函数:将列属性全部小写;此时Math变成了math。

后面排序的话,也就是根据全部小写的字段进行排序,所以Math会在name的前面。

参数ascending

1
df.sort_index()
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing
1
2
# df.sort_index()  默认情况:升序
df.sort_index(ascending=True)
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing
1
df.sort_index(ascending=False)  # 设置成降序
name age Math address
2.0 Ana 20 120 shanghai
1.0 John 28 150 guangzhou
0.0 Tom 19 80 shenzhen
NaN Jimmy 24 100 beijing

参数inplace

inplace的作用是用来直接修改原数据还是生成新的数据。

如果是True,则表示原地修改,即原数据直接改变。

为了演示的方便,先生成一个df的副本df1,对df1直接操作:

1
2
df1 = df.copy()
df1
name age Math address
NaN Jimmy 24 100 beijing
2.0 Ana 20 120 shanghai
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
1
2
3
# 默认是False

df1.sort_index(inplace=False)
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing

此时df1是没有改变的:

1
df1
name age Math address
NaN Jimmy 24 100 beijing
2.0 Ana 20 120 shanghai
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
1
df1.sort_index(inplace=True)  # 原地修改

如果设置成True,此时df1已经完成了排序工作:

1
df1
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing

参数kind

kind表示排序选择的算法:{‘quicksort’, ‘mergesort’, ‘heapsort’},默认是’quicksort‘。

  • ‘quicksort’:快速排序
  • ‘mergesort’:合并排序
  • ‘heapsort’:堆排序
1
df.sort_index()
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing
1
df.sort_index(kind="mergesort")
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing

参数na_position

空值的位置选择,first或者last。默认是last

1
df.sort_index()
name age Math address
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai
NaN Jimmy 24 100 beijing
1
df.sort_index(na_position="first")
name age Math address
NaN Jimmy 24 100 beijing
0.0 Tom 19 80 shenzhen
1.0 John 28 150 guangzhou
2.0 Ana 20 120 shanghai

参数sort_remaining

如果为 true 且按级别和索引排序是多层,则按指定级别排序后也按其他级别(按顺序)排序

1
2
3
4
5
6
7
8
# 一个来自官网的例子

arrays = [np.array(['qux', 'qux', 'foo', 'foo',
'baz', 'baz', 'bar', 'bar']),
np.array(['two', 'one', 'two', 'one',
'two', 'one', 'two', 'one'])]
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=arrays)
s
qux  two    1
     one    2
foo  two    3
     one    4
baz  two    5
     one    6
bar  two    7
     one    8
dtype: int64
1
s.sort_index(level=1, sort_remaining=True)  # 默认True
bar  one    8
baz  one    6
foo  one    4
qux  one    2
bar  two    7
baz  two    5
foo  two    3
qux  two    1
dtype: int64
1
s.sort_index(level=1, sort_remaining=False)
qux  one    2
foo  one    4
baz  one    6
bar  one    8
qux  two    1
foo  two    3
baz  two    5
bar  two    7
dtype: int64

参数level

1
2
3
4
5
6
7
8
9
10
df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
"age":[24,20,19,28],
"Math":[100,120,80,150],
"address":["beijing","shanghai","shenzhen","guangzhou"]
},
index=[[np.nan,2,0,1], # 创建多层索引的DataFrame
[4,5,8,1]
])

df
name age Math address
NaN 4 Jimmy 24 100 beijing
2 5 Ana 20 120 shanghai
0 8 Tom 19 80 shenzhen
1 1 John 28 150 guangzhou

可以看到df是多层索引:

1
df.index
MultiIndex([(nan, 4),
            (2.0, 5),
            (0.0, 8),
            (1.0, 1)],
           )
1
df.sort_index(level=0)
name age Math address
NaN 4 Jimmy 24 100 beijing
0 8 Tom 19 80 shenzhen
1 1 John 28 150 guangzhou
2 5 Ana 20 120 shanghai
1
df.sort_index(level=1)
name age Math address
1 1 John 28 150 guangzhou
NaN 4 Jimmy 24 100 beijing
2 5 Ana 20 120 shanghai
0 8 Tom 19 80 shenzhen

本文标题:Pandas索引排序详解

发布时间:2022年08月31日 - 13:08

原始链接:http://www.renpeter.cn/2022/08/31/Pandas%E7%B4%A2%E5%BC%95%E6%8E%92%E5%BA%8F%E8%AF%A6%E8%A7%A3.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea