Fork me on GitHub

Pandas函数使用-nlargest-nsmallest

nsmallest和nlargest的使用

本文介绍两个函数的使用:nsmallest和nlargest。

官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

1
2
3
4
5
DataFrame.nsmallest(
n, # int类型
columns, # 字段名
keep='first' # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
)

模拟数据

1
2
import pandas as pd
import numpy as np
1
2
3
4
5
6
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
"score":[100,128,100,150,100,145],
"age":[21,25,23,21,25,25],
"height":[1.75,1.8,1.77,1.8,1.9,1.71]
})
df
name score age height
0 xiaosun 100 21 1.75
1 zhoujuan 128 25 1.80
2 xiaozhang 100 23 1.77
3 wangfeng 150 21 1.80
4 xiaoming 100 25 1.90
5 zhangjun 145 25 1.71

nsmallest

默认情况

1
df.nsmallest(2, "score")
name score age height
0 xiaosun 100 21 1.75
2 xiaozhang 100 23 1.77
1
df.nsmallest(4, "score")
name score age height
0 xiaosun 100 21 1.75
2 xiaozhang 100 23 1.77
4 xiaoming 100 25 1.90
1 zhoujuan 128 25 1.80

可以看到默认情况,重复值也会多次计数。

参数keep

1
2
3
# 同上结果,默认first

df.nsmallest(4, "score", keep="first")
name score age height
0 xiaosun 100 21 1.75
2 xiaozhang 100 23 1.77
4 xiaoming 100 25 1.90
1 zhoujuan 128 25 1.80
1
df.nsmallest(4, "score", keep="last")
name score age height
4 xiaoming 100 25 1.90
2 xiaozhang 100 23 1.77
0 xiaosun 100 21 1.75
1 zhoujuan 128 25 1.80

排序的顺序发生了变化,从索引号最大的4开始;

如何理解keep=“all”?

1
df.nsmallest(2, "score")
name score age height
0 xiaosun 100 21 1.75
2 xiaozhang 100 23 1.77

当keep="all"会把全部的信息显示出来:

1
df.nsmallest(2, "score", keep="all")
name score age height
0 xiaosun 100 21 1.75
2 xiaozhang 100 23 1.77
4 xiaoming 100 25 1.90

多个字段取值

1
df.nsmallest(4,["age","height"])
name score age height
0 xiaosun 100 21 1.75
3 wangfeng 150 21 1.80
2 xiaozhang 100 23 1.77
5 zhangjun 145 25 1.71

nlargest

该函数是降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

1
2
3
4
5
DataFrame.nlargest(
n,
columns,
keep='first' # {‘first’, ‘last’, ‘all’}, default ‘first’
)
1
df.nlargest(3,"score")
name score age height
3 wangfeng 150 21 1.80
5 zhangjun 145 25 1.71
1 zhoujuan 128 25 1.80
1
df.nlargest(3,"age")
name score age height
1 zhoujuan 128 25 1.80
4 xiaoming 100 25 1.90
5 zhangjun 145 25 1.71
1
df.nlargest(2,"age",keep="first")
name score age height
1 zhoujuan 128 25 1.8
4 xiaoming 100 25 1.9
1
df.nlargest(2,"age",keep="last")
name score age height
5 zhangjun 145 25 1.71
4 xiaoming 100 25 1.90
1
df.nlargest(2,"age",keep="all")
name score age height
1 zhoujuan 128 25 1.80
4 xiaoming 100 25 1.90
5 zhangjun 145 25 1.71

nlargest + drop_duplicates

实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可

1
df
name score age height
0 xiaosun 100 21 1.75
1 zhoujuan 128 25 1.80
2 xiaozhang 100 23 1.77
3 wangfeng 150 21 1.80
4 xiaoming 100 25 1.90
5 zhangjun 145 25 1.71
1
df["age"].value_counts()
25    3
21    2
23    1
Name: age, dtype: int64

年龄最大为25,且有3位;根据age去重:

1
2
df1 = df.drop_duplicates(subset=["age"], keep="first")
df1
name score age height
0 xiaosun 100 21 1.75
1 zhoujuan 128 25 1.80
2 xiaozhang 100 23 1.77
1
df1.nlargest(2,"age")
name score age height
1 zhoujuan 128 25 1.80
2 xiaozhang 100 23 1.77

本文标题:Pandas函数使用-nlargest-nsmallest

发布时间:2022年08月31日 - 15:08

原始链接:http://www.renpeter.cn/2022/08/31/Pandas%E5%87%BD%E6%95%B0%E4%BD%BF%E7%94%A8-nlargest-nsmallest.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea