nsmallest和nlargest的使用
本文介绍两个函数的使用:nsmallest和nlargest。
官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html
1 | DataFrame.nsmallest( |
模拟数据
1 | import pandas as pd |
1 | df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"], |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
3 | wangfeng | 150 | 21 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
nsmallest
默认情况
1 | df.nsmallest(2, "score") |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
1 | df.nsmallest(4, "score") |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
1 | zhoujuan | 128 | 25 | 1.80 |
可以看到默认情况,重复值也会多次计数。
参数keep
1 | # 同上结果,默认first |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
1 | zhoujuan | 128 | 25 | 1.80 |
1 | df.nsmallest(4, "score", keep="last") |
name | score | age | height | |
---|---|---|---|---|
4 | xiaoming | 100 | 25 | 1.90 |
2 | xiaozhang | 100 | 23 | 1.77 |
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
排序的顺序发生了变化,从索引号最大的4开始;
如何理解keep=“all”?
1 | df.nsmallest(2, "score") |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
当keep="all"会把全部的信息显示出来:
1 | df.nsmallest(2, "score", keep="all") |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
多个字段取值
1 | df.nsmallest(4,["age","height"]) |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
3 | wangfeng | 150 | 21 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
5 | zhangjun | 145 | 25 | 1.71 |
nlargest
该函数是降序排列
1 | DataFrame.nlargest( |
1 | df.nlargest(3,"score") |
name | score | age | height | |
---|---|---|---|---|
3 | wangfeng | 150 | 21 | 1.80 |
5 | zhangjun | 145 | 25 | 1.71 |
1 | zhoujuan | 128 | 25 | 1.80 |
1 | df.nlargest(3,"age") |
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
1 | df.nlargest(2,"age",keep="first") |
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.8 |
4 | xiaoming | 100 | 25 | 1.9 |
1 | df.nlargest(2,"age",keep="last") |
name | score | age | height | |
---|---|---|---|---|
5 | zhangjun | 145 | 25 | 1.71 |
4 | xiaoming | 100 | 25 | 1.90 |
1 | df.nlargest(2,"age",keep="all") |
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
nlargest + drop_duplicates
实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可
1 | df |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
3 | wangfeng | 150 | 21 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
1 | df["age"].value_counts() |
25 3
21 2
23 1
Name: age, dtype: int64
年龄最大为25,且有3位;根据age去重:
1 | df1 = df.drop_duplicates(subset=["age"], keep="first") |
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
1 | df1.nlargest(2,"age") |
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |