Pandas操作技巧
pandas中有很多的操作技巧需要我们去挖掘,本文中介绍pandas
中的3
种操作奇技淫巧,让pandas
的操作骚动起来。
- 剪贴板中的数据生成
DF
型数据 - 通过数据类型直接选择
columns
- 将
strings
转成numbers
导入库
1 | import pandas as pd |
读取excel表格
通过read_excel()函数
1 | df1 = pd.read_excel("salesfunnel.xlsx") |
Account | Name | Rep | Manager | Product | Quantity | Price | Status | |
---|---|---|---|---|---|---|---|---|
0 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | CPU | 1 | 30000 | presented |
1 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Software | 1 | 10000 | presented |
2 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Maintenance | 2 | 5000 | pending |
3 | 737550 | Fritsch, Russel and Anderson | Craig Booker | Debra Henley | CPU | 1 | 35000 | declined |
4 | 146832 | Kiehn-Spinka | Daniel Hilton | Debra Henley | CPU | 2 | 65000 | won |
5 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | CPU | 2 | 40000 | pending |
6 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | Software | 1 | 10000 | presented |
7 | 412290 | Jerde-Hilpert | John Smith | Debra Henley | Maintenance | 2 | 5000 | pending |
8 | 740150 | Barton LLC | John Smith | Debra Henley | CPU | 1 | 35000 | declined |
9 | 141962 | Herman LLC | Cedric Moss | Fred Anderson | CPU | 2 | 65000 | won |
10 | 163416 | Purdy-Kunde | Cedric Moss | Fred Anderson | CPU | 1 | 30000 | presented |
11 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Maintenance | 1 | 5000 | pending |
12 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Software | 1 | 10000 | presented |
13 | 307599 | Kassulke, Ondricka and Metz | Wendy Yule | Fred Anderson | Maintenance | 3 | 7000 | won |
14 | 688981 | Keeling LLC | Wendy Yule | Fred Anderson | CPU | 5 | 100000 | won |
15 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | CPU | 2 | 65000 | declined |
16 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | Monitor | 2 | 5000 | presented |
通过read_clipboard()
这个方法是将我们剪贴板中的内容直接变成DataFrame型数据,不需要其他转换。
我们需要事先在表格中剪贴好数据,然后直接运行下面的代码:
1 | # 现在剪贴板中进行赋值,再执行下面的语句 |
Account | Name | Rep | Manager | Product | Quantity | Price | Status | |
---|---|---|---|---|---|---|---|---|
0 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | CPU | 1 | 30000 | presented |
1 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Software | 1 | 10000 | presented |
2 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Maintenance | 2 | 5000 | pending |
3 | 737550 | Fritsch, Russel and Anderson | Craig Booker | Debra Henley | CPU | 1 | 35000 | declined |
4 | 146832 | Kiehn-Spinka | Daniel Hilton | Debra Henley | CPU | 2 | 65000 | won |
5 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | CPU | 2 | 40000 | pending |
6 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | Software | 1 | 10000 | presented |
7 | 412290 | Jerde-Hilpert | John Smith | Debra Henley | Maintenance | 2 | 5000 | pending |
8 | 740150 | Barton LLC | John Smith | Debra Henley | CPU | 1 | 35000 | declined |
9 | 141962 | Herman LLC | Cedric Moss | Fred Anderson | CPU | 2 | 65000 | won |
10 | 163416 | Purdy-Kunde | Cedric Moss | Fred Anderson | CPU | 1 | 30000 | presented |
11 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Maintenance | 1 | 5000 | pending |
12 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Software | 1 | 10000 | presented |
13 | 307599 | Kassulke, Ondricka and Metz | Wendy Yule | Fred Anderson | Maintenance | 3 | 7000 | won |
14 | 688981 | Keeling LLC | Wendy Yule | Fred Anderson | CPU | 5 | 100000 | won |
15 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | CPU | 2 | 65000 | declined |
16 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | Monitor | 2 | 5000 | presented |
通过数据类型选择columns
1 | df3 = pd.DataFrame({'name':['xiaoming','zhansan','lisi'], |
name | age | address | height | |
---|---|---|---|---|
0 | xiaoming | 22 | shenzhen | 178 |
1 | zhansan | 28 | guangzhou | 180 |
2 | lisi | 25 | changsha | 176 |
1 | df3.dtypes |
name object
age int64
address object
height int64
dtype: object
1 | # 我们直接选择int64型的数据 |
1 | df3.select_dtypes(include='int64') |
age | height | |
---|---|---|
0 | 22 | 178 |
1 | 28 | 180 |
2 | 25 | 176 |
1 | # 选择除了int64之外的数据 |
name | address | |
---|---|---|
0 | xiaoming | shenzhen |
1 | zhansan | guangzhou |
2 | lisi | changsha |
1 | # 同时选择多种数据类型 |
name | age | address | height | |
---|---|---|---|---|
0 | xiaoming | 22 | shenzhen | 178 |
1 | zhansan | 28 | guangzhou | 180 |
2 | lisi | 25 | changsha | 176 |
将strings转成numbers
两种方法将字符串改成数值型数据
- 指定类型用:astype()
- to_numeric()
1 | df4 = pd.DataFrame({'goods':['A','B','C'], |
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50 |
1 | B | 20 | - |
2 | C | 60 | 40 |
1 | df4.dtypes |
goods object
price object
sales object
dtype: object
1 | # 将price转成int型 |
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50 |
1 | B | 20 | - |
2 | C | 60 | 40 |
1 | df4.dtypes |
goods object
price int64
sales object
dtype: object
使用同样的方法转化sales列数据则会报错:
1 | df4['sales'] = pd.to_numeric(df4['sales'], errors='coerce') |
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50.0 |
1 | B | 20 | NaN |
2 | C | 60 | 40.0 |
1 | df4.dtypes |
goods object
price int64
sales float64
dtype: object