Fork me on GitHub

pandas技巧_剪贴板生成DF+数据转化+字符串转数值

Pandas操作技巧

pandas中有很多的操作技巧需要我们去挖掘,本文中介绍pandas中的3种操作奇技淫巧,让pandas的操作骚动起来。

  • 剪贴板中的数据生成DF型数据
  • 通过数据类型直接选择columns
  • strings转成numbers

导入库

1
2
import pandas as pd
import numpy as np

读取excel表格

通过read_excel()函数

1
2
df1 = pd.read_excel("salesfunnel.xlsx")
df1
Account Name Rep Manager Product Quantity Price Status
0 714466 Trantow-Barrows Craig Booker Debra Henley CPU 1 30000 presented
1 714466 Trantow-Barrows Craig Booker Debra Henley Software 1 10000 presented
2 714466 Trantow-Barrows Craig Booker Debra Henley Maintenance 2 5000 pending
3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley CPU 1 35000 declined
4 146832 Kiehn-Spinka Daniel Hilton Debra Henley CPU 2 65000 won
5 218895 Kulas Inc Daniel Hilton Debra Henley CPU 2 40000 pending
6 218895 Kulas Inc Daniel Hilton Debra Henley Software 1 10000 presented
7 412290 Jerde-Hilpert John Smith Debra Henley Maintenance 2 5000 pending
8 740150 Barton LLC John Smith Debra Henley CPU 1 35000 declined
9 141962 Herman LLC Cedric Moss Fred Anderson CPU 2 65000 won
10 163416 Purdy-Kunde Cedric Moss Fred Anderson CPU 1 30000 presented
11 239344 Stokes LLC Cedric Moss Fred Anderson Maintenance 1 5000 pending
12 239344 Stokes LLC Cedric Moss Fred Anderson Software 1 10000 presented
13 307599 Kassulke, Ondricka and Metz Wendy Yule Fred Anderson Maintenance 3 7000 won
14 688981 Keeling LLC Wendy Yule Fred Anderson CPU 5 100000 won
15 729833 Koepp Ltd Wendy Yule Fred Anderson CPU 2 65000 declined
16 729833 Koepp Ltd Wendy Yule Fred Anderson Monitor 2 5000 presented

通过read_clipboard()

这个方法是将我们剪贴板中的内容直接变成DataFrame型数据,不需要其他转换。

我们需要事先在表格中剪贴好数据,然后直接运行下面的代码:

1
2
3
4
# 现在剪贴板中进行赋值,再执行下面的语句

df2 = pd.read_clipboard()
df2
Account Name Rep Manager Product Quantity Price Status
0 714466 Trantow-Barrows Craig Booker Debra Henley CPU 1 30000 presented
1 714466 Trantow-Barrows Craig Booker Debra Henley Software 1 10000 presented
2 714466 Trantow-Barrows Craig Booker Debra Henley Maintenance 2 5000 pending
3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley CPU 1 35000 declined
4 146832 Kiehn-Spinka Daniel Hilton Debra Henley CPU 2 65000 won
5 218895 Kulas Inc Daniel Hilton Debra Henley CPU 2 40000 pending
6 218895 Kulas Inc Daniel Hilton Debra Henley Software 1 10000 presented
7 412290 Jerde-Hilpert John Smith Debra Henley Maintenance 2 5000 pending
8 740150 Barton LLC John Smith Debra Henley CPU 1 35000 declined
9 141962 Herman LLC Cedric Moss Fred Anderson CPU 2 65000 won
10 163416 Purdy-Kunde Cedric Moss Fred Anderson CPU 1 30000 presented
11 239344 Stokes LLC Cedric Moss Fred Anderson Maintenance 1 5000 pending
12 239344 Stokes LLC Cedric Moss Fred Anderson Software 1 10000 presented
13 307599 Kassulke, Ondricka and Metz Wendy Yule Fred Anderson Maintenance 3 7000 won
14 688981 Keeling LLC Wendy Yule Fred Anderson CPU 5 100000 won
15 729833 Koepp Ltd Wendy Yule Fred Anderson CPU 2 65000 declined
16 729833 Koepp Ltd Wendy Yule Fred Anderson Monitor 2 5000 presented

通过数据类型选择columns

1
2
3
4
5
6
df3 = pd.DataFrame({'name':['xiaoming','zhansan','lisi'],
'age':[22,28,25],
'address':['shenzhen','guangzhou','changsha'],
'height':[178,180,176]
})
df3
name age address height
0 xiaoming 22 shenzhen 178
1 zhansan 28 guangzhou 180
2 lisi 25 changsha 176
1
df3.dtypes
name       object
age         int64
address    object
height      int64
dtype: object
1
# 我们直接选择int64型的数据
1
df3.select_dtypes(include='int64')
age height
0 22 178
1 28 180
2 25 176
1
2
3
# 选择除了int64之外的数据

df3.select_dtypes(exclude='int64')
name address
0 xiaoming shenzhen
1 zhansan guangzhou
2 lisi changsha
1
2
3
# 同时选择多种数据类型

df3.select_dtypes(include=['int64','object'])
name age address height
0 xiaoming 22 shenzhen 178
1 zhansan 28 guangzhou 180
2 lisi 25 changsha 176

将strings转成numbers

两种方法将字符串改成数值型数据

  • 指定类型用:astype()
  • to_numeric()
1
2
3
4
5
df4 = pd.DataFrame({'goods':['A','B','C'],
'price':['30','20','60'],
'sales':['50','-','40']}
)
df4
goods price sales
0 A 30 50
1 B 20 -
2 C 60 40
1
df4.dtypes
goods    object
price    object
sales    object
dtype: object
1
2
3
4
5
# 将price转成int型

# 等价: df4 = df4.astype({'price': 'int'})
df4['price'] = df4['price'].astype('int64')
df4
goods price sales
0 A 30 50
1 B 20 -
2 C 60 40
1
df4.dtypes
goods    object
price     int64
sales    object
dtype: object

使用同样的方法转化sales列数据则会报错:

1
2
df4['sales'] = pd.to_numeric(df4['sales'], errors='coerce')
df4
goods price sales
0 A 30 50.0
1 B 20 NaN
2 C 60 40.0
1
df4.dtypes
goods     object
price      int64
sales    float64
dtype: object

本文标题:pandas技巧_剪贴板生成DF+数据转化+字符串转数值

发布时间:2020年10月19日 - 20:10

原始链接:http://www.renpeter.cn/2020/10/19/pandas%E6%8A%80%E5%B7%A7-%E5%89%AA%E8%B4%B4%E6%9D%BF%E7%94%9F%E6%88%90DF-%E6%95%B0%E6%8D%AE%E8%BD%AC%E5%8C%96-%E5%AD%97%E7%AC%A6%E4%B8%B2%E8%BD%AC%E6%95%B0%E5%80%BC.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea