Fork me on GitHub

pandas实战-填充数据

本文中记录了最近工作在处理数据的时候遇到的一个需求案例:按照指定的需求填充数据。数据是自己模拟的,类似于业务上的数据。

模拟数据

说明

数据

在一个DataFrame数据框中,有time、userid两个字段,分别代表日期和姓名,都有重复值

需求

增加3个字段:二十九、三十、三十一。它们的取值要求如下(取值只有0和1):

  • 如果某个人在29号有登陆,则他的全部记录的二十九字段填充为1,否则为0;
  • 30和31号也是类似的要求

模拟数据

1
2
3
import numpy as np
import pandas as pd
import datetime
1
2
3
4
5
6
df = pd.DataFrame({"time":["2020-05-28","2020-05-28","2020-05-28","2020-05-29","2020-05-29","2020-05-30","2020-05-30","2020-05-31","2020-05-31"],
"userid":["xiaoming","zhangsan","lisi","zhangsan","wangwu","lisi","zhoujun","wangwu","xiaoming"],
"二十九": np.nan * 9,
"三十": np.nan * 9,
"三十一": np.nan * 9,
})
1
df
time userid 二十九 三十 三十一
0 2020-05-28 xiaoming NaN NaN NaN
1 2020-05-28 zhangsan NaN NaN NaN
2 2020-05-28 lisi NaN NaN NaN
3 2020-05-29 zhangsan NaN NaN NaN
4 2020-05-29 wangwu NaN NaN NaN
5 2020-05-30 lisi NaN NaN NaN
6 2020-05-30 zhoujun NaN NaN NaN
7 2020-05-31 wangwu NaN NaN NaN
8 2020-05-31 xiaoming NaN NaN NaN

解决过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
for i in range(len(df)):
if df.loc[i,"time"] == "2020-05-29": # 如果某行记录的time字段是29号
# loc的参数是行索引和列索引
df1 = df[df['userid'].isin([df.loc[i,"userid"]])] # 取出当前用户的全部行记录,用isin()方法判断
for j in df1.index: # j是满足要求用户的行索引index
df.loc[j,"二十九"] = 1 # 将对应索引的二十九字段设置为1

if df.loc[i,"time"] == "2020-05-30":
df1 = df[df['userid'].isin([df.loc[i,"userid"]])]
for j in df1.index:
df.loc[j,"三十"] = 1

if df.loc[i,"time"] == "2020-05-31":
df1 = df[df['userid'].isin([df.loc[i,"userid"]])]
for j in df1.index:
df.loc[j,"三十一"] = 1
1
df
time userid 二十九 三十 三十一
0 2020-05-28 xiaoming NaN NaN 1.0
1 2020-05-28 zhangsan 1.0 NaN NaN
2 2020-05-28 lisi NaN 1.0 NaN
3 2020-05-29 zhangsan 1.0 NaN NaN
4 2020-05-29 wangwu 1.0 NaN 1.0
5 2020-05-30 lisi NaN 1.0 NaN
6 2020-05-30 zhoujun NaN 1.0 NaN
7 2020-05-31 wangwu 1.0 NaN 1.0
8 2020-05-31 xiaoming NaN NaN 1.0
1
2
df1 = df[df['userid'].isin(["zhangsan"])]
df1.index
Int64Index([1, 3], dtype='int64')

其他字段

其余信息直接用fillna方法填充0即可

1
df.fillna(0)
time userid 二十九 三十 三十一
0 2020-05-28 xiaoming 0.0 0.0 1.0
1 2020-05-28 zhangsan 1.0 0.0 0.0
2 2020-05-28 lisi 0.0 1.0 0.0
3 2020-05-29 zhangsan 1.0 0.0 0.0
4 2020-05-29 wangwu 1.0 0.0 1.0
5 2020-05-30 lisi 0.0 1.0 0.0
6 2020-05-30 zhoujun 0.0 1.0 0.0
7 2020-05-31 wangwu 1.0 0.0 1.0
8 2020-05-31 xiaoming 0.0 0.0 1.0

本文标题:pandas实战-填充数据

发布时间:2020年06月04日 - 22:06

原始链接:http://www.renpeter.cn/2020/06/04/pandas%E5%AE%9E%E6%88%98-%E5%A1%AB%E5%85%85%E6%95%B0%E6%8D%AE.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea