Fork me on GitHub

pandasNote2

1
2
3
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

重新索引

重新索引不会改变原数据

  • 行索引
    • Series.reindex
    • DF.reindex()
  • 列索引
    • 通过columns关键字指定
1
2
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj
1
2
3
4
5
d    4.5
b 7.2
a -5.3
c 3.6
dtype: float64
1
2
3
# S型数据重新排序索引
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
obj2
1
2
3
4
5
6
a   -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
1
2
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
obj3
1
2
3
4
0      blue
2 purple
4 yellow
dtype: object
1
2
# ffill前项填充:填充的是前一个数值
obj3.reindex(range(6), method='ffill')
1
2
3
4
5
6
7
0      blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object
1
2
3
4
5
# DF重新索引
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
index=['a', 'c', 'd'],
columns=['Ohio', 'Texas', 'California'])
frame
Ohio Texas California
a 0 1 2
c 3 4 5
d 6 7 8
1
2
# DF重新索引
frame.reindex(["a", "b", "c", "d"])
Ohio Texas California
a 0.0 1.0 2.0
b NaN NaN NaN
c 3.0 4.0 5.0
d 6.0 7.0 8.0
1
2
# 重新索引列
frame.reindex(columns=["Ohio", "Utah", "California"])
Ohio Utah California
a 0 NaN 2
c 3 NaN 5
d 6 NaN 8
1
2
3
4
5
# drop等函数默认是就地修改,不改变原有数据
# 使用inplace=True改变原有数据
print(obj)
obj.drop('c', inplace=True)
obj
1
2
3
4
5
6
7
8
9
10
d    4.5
b 7.2
a -5.3
c 3.6
dtype: float64

d 4.5
b 7.2
a -5.3
dtype: float64

舍弃指定轴上的数据

  • drop(index)
  • drop([index1, index2])
1
2
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj
1
2
3
4
5
6
a    0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
1
2
3
# 舍弃一行数据
new_obj = obj.drop('c')
new_obj
1
2
3
4
5
a    0.0
b 1.0
d 3.0
e 4.0
dtype: float64

删除数据

  • 行:axis=0,默认
  • 列:axis=1,或者axis=columns
  • 删除一个通过标签形式
  • 删除多个是传入列表形式
1
2
3
4
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1
2
# 默认是删除行数据
data.drop(['Colorado', 'Ohio'])
one two three four
Utah 8 9 10 11
New York 12 13 14 15
1
2
# axis=1:删除列数据
data.drop('two', axis=1)
one three four
Ohio 0 2 3
Colorado 4 6 7
Utah 8 10 11
New York 12 14 15
1
2
# 删除多列数据
data.drop(['two', 'four'], axis='columns')
one three
Ohio 0 2
Colorado 4 6
Utah 8 10
New York 12 14

选取行数据

  • loc:轴标签
  • iloc:整数索引
1
data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1
2
# 标签索引
data.loc['Colorado', ['two', 'three']]
1
2
3
two      5
three 6
Name: Colorado, dtype: int32
1
2
# 切片形式:前面表示行所用,后面表示列
data.loc[:'Utah', 'two']
1
2
3
4
Ohio        1
Colorado 5
Utah 9
Name: two, dtype: int32
1
2
# 整数数值索引
data.iloc[2, [3, 0, 1]]
1
2
3
4
four    11
one 8
two 9
Name: Utah, dtype: int32
1
data.iloc[[1, 2], [3, 0, 1]]
four one two
Colorado 7 4 5
Utah 11 8 9

整数索引

1
2
ser = pd.Series(np.arange(3.))
ser
1
2
3
4
0    0.0
1 1.0
2 2.0
dtype: float64
1
2
ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])
ser2[-1]
1
2.0
1
2
# 索引不包含末尾
ser[:1]
1
2
0    0.0
dtype: float64
1
ser.loc[:2]
1
2
3
4
0    0.0
1 1.0
2 2.0
dtype: float64
1
ser.iloc[:2]
1
2
3
0    0.0
1 1.0
dtype: float64
1
2
3
4
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1
data[['three', 'one']]
three one
Ohio 2 0
Colorado 6 4
Utah 10 8
New York 14 12
1
data[data['three'] > 5]
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1
data < 5
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
1
2
data[data < 5] = 0
data
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1
data.loc['Colorado', ['two', 'three']]
1
2
3
two      5
three 6
Name: Colorado, dtype: int32
1
2
# 所有行的前三列,再选择大于5的数值
data.iloc[:, :3][data.three > 5]
one two three
Colorado 0 5 6
Utah 8 9 10
New York 12 13 14
Stay Foolish Stay Hungry

本文标题:pandasNote2

发布时间:2019年10月04日 - 20:10

原始链接:http://www.renpeter.cn/2019/10/04/pandasNote2.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea