重新排列表格型数据的基础运算称之为重塑
reshape
或者轴向旋转pivot
stack:
将数据的列旋转成行,AB由列属性变成行索引unstack:
将数据的行旋转成列,AB由行索引变成列属性
重点知识
- stack和unstack的用法
- 如何实现行和列的位置互换
层次化索引 MultiIndex
- 数据分散在不同的文件或者数据库中
- 层次化索引在⼀个轴上拥有多个(两个以上)索引级别
- 低维度形式处理高维度数据
1 | import pandas as pd |
1 | data = pd.DataFrame(np.arange(6).reshape(2,3) |
number | one | two | three |
---|---|---|---|
state | |||
Inhio | 0 | 1 | 2 |
Colorado | 3 | 4 | 5 |
列行的互转
1 | res = data.stack() # 列转成行 |
1 | state number |
1 | res.index |
1 | MultiIndex(levels=[['Inhio', 'Colorado'], ['one', 'two', 'three']], |
1 | type(res) # res 是S型对象 |
1 | pandas.core.series.Series |
1 | res.unstack() # 行转成列:one、two、three 变成列属性 |
number | one | two | three |
---|---|---|---|
state | |||
Inhio | 0 | 1 | 2 |
Colorado | 3 | 4 | 5 |
1 | res.unstack(1) # 默认操作最内层 |
number | one | two | three |
---|---|---|---|
state | |||
Inhio | 0 | 1 | 2 |
Colorado | 3 | 4 | 5 |
实现行索引和列属性的位置互换
1 | res.unstack(0) # 实现了将行索引和列属性的位置互换 |
state | Inhio | Colorado |
---|---|---|
number | ||
one | 0 | 3 |
two | 1 | 4 |
three | 2 | 5 |
1 | s1 = pd.Series([0,1,2,3], index=['a','b','c','d']) |
1 | data1 = pd.concat([s1,s2], keys=['one', 'two']) |
1 | one a 0 |
1 | data1.unstack() # 行索引转成列属性,unstack引入缺失值 |
a | b | c | d | e | |
---|---|---|---|---|---|
one | 0.0 | 1.0 | 2.0 | 3.0 | NaN |
two | NaN | NaN | 4.0 | 5.0 | 6.0 |
1 | data1.unstack().stack() # 操作可逆,stack操作默认过滤缺失值 |
1 | one a 0.0 |
1 | data1.unstack().stack(dropna=False) # 通过dropna参数保留缺失值 |
1 | one a 0.0 |
官网demo
1 | tuples = list(zip(*[['bar', 'bar', 'baz', 'baz', |
1 | [('bar', 'one'), |
zip
用法
1 | zip(*[['bar', 'bar', 'baz', 'baz', |
1 | <zip 0x1d68ccb0d08> |
1 | list(zip(*[["name", "age", "fee"], # zip函数的用法 |
1 | [('name', 'xiaoming'), ('age', 18), ('fee', '28.22')] |
1 | index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) # 错层次索引如何创建 |
1 | # index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) |
1 | MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], |
1 | df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B']) |
A | B | ||
---|---|---|---|
first | second | ||
bar | one | -0.904164 | 0.001987 |
two | -0.299222 | -1.895165 | |
baz | one | 2.029077 | -0.618868 |
two | -3.423209 | 0.621895 | |
foo | one | 0.278142 | -0.029705 |
two | -1.875680 | -1.225554 | |
qux | one | 0.593413 | -1.241949 |
two | -0.727231 | 0.394393 |
1 | df2 = df[:4] |
A | B | ||
---|---|---|---|
first | second | ||
bar | one | -0.904164 | 0.001987 |
two | -0.299222 | -1.895165 | |
baz | one | 2.029077 | -0.618868 |
two | -3.423209 | 0.621895 |
1 | stacked = df2.stack() # 列---->行 |
1 | first second |
1 | stacked.unstack() # 行----->列 |
first | bar | baz | |
---|---|---|---|
second | |||
one | A | -0.904164 | 2.029077 |
B | 0.001987 | -0.618868 | |
two | A | -0.299222 | -3.423209 |
B | -1.895165 | 0.621895 |
1 | stacked.unstack(1) |
second | one | two | |
---|---|---|---|
first | |||
bar | A | -0.904164 | -0.299222 |
B | 0.001987 | -1.895165 | |
baz | A | 2.029077 | -3.423209 |
B | -0.618868 | 0.621895 |
关于unstack的数字标签
unstack(1) =unstack("second")
,默认是最里层标签unstack(0) =unstack("first")
pivot
本质
DF的pivot
本质上就是set_index
先创建层次化索引,再利用unstack
进行重塑。
左边的表格类似于是
Excel
或者MySQL
中的存储形式,通过轴向转换变成右边的DataFrame
型数据。
1 | df[df['bar'] == 'A'] # select out everything for variable A |
官网demo
1 | In [1]: df |