pandas系列7-透视表和交叉表

Posted on 2019-10-11 | In pandas , 文档 |

Words count in article: 774 | Reading time ≈ 3

透视表pivot_table是各种电子表格和其他数据分析软件中一种常见的数据分析汇总工具。

根据一个或者多个键对数据进行聚合

根据行和列上的分组键将数据分配到各个矩形区域中

一文看懂pandas的透视表

pandas系列6-重塑reshape

Posted on 2019-10-11 | In pandas , 文档 |

Words count in article: 1.5k | Reading time ≈ 8

重新排列表格型数据的基础运算称之为重塑reshape或者轴向旋转pivot

stack：将数据的列旋转成行，AB由列属性变成行索引
unstack:将数据的行旋转成列，AB由行索引变成列属性

重点知识

stack和unstack的用法
如何实现行和列的位置互换

pandas系列5-分组_groupby

Posted on 2019-10-10 | In pandas , 文档 |

Words count in article: 1.4k | Reading time ≈ 6

groupby 是pandas 中非常重要的一个函数, 主要用于数据聚合和分类计算. 其思想是“split-apply-combine”（拆分 - 应用 - 合并）.

拆分：groupby，按照某个属性column分组，得到的是一个分组之后的对象

应用：对上面的对象使用某个函数，可以是自带的也可以是自己写的函数，通过apply(function)

合并：最终结果是个S型数据

pandas分组和聚合详解

sqlzoo练习17-group by and having

Posted on 2019-10-9 | In MySQL , sqlzoo练习 |

Words count in article: 303 | Reading time ≈ 1

GROUP BYandHAVING

By including a GROUP BY clause functions such as SUM and COUNT are applied to groups of items sharing values. When you specify GROUP BY continent the result is that you get only one row for each different value of continent. All the other columns must be “aggregated” by one of SUM, COUNT …

The HAVING clause allows use to filter the groups which are displayed. The WHERE clause filters rows before the aggregation, the HAVING clause filters after the aggregation.

where 过滤在前
group by 中间
having 过滤在后

Golang之旅20-文件操作

Posted on 2019-10-9 | In go |

Words count in article: 776 | Reading time ≈ 3

文件操作的包是os，主要的方法是Create、Open、OpenFile、Read、ReadAt（定位读取）等

文件读取

read()
bufio读取，按照行读取
ioutil读取，快速

文件写入

write()
bufio
ioutil

pandas系列0-基础操作大全

Posted on 2019-10-8 | In pandas , 文档 |

Words count in article: 468 | Reading time ≈ 2

读取和写入文件

读取	写入
read_csv	to_csv
read_excel	to_excel
read_hdf	to_hdf
read_sql	to_sql
read_json	to_json
read_msgpack (experimental)	to_msgpack (experimental)
read_html	to_html
read_gbq(experimental)	to_gbq (experimental)
read_stata	to_stata
read_sas	ro_sas
read_clipboard	to_clipboard
read_pickle	to_pickle／／速度比csv快

保存文件

1
2
3

submission = pd.DataFrame({ 'PassengerId': test_df['PassengerId'],'Survived': predictions })
submission.to_csv("submission.csv", index=False)
# index参数是否写入行names键

数据结构汇总

Posted on 2019-10-8 | In 算法 , 数据结构 |

Words count in article: 25 | Reading time ≈ 1

常见的数据结构汇总，包含：

链表
数组
哈希表
堆
二叉树
栈
队列

break/continue

Posted on 2019-10-8 | In python , 进阶 |

Words count in article: 968 | Reading time ≈ 4

在Python中控制流主要有三种：if、break和continue。本文中讲解一下后两种，同时讲解Python中缩进对代码的影响。

break
缩进不同对代码输出影响
continue

python高阶函数

Posted on 2019-10-8 | In python , 进阶 |

Words count in article: 88 | Reading time ≈ 1

在这幅思维导图中主要是介绍了Python中几个比较重要的高阶函数及其用法，还有文件读取的三种方式，希望对学习Python的朋友有所帮助。

pandas系列4_合并和连接

Posted on 2019-10-8 | In pandas , 文档 |

Words count in article: 1.8k | Reading time ≈ 9

`concat`函数

直接将值和索引粘合在一起，默认是在axis=0上面工作，得到的是新的Series；改成axis=1，变成一个DF型数据

axis
- axis=0：默认是Series
- axis=1：得到DF数据，缺值用NaN补充
join
- outer：合并，缺值用nan
- inner：求交集，非交集部分直接删除
keys：用于层次化索引
ignore_index：不保留连接轴上的索引，产生新的索引

官方文档