import pandas as pd df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx') df
name team Q1 Q2 Q3 Q4 0 Liver E 89212464 1 Arry C 36373757 2 Ack A 57601884 3 Eorge C 93967178 4 Oah D 65496186 ... ... ... ... ... ... ... 95 Gabriel C 48598774 96 Austin7 C 21313043 97 Lincoln4 C 9893120 98 Eli E 11745891 99 Ben E 21434174 100 rows × 6 columns
team Q1 Q2 Q3 Q4 name Liver E 89212464 Arry C 36373757 Ack A 57601884 Eorge C 93967178 Oah D 65496186 ... ... ... ... ... ... Gabriel C 48598774 Austin7 C 21313043 Lincoln4 C 9893120 Eli E 11745891 Ben E 21434174 100 rows × 5 columns\
team Q1 Q2 Q3 Q4 name Liver E 89212464 Arry C 36373757 Ack A 57601884 Eorge C 93967178 Oah D 65496186 Harlie C 24138743 Acob B 6195948 Lfie A 9109937 Reddie D 64935772 Oscar A 7792667# 选择多列 df[['team', 'Q1']] # 只看这两列,注意括号 df.loc[:, ['team', 'Q1']] # 和上一行效果一样
排序
df.sort_values([{要排序的列}],ascending=[{这些列是否升序}])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 多字段排序 df.sort_values(['team', 'Q1'], ascending=[True, False])排序公式 team Q1 Q2 Q3 Q4 name Aaron A 9675558 Henry A 91157517 Nathan A 87776213 Dylan A 86876520 Blake A 7823939 ... ... ... ... ... ... Eli E 11745891 Jude E 8451365 Rory9 E 8125827 Jackson5 E 6101533 Finn E 415532
Q1 Q2 Q3 Q4 team A 1066 17 51.470588 97 B 975 22 54.636364 99 C 1056 22 48.545455 98 D 860 19 65.315789 99 E 963 20 44.050000 98
偏移
diff
跟numpy中的语义相同 比如查看所有学生Q1 到 Q4 每季度成绩增长
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
df.loc[:,'Q1':'Q4'].diff(1, axis=1)
Q1 Q2 Q3 Q4 NaN -68340 NaN 1020 NaN 3 -4266 NaN 3 -257 NaN -161225 ... ... ... ... NaN 1128 -13 NaN 10 -113 NaN -5 -9219 NaN 63 -1633 NaN 22 -233
shift
向下移动,实际上索引不会动,其他列全部下移一位
1 2 3 4 5 6 7 8 9 10
df.shift().tail()
team Q1 Q2 Q3 Q4 name Gabriel D 20.031.062.068.0 Austin7 C 48.059.087.074.0 Lincoln4 C 21.031.030.043.0 Eli C 98.093.01.020.0 Ben E 11.074.058.091.0
向上移动
1 2 3 4 5 6 7 8
df.shift().head() team Q1 Q2 Q3 Q4 name Liver None NaN NaN NaN NaN Arry E 89.021.024.064.0 Ack C 36.037.037.057.0 Eorge A 57.060.018.084.0 Oah C 93.096.071.078.0
计算
DataFrame可以执行常见的运算符+-*/|& 等
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
df[['Q1','Q2']]*2
Q1 Q2 name Liver 17842 Arry 7274 Ack 114120 Eorge 186192 Oah 13098 ... ... ... Gabriel 96118 Austin7 4262 Lincoln4 196186 Eli 22148 Ben 4286
两个矩阵相加
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
df['Q1']+df['Q2']
name Liver 110 Arry 73 Ack 117 Eorge 189 Oah 114 ... Gabriel 107 Austin7 52 Lincoln4 191 Eli 85 Ben 64 Length: 100, dtype: int64
name team Q1 Q2 Q3 Q4 0 Liver E 89212464 3 Eorge C 93967178 12 Archie C 83895968 17 Henry A 91157517 19 Max E 9775413 25 Harrison B 89131875 32 Alexander C 91762679 33 Adam C 90324739 36 Jaxon E 88981998 38 Elijah B 97891546 42 Dylan A 86876520 70 Nathan A 87776213 77 Michael B 89215992 80 Ryan E 92706431 83 Albert0 B 85384117 88 Aaron A 9675558 97 Lincoln4 C 9893120
team Q1 Q2 Q3 Q4 name Zachary E 12718593 Jackson5 E 6101533 Harvey2 B 43768790 Elliott B 9313360 Aiden D 20316268
df.query('Q1 + Q2 > 180')
team Q1 Q2 Q3 Q4 name Eorge C 93967178 Jaxon E 88981998 Elijah B 97891546 Lincoln4C 9893120
df.query('team !="C"')
team Q1 Q2 Q3 Q4 name Liver E 89212464 Ack A 57601884 Oah D 65496186 Acob B 6195948 Lfie A 9109937 ... ... ... ... ... ... Grayson7B 59847433 Jamie0 B 39978455 Aiden D 20316268 Eli E 11745891 Ben E 21434174 78 rows × 5 columns