๊ด€๋ฆฌ ๋ฉ”๋‰ด

yeon's ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

[๋ฐ์ดํ„ฐ ๋ถ„์„] ๋ฐ์ดํ„ฐ ์ง‘๊ณ„ | groupby, pivot table ๋ณธ๋ฌธ

Computer ๐Ÿ’ป/๋ฐ์ดํ„ฐ ๋ถ„์„

[๋ฐ์ดํ„ฐ ๋ถ„์„] ๋ฐ์ดํ„ฐ ์ง‘๊ณ„ | groupby, pivot table

yeon42 2021. 7. 19. 19:39
728x90
๋ฐ์ดํ„ฐ ์ง‘๊ณ„ํ•˜๊ธฐ (groupby & pivot_table ๋น„๊ต)

 

 

  • ์ง€์—ญ๋ช…์œผ๋กœ ๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ํ‰๊ท ์„ ๊ตฌํ•˜๊ธฐ
df_last.groupby(["์ง€์—ญ๋ช…"])["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].mean()
pd.pivot_table(df_last, index=["์ง€์—ญ๋ช…"], values=["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"], aggfunc="mean")

- aggfunc๋Š” ์ž๋™์œผ๋กœ ํ‰๊ท ๊ฐ’ ์ถœ๋ ฅ

 

groupby
pivot_table

 

 

 

  • ์ „์šฉ๋ฉด์ ์œผ๋กœ ๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
df_last.groupby(["์ „์šฉ๋ฉด์ "])["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].mean()
pd.pivot_table(df_last, index="์ „์šฉ๋ฉด์ ", values="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")

groupby
pivot_table

 

 

  • ์ง€์—ญ๋ช…, ์ „์šฉ๋ฉด์ ์œผ๋กœ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
df_last.groupby(["์ง€์—ญ๋ช…, ์ „์šฉ๋ฉด์ "])["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].mean().unstack().round()
df_last.pivot_table(index="์ „์šฉ๋ฉด์ ", columns="์ง€์—ญ๋ช…", values="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ").round()

- unstack() : ๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค๊ฐ€ ์ปฌ๋Ÿผ๊ฐ’์œผ๋กœ ์˜ค๊ฒŒ ๋จ

- round() : ์†Œ์ˆ˜์  ์ œ๊ฑฐ

- T / transpose() : ํ–‰๊ณผ ์—ด ๊ต์ฒด

 

groupby
pivot_table

 

 

 

  • ์—ฐ๋„, ์ง€์—ญ๋ช…์œผ๋กœ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
g = df_last.groupby(["์—ฐ๋„", "์ง€์—ญ๋ช…"])["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].mean()
p = pd.pivot_table(df_last, index=["์—ฐ๋„", "์ง€์—ญ๋ช…"], values="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")
p.loc(2018)

- loc() : ํ–‰์„ ๊ธฐ์ค€์œผ๋กœ ๊ฐ€์ ธ์˜ฌ ๋•Œ

   - loc(2018) : 2018๋…„ ๋ฐ์ดํ„ฐ๋งŒ ๋ณด๊ณ ์‹ถ์€ ๊ฒฝ์šฐ

 

groupby
pivot_table

 

 

 


 

 

  • groupby ๋ณด๋‹จ pivot_table์ด ์กฐ๊ธˆ ๋” ์ง๊ด€์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • groupby๋Š” ๊ฒฐ๊ณผ๊ฐ’์ด series ํ˜•ํƒœ์ด๋ฏ€๋กœ pivot_table๋ณด๋‹ค ์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋” ๋น ๋ฅด๋‹ค.
Comments