์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- ๊นํ
- ๋ฐ์ดํฐ
- c++
- native
- ๋ฐ์ดํฐ๋ถ์
- cs231n
- Git
- Kaggle
- ์๋๋ก์ด๋์คํ๋์ค
- ๊ฒฐ์ ํธ๋ฆฌ
- nlp
- ๋์
- ์๊ณ ๋ฆฌ์ฆ
- react
- ๋ค์ดํฐ๋ธ
- ํ๊ตญ์ด์๋ฒ ๋ฉ
- ํ์ดํ๋
- ๋จธ์ ๋ฌ๋
- ๋ถ์
- ์ ํ๋์ํ
- linearalgebra
- ๋ฆฌ์กํธ
- AI
- ์ํ์ฝ๋ฉ
- ์๋ฒ ๋ฉ
- ์ธํ๋ฐ
- ๋ฐ์ดํฐ์๊ฐํ
- ๋ฅ๋ฌ๋
- ๋ฐฑ์ค
- Titanic
- Today
- Total
yeon's ๐ฉ๐ป๐ป
[๋ฐ์ดํฐ ๋ถ์] Pandas, Seaborn, plot, unstack ๋ณธ๋ฌธ
[๋ฐ์ดํฐ ๋ถ์] Pandas, Seaborn, plot, unstack
yeon42 2021. 8. 4. 23:07- ์๊ถ์ ์ข ์๋ถ๋ฅ๋ช , ์๊ตฐ๊ตฌ๋ช ์ผ๋ก ๊ทธ๋ฃนํ ํ๊ณ '์ํธ๋ช '์ผ๋ก ๋น๋์ ์ธ๊ธฐ
g = df_academy_selectd.groupby(["์๊ถ์
์ข
์๋ถ๋ฅ๋ช
", "์๊ตฐ๊ตฌ๋ช
"])["์ํธ๋ช
"].count()
1.2.4 Pandas์ plot์ผ๋ก ์๊ฐํ
- 'ํ์-์ ์' ๋ฐ์ดํฐ๋ง ๊ฐ์ ธ์ ์๊ฐํ ํ๊ธฐ
g.loc["ํ์-์
์"].sort_values().plot.barh(figsize=(10, 7))
g.plot.bar()
- ๊ทธ๋ฃนํ๋ ๋ฐ์ดํฐ๋ multi-index ์ด๊ธฐ ๋๋ฌธ์ ๋ณด๊ธฐ๊ฐ ์ด๋ ต๋ค. -> ์ด๋ป๊ฒ ๊ฐ์ ํ ์ ์์๊น?
1.2.5 unstack() ์ดํดํ๊ธฐ
https://pandas.pydata.org/docs/user_guide/reshaping.html
Reshaping and pivot tables — pandas 1.3.1 documentation
Reshaping by melt The top-level melt() function and the corresponding DataFrame.melt() are useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivo
pandas.pydata.org
- ์์์ g๋ multi-index์๋ค.
- unstack() ์ทจํด์ฃผ๊ธฐ
- pivot ํํ๋ก ๋ณ๊ฒฝ
- barh ๊ทธ๋ํ๋ก
g.unstack().plot.barh(figsize=(8, 9))
g.unstack().loc["ํ์-์
์"].plot.barh(figsize=(8, 9))
- T = transpoze()
g.unstack().T.plot.bar(figsize=(15, 5))
- ์ ์ฒด์ ์ผ๋ก ํ์ ๊ฐ์๊ฐ ๊ฐ๋จ๊ตฌ, ์์ด๊ตฌ, ์์ฒ๊ตฌ, ... ๊ฐ ๋ง์ ๊ฑธ ์ ์ ์์
1.2.6 ๊ฐ์ ๊ทธ๋ํ๋ก seaborn์ผ๋ก ๊ทธ๋ฆฌ๊ธฐ
https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html
Reshaping and pivot tables — pandas 1.3.1 documentation
Reshaping by melt The top-level melt() function and the corresponding DataFrame.melt() are useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivo
pandas.pydata.org
- ๊ทธ๋ฃนํํ ๊ฐ์ ์ธ๋ฑ์ค ํ์ธ
g.index
- ์ธ๋ฑ์ค๊ฐ์ ์ปฌ๋ผ์ผ๋ก ๋ง๋ค๊ณ rename
t = g.reset_index()
t = t.rename(columns={"์ํธ๋ช
":"์ํธ์"})
- x์ถ์ ์๊ตฐ๊ตฌ๋ช , y์ถ์ ์ํธ์ ๋ฅผ ๋ง๋๊ทธ๋ํ๋ก ๊ทธ๋ฆฌ๊ธฐ
plt.figure(figsize=(15, 4))
sns.barplot(data=t, x="์๊ตฐ๊ตฌ๋ช
", y="์ํธ์", ci=None)
- ์ํธ์๋ณ๋ก ์์์ ๋ค๋ฅด๊ฒ ํํ
- x์ถ์ ์๊ถ์ ์ข ์๋ถ๋ฅ๋ช , y์ถ์ ์ํธ์๋ฅผ ๋ง๋๊ทธ๋ํ๋ก ๊ทธ๋ฆฌ๊ธฐ
plt.figure(figsize=(15, 4))
sns.barplot(data=t, x="์๊ถ์
์ข
์๋ถ๋ฅ๋ช
", y="์ํธ์", ci=None)
- ์๊ตฐ๊ตฌ๋ช ์ผ๋ก ์์ ๋ค๋ฅด๊ฒ ํํ
- ์๊ถ์ ์ข ์๋ถ๋ฅ๋ช ์ด 'ํ์-์ ์'์ธ ์๋ธ์ ๋ง
academy_sub = t[t["์๊ถ์
์ข
์๋ถ๋ฅ๋ช
"] == "ํ์-์
์"].copy()
plt.figure(figsize=(15, 4))
sns.barplot(data=academy_sub, x="์๊ตฐ๊ตฌ๋ช
", y="์ํธ์")
- ๊ตฌ๋ณ๋ก ํ์-์ ์ ๊ทธ๋ํ๋ง ๊ทธ๋ฆผ
- ๊ฐ๋จ๊ตฌ, ์์ฒ๊ตฌ, ์์ด๊ตฌ, ... ์ ์ ์ํ์์ด ๋ง๊ตฌ๋
- catplot์ผ๋ก ์๋ธํ๋กฏ ๊ทธ๋ฆฌ๊ธฐ
sns.catplot(data=t, x="์๊ถ์
์ข
์๋ถ๋ฅ๋ช
", y="์ํธ์", kind="bar", col="์๊ตฐ๊ตฌ๋ช
", col_wrap=4, sharex=False)
- sharex=False : ๊ฐ ๊ทธ๋ํ ์๋์ x์ถ ๊ฐ ํํํ๊ธฐ
'Computer ๐ป > ๋ฐ์ดํฐ ๋ถ์' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๋ฐ์ดํฐ ๋ถ์] Folium | ์ง๋ ํ์ฉํ๊ธฐ (0) | 2021.08.04 |
---|---|
[๋ฐ์ดํฐ ๋ถ์] ๊ฒฝ๋์ ์๋๋ก scatterplot ๊ทธ๋ฆฌ๊ธฐ (0) | 2021.08.04 |
[๋ฐ์ดํฐ ๋ถ์] isin์ผ๋ก ์๋ธ์ ๋ง๋ค๊ธฐ (0) | 2021.08.03 |
[๋ฐ์ดํฐ ๋ถ์] ๊ตฌ๋ณ ํ์์ - seaborn์ผ๋ก ์๊ฐํ (0) | 2021.08.03 |
[๋ฐ์ดํฐ ๋ถ์] ์ธ๋ฑ์ฑ๊ณผ ํํฐ๋ก ์๋ธ์ ๋ง๋ค๊ธฐ (0) | 2021.08.03 |