์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
- ๋ฅ๋ฌ๋
- ๋ฐ์ดํฐ
- ๋จธ์ ๋ฌ๋
- ๋์
- c++
- ๊ฒฐ์ ํธ๋ฆฌ
- ๋ฆฌ์กํธ
- ๋ฐฑ์ค
- ์ ํ๋์ํ
- Kaggle
- ๊นํ
- AI
- ์ธํ๋ฐ
- ๋ฐ์ดํฐ๋ถ์
- Git
- ๋ถ์
- ์๊ณ ๋ฆฌ์ฆ
- nlp
- linearalgebra
- ๋ฐ์ดํฐ์๊ฐํ
- ๋ค์ดํฐ๋ธ
- ํ์ดํ๋
- native
- ์๋ฒ ๋ฉ
- ์ํ์ฝ๋ฉ
- Titanic
- ์๋๋ก์ด๋์คํ๋์ค
- react
- ํ๊ตญ์ด์๋ฒ ๋ฉ
- cs231n
- Today
- Total
yeon's ๐ฉ๐ป๐ป
[kaggle] ํ์ดํ๋(titanic) | 4. EDA - Age ๋ณธ๋ฌธ
[kaggle] ํ์ดํ๋(titanic) | 4. EDA - Age
yeon42 2021. 7. 27. 20:512.4 Age
print('์ ์ผ ๋์ด ๋ง์ ํ์น๊ฐ : {:.1f} Years'.format(df_train['Age'].max()))
print('์ ์ผ ์ด๋ฆฐ ํ์น๊ฐ : {:.1f} Years'.format(df_train['Age'].min()))
print('ํ์น๊ฐ ํ๊ท ๋์ด : {:.1f} Years'.format(df_train['Age'].mean()))
-> data ๊ฐ์ ๋ํ max(), min(), mean() ๊ฐ์ ๊ตฌํ ์ ์๊ตฌ๋!
- ์์กด์ ๋ฐ๋ฅธ age์ histogram
* kdeplot (์ปค๋๋ฐ๋์ถ์ )
- histogram์ ๋ฑ๋ฑํ step์ผ๋ก ๊ทธ๋ ค์ง๋ค.
- ์ด๊ฒ์ ํ๋์ ๋ฐ๋ํจ์๋ก smoothํ๊ฒ ๊ทธ๋ ค์ฃผ์
fix, ax = plt.subplots(1, 1, figsize=(9, 5))
sns.kdeplot(df_train[df_train['Survived'] == 1]['Age'], ax=ax)
sns.kdeplot(df_train[df_train['Survived'] == 0]['Age'], ax=ax)
plt.legend(['Survived == 1', 'Survived == 0'])
plt.show()
* ์ฝ๋ ํด์
df_train['Survived'] == 1
df_train[df_train['Survived'] == 1]
- ์ด์๋จ์ ์ฌ๋๋ง ๋ฐํํ์
df_train[df_train['Survived'] == 1]['Age']
- ์์กดํ ์ฌ๋์ Age ์ปฌ๋ผ๋ง ๊ฐ์ ธ์ค์
sns.kdeplot(df_train[df_train['Survived'] == 1]['Age']
- ์ด๊ฑธ seaborn์ kdeplot ๊ทธ๋ํ ์์ ๋ฃ์ด๋ณธ๋ค
df_train[df_train['Survived'] == 1]['Age'].hist()
- ์ด๊ฑด histogram !
* ๊ทธ๋ํ์ '๋ํ์ง'๋ฅผ ์ค๋นํ๋ 3๊ฐ์ง ๋ฐฉ๋ฒ
f = plt.figure(figsize=(5, 5))
a = np.arange(100)
b = np.sin(a)
plt.plot(b)
f, ax = plt.subplots(1, 1, figsize=(5, 5))
a = np.arange(100)
b = np.sin(a)
ax.plot(b)
plt.figure(figsize=(5, 5))
a = np.arange(100)
b = np.sin(a)
plt.plot(b)
* ์์กด ์๋ ๊ทธ๋ํ
plt.figure(figsize=(8, 6))
df_train['Age'][df_train['Pclass'] == 1].plot(kind='kde')
df_train['Age'][df_train['Pclass'] == 2].plot(kind='kde')
df_train['Age'][df_train['Pclass'] == 3].plot(kind='kde')
plt.xlabel('Age')
plt.title('Age Distribution within classes')
plt.legend(['1st Class', '2nd Class', '3rd Class'])
- ์ kdeplot์ผ๋ก ๊ทธ๋ฆฌ๋ ?! -> histogram์ ๊ฒน์ณ์ ธ์ ๋ณด์ด์ง ์๋๋ค.
plt.figure(figsize=(8, 6))
df_train['Age'][df_train['Pclass'] == 1].plot(kind='hist')
df_train['Age'][df_train['Pclass'] == 2].plot(kind='hist')
df_train['Age'][df_train['Pclass'] == 3].plot(kind='hist')
# ์ฌ๊ธฐ์๋
plt.xlabel('Age')
plt.title('Age Distribution within classes')
plt.legend(['1st Class', '2nd Class', '3rd Class'])
* ์์กด ํ๋ฅ ์ ๋ฐ๋ฅธ ๊ทธ๋ํ
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
sns.kdeplot(df_train[(df_train['Survived'] == 0) & (df_train['Pclass'] == 1)]['Age'], ax=ax)
sns.kdeplot(df_train[(df_train['Survived'] == 1) & (df_train['Pclass'] == 1)]['Age'], ax=ax)
plt.legend(['Survived == 0', 'Survived == 1'])
plt.title('1st class')
plt.show()
--> ์ ์ ์ฌ๋์ผ์๋ก ์์กด ํ๋ฅ ์ด ๋๋ค๋ ๊ฒ์ ์ ์ ์๋ค.
* ์ ๋ง ๊ทธ๋ฐ์ง ํ์ธํด๋ณด๊ธฐ
change_age_range_survival_ratio = []
for i in range(1, 80):
change_age_range_survival_ratio.append(df_train[df_train['Age'] < i]['Survived'].sum() / len(df_train[df_train['Age'] < i]['Survived]))
plt.figure(figsize=(7, 7))
plt.plot(change_age_range_survival_ratio)
plt.title('Survival rate change depending on range of Age', y=1.02)
plt.ylabel('Survival rate')
plt.xlabel('Range of Age(0-x)')
- age์ ๋ฒ์๋ฅผ ๋ค๋ฅด๊ฒ ํ์ ๋ (1-80์ธ) survival ratio๊ฐ ์ด๋ป๊ฒ ๋ณํ๋๊ฐ๋ฅผ ๋ณด๊ณ ์ถ์
- i๊ฐ 1์ธ๋ถํฐ 80์ธ๊น์ง ๋ณํ๋๋ฐ, ๋ง์ฝ i=10์ด๋ผ๋ฉด 10์ด๋ณด๋ค ์์ ์ ๋ค ์ค ๋ช ๋ช ์ด ์ด์๋๋ฅผ ๋ฐํํด์ค
- ์ ๋ง ๋์ด๊ฐ ์ด๋ฆด์๋ก ์์กด ํ๋ฅ ์ด ๋์ ๊ฒ์ ์ ์ ์๋ค!!
* ์ฝ๋ ํด์
i = 10
df_train[df_train['Age'] < i]
- df_train์ age๊ฐ 10(์ธ)๋ณด๋ค ์์ ์น๊ตฌ๋ค์ row๋ฅผ ๋ฐํ์ํด
df_train[df_train['Age'] < i]['Survived']
- ๊ทธ ์ฌ๋๋ค์ด Survived ํ๋์ง ์ํ๋์ง
df_train[df_train['Age'] < i]['Survived'].sum()
- ์ด ์์กด์ ๋ช ๋ช ์ด ํ๋์ง
- Survived ํ ์ฌ๋์ด 1์ด๋ฏ๋ก sum์ ๊ฒฐ๊ณผ๋ ์ด ์์กด์ ํ ์ฌ๋์ ์!
len(df_train[df_train['Age'] < i]['Survived']
df_train[df_train['Age'] < i]['Survived'].sum() / len(df_train[df_train['Age'] < i]['Survived']
- ์ด ์ฌ๋์ ์๋ก ๋๋ ํ๋ฅ !! (์ฐ๋ฆฌ๊ฐ ๋ณด๊ณ ์ถ์ ๊ฒ)
'Computer ๐ป > ๋ฐ์ดํฐ ๋ถ์' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[kaggle] ๋ฐ์ดํฐ๋ถ์ | EDA - Embarked (0) | 2021.07.30 |
---|---|
[kaggle] ํ์ดํ๋(titanic) | Age, Sex, Pclass (violinplot) (0) | 2021.07.29 |
[kaggle] ํ์ดํ๋(titanic) | 3. EDA - Sex (์ฑ๋ณ) (0) | 2021.07.27 |
[kaggle] ํ์ดํ๋(titanic) | 2. EDA - Pclass (0) | 2021.07.26 |
[๋ฐ์ดํฐ ๋ถ์] ์ฐ๋๋ณ/์ง์ญ๋ณ ์๊ฐํ (0) | 2021.07.26 |