Notice
Recent Posts
Recent Comments
Link
์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- ๋ค์ดํฐ๋ธ
- ๋ถ์
- ์ ํ๋์ํ
- ๋ฆฌ์กํธ
- linearalgebra
- ํ๊ตญ์ด์๋ฒ ๋ฉ
- c++
- ํ์ดํ๋
- ๋ฐ์ดํฐ
- ๋ฅ๋ฌ๋
- ๊นํ
- nlp
- ์๋ฒ ๋ฉ
- ์๋๋ก์ด๋์คํ๋์ค
- native
- ๊ฒฐ์ ํธ๋ฆฌ
- react
- ๋ฐ์ดํฐ๋ถ์
- Kaggle
- ๋ฐฑ์ค
- Titanic
- ๋์
- cs231n
- ์ธํ๋ฐ
- ๋จธ์ ๋ฌ๋
- AI
- Git
- ์ํ์ฝ๋ฉ
- ๋ฐ์ดํฐ์๊ฐํ
- ์๊ณ ๋ฆฌ์ฆ
Archives
- Today
- Total
yeon's ๐ฉ๐ป๐ป
[kaggle] ํ์ดํ๋(titanic) | 10. Feature Engineering - Embarked Feature ๋ณธ๋ฌธ
Computer ๐ป/๋ฐ์ดํฐ ๋ถ์
[kaggle] ํ์ดํ๋(titanic) | 10. Feature Engineering - Embarked Feature
yeon42 2021. 8. 7. 22:01728x90
3.1.2 Fill Null in Embarked
df_train['Embarked'].isnull().sum()
- Embarked์ null data๋ ํ์ฌ 2๊ฐ์ด๋ค.
-> ์ด null data๋ค์ ๋ค๋ฅธ ๊ฐ์ผ๋ก ์ฑ์ฐ๊ฒ ๋ค !
- ํ์ฌ S์์ ๊ฐ์ฅ ๋ง์ ํ์น๊ฐ์ด ์์ผ๋ฏ๋ก, nulld data๋ฅผ S๋ก ์ฑ์ฐ๊ฒ ๋ค.
- fillna
df_train['Embarked'.fillna('S', inplace=True)
3.2 Change Age (continuous to categorical)
- Age๋ ํ์ฌ continuous feature
- Age๋ฅผ ๋ช ๊ฐ์ ๊ทธ๋ฃน์ผ๋ก ๋๋์ด categoryํ ์์ผ์ฃผ์
- ์๋ก์ด column ์์ฑ
df_train['Age_cat'] = 0
- loc์ ์ฌ์ฉํด 10์ด ๊ฐ๊ฒฉ์ผ๋ก ๋๋๊ธฐ
df_train.loc[df_train['Age'] < 10, 'Age_cat'] = 0
df_train.loc[(10 <= df_train['Age']) & (df_train['Age'] < 20), 'Age_cat'] = 1
df_train.loc[(20 <= df_train['Age']) & (df_train['Age'] < 30), 'Age_cat'] = 2
df_train.loc[(30 <= df_train['Age']) & (df_train['Age'] < 40), 'Age_cat'] = 3
df_train.loc[(40 <= df_train['Age']) & (df_train['Age'] < 50), 'Age_cat'] = 4
df_train.loc[(50 <= df_train['Age']) & (df_train['Age'] < 60), 'Age_cat'] = 5
df_train.loc[(60 <= df_train['Age']) & (df_train['Age'] < 70), 'Age_cat'] = 6
df_train.loc[(70 <= df_train['Age']), 'Age_cat'] = 7
df_test.loc[df_test['Age'] < 10, 'Age_cat'] = 0
df_test.loc[(10 <= df_test['Age']) & (df_test['Age'] < 20), 'Age_cat'] = 1
df_test.loc[(20 <= df_test['Age']) & (df_test['Age'] < 30), 'Age_cat'] = 2
df_test.loc[(30 <= df_test['Age']) & (df_test['Age'] < 40), 'Age_cat'] = 3
df_test.loc[(40 <= df_test['Age']) & (df_test['Age'] < 50), 'Age_cat'] = 4
df_test.loc[(50 <= df_test['Age']) & (df_test['Age'] < 60), 'Age_cat'] = 5
df_test.loc[(60 <= df_test['Age']) & (df_test['Age'] < 70), 'Age_cat'] = 6
df_test.loc[(70 <= df_test['Age']), 'Age_cat'] = 7
but, ์๊ฐ์ ํ๋์ฝ๋ฉ์ ์ด๋ ต๋ค.
-> apply ๋ผ๋ ๋ฉ์๋๋ฅผ ์ฌ์ฉํด ํจ์๋ฅผ ์์ฑํด ์กฐ๊ธ ๋ ์ฝ๊ฒ ํด๋ณด์.
def category_age(x):
if x<10:
return 0
elif x<20:
return 1
elif x<30:
return 2
elif x<40:
return 3
elif x<50:
return 4
elif x<60:
return 5
elif x<70:
return 6
else:
return 7
df_train['Age_cat_2'] = df_train['Age'].apply(category_age)
- ๋์ ๋น๊ตํด๋ณด์ !
(df_train['Age_cat'] == df_train['Age_cat_2']).all()
- all() : ๋ชจ๋ ๊ฒ true์ผ ๋๋ง T, ํ๋๋ผ๋ false์ด๋ฉด F
- ์ธ๋ฐ์๋ ๊ฒ์ ๋ ๋ฆฌ๊ธฐ
df_train.drop(['Age', 'Age_cat_2'], axis=1, inplace=True)
df_test.drop(['Age'], axis=1, inplace=True)
- axis=1 : ์ปฌ๋ผ์ด ๋ ์๊ฐ
'Computer ๐ป > ๋ฐ์ดํฐ ๋ถ์' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[์ ๊ตญ ๋์ ๊ณต์ ํ์ค ๋ฐ์ดํฐ] Pandas Profiling, ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ (0) | 2021.08.12 |
---|---|
[kaggle] ํ์ดํ๋(titanic) | 11. Feature Engineering - Pearson (0) | 2021.08.07 |
[kaggle] ํ์ดํ๋(titanic) | 9. Feature Engineering - Fill Null in Age (0) | 2021.08.07 |
[kaggle] ํ์ดํ๋(titanic) | 8. EDA - Fare (0) | 2021.08.07 |
[ํ๋์ฐจ์ด์ฆ ์ ์ ๋ถ์] *์์ฝ (0) | 2021.08.07 |
Comments