๊ด€๋ฆฌ ๋ฉ”๋‰ด

yeon's ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

๊ฒฐ์ • ํŠธ๋ฆฌ(Decision Tree) ๋ณธ๋ฌธ

Computer ๐Ÿ’ป/Machine Learning

๊ฒฐ์ • ํŠธ๋ฆฌ(Decision Tree)

yeon42 2021. 8. 31. 13:04
728x90

https://bkshin.tistory.com/entry/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-4-%EA%B2%B0%EC%A0%95-%ED%8A%B8%EB%A6%ACDecision-Tree

 

๋จธ์‹ ๋Ÿฌ๋‹ - 4. ๊ฒฐ์ • ํŠธ๋ฆฌ(Decision Tree)

๊ฒฐ์ • ํŠธ๋ฆฌ(Decision Tree, ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ, ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด๋ผ๊ณ ๋„ ํ•จ)๋Š” ๋ถ„๋ฅ˜(Classification)์™€ ํšŒ๊ท€(Regression) ๋ชจ๋‘ ๊ฐ€๋Šฅํ•œ ์ง€๋„ ํ•™์Šต ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ์Šค๋ฌด๊ณ ๊ฐœ ํ•˜๋“ฏ์ด ์˜ˆ/์•„๋‹ˆ์˜ค ์งˆ๋ฌธ์„

bkshin.tistory.com

๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•„์‚ฌํ•˜๋ฉฐ ๊ณต๋ถ€

 

 


 

* ๊ฒฐ์ • ํŠธ๋ฆฌ (Decision Tree, ์˜์‚ฌ ๊ฒฐ์ • ํŠธ๋ฆฌ, ์˜์‚ฌ ๊ฒฐ์ • ๋‚˜๋ฌด)

 

- ๋ถ„๋ฅ˜ (Classification)์™€ ํšŒ๊ท€(Regression) ๋ชจ๋‘ ๊ฐ€๋Šฅํ•œ ์ง€๋„ ํ•™์Šต ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜

- ๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ์Šค๋ฌด๊ณ ๊ฐœ ํ•˜๋“ฏ์ด ์˜ˆ/์•„๋‹ˆ์˜ค ์งˆ๋ฌธ์„ ์ด์–ด๊ฐ€๋ฉฐ ํ•™์Šตํ•œ๋‹ค.

 

ex) ๋งค, ํŽญ๊ท„, ๋Œ๊ณ ๋ž˜, ๊ณฐ์„ ๊ตฌ๋ถ„ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด์ž.

- ๋งค์™€ ํŽญ๊ท„์€ ๋‚ ๊ฐœ๊ฐ€ ์žˆ๊ณ , ๋Œ๊ณ ๋ž˜์™€ ๊ณฐ์€ ๋‚ ๊ฐœ๊ฐ€ ์—†๋‹ค.

- '๋‚ ๊ฐœ๊ฐ€ ์žˆ๋‚˜์š”?' ๋ผ๋Š” ์งˆ๋ฌธ์„ ํ†ตํ•ด ๋งค, ํŽญ๊ท„ / ๋Œ๊ณ ๋ž˜, ๊ณฐ ์„ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

- ๋งค์™€ ํŽญ๊ท„์€ '๋‚  ์ˆ˜ ์žˆ๋‚˜์š”?' ๋ผ๋Š” ์งˆ๋ฌธ์„ ํ†ตํ•ด ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๊ณ ,

  ๋Œ๊ณ ๋ž˜์™€ ๊ณฐ์€ '์ง€๋А๋Ÿฌ๋ฏธ๊ฐ€ ์žˆ๋‚˜์š”?' ๋ผ๋Š” ์งˆ๋ฌธ์„ ํ†ตํ•ด ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

 

์ถœ์ฒ˜: ํ…์„œ ํ”Œ๋กœ์šฐ ๋ธ”๋กœ๊ทธ / ์œ„ ๋ธ”๋กœ๊ทธ

 

- ์ด๋ ‡๊ฒŒ ํŠน์ • ๊ธฐ์ค€ (์งˆ๋ฌธ)์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ชจ๋ธ์„ ๊ฒฐ์ • ํŠธ๋ฆฌ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•œ๋‹ค.

- ํ•œ ๋ฒˆ์˜ ๋ถ„๊ธฐ ๋•Œ๋งˆ๋‹ค ๋ณ€์ˆ˜ ์˜์—ญ์„ ๋‘ ๊ฐœ๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค.

 

- ๊ฒฐ์ • ํŠธ๋ฆฌ์—์„œ ์งˆ๋ฌธ์ด๋‚˜ ์ •๋‹ต์„ ๋‹ด์€ ๋„ค๋ชจ ์ƒ์ž : ๋…ธ๋“œ(Node)

- ๋งจ ์ฒซ ๋ถ„๋ฅ˜ ๊ธฐ์ค€ (์ฒซ ์งˆ๋ฌธ) : Root Node

- ๋งจ ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ : Terminal Node / Leaf Node

 

์ถœ์ฒ˜: ratsgo's blog / ์œ„ ๋ธ”๋กœ๊ทธ

 

- ์ „์ฒด์ ์ธ ๋ชจ์–‘์ด ๋‚˜๋ฌด๋ฅผ ๋’ค์ง‘์–ด ๋†’์€ ๊ฒƒ๊ณผ ๊ฐ™์•„ ์ด๋ฆ„์ด Decision Tree์ด๋‹ค.

 

 


 

* ๊ฒฐ์ • ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ”„๋กœ์„ธ์Šค

์ถœ์ฒ˜: ํ…์„œ ํ”Œ๋กœ์šฐ ๋ธ”๋กœ๊ทธ / ์œ„ ๋ธ”๋กœ๊ทธ

 

- ๋จผ์ € ์œ„์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ์งˆ๋ฌธ์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ”

 

์ถœ์ฒ˜: ํ…์„œ ํ”Œ๋กœ์šฐ ๋ธ”๋กœ๊ทธ / ์œ„ ๋ธ”๋กœ๊ทธ

 

- ๋‚˜๋‰œ ๊ฐ ๋ฒ”์ฃผ์—์„œ ๋˜ ๋‹ค์‹œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ์งˆ๋ฌธ์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ”

- but, ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์ดํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์˜ค๋ฒ„ํ”ผํ‹ฐ์ž‰ ๋จ.

- ๊ฒฐ์ • ํŠธ๋ฆฌ์— ์•„๋ฌด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฃผ์ง€ ์•Š๊ณ  ๋ชจ๋ธ๋งํ•˜๋ฉด ์˜ค๋ฒ„ํ”ผํŒ…์ด ๋œ๋‹ค.

 

 

 


* ๊ฐ€์ง€์น˜๊ธฐ (Pruning)

 

- ์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ง‰๊ธฐ ์œ„ํ•œ ์ „๋žต์œผ๋กœ ๊ฐ€์ง€์น˜๊ธฐ(Pruning)๋ผ๋Š” ๊ธฐ๋ฒ•์ด ์žˆ์Œ

- ํŠธ๋ฆฌ์— ๊ฐ€์ง€๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๋‹ค๋ฉด ์˜ค๋ฒ„ํ”ผํŒ…์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ์Œ

- ๊ฐ€์ง€์น˜๊ธฐ: ๋‚˜๋ฌด์˜ ๊ฐ€์ง€๋ฅผ ์น˜๋Š” ์ž‘์—…

  ์ฆ‰, ์ตœ๋Œ€ ๊นŠ์ด๋‚˜ ํ„ฐ๋ฏธ๋„ ๋…ธ๋“œ์˜ ์ตœ๋Œ€ ๊ฐœ์ˆ˜, ํ˜น์€ ํ•œ ๋…ธ๋“œ๊ฐ€ ๋ถ„ํ• ํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์†Œ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ์ œํ•œํ•˜๋Š” ๊ฒƒ

 

- min_sample_split ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ํ•œ ๋…ธ๋“œ์— ๋“ค์–ด์žˆ๋Š” ์ตœ์†Œ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ์ •ํ•ด์ฃผ๊ธฐ

- min_sample_split=10 ์ด๋ฉด ํ•œ ๋…ธ๋“œ์— 10๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด, ๊ทธ ๋…ธ๋“œ๋Š” ๋” ์ด์ƒ ๋ถ„๊ธฐ๋ฅผ ํ•˜์ง€ ์•Š์Œ

 

- max_depth๋ฅผ ํ†ตํ•ด ์ตœ๋Œ€ ๊นŠ์ด๋ฅผ ์ง€์ •ํ•ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค.

- max_depth=4 ๋ผ๋ฉด, ๊นŠ์ด๊ฐ€ 4๋ณด๋‹ค ํฌ๊ฒŒ ๊ฐ€์ง€๋ฅผ ์น˜์ง€ ์•Š๋Š”๋‹ค.

- ๊ฐ€์ง€์น˜๊ธฐ๋Š” ์‚ฌ์ „ ๊ฐ€์ง€์น˜๊ธฐ์™€ ์‚ฌํ›„ ๊ฐ€์ง€์น˜๊ธฐ๊ฐ€ ์žˆ์ง€๋งŒ sklearn์—์„œ๋Š” ์‚ฌ์ „ ๊ฐ€์ง€์น˜๊ธฐ๋งŒ ์ง€์›ํ•œ๋‹ค.

 

 

 


 

* ์•Œ๊ณ ๋ฆฌ์ฆ˜: ์—”ํŠธ๋กœํ”ผ(Entropy), ๋ถˆ์ˆœ๋„(Impurity)

 

- ๋ถˆ์ˆœ๋„(Impurity): ํ•ด๋‹น ๋ฒ”์ฃผ ์•ˆ์— ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์„ž์—ฌ ์žˆ๋Š”์ง€

- ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ ์œ„์ชฝ ๋ฒ”์ฃผ๋Š” ๋ถˆ์ˆœ๋„๊ฐ€ ๋‚ฎ๊ณ , ์•„๋ž˜์ชฝ ๋ฒ”์ฃผ๋Š” ๋ถˆ์ˆœ๋„๊ฐ€ ๋†’๋‹ค.

- ์ฆ‰, ์œ„์ชฝ ๋ฒ”์ฃผ๋Š” ์ˆœ๋„(Purity)๊ฐ€ ๋†’๊ณ , ์•„๋ž˜์ชฝ ๋ฒ”์ฃผ๋Š” ์ˆœ๋„๊ฐ€ ๋‚ฎ๋‹ค.

- ์œ„์ชฝ ๋ฒ”์ฃผ๋Š” ๋‹ค ๋นจ๊ฐ„์ ์ธ๋ฐ ํ•˜๋‚˜๋งŒ ํŒŒ๋ž€์ ์ด๋ฏ€๋กœ ๋ถˆ์ˆœ๋„๊ฐ€ ๋‚ฎ๋‹ค.

- ๋ฐ˜๋ฉด ์•„๋ž˜์ชฝ ๋ฒ”์ฃผ๋Š” 5๊ฐœ๋Š” ํŒŒ๋ž€์ , 3๊ฐœ๋Š” ๋นจ๊ฐ„์ ์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ด ์„ž์—ฌ ์žˆ์–ด ๋ถˆ์ˆœ๋„๊ฐ€ ๋†’๋‹ค.

 

์ถœ์ฒ˜: ratsgo's blog / ์œ„ ๋ธ”๋กœ๊ทธ

 

- ํ•œ ๋ฒ”์ฃผ์— ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ์žˆ๋‹ค๋ฉด ๋ถˆ์ˆœ๋„๊ฐ€ ์ตœ์†Œ(์ˆœ๋„๋Š” ์ตœ๋Œ€)์ด๊ณ ,

  ํ•œ ๋ฒ”์ฃผ์— ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •ํ™•ํžˆ ๋ฐ˜๋ฐ˜ ์žˆ๋‹ค๋ฉด ๋ถˆ์ˆœ๋„๊ฐ€ ์ตœ๋Œ€(์ˆœ๋„๋Š” ์ตœ์†Œ)์ด๋‹ค.

- ๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ๋ถˆ์ˆœ๋„๋ฅผ ์ตœ์†Œํ™”(์ˆœ๋„ ์ตœ๋Œ€ํ™”)ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค.

 

 

- ์—”ํŠธ๋กœํ”ผ(Entropy) : ๋ถˆ์ˆœ๋„(Impurity)๋ฅผ ์ˆ˜์น˜์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ์ฒ™๋„

- ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋†’๋‹ค = ๋ถˆ์ˆœ๋„๊ฐ€ ๋†’๋‹ค

  ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋‚ฎ๋‹ค = ๋ถˆ์ˆœ๋„๊ฐ€ ๋‚ฎ๋‹ค

- ์—”ํŠธ๋กœํ”ผ = 1 : ๋ถˆ์ˆœ๋„ ์ตœ๋Œ€ / ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •ํ™•ํžˆ ๋ฐ˜๋ฐ˜ ์žˆ๋‹ค.

  ์—”ํŠธ๋กœํ”ผ = 0 : ๋ถˆ์ˆœ๋„ ์ตœ์†Œ / ํ•œ ๋ฒ”์ฃผ ์•ˆ์— ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ์žˆ๋‹ค.

 

 

- ์—”ํŠธ๋กœํ”ผ ๊ตฌํ•˜๋Š” ๊ณต์‹: 

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

(pi = ํ•œ ์˜์—ญ ์•ˆ์— ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ฐ€์šด๋ฐ, ๋ฒ”์ฃผ i์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋น„์œจ)

 

 

 


 

* ์ •๋ณด ํš๋“ (Information gain)

 

- ์—”ํŠธ๋กœํ”ผ๊ฐ€ 1 -> 0.7 ๋กœ ๋ฐ”๋€Œ์—ˆ๋‹ค๋ฉด ์ •๋ณด ํš๋“(information gain)์€ 0.3์ด๋‹ค.

- ์ •๋ณด ํš๋“: (๋ถ„๊ธฐ ์ด์ „์˜ ์—”ํŠธ๋กœํ”ผ) - (๋ถ„๊ธฐ ์ดํ›„์˜ ์—”ํŠธ๋กœํ”ผ)

Information gain = entropy(parent) - [weighted average] entropy(children)

 

- entropy(parent)๋Š” ๋ถ„๊ธฐ ์ด์ „์˜ ์—”ํŠธ๋กœํ”ผ์ด๊ณ , entropy(chlidren)์€ ๋ถ„๊ธฐ ์ดํ›„์˜ ์—”ํŠธ๋กœํ”ผ

- [weighted average] entropy(children)์€ entropy(children)์˜ ๊ฐ€์ค‘ ํ‰๊ท ์„ ์˜๋ฏธ

- ๋ถ„๊ธฐ ์ดํ›„ ์—”ํŠธ๋กœํ”ผ์— ๋Œ€ํ•ด ๊ฐ€์ค‘ ํ‰๊ท ์„ ํ•˜๋Š” ์ด์œ ๋Š” ๋ฒ”์ฃผ๊ฐ€ 2๊ฐœ ์ด์ƒ์œผ๋กœ ์ชผ๊ฐœ์ง€๊ธฐ ๋–„๋ฌธ

 

- ๊ฒฐ์ • ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ •๋ณด ํš๋“์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค.

- ์–ด๋А feature์˜ ์–ด๋А ๋ถ„๊ธฐ์ ์—์„œ ์ €์˜ต ํš๋“์ด ์ตœ๋Œ€ํ™”๋˜๋Š”์ง€ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„๊ธฐ๊ฐ€ ์ง„ํ–‰๋œ๋‹ค.

 

 

 

 


 

* ์‹ค์Šต

 

- ์ „๋ฐ˜์ ์ธ ๋ฐฉ์‹์€ ์ง€๊ธˆ๊นŒ์ง€ ํ–ˆ๋˜ ๋‹ค๋ฅธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ๊ณผ ์œ ์‚ฌ

- Classifier๋ฅผ ๋งŒ๋“ค๊ณ , fittingํ•œ ๋’ค, Testํ•ด๋ณธ๋‹ค.

- Classifier๋งŒ DecisionTreeClassifier์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ ๋นผ๊ณ ๋Š” ๋‹ค๋ฅธ๊ฒŒ ์—†๋‹ค.

 

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
	cancer.data, cancer.target, stratify=cancer.target, random_state=42)
tree = DecisionTreeClassifier(random_state=0)
tree.fit(X_train, y_train)
print("ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.3f}".format(tree.score(X_train, y_train)))
print("ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.3f}".format(tree.score(X_test, y_test)))

>>> ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 1.000
>>> ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 0.937

 

- ๊ฒฐ์ • ํŠธ๋ฆฌ์˜ default๋Š” max_depth, min_sample_split ์ œํ•œ์ด ์—†์œผ๋ฏ€๋กœ ํ•œ ๋ฒ”์ฃผ์— ํ•œ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚จ์„ ๋•Œ๊นŒ์ง€ ๊ฐ€์ง€๋ฅผ ์นœ๋‹ค.

- ๋”ฐ๋ผ์„œ ํ›ˆ๋ จ ์„ธํŠธ์˜ ์ •ํ™•๋„๋Š” 100%์ด์ง€๋งŒ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„๋Š” 93.7%์ด๋‹ค.

 

tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)

print("ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.3f}".format(tree.score(X_train, y_train)))
print("ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.3f}".format(tree.score(X_test, y_test)))

>>> ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 0.988
>>> ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 0.951

 

- ๋ฐ˜๋ฉด max_depth=4๋กœ ์„ค์ •ํ•ด์ฃผ๋ฉด ์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ง‰์•„ ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„๋Š” ์ข€ ๋– ๋ ์ง€์ง€๋งŒ, ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์•„์กŒ๋‹ค.

 

 

 

 


 

 

 

* ์—”ํŠธ๋กœํ”ผ ์˜ˆ์ œ

 

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

 

- ์œ„ ํ‘œ๋Š” ๊ฒฝ์‚ฌ, ํ‘œ๋ฉด, ์†๋„ ์ œํ•œ์„ ๊ธฐ์ค€์œผ๋กœ ์†๋„๊ฐ€ ๋А๋ฆฐ์ง€ ๋น ๋ฅธ์ง€ ๋ถ„๋ฅ˜ํ•ด๋†“์€ ํ‘œ

- X variables๊ฐ€ ๊ฒฝ์‚ฌ, ํ‘œ๋ฉด, ์†๋„ ์ œํ•œ์ด๊ณ , Y variable์ด ์†๋„

- ์ด ๋•Œ ์—”ํŠธ๋กœํ”ผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ๋ชจ๋ธ๋ง ํ•ด๋ณด๊ฒ ๋‹ค.

 

- ์†๋„ ๋ผ๋ฒจ์—๋Š” slow, slow, fast, fast ์ด 4๊ฐœ์˜ examples๋“ค์ด ์žˆ๋‹ค.

- Pi๋Š” ํ•œ ์˜์—ญ ์•ˆ์— ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ฐ€์šด๋ฐ ๋ฒ”์ฃผ i์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ

- i๊ฐ€ slow๋ผ๋ฉด slow ๋ผ๋ฒจ ๊ฐฏ์ˆ˜ = 2๊ฐœ, ์ „์ฒด ๋ผ๋ฒจ ๊ฐฏ์ˆ˜ = 4๊ฐœ ์ด๊ธฐ ๋•Œ๋ฌธ์— P_slow = 2/4 = 0.5

- ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ P_fast = 0.5

 

- ๊ทธ๋ ‡๋‹ค๋ฉด, ํ˜„์žฌ ๋ฒ”์ฃผ ์ „์ฒด์˜ ์—”ํŠธ๋กœํ”ผ๋Š”? 1

  - ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •ํ™•ํžˆ ๋ฐ˜๋ฐ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ

 

 

- ์œ„์—์„œ ๋ณธ ์—”ํŠธ๋กœํ”ผ ๊ณต์‹์— ๊ทธ๋Œ€๋กœ ๋Œ€์ž…ํ•ด ์•คํŠธ๋กœํ”ผ๋ฅผ ๊ตฌํ•ด๋ณด์ž

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

 

Entropy = -P_slow*log2(P_slow) - P_fast*log2(P_fast)

              = -05 * log2(0.5) - 0.5 * log2(0.5) = 1

 

 


 

* ๊ฒฝ์‚ฌ ๊ธฐ์ค€ ๋ถ„๊ธฐ

 

- ๋จผ์ € ๊ฒฝ์‚ฌ(grade)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ฒซ ๋ถ„๊ธฐ๋ฅผ ํ•ด๋ณด์ž.

- ์ „์ฒด ๋ฐ์ดํ„ฐ ์ค‘ steep์€ 3๊ฐœ, ์ด ๋•Œ์˜ ์†๋„๋Š” ๊ฐ๊ฐ slow, slow, fast์ด๋‹ค.

- flat์˜ ๋ฐ์ดํ„ฐ๋Š” ์ด 1๊ฐœ, ์ด ๋•Œ์˜ ์†๋„๋Š” fast

 

- flat์— ํ•ด๋‹นํ•˜๋Š” ๋…ธ๋“œ์˜ ์—”ํŠธ๋กœํ”ผ๋Š”?

์ถœ์ฒ˜: Udacity / ์œ„ ๋ธ”๋กœ๊ทธ

- ์˜ค๋ฅธ์ชฝ ๋…ธ๋“œ์—์„œ ํ•œ ๋…ธ๋“œ์— fast๋ผ๋Š” ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—”ํŠธ๋กœํ”ผ๋Š” 0์ด๋‹ค.

- ๋”ฐ๋ผ์„œ, entropy(flat) = 0

 

- entropy(steep)์€ ์™ผ์ชฝ ๋…ธ๋“œ

  - slow๊ฐ€ 2๊ฐœ, fast๊ฐ€ 1๊ฐœ

  - entropy(steep) = - P_slow * log2(P_slow) - P_fast * log2(P_fast)

                               = -(2/3) * logw(2/3) - (1/3) * log2(1/3)

                               = 0.9184

 

- ๋ถ„๊ธฐ ์ดํ›„ ๋…ธ๋“œ์— ๋Œ€ํ•œ ๊ฐ€์ค‘ ํ‰๊ท  ๊ตฌํ•ด๋ณด์ž

- [weighted average] entropy(children)

  = weighted average of steep * entropy(steep) + weighted average of flat * entropy(flat)

  = (3/4) * (0.9184) + (1/4) * 0

  = 0.6888

 

- ๋”ฐ๋ผ์„œ ๊ฒฝ์‚ฌ(grade)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๊ธฐํ•œ ํ›„์˜ ์—”ํŠธ๋กœํ”ผ๋Š” 0.6888

 

- ์ด์ œ ์ •๋ณด ํš๋“ ๊ณต์‹์„ ํ†ตํ•ด ์ •๋ณด ํš๋“๋Ÿ‰์„ ๊ตฌํ•ด๋ณด์ž

- information gain

  = entropy(parent) - [weighted average] entropy(children)

  = 1 - 0.6888

  = 0.3112

 

- ๊ฒฝ์‚ฌ feature๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๊ธฐ๋ฅผ ํ–ˆ์„ ๋•Œ๋Š” 0.3112๋งŒํผ์˜ ์ •๋ณด ํš๋“(information gain)์ด ์žˆ๋‹ค๋Š” ๋œป!

 

 


 

* ํ‘œ๋ฉด ๊ธฐ์ค€ ๋ถ„๊ธฐ

 

- ํ‘œ๋ฉด(bumpiness)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๊ธฐ ํ–ˆ์„ ๋•Œ๋Š” bumpy์— slow, fast, smooth์—๋„ slow, fast๊ฐ€ ์žˆ๋‹ค.

- ํ•˜๋‚˜์˜ ๋ฒ”์ฃผ์— ๋Œ€ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •ํ™•ํžˆ ๋ฐ˜๋ฐ˜์ด๋ฏ€๋กœ ์—”ํŠธ๋กœํ”ผ๋Š” 1.

 

- entropy(bumpy) = - P_slow * log2(P_slow) - P_fast * log2(P_fast) = 1

- entropy(smooth) = - P_slow * log2(P_slow) - P_fast * log2(P_fast) = 1

 

- [weighted average] entropy(children)

  = weighted average of bumpy * entropy(bumpy) + weighted average of smooth * entropy(smooth)

  = (2/4) * 1 + (2/4) * 1

  = 1

 

- information gain = entropy(parent) - [weighted average] entropy(children) = 1 - 1 = 0

 

- ํ‘œ๋ฉด์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๊ธฐํ–ˆ์„ ๋•Œ๋Š” ์ •๋ณด ํš๋“์ด ์ „ํ˜€ ์—†๋‹ค๋Š” ๋œป!!

 

 


* ์†๋„ ์ œํ•œ ๊ธฐ์ค€ ๋ถ„๊ธฐ

 

์ถœ์ฒ˜: Udacity / ์œ„ ๋ธ”๋กœ๊ทธ

- entropy(yes) = -P_slow * log2(P_slow) - P_fast * log2(P_fast) = -1 * log2(1) - 0 * log2(0) = 0

- entropy(no) = -P_slow * log2(P_slow) - P_fast * log2(P_fast) = 0 * log2(0) - 1 * log(1) = 0

 

๋”ฐ๋ผ์„œ, information gain = 1 - (2/4) * 0 - (2/4) * 0 = 1

 

 


 

- ๊ฒฝ์‚ฌ, ํ‘œ๋ฉด, ์†๋„์ œํ•œ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๊ธฐํ–ˆ์„ ๋•Œ ์ •๋ณด ํš๋“์€ ๊ฐ๊ฐ 0.3112, 0, 1์ด๋‹ค.

- ๊ฒฐ์ •ํŠธ๋ฆฌ๋Š” ์ •๋ณด ํš๋“์ด ๊ฐ€์žฅ ๋งŽ์€ ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค.

- ๋”ฐ๋ผ์„œ ์ฒซ ๋ถ„๊ธฐ์ ์„ ์†๋„์ œํ•œ ๊ธฐ์ค€์œผ๋กœ ์žก๋Š”๋‹ค.

- ์ด๋Ÿฐ์‹์œผ๋กœ max_depth๋‚˜ min_sample_split์œผ๋กœ ์„ค์ •ํ•œ ๋ฒ”์œ„๊นŒ์ง€ ๋ถ„๊ธฐ๋ฅผ ํ•˜๊ฒŒ ๋œ๋‹ค.

- ์ด๊ฒƒ์ด ๋ฐ”๋กœ ๊ฒฐ์ •ํŠธ๋ฆฌ์˜ ์ „์ฒด์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

 

 

Comments