๊ด€๋ฆฌ ๋ฉ”๋‰ด

yeon's ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

Apriori algorithm ๋ณธ๋ฌธ

Computer ๐Ÿ’ป/Machine Learning

Apriori algorithm

yeon42 2021. 11. 19. 17:51
728x90

https://ordo.tistory.com/89

 

[Python] Apriori algorithm:: ์—ฐ๊ด€๊ทœ์น™๋ถ„์„ (1)

์•ˆ๋…•ํ•˜์„ธ์š”. ์šฐ์ฃผ์‹  ์ž…๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์—ฐ๊ด€๊ทœ์น™ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ๊ฐ€์žฅ ๋จผ์ € ์ ‘ํ•˜๊ฒŒ ๋˜๋Š” Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋นˆ๋ฐœํ•ญ๋ชฉ์ง‘ํ•ฉ(frequent itemsets) ๋ฐ ์—ฐ๊ด€๊ทœ

ordo.tistory.com

์œ„ ๋ธ”๋กœ๊ทธ๋ฅผ ํ•„์‚ฌํ•˜์—ฌ ๊ณต๋ถ€ํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

 

* ๋ชจ๋“  ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€์˜ ์ถœ์ฒ˜๋Š” ์œ„ ๋ธ”๋กœ๊ทธ์ž…๋‹ˆ๋‹ค.

 


 

 

Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜ 

; frequent itemsets ๋ฐ association rule ๋ถ„์„์„ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

* Association Rule; ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ itemset set์ด ์–ผ๋งŒํผ ๋นˆ๋ฒˆํžˆ ๋ฐœ์ƒํ•˜๋Š”์ง€ ์—ฐ๊ด€๋„๋ฅผ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Œ

 

* Association Rule์˜ ๋Œ€ํ‘œ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜

- Apriori algorithm

- FP-growth algorithm

- DHP algorithm

 

 

 

 

๋จผ์ €, Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ association rule ํƒ์ƒ‰์„ ์œ„ํ•œ transaction ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์–ด์•ผ ํ•˜๊ณ , minsup์„ ์„ค์ •ํ•ด์ค˜์•ผ ํ•จ

 

Transaction ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ฃผ์–ด์ง€๊ณ , minsup=3์œผ๋กœ ์„ค์ •ํ•˜์ž.

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

- ์Šˆํผ๋งˆ์ผ“ ๊ตฌ๋งค ๋ฐ์ดํ„ฐ๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด, {1, 2, 3, 4}๋Š” 1๋ฒˆ item, 2๋ฒˆ item, 3๋ฒˆ item, 4๋ฒˆ item์ด ๊ฐ™์ด ๊ตฌ๋งค๋˜์—ˆ๊ณ , ์ด๋ฅผ ํ•˜๋‚˜์˜ itemset์ด๋ผ๊ณ  ๋ณด๋ฉด ๋จ

 

 

1. ์œ„ DB๋ฅผ ์Šค์บ”ํ•˜๋ฉฐ 1-fp๋ฅผ ์ฐพ๋Š”๋‹ค.

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

- ๊ฐ item๋“ค์ด 3๋ฒˆ, 6๋ฒˆ, 4๋ฒˆ, 5๋ฒˆ๋งŒํผ ๊ตฌ๋งค๊ฐ€ ๋˜์—ˆ๊ณ ,

  ๋ชจ๋‘ minsup=3๋ณด๋‹ค ํฌ๊ธฐ ๋•Œ๋ฌธ์— 1-fp์— ํฌํ•จ๋จ

 

 

2. 1-fp๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ 2-fp๋ฅผ ์ฐพ๋Š”๋‹ค.

- ์ด ๋•Œ, self-join๊ณผ prune ๊ณผ์ •์„ ๊ฑฐ์ณ์•ผ ํ•œ๋‹ค.

 

* self-join: {1}, {2}, {3}, {4} item์„ ๋ฐ”ํƒ•์œผ๋กœ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋Š” ๊ธธ์ด=2์ธ ๋ชจ๋“  ํ›„๋ณด๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •

=> {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}๊ฐ€ ์ƒ์„ฑ๋  ๊ฒƒ

 

์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ ํ›„๋ณด๋“ค์„ ๋Œ€์ƒ์œผ๋กœ pruning์„ ํ•˜๋Š”๋ฐ,

pruning ๋Œ€์ƒ์€ ํ›„๋ณด๋“ค ์ค‘ itemset์˜ item ์ค‘ 1-fp์— ์—†๋Š” item์ด ์žˆ์œผ๋ฉด pruning ํ•  ๋Œ€์ƒ์ด ๋จ

{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}์˜ ๊ฐ itemset์˜ item๋“ค์€ ๋ชจ๋‘ 1-fp์— ํฌํ•จ๋˜๋ฏ€๋กœ ์—ฌ๊ธฐ์„œ pruning ๋œ ํ›„๋ณด๋Š” ์—†์Œ

 

 

์ด์ œ self-join๊ณผ prune ๊ณผ์ •์„ ๊ฑฐ์นœ ํ›„๋ณด๋“ค์„ ๋Œ€์ƒ์œผ๋กœ DB๋ฅผ ์Šค์บ”ํ•˜๋ฉฐ countํ•œ๋‹ค.

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

 

- count๋œ ๊ฐ’์ด minsup=3๋ณด๋‹ค ํฐ itemset๋“ค๋กœ 2-fp๋ฅผ ๊ตฌํ•œ๋‹ค.

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

์ฆ‰, {1, 2}, {2, 3}, {2, 4}, {3, 4} itemset๋“ค์ด 2-fp์ด๋‹ค.

 

 

* ์—ฌ๊ธฐ์„œ ์•Œ ์ˆ˜ ์žˆ๋Š” Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์›๋ฆฌ๋Š”,

{1, 3}๊ณผ {1, 4}๋Š” ๋‹ค์Œ ๋‹จ๊ณ„์˜ fp ๋ถ„์„์— ์žˆ์–ด ๋” ์ด์ƒ ์ด์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ

because ํ•œ itemset์ด infrequentํ•˜๋‹ค๋ฉด, ๊ทธ item์„ ํฌํ•จํ•œ ๋ชจ๋“  ์ง‘ํ•ฉ(superset) ๋˜ํ•œ infrequentํ•˜๊ธฐ ๋•Œ๋ฌธ!

 

 

 

3. 2-fp๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ 3-fp ๊ตฌํ•˜๊ธฐ

(2๋ฒˆ๊ณผ ๊ณผ์ •์€ ๋˜‘๊ฐ™๋‹ค.

2-fp๋ฅผ ๋Œ€์ƒ์œผ๋กœ self-join๊ณผ prune์„ ํ•จ)

 

self-join: {1, 2}, {2, 3}, {2, 4}, {3, 4}

-> {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}

 

prune: {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}

({1, 2, 3}์˜ subset์ด {1, 2}, {1, 3}, {2, 3}์ธ๋ฐ, 2-fp์— ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ.

 ๋‚˜๋จธ์ง€ ๋‘ {1, 2, 4}, {1, 3, 4}๋„ ๋งˆ์ฐฌ๊ฐ€์ง€!)

 

 

๊ธธ์ด๊ฐ€ 3์ธ ํ›„๋ณด {2, 3, 4}๋ฅผ ๋Œ€์ƒ์œผ๋กœ DB๋ฅผ ์Šค์บ”ํ•˜๋ฉฐ countํ•œ๋‹ค.

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

- count๋œ ๊ฐ’์ด min-sup=3๋ณด๋‹ค ํฐ itemset๋“ค๋กœ 3-fp๋ฅผ ๊ตฌํ•ด์•ผํ•˜์ง€๋งŒ, minsup์„ ๋งŒ์กฑํ•˜๋Š” ๋” ์ด์ƒ์˜ itemset์ด ์—†์œผ๋ฏ€๋กœ ์ข…๋ฃŒ!

 

 

 


 

์œ„์˜ ์˜ˆ์‹œ๋ฅผ ์ผ๋ฐ˜ํ™”ํ•ด๋ณด๋ฉด,

 

1. DB๋ฅผ ์Šค์บ”ํ•˜๋ฉฐ 1-fp๋ฅผ ๊ตฌํ•จ

2. k-fp๋ฅผ ๋Œ€์ƒ์œผ๋กœ (k+1)-fp๋ฅผ ๊ตฌํ•จ

- self join๊ณผ prune์„ ํ†ตํ•ด ํ›„๋ณด๋ฅผ ๊ตฌํ•˜๊ณ ,

- DB๋ฅผ ์Šค์บ”ํ•˜๋ฉฐ minsup ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ํ›„๋ณด๋“ค๋งŒ ๋„์ถœ

3. ๋” ์ด์ƒ (k+1)-fp์ด ๋งŒ๋“ค์–ด์ง€์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ 2๋ฒˆ ๊ณผ์ •์„ ๋ฐ˜๋ณต

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comments