๊ด€๋ฆฌ ๋ฉ”๋‰ด

yeon's ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

RNN (many-to-one stacking) ๋ณธ๋ฌธ

Computer ๐Ÿ’ป/Deep Learning

RNN (many-to-one stacking)

yeon42 2021. 10. 29. 17:30
728x90

https://engineer-mole.tistory.com/25

 

[python/Tensorflow2.0] RNN(Recurrent Neural Network) ; many to one stacking

1. Stacking์ด๋ž€? โ€‹CNN์—์„œ convolution layer๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์ผ๋“ฏ ,RNN๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์Œ“์„ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ multi layered RNN ๋˜๋Š” stacked RNN์ด๋ผ๊ณ  ์–˜๊ธฐํ•œ๋‹ค. CNN์—์„œ convolution layer๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์Œ“..

engineer-mole.tistory.com

์œ„ ๋ธ”๋กœ๊ทธ๋ฅผ ํ•„์‚ฌํ•˜๋ฉฐ ๊ณต๋ถ€

 

* ๋ชจ๋“  ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€์˜ ์ถœ์ฒ˜๋Š” ์œ„ ๋ธ”๋กœ๊ทธ์ž…๋‹ˆ๋‹ค.

 

(๋”๋ถˆ์–ด ํ•ด๋‹น ํฌ์ŠคํŠธ์™€ ์ œ ๊ธ€์€ ๊น€์„ฑํ›ˆ ๊ต์ˆ˜๋‹˜์˜ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹'์„ ๋ฐ”ํƒ•์œผ๋กœ ์ œ์ž‘๋˜์—ˆ์Œ์„ ๋ช…์‹œํ•ฉ๋‹ˆ๋‹ค.

https://www.boostcourse.org/ai212/lecture/43752?isDesc=false) 

 

 


 

1. Stacking์ด๋ž€?

 

CNN์—์„œ convolution layer๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์ผ๋“ฏ, RNN๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์Œ“์„ ์ˆ˜ ์žˆ๋‹ค.

-> ์ด๋ฅผ multi layered RNN ๋˜๋Š” stacked RNN ์ด๋ผ๊ณ  ์–˜๊ธฐํ•œ๋‹ค.

 

CNN์—์„œ convolution layer๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์Œ“์•˜์„ ๋•Œ, input ์ด๋ฏธ์ง€์— ๊ฐ€๊นŒ์šด convolution layer์€ edge์™€ ๊ฐ™์€ ๊ธ€๋กœ๋ฒŒํ•œ feature์„ ๋ฝ‘์„ ์ˆ˜ ์žˆ๊ณ , output์— ๊ฐ€๊นŒ์šด convolution layer์€ ์ข€ ๋” abstractํ•œ feature์„ ๋ฝ‘์„ ์ˆ˜ ์žˆ๋“ฏ์ด,

RNN์—์„œ๋„ stacked RNN์„ ํ™œ์šฉํ•ด ๋น„์Šทํ•œ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

 

์ถœ์ฒ˜: ์œ„ ๋ธ”๋กœ๊ทธ

 

- ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ์˜ stacked RNN ๊ด€๋ จ ์—ฌ๋Ÿฌ ๋…ผ๋ฌธ์—์„œ input์— ๊ฐ€๊นŒ์šด RNN์˜ hidden states๊ฐ€ sementic information(์˜๋ฏธ์  ์ •๋ณด)๋ณด๋‹ค syntatic information(๋ฌธ๋ฒ•์  ์ •๋ณด)์„ ์ƒ๋Œ€์ ์œผ๋กœ ๋” ์ž˜ ์ธ์ฝ”๋”ฉ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ,

- ๋ฐ˜๋Œ€๋กœ output์— ๊ฐ€๊นŒ์šด RNN์˜ hidden states๋Š” sementic information์„ syntatic information๋ณด๋‹ค ๋”์šฑ ์ž˜ ์ธ์ฝ”๋”ฉ ํ•˜๊ณ  ์žˆ์Œ์„ ์‹ค์ฆ์ ์œผ๋กœ ํŒŒ์•…ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์— stacked RNN์ด ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค.

 

 


 

2. Stacked RNN ๊ตฌํ˜„

 

์ด์ „์˜ many-to-one ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ์‹๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š์œผ๋ฉฐ ์ฐจ์ด์ ์€ RNN์„ ์—ฌ๋Ÿฌ ๊ฐœ ํ™œ์šฉํ•˜๋Š” stacked RNN์„ many-to-one์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ 

 

 

- ์‹œํ€€์Šค๋ฅผ tokenization ํ•œ ๋’ค embedding layer์„ ๊ฑฐ์ณ ์–ด๋–ค numeric vector๋กœ ํ‘œํ˜„๋œ ๊ฐ ํ† ํฐ์„ stacked RNN์ด ์ˆœ์„œ๋Œ€๋กœ ์ฝ์–ด๋“ค์ธ๋‹ค.

- stacked RNN์„ ๊ตฌ์„ฑํ•˜๊ณ  ์žˆ๋Š” RNN ์ค‘ t๋ฒˆ์งธ ์‹œ์ ์˜ ํ† ํฐ(๋งˆ์ง€๋ง‰ ์ค„ ๋นจ๊ฐ„์ƒ‰)๊ณผ t-1 ์‹œ์ ์˜ hidden states(์ดˆ๋ก์ƒ‰)๋ฅผ ๋ฐ›์•„์„œ t๋ฒˆ์งธ ์‹œ์ ์˜ hidden states(์ดˆ๋ก์ƒ‰)๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

- ๋‘ ๋ฒˆ์งธ RNN์€ t๋ฒˆ์งธ ์‹œ์ ์— ์ฒซ ๋ฒˆ์งธ rnn์˜ hidden states์™€ t-1๋ฒˆ์งธ ์‹œ์ ์€ ๋‘ ๋ฒˆ์งธ rnn์˜ hidden states๋ฅผ ๋ฐ›์•„์„œ t๋ฒˆ์งธ ์‹œ์ ์— hidden states๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

- ์ด์™€ ๊ฐ™์€ ๋ฐฉ์‹์€ rnn์„ ๋ช‡ ๊ฐœ staking ํ–ˆ๋А๋ƒ์— ์ƒ๊ด€์—†์ด ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋œ๋‹ค.

- ๋งˆ์ง€๋ง‰ ํ† ํฐ์„ ์ฝ์—ˆ์„ ๋•Œ ๋‚˜์˜จ ์ถœ๋ ฅ๊ณผ ์ •๋‹ต์˜ loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด loss๋กœ stacked RNN์„ back propagation์„ ํ†ตํ•ด ํ•™์Šตํ•œ๋‹ค.

 

 

- ์ด๋ฒˆ์—๋Š” ๋ฌธ์žฅ์„ ๋ถ„๋ฅ˜ํ•ด๋ณด์ž

 

 

(1) Importing Libraries

# setup
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import Sequential, Model
from tensorflow.keras.preprocessing.sequence import pad_sequences
%matplotlib inline

print(tf.__version__)

 

(2) Preparing Dataset

# example data
sentences = ['What I cannot create, I do not understand.', 
             'Intellecuals solve problems, geniuses prevent them', 
             'A person who never made a mistake never tied anything new.', 
             'The same equations have the same solutions.']

y_data = [1, 0, 0, 1]

- y_data์—์„œ 1์€ richard feynman์ด ํ–ˆ๋˜ ๋ง, 0์€ albert eisntein์ด ํ–ˆ๋˜ ๋ง ..

 

- ์˜ˆ์ œ ๋ฐ์ดํ„ฐ๋กœ ์ฃผ์–ด์ง„ ๊ฐ sentence๋ฅผ ์บ๋ฆญํ„ฐ์˜ ์‹œํ€€์Šค๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ๋ฌธ์ œ๋ฅผ ํ’€ ๊ฒƒ์ด๋ฉฐ,

- ์ด๋ฅผ ์œ„ํ•ด (์ง€๋‚œ ๋ฒˆ ์ฒ˜๋Ÿผ) ํ† ํฐ์ด ์ด๋Ÿฌํ•œ ์บ๋ฆญํ„ฐ๋ฅผ integer index๋กœ ๋งตํ•‘ํ•˜๊ณ  ์žˆ๋Š” ํ† ํฐ์˜ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•จ

 

# creating a token dictionary
char_set = ['<pad>'] + sorted(list(set(''.join(sentences))))
idx2char = {idx: char for idx, char in enumerate(char_set)}
char2idx = {char: idx for idx, char in enumerate(char_set)}

print(char_set)
print(idx2char)
print(char2idx)

 

# converting sequence of token to sequence of indices
x_data = list(map(lambda sentence: [char2idx.get(char) for char in sentence], sentences))
x_data_len = list(map(lambda sentence: len(sentence), sentences))

print(x_data)
print(x_data_len)
print(y_data)

- ์ด์ „๋ณด๋‹ค ์‹œํ€€์Šค์˜ ๊ธธ์ด๊ฐ€ ํ›จ์”ฌ ๊ธธ์–ด์กŒ๋‹ค.

- ์ด๋ ‡๊ฒŒ ๊ธธ์ด๊ฐ€ ๊ธด ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ฃฐ ๋•Œ์—๋Š” ๋‹จ์ˆœ RNN๋ณด๋‹ค๋Š” Long Short - Term Memory Network(LSTM) ๋˜๋Š” Gated Recurrent Unit(GRU) ๋“ฑ์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

- ์—ฌ๊ธฐ์„œ๋Š” stacked RNN ๊ตฌ์กฐ๋กœ ์ž‘์„ฑํ•˜์˜€๋‹ค.

 

 

# padding the sequence of indices
max_sequence = 58
x_data = pad_sequences(sequences=x_data, maxlen=max_sequence, padding='post', truncating='post')

# checking data
print(x_data)
print(x_data_len)
print(y_data)

- sentence์˜ sequence๊ฐ€ ๋‹ค๋ฅธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ

 

 

 

 

(3) Creating Model

# creating simple rnn for "many to one" classification without dropout
num_classes = 2
hidden_dims = [10, 10]

input_dim = len(char2idx)
output_dim = len(char2idx)
one_hot = np.eye(len(char2idx))

model = Sequential()
model.add(layers.Embedding(input_dim=input_dim, output_dim=output_dim,
                           trainable=False, mask_zero=True, input_length=max_sequence,
                           embeddings_initializer=keras.initializers.Constant(one_hot)))
                           
model.add(layers.SimpleRNN(units=hidden_dims[0], return_sequences=True))
model.add(layers.TimeDistributed(layers.Dropout(rate=.2)))

model.add(layers.SimpleRNN(units=hidden_dims[1]))
model.add(layers.Dropout(rate=.2))
model.add(layers.Dense(units=num_classes))

- ์ด์ „๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ mask_zero=True ์˜ต์…˜์„ ํ†ตํ•ด ์‹œํ€€์Šค ์ค‘ 0์œผ๋กœ padding๋œ ๋ถ€๋ถ„์„ ์—ฐ์‚ฐ์— ํฌํ•จํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.

  trainable=False ์˜ต์…˜์œผ๋กœ one-hot vector๋ฅผ trainingํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.

 

- return_sequences=True: ๋‘ ๋ฒˆ์งธ rnn์ด ํ•„์š”ํ•œ ํ˜•ํƒœ,

  ์ฆ‰ (data dimension, max sequences, input dimension)์˜ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค.

 

- TimeDistributed์™€ Dropout์„ ์ด์šฉํ•˜๋Š” ์ด์œ ๋Š”

  stacked RNN์ด shallow RNN์— ๋น„ํ•ด ๋ชจ๋ธ์˜ capacity๊ฐ€ ๋†’์€ ๊ตฌ์กฐ์ด๋ฏ€๋กœ overfitting๋  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๊ธฐ ๋•Œ๋ฌธ.

- ๋”ฐ๋ผ์„œ rnn์ด ๊ฐ ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•œ hidden states์— dropout์„ ๊ฑธ์–ด overfitting์„ ๋ฐฉ์ง€ํ•ด์ค€๋‹ค.

(๋‘ ๋ฒˆ์งธ layer์—์„œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€)

 

 

model.summary()

 

 

(4) Training model

# creating loss function
def loss_fn(model, x, y, training):
    return tf.reduce_mean(tf.keras.losses.sparse_categorical_crossentropy(
                        y_true=y, y_pred=model(x, training), from_logits=True))
                        
# creating an optimizer
lr = .01
epochs = 30
batch_size = 2
opt = tf.keras.optimizers.Adam(learning_rate=lr)

# generating data pipeline
tr_dataset = tf.data.Dataset.from_tensor_slices((x_data, y_data))
tr_dataset = tr_dataset.shuffle(buffer_size=4)
tr_dataset = tr_dataset.batch(batch_size=batch_size)

print(tr_dataset)

- ์šฐ๋ฆฌ๊ฐ€ ์„ค๊ณ„ํ•œ stacked RNN ๊ตฌ์กฐ๋Š” Dropout์„ ํ™œ์šฉํ•œ๋‹ค.

- Dropout์€ trainingํ•  ๋•Œ ํ™œ์šฉํ•˜๋˜ inference ๋‹จ๊ณ„์—์„œ๋Š” ํ™œ์šฉํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ

  ์ด๋ฅผ controlํ•˜๊ธฐ ์œ„ํ•ด loss function์— training argument๋ฅผ ๋‘์–ด ์ด๋ฅผ ์ปจํŠธ๋กคํ•œ๋‹ค.

 

 

# training
tr_loss_hist = []

for epoch in range(epochs):
    avg_tr_loss = 0
    tr_step = 0
    
    for x_mb, y_mb in tr_dataset:
        with tf.GradientTape() as tape:
            tr_loss = loss_fn(model, x=x_mb, y=y_mb, training=True)
        grads = tape.gradient(target=tr_loss, sources=model.variables)
        opt.apply_gradients(grads_and_vars=zip(grads, model.variables))
        avg_tr_loss += tr_loss
        tr_step += 1
    else:
        avg_tr_loss /= tr_step
        tr_loss_hist.append(avg_tr_loss)
    
    if (epoch + 1) % 5 == 0:
        print('epoch: {:3}, tr_loss: {:.3f}'.format(epoch+1, avg_tr_loss.numpy()))

 

 

 

(5) Checking performance

yhat = model.predict(x_data)
yhat = np.argmax(yhat, axis=-1)
print('acc: {:.2%}'.format(np.mean(yhat == y_data)))

plt.plot(tr_loss_hist)

 

 

 

Comments