Day 22 (09.07)

LSTM & GRU

Untitled

Long therm dependency를 해결하기 위해 사용
i, f, o, g 벡터는 $c_{t-1}$을 적절히 변환하는데 사용
- forget gate의 경우는 이전 time stamp에서 오는 정보를 잊는 정도를 학습

Untitled

한번의 선형변환 만으로 $c_{t-1}$에 전해줄 정보를 만들기 어려운 경우에는, 더해주고자 하는 값보다 조금 더 큰 값으로 구성된 값을 $\tilde{c}$로 만들어 준후, 각 차원에서 특정 비율을 덜어내서 만든다는 의미
$c_t$는 기억해야하는 모든 정보를 담고 벡터, $h_t$는 현재 타임 스탬프에서 예측값을 내는 output layer의 입력으로 사용($c_t$에서 필요한 정보만 필터링)
예를 들어 따옴표가 열려 있따는 정보가 $c_t$에 담겨 있지만, 지금 당장 필요하지 않기 때문에 $h_t$를 계산할 때에는 필터링 될것

GRU

LSTM의 연산량을 메모리양을 줄인 경량화 모델

Untitled

input gate와 forget gate를 일원화

GRU의 역전파 알고리즘

Untitled

$W_{hh}$ 행렬을 지속적으로 곱해주는 연산이 아니라, 전 time stamp의 각 time stamp마다 다른 forget gate를 곱하고 필요로 하는 정보를 곱셈이 아닌 덧셈으로 원하는 정보를 만들기 때문에 기울기 소멸 문제나 폭발 문제가 생기지 않음.

필수 과제

RNN 계열 모델을 활용한 언어 모델

import math
import torch
import torch.nn as nn
import torch.nn.functional as F

class RNNModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5):
        super(RNNModel, self).__init__()
        self.ntoken = ntoken
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        if rnn_type in ['LSTM', 'GRU']:
            self.rnn = getattr(nn, rnn_type)(ninp, nhid, nlayers, dropout=dropout)
        else:
            try:
                nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
            except KeyError:
                raise ValueError( """An invalid option for `--model` was supplied,
                                 options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
            self.rnn = nn.RNN(ninp, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
        self.decoder = nn.Linear(nhid, ntoken)

        self.init_weights()

        self.rnn_type = rnn_type
        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        nn.init.uniform_(self.encoder.weight, -initrange, initrange)
        nn.init.zeros_(self.decoder.weight)
        nn.init.uniform_(self.decoder.weight, -initrange, initrange)

    def forward(self, input, hidden):
        if self.rnn_type == 'LSTM' :
          input_gate = 
        else : 
        ############################ ANSWER HERE ################################
        # TODO: forward 함수를 완성해주세요.
        #
        # Hint1: Dropout은 여러 곳에 적용될 수 있습니다. 
        #        예를 들어, encoding 이후와 decoding 전에 적용가능합니다.
        # Hint2: forward 함수는 input 을 encoding 한 후 model 에 통과시킵니다. 
        #        model 의 output 을 decoding 하여 F.log_softmax() 를 통과 시켜 반한홥니다.
        #########################################################################
        return F.log_softmax(decoded, dim=1), hidden

    def init_hidden(self, bsz):
        weight = next(self.parameters())
        if self.rnn_type == 'LSTM':
            return (weight.new_zeros(self.nlayers, bsz, self.nhid),
                    weight.new_zeros(self.nlayers, bsz, self.nhid))
        else:
            return weight.new_zeros(self.nlayers, bsz, self.nhid)

LSTM & GRU

GRU

GRU의 역전파 알고리즘

필수 과제

RNN 계열 모델을 활용한 언어 모델

필수과제 2~3