424. 트랜스포머

(Transformer : Attention is All You Need)

트랜스포머(Transformer)란?

기계 번역 모델
- 어텐션 메커니즘 극대화 → 번역 성능 향상
- 자연어처리 모델 SOTA(State-of-the-Art)의 기본 아이디어는 거의 모두 트랜스포머를 기반
컴퓨터 비전 쪽에서도 적용하려는 시도
멀티모달(Multi-Modal, 컴퓨터와 대화하는 환경) 분야에도 적용
트랜스포머는 RNN 기반 모델이 가진 구조적 단점을 해결
- RNN 기반 모델에서는 단어가 순차적으로 들어오기 때문에 처리해야 하는 시퀀스가 길수록 연산 시간이 길어짐
- 모든 토큰을 동시에 입력받아 병렬 처리하기 때문에 GPU 연산에 최적화
트랜스포머 구조
- 인코더 블록 6개, 디코더 블록 6개

Untitled

인코더 블록 하나 - 디코더 블록 하나
- 인코더 블록
  - 하부 층 2개 ▶️ [Multi-Head (Self) Attention, Feed Forward]
- 디코더 블록
  - 하부 층 3개 ▶️ [Masked Multi-Head (Self) Attention, Multi-Head (Encoder-Decoder) Attention, Feed Forward]

Untitled

위치 인코딩(Positional Encoding)

병렬화 → 모든 단어를 동시에 입력 → 단어 위치를 알 수 없음
- 이를 해결하기 위해 컴퓨터가 단어의 위치를 알 수 있도록 위치 정보를 담은 벡터를 따로 제공
- 수식
  - pos: 단어 위치
  - i: 단어의 차원
  - d_model: 모델의 차원

$$ \begin{aligned} \text{PE}{\text{pos},2i} &= \sin \bigg(\frac{\text{pos}}{10000^{2i/d{\text{model}}}}\bigg) \\ \text{PE}{\text{pos},2i+1} &= \cos \bigg(\frac{\text{pos}}{10000^{2i/d{\text{model}}}}\bigg) \end{aligned} $$

def get_angles(pos, i, d_model):
    """
    sin, cos 안에 들어갈 수치를 구하는 함수입니다.
    """
    angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
    return pos * angle_rates

def positional_encoding(position, d_model):
    """
    위치 인코딩(Positional Encoding)을 구하는 함수입니다.
    """
    angle_rads= get_angles(np.arange(position)[:, np.newaxis],
                          np.arange(d_model)[np.newaxis, :],
                          d_model)

# apply sin to even indices in the array; 2iangle_rads[:, 0::2]= np.sin(angle_rads[:, 0::2])

# apply cos to odd indices in the array; 2i+1angle_rads[:, 1::2]= np.cos(angle_rads[:, 1::2])

    pos_encoding= angle_rads[np.newaxis,...]

return tf.cast(pos_encoding, dtype=tf.float32)

트랜스포머(Transformer)란?

위치 인코딩(Positional Encoding)

Self-Attention(셀프 어텐션)