Malav Patel

Introduction

Below is a diagram of the transformer architecture as it appeared in the paper Attention Is All You Need. The diagram presents the encoder-decoder architecture and tracks the shape of tensors as they are passed through the network. Hover over parts of the diagram to learn more about their transformation.