#transformer

01

Attention

The original paper Attention is a mechanism that lets neural networks focus on specific parts of an input sequence.

Jun 22, 2026

Deep Learning

02

BERT

Most of the information is available in the BERT paper. Key details: Multi-head attention. Transformer encoder.

Jun 22, 2026

NLP

03

Transformer

The first Transformer was introduced in the Attention Is All You Need paper, soon after that BERT was published.

Jun 22, 2026

NLP

DSWoK — Data Science Well of Knowledge