DSWoK — Data Science Well of Knowledge

#transformer

3 notes · co-occurs with 4 tags · last updated May 18, 2026

Co-tags#nlp3#architecture3#attention2#algorithm1
Notes tagged #transformer
01
Attention
The original paper Attention is a mechanism that lets neural networks focus on specific parts of an input sequence.
May 18, 2026
Deep Learning
02
BERT
Most of the information is available in the BERT paper. Key details: Multi-head attention. Transformer encoder.
May 18, 2026
NLP
03
Transformer
The first Transformer was introduced in the Attention Is All You Need paper, soon after that BERT was published.
May 18, 2026
NLP

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community