Skip to content
Full Attention Vanilla Transformer with:
num_positions: 4096
num_layers: 8
embed_dim: 64
num_heads: 4
running at 10,5 GB