Transformer Gallery

  • Architected and implemented core Transformer architectures—including Transformer, Transformer-XL, Longformer, and Block-Recurrent Transformer—mirroring foundations of large-scale language models.
  • Bootstrapped the codebase and personally wrote 80 % of the implementation, ensuring modularity for easy extension and experimentation.
  • Validated model correctness through benchmarked language modeling tasks and attention‐visualization tools.