Flash Attention 2: Faster Attention with Better Parallelism and Work Partitioning

63K views

Data Science Gems

Updated 2 days ago

Flash Attention 2: Faster Attention with Better Parallelism and Work Partitioning

Flash Attention 2: Faster Attention with Better Parallelism and Work Partitioning