MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

424 views

MIT HAN Lab

3 weeks ago

MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention