DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

2.9K views

PyTorch

Streamed 7 months ago

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

3.1 Readiness
19:32