Optimization
Speeding Up LLM Inference
Techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption: mixed-precision, Bfloat16, quantization, fine-tuning with adapters, continuous batching
Last updated on Aug 21, 2023
5 min read
LLM
,
Optimization