SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper
• 2305.09781 • Published
• 4
Note https://github.com/NVIDIA/FasterTransformer readme states: FasterTransformer development has transitioned to TensorRT-LLM. All developers are encouraged to leverage TensorRT-LLM to get the latest improvements on LLM Inference. The NVIDIA/FasterTransformer repo will stay up, but will not have further development.