Distill loss in retrieval task?

by YWang17 - opened Oct 29, 2024

Oct 29, 2024

Hi, it is mentioned in the paper "To distill the score from reranker in retrieval tasks, we use the bge-reranker model as the teacher." Does it mean that you involve the distillation in this work? Could you please explain more about this reranker and distill loss? What's the metric if there is no distill?

cfli

Beijing Academy of Artificial Intelligence org Oct 30, 2024

Yes, during training, we used distillation with the teacher model being bge-reranker-v2.5-gemma2-lightweight. We utilized KL divergence loss and combined it with InfoNCE loss as the final loss.
Since distillation is definitely effective, we don't have metrics for the case without distillation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment