测试效果bad case

#40

by jwww123 - opened Nov 25, 2024

•

query: ['动脉瘤是什么']
documents: ['腘窝囊肿是什么', '动脉瘤是什么?']
score: [[83.64187622070312, 48.1422119140625]]
为啥「腘窝囊肿是什么」的匹配度更高呢？

query: ['动脉瘤是什么?']
documents: ['腘窝囊肿是什么', '动脉瘤是什么?']
[[39.49958801269531, 99.9999771118164]]
query加了个问号后结果就正常了

使用的就是例子的代码，是使用姿势不对还是需要微调呢？

发现是后缀相同的文本，相似度会高很多，gte-Qwen2-1.5B-instruct也有同样的问题

Alibaba-NLP org Jan 16, 2025

请给一下完整计算相似度的脚

zyznull changed discussion status to closed Jan 16, 2025

zyznull changed discussion status to open Jan 16, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment