BPVELA-E560M

BPVELA-E560M is the accuracy-first BPVELA release line for Traditional Chinese retrieval and embedding use cases.

繁體中文說明

BPVELA-E560M 是 BPVELA 目前的 accuracy-first 系列,針對繁體中文語意檢索、相似度比對與 retrieval-first RAG 場景做優化。

模型摘要

  • 系列版本:v1.0.0
  • 基底模型:intfloat/multilingual-e5-large
  • 釋出形式:LoRA adapter 加上 SentenceTransformer 組件
  • 建議用途:semantic retrieval、retrieval-first RAG、similarity search
  • 主要語言:Traditional Chinese / 繁體中文

重要說明

這個 repository 釋出的是 LoRA adapter,不是 merged full checkpoint。使用時需要以 base model 為底,再載入這個 adapter。

驗證摘要

  • Taiwan-md pair benchmark:Spearman 0.8400、Pearson 0.9224
  • Wrapped retrieval smoke:pass rate 1.0000、retrieval hit rate 1.0000、top-1 rate 0.9667

Query / Passage 格式

這條模型線基於 E5,做檢索時建議保留標準前綴。

  • Query:query: 你的問題
  • Passage:passage: 文件內容

備註

  • bpvela_model_config.yaml 保留了專案內部使用的載入設定。
  • 這個公開模型 repo 不需要包含 Taiwan-md corpus 或 FAISS index。
  • 公開前請再確認最終 license。

授權說明

  • Taiwan-MD 內容授權:CC BY-SA 4.0
  • BPVELA 專案程式碼授權:MIT
  • 基底模型 intfloat/multilingual-e5-large:Hugging Face 標示為 MIT
  • 本 repo 釋出的 adapter 權重與模型卡內容,建議以 CC BY-SA 4.0 方式對外說明

本 repository 公開的是 BPVELA-E560M 的 LoRA adapter 權重、模型卡與相關說明文件,並不包含 intfloat/multilingual-e5-large 的完整基底模型權重。

BPVELA-E560M 的訓練與優化過程使用了 Taiwan-MD 內容;依目前資料來源條件,建議將本 adapter 權重與模型卡內容以 CC BY-SA 4.0 對外說明與散布。

任何再散布、修改版散布、或以本 adapter 為基礎的公開衍生釋出,建議:

  • 保留原始出處與適當署名
  • 清楚標示修改情形
  • 以相同或相容的分享方式提供衍生內容

使用者在載入與使用本 adapter 時,仍需自行遵守上游基底模型 intfloat/multilingual-e5-large 的授權條件。

Summary

  • Series version: v1.0.0
  • Base model: intfloat/multilingual-e5-large
  • Release type: LoRA adapter plus SentenceTransformer modules
  • Recommended usage: semantic retrieval, retrieval-first RAG, similarity search
  • Language focus: Traditional Chinese

Important

This repository contains a LoRA adapter release, not a merged full checkpoint. Load it on top of the base model.

Validation Snapshot

  • Taiwan-md pair benchmark: Spearman 0.8400, Pearson 0.9224
  • Wrapped retrieval smoke: pass rate 1.0000, retrieval hit rate 1.0000, top-1 rate 0.9667

Query And Passage Formatting

This line is based on E5. For retrieval, keep the standard E5 prefixes.

  • Query: query: your question
  • Passage: passage: your document

Loading Example

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Normalize, Pooling, Transformer
from peft import PeftModel

base_model = "intfloat/multilingual-e5-large"
adapter_repo = "BluePlanetAI/BPVELA-E560M"

transformer = Transformer(base_model)
transformer.auto_model = PeftModel.from_pretrained(
    transformer.auto_model,
    adapter_repo,
    is_trainable=False,
)

pooling = Pooling.load(adapter_repo, subfolder="1_Pooling")
normalize = Normalize.load(adapter_repo, subfolder="2_Normalize")

model = SentenceTransformer(modules=[transformer, pooling, normalize])

emb = model.encode(["query: 台灣颱風災害應變流程"], normalize_embeddings=True)
print(len(emb[0]))

Notes

  • bpvela_model_config.yaml is included as the project-side loading reference.
  • This public model repo does not need to include the Taiwan-md corpus or FAISS index.
  • Release owner should finalize the public license before publishing.

License Notes

  • Taiwan-MD content license: CC BY-SA 4.0
  • BPVELA project code license: MIT
  • Base model intfloat/multilingual-e5-large: marked as MIT on Hugging Face
  • The adapter weights and model card content published in this repo are best documented as CC BY-SA 4.0

This repository publishes the BPVELA-E560M LoRA adapter weights, model card, and related documentation only. It does not redistribute the full base-model weights of intfloat/multilingual-e5-large.

Because the training and optimization process uses Taiwan-MD content, the adapter release and model card are best documented for public distribution under CC BY-SA 4.0.

For redistribution, modified redistribution, or public derivative releases based on this adapter, users should:

  • preserve attribution to the original release
  • clearly indicate modifications
  • keep the share-alike expectations for the released derivative materials

Use of this adapter remains subject to the applicable license terms of the upstream base model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BluePlanetAI/BPVELA-E560M

Adapter
(3)
this model

Space using BluePlanetAI/BPVELA-E560M 1