Instructions to use BluePlanetAI/BPVELA-E560M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BluePlanetAI/BPVELA-E560M with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BluePlanetAI/BPVELA-E560M") sentences = [ "那是 個快樂的人", "那是 條快樂的狗", "那是 個非常幸福的人", "今天是晴天" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
BPVELA-E560M
BPVELA-E560M is the accuracy-first BPVELA release line for Traditional Chinese retrieval and embedding use cases.
繁體中文說明
BPVELA-E560M 是 BPVELA 目前的 accuracy-first 系列,針對繁體中文語意檢索、相似度比對與 retrieval-first RAG 場景做優化。
模型摘要
- 系列版本:
v1.0.0 - 基底模型:
intfloat/multilingual-e5-large - 釋出形式:LoRA adapter 加上 SentenceTransformer 組件
- 建議用途:semantic retrieval、retrieval-first RAG、similarity search
- 主要語言:Traditional Chinese / 繁體中文
重要說明
這個 repository 釋出的是 LoRA adapter,不是 merged full checkpoint。使用時需要以 base model 為底,再載入這個 adapter。
驗證摘要
- Taiwan-md pair benchmark:Spearman
0.8400、Pearson0.9224 - Wrapped retrieval smoke:pass rate
1.0000、retrieval hit rate1.0000、top-1 rate0.9667
Query / Passage 格式
這條模型線基於 E5,做檢索時建議保留標準前綴。
- Query:
query: 你的問題 - Passage:
passage: 文件內容
備註
bpvela_model_config.yaml保留了專案內部使用的載入設定。- 這個公開模型 repo 不需要包含 Taiwan-md corpus 或 FAISS index。
- 公開前請再確認最終 license。
授權說明
- Taiwan-MD 內容授權:
CC BY-SA 4.0 - BPVELA 專案程式碼授權:
MIT - 基底模型
intfloat/multilingual-e5-large:Hugging Face 標示為MIT - 本 repo 釋出的 adapter 權重與模型卡內容,建議以
CC BY-SA 4.0方式對外說明
本 repository 公開的是 BPVELA-E560M 的 LoRA adapter 權重、模型卡與相關說明文件,並不包含 intfloat/multilingual-e5-large 的完整基底模型權重。
BPVELA-E560M 的訓練與優化過程使用了 Taiwan-MD 內容;依目前資料來源條件,建議將本 adapter 權重與模型卡內容以 CC BY-SA 4.0 對外說明與散布。
任何再散布、修改版散布、或以本 adapter 為基礎的公開衍生釋出,建議:
- 保留原始出處與適當署名
- 清楚標示修改情形
- 以相同或相容的分享方式提供衍生內容
使用者在載入與使用本 adapter 時,仍需自行遵守上游基底模型 intfloat/multilingual-e5-large 的授權條件。
Summary
- Series version:
v1.0.0 - Base model:
intfloat/multilingual-e5-large - Release type: LoRA adapter plus SentenceTransformer modules
- Recommended usage: semantic retrieval, retrieval-first RAG, similarity search
- Language focus: Traditional Chinese
Important
This repository contains a LoRA adapter release, not a merged full checkpoint. Load it on top of the base model.
Validation Snapshot
- Taiwan-md pair benchmark: Spearman
0.8400, Pearson0.9224 - Wrapped retrieval smoke: pass rate
1.0000, retrieval hit rate1.0000, top-1 rate0.9667
Query And Passage Formatting
This line is based on E5. For retrieval, keep the standard E5 prefixes.
- Query:
query: your question - Passage:
passage: your document
Loading Example
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Normalize, Pooling, Transformer
from peft import PeftModel
base_model = "intfloat/multilingual-e5-large"
adapter_repo = "BluePlanetAI/BPVELA-E560M"
transformer = Transformer(base_model)
transformer.auto_model = PeftModel.from_pretrained(
transformer.auto_model,
adapter_repo,
is_trainable=False,
)
pooling = Pooling.load(adapter_repo, subfolder="1_Pooling")
normalize = Normalize.load(adapter_repo, subfolder="2_Normalize")
model = SentenceTransformer(modules=[transformer, pooling, normalize])
emb = model.encode(["query: 台灣颱風災害應變流程"], normalize_embeddings=True)
print(len(emb[0]))
Notes
bpvela_model_config.yamlis included as the project-side loading reference.- This public model repo does not need to include the Taiwan-md corpus or FAISS index.
- Release owner should finalize the public license before publishing.
License Notes
- Taiwan-MD content license:
CC BY-SA 4.0 - BPVELA project code license:
MIT - Base model
intfloat/multilingual-e5-large: marked asMITon Hugging Face - The adapter weights and model card content published in this repo are best documented as
CC BY-SA 4.0
This repository publishes the BPVELA-E560M LoRA adapter weights, model card, and related documentation only. It does not redistribute the full base-model weights of intfloat/multilingual-e5-large.
Because the training and optimization process uses Taiwan-MD content, the adapter release and model card are best documented for public distribution under CC BY-SA 4.0.
For redistribution, modified redistribution, or public derivative releases based on this adapter, users should:
- preserve attribution to the original release
- clearly indicate modifications
- keep the share-alike expectations for the released derivative materials
Use of this adapter remains subject to the applicable license terms of the upstream base model.
Model tree for BluePlanetAI/BPVELA-E560M
Base model
intfloat/multilingual-e5-large