Text Classification
sentence-transformers
Safetensors
Korean
bert
cross-encoder
veterinary
medical
korean
text-embeddings-inference
Instructions to use JOhyeongi/vet-kmbert-cross-encoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use JOhyeongi/vet-kmbert-cross-encoder with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("JOhyeongi/vet-kmbert-cross-encoder") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
๐ฅ Vet KM-BERT Cross-Encoder
์์ํ ๋๋ฉ์ธ์ ํนํ๋ ํ๊ตญ์ด Cross-Encoder ๋ชจ๋ธ์ ๋๋ค. RAG ์์คํ ์ Reranking ๋จ๊ณ์์ ์ฌ์ฉ๋ฉ๋๋ค.
๋ชจ๋ธ ์ ๋ณด
- Base Model: madatnlp/km-bert
- Task: Binary Classification (์ง๋ฌธ-๋ฌธ์ ์ฐ๊ด์ฑ ํ๋จ)
- Language: Korean (ํ๊ตญ์ด)
- Domain: Veterinary Medicine (์์ํ)
ํ์ต ๋ฐ์ดํฐ
- ๋ฐ์ดํฐ์
: ์์ํ ๋ฌธ์ 213๊ฐ (5๊ฐ ์ง๋ฃ๊ณผ)
- ๋ด๊ณผ, ์๊ณผ, ์ธ๊ณผ, ์น๊ณผ, ํผ๋ถ๊ณผ
- ์ง๋ฌธ ์: 600๊ฐ (ํ์ต 420๊ฐ, ํ๊ฐ 180๊ฐ)
- ํ๋ ์ด์ ๋ฐฉ๋ฒ: LLM Scoring + Graph Refinement (LightGCN)
์ฑ๋ฅ
| Metric | Score |
|---|---|
| Accuracy | ~68% |
| F1-Score | ~0.72 |
| Precision | ~0.71 |
| Recall | ~0.73 |
์ฌ์ฉ ๋ฐฉ๋ฒ
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# ๋ชจ๋ธ ๋ก๋
model_name = "JOhyeongi/vet-kmbert-cross-encoder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# ์ถ๋ก
query = "๊ฐ์์ง๊ฐ ๊ตฌํ ๋ฅผ ํด์."
document = "๊ฐ์์ง ๊ตฌํ ์ ์์ธ์ ๋ค์ํฉ๋๋ค..."
inputs = tokenizer(
[[query, document]],
padding=True,
truncation=True,
return_tensors="pt",
max_length=512
)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=1)
score = probs[0][1].item() # Relevance score
print(f"Relevance Score: {score:.4f}")
์ ์ฒด RAG ํ์ดํ๋ผ์ธ
์ด ๋ชจ๋ธ์ ๋ค์ ํ๋ก์ ํธ์ ์ผ๋ถ์ ๋๋ค:
- Repository: catholic_retreival
- Pipeline: Rationale Generation โ Retrieval โ Reranking โ Answer Generation
ํ์ต ์ค์
Epochs: 3
Batch Size: 8
Learning Rate: 2e-5
Max Length: 512
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
๋ผ์ด์ ์ค
MIT License
์ธ์ฉ
@misc{vet-kmbert-cross-encoder,
title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
author={Catholic University},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
}
์ฐ๋ฝ์ฒ
- GitHub: jasonhk24/catholic_retreival
- Issues: GitHub Issues
- Downloads last month
- 4
Model tree for JOhyeongi/vet-kmbert-cross-encoder
Base model
madatnlp/km-bert