Meno-Lite-0.1

A 7B language model built to read, not to memorize.

💡 TL;DR

  • 🎯 Focus: RAG, document QA, information extraction, knowledge graph construction, summarization
  • 🧠 Core idea: train language skills (comprehension, extraction, reasoning), not factual memorization — knowledge comes from context
  • 🏆 Results: top-performing 7B model on MultiQ (multi-hop QA); #1 on NEREL-bench (knowledge graph construction), outperforming models up to 32B; near-perfect passkey retrieval up to 128k tokens
  • 🇷🇺 Languages: Russian (primary) + English
  • Tokenizer: 3.77 chars/token on Russian — 47% more efficient than vanilla Qwen2.5
  • 🖥️ Deployment: fits on a single consumer GPU; works with vLLM and transformers out of the box
  • 📜 License: Apache 2.0

Use when: you have documents and need to extract information, answer questions over them, or build a knowledge graph. Don't use when: you need a general-purpose chatbot with broad world knowledge and no retrieval pipeline.

🧠 Key idea

Why "Meno"? The name alludes to Plato's dialogue Meno, where Socrates argues that knowledge is not learned but recollected from within (ἀνάμνησις). We invert this metaphor: rather than assuming knowledge is already inside the model, we externalize it into a retrieval corpus and let the model "recollect" through a RAG pipeline. Like Socrates' interlocutor, the model doesn't carry the answers within itself — but given the right context, it can arrive at them. This is why Meno-Lite's training focuses on sharpening the skills that make such recollection possible: comprehension, extraction, inference, and generation.

We hypothesize that the capabilities of LLMs can be roughly decomposed into world knowledge (facts, dates, entities) and language skills (comprehension, extraction, inference, generation). While world knowledge demands ever more parameters, language skills appear to reach a usable plateau even in 7B-class models — provided they are deliberately cultivated. Meno-Lite-0.1 is an empirical test of this idea: by investing training compute into language skills rather than factual recall, we aim for a model that performs competitively on context-grounded tasks while remaining deployable on a single consumer GPU. The upcoming technical report will examine where this trade-off holds and where it breaks down.

🧬 Model Lineage

Meno-Lite-0.1 is derived from RuadaptQwen2.5-7B-Lite-Beta through a carefully designed two-stage training pipeline (continued pretraining → supervised fine-tuning) that sharpens the model's ability to work with documents rather than from parametric memory. The full lineage is:

Qwen/Qwen2.5-7B-Instruct
  └─► t-tech/T-lite-it-1.0
        └─► RefalMachine/RuadaptQwen2.5-7B-Lite-Beta
              └─► bond005/Meno-Lite-0.1   ◄── you are here

Each ancestor added a layer of Russian-language adaptation; Meno-Lite-0.1 adds a final layer of skill-oriented training focused on information extraction, entity normalization, multi-hop reasoning over long contexts, and instruction following for RAG scenarios. Although the model is primarily oriented toward Russian, it retains strong English performance thanks to bilingual pretraining data (sampled FineWeb-Edu) and English-language SFT examples (MultiHopRAG, MTRAGEval).

  • Developed by: Ivan Bondarenko, Novosibirsk State University (NSU)
  • Model type: Causal decoder-only transformer (Qwen2.5 architecture)
  • Parameters: ~7B
  • Language(s): Russian (primary), English (retained)
  • License: Apache 2.0
  • Base model: RefalMachine/RuadaptQwen2.5-7B-Lite-Beta

⚡ Tokenizer Efficiency

An often-overlooked determinant of real-world throughput is tokenizer efficiency: the more characters each token covers, the fewer autoregressive steps are needed to generate text of a given length. Meno-Lite-0.1 inherits the extended tokenizer from RuadaptQwen2.5-7B-Lite-Beta, which dramatically improves Russian-language efficiency compared to the original Qwen2.5 vocabulary.

Model Chars/token (RU) Chars/token (EN)
Meno-Lite-0.1 3.77 4.13
RuadaptQwen2.5-7B-Lite-Beta 3.77 4.13
AvitoTech/avibe (8B) 3.79 4.06
t-tech/T-lite-it-2.1 (7B) 3.74 4.14
t-tech/T-lite-it-1.0 (7B) 2.57 4.14
Qwen/Qwen2.5-7B-Instruct 2.57 4.14
GigaChat3-10B-A1.8B 3.74 3.99

Meno-Lite-0.1 achieves 3.77 characters per token on Russian text — a 47% improvement over the original Qwen2.5 tokenizer (2.57 chars/token). This translates directly into faster inference and lower serving costs for Russian-language workloads, while English efficiency remains on par with the best models in the class.

📊 Evaluation

Note: A more detailed analysis of Meno-Lite-0.1's performance will be provided in an upcoming technical report.

MERA Benchmark

https://mera.a-ai.ru/ru/text/leaderboard

MERA is the most comprehensive benchmark for evaluating Russian LLMs on "strong AI" tasks. The benchmark comprises 23 tasks covering world knowledge, logic, causality, and AI ethics. Below we present results on 5 selected tasks chosen for their relevance to RAG and document processing scenarios:

  • MultiQ: Multi-hop question answering over multi-document contexts — directly measures core RAG capability
  • RWSD: Coreference resolution (Winograd Schema) — tests discourse understanding
  • RCB: Natural language inference with causality detection — evaluates reasoning over text
  • CheGeKa: World knowledge QA — included for comparison to show the model's intentional design trade-off
  • ruWorldTree: Elementary science facts — tests knowledge vs. reasoning balance

However, the Overall Score column reflects performance across all 23 MERA tasks, not just the 5 shown here.

Model Size Overall Score MultiQ RWSD RCB CheGeKa ruWorldTree
GPT-4o - 0.642 0.572 / 0.431 0.496 0.557 / 0.521 0.553 / 0.464 0.985 / 0.985
Meno-Lite-0.1 7B 0.555 0.536 / 0.403 0.569 0.541 / 0.458 0.346 / 0.293 0.949 / 0.760
T-lite-it-1.0 7B 0.552 0.523 / 0.398 0.535 0.571 / 0.533 0.502 / 0.413 0.964 / 0.964
AvitoTech/avibe 8B 0.618 0.539 / 0.410 0.565 0.582 / 0.547 0.168 / 0.12 0.968 / 0.968
RuadaptQwen2.5-7B-Lite-Beta 7B 0.536 0.479 / 0.342 0.465 0.553 / 0.458 0.379 / 0.308 0.960 / 0.960
Qwen2.5-7B-Instruct 7B 0.482 0.425 / 0.296 0.515 0.562 / 0.493 0.077 / 0.048 0.939 / 0.939

Key observations:

  • Meno-Lite-0.1 achieves solid results within its size class (7B parameters), improving notably over its direct ancestors (RuadaptQwen2.5-7B-Lite-Beta and Qwen2.5-7B-Instruct)
  • The model shows competitive performance on MultiQ (multi-hop question answering), which is particularly relevant for RAG pipelines
  • As expected by design, world knowledge tasks (CheGeKa) are not the model's strength — this is an intentional trade-off for better context-grounded performance

NEREL-bench: Knowledge Graph Construction

https://huggingface.co/datasets/bond005/NEREL_bench

NEREL-bench evaluates LLM capabilities for knowledge graph construction: named entity recognition, relation extraction, and contextual definition generation. These tasks are critical for GraphRAG and knowledge-intensive applications.

Note: NEREL-bench was developed by the author of this model. To prevent data leakage between the SFT training set (NEREL-instruct) and the evaluation set (NEREL-bench), we followed the original train/dev/test split defined in the NEREL paper (Loukachevitch et al., 2021). Only the training portion of NEREL was used to construct SFT instructions; the test portion was held out and used exclusively for evaluation in NEREL-bench. However, because Meno-Lite-0.1 was exposed to the NEREL annotation schema and text domain during SFT, it may have a distributional advantage over models that were not. We encourage independent replication on other IE benchmarks.

Model Size RuEntityRecognition (F1) RuEntityDefinition (chrF++) RuRelationExtraction (F1) RuRelationDefinition (chrF++) Harmonic Mean
Meno-Lite-0.1 7B 0.5043 0.5273 0.3469 0.5582 0.4676
Qwen2.5-32B-Instruct 32B 0.5361 0.5275 0.2393 0.5993 0.4163
gemma-3-12b-it 12B 0.5136 0.4955 0.2450 0.5649 0.4075
gemma-3-27b-it 27B 0.5436 0.4818 0.2243 0.5827 0.3964
Qwen2.5-14B-Instruct 14B 0.5096 0.5182 0.2222 0.5829 0.3957
AvitoTech/avibe 8B 0.4683 0.4351 0.2207 0.3971 0.3483
T-lite-it-1.0 7B 0.4660 0.4644 0.1741 0.5329 0.3356
T-lite-it-2.1 8B 0.4889 0.3933 0.1308 0.5469 0.2845
Qwen2.5-7B-Instruct 7B 0.4770 0.4790 0.1919 0.5411 0.3558
RuadaptQwen2.5-7B-Lite-Beta 7B 0.4208 0.3925 0.1215 0.5041 0.2642

Meno-Lite-0.1 achieves the highest harmonic mean score, outperforming models 2–4× larger on knowledge graph construction tasks. Keeping in mind the distributional advantage noted above, this result suggests that the model is well-suited for:

  • GraphRAG pipelines
  • Automated knowledge base construction
  • Document analysis and entity extraction
  • Building structured representations from unstructured text

LIBRA: Long-Context Understanding

https://huggingface.co/datasets/ai-forever/LIBRA

LIBRA evaluates long-context understanding across tasks ranging from 4k to 128k tokens.

Simple Information Retrieval (Passkey)

Model 4k 8k 16k 32k 64k 128k
Meno-Lite-0.1 1.0 1.0 1.0 1.0 1.0 0.98
T-lite-it-2.1 1.0 1.0 1.0 1.0 1.0 0.895
AvitoTech/avibe 1.0 1.0 1.0 1.0 1.0 0.895
RefalMachine/RuadaptQwen2.5-7B-Lite-Beta 1.0 1.0 1.0 1.0 1.0 0.98
T-lite-it-1.0 1.0 1.0 1.0 1.0 1.0 0.58
Qwen2.5-7B-Instruct 1.0 1.0 1.0 1.0 1.0 0.58

Meno-Lite-0.1 maintains near-perfect passkey retrieval performance across all context lengths, including 128k tokens.

Multi-hop Question Answering

Scores are reported as a range from shortest to longest context (4k → 128k).

Model LibrusecMHQA (8k) ruBABILongQA1 (4k→128k) ruBABILongQA4 (4k→128k) ruBABILongQA5 (4k→128k)
Meno-Lite-0.1 0.484 0.72 → 0.36 0.56 → 0.22 0.80 → 0.54
Qwen2.5-14B-Instruct 0.484 0.90 → 0.38 0.66 → 0.15 0.86 → 0.64
T-lite-it-2.1 0.453 0.77 → 0.44 0.60 → 0.27 0.79 → 0.69
T-lite-it-1.0 0.456 0.74 → 0.34 0.56 → 0.15 0.81 → 0.54
RefalMachine/RuadaptQwen2.5-7B-Lite-Beta 0.432 0.74 → 0.29 0.59 → 0.22 0.79 → 0.49
Qwen2.5-7B-Instruct 0.419 0.65 → 0.48 0.62 → 0.08 0.81 → 0.69

Meno-Lite-0.1 shows competitive multi-hop reasoning at shorter contexts, matching Qwen2.5-14B-Instruct on LibrusecMHQA. Performance degrades at very long contexts, which is consistent with other models in this size class.

LLM Tool Calling Benchmark (BFCL Russian)

https://github.com/MKreGGo/ru_tool_calling_tests

For completeness, we include tool-calling results, although function calling is not a target capability of Meno-Lite-0.1.

Model Overall Success Rate
T-lite-it-2.1 84.5%
Qwen3-8B (thinking mode) 80.3%
Qwen2.5-7B-Instruct 76.1%
AvitoTech/avibe 69.2%
Meno-Lite-0.1 58.9%
RefalMachine/RuadaptQwen2.5-7B-Lite-Beta 58.5%
T-lite-it-1.0 2.9%

Function-calling performance is moderate. Meno-Lite-0.1 significantly outperforms its direct predecessor (T-lite-it-1.0) and matches its immediate ancestor (RuadaptQwen2.5-7B-Lite-Beta), but lags behind models with dedicated tool-calling training.

👨‍💻 Usage

How to Get Started with the Model

1. RAG Question Answering

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bond005/meno-lite-0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

SYSTEM_PROMPT = "Вы — полезный ассистент. Отвечайте на вопросы, опираясь на предоставленный контекст."

CHUNKS = [
    "Новосибирский государственный университет (НГУ) был основан в 1959 году в Академгородке.",
    "12 сентября 1959 года был успешно осуществлён запуск автоматической межпланетной станции «Луна-2». "
    "14 сентября 1959 года станция «Луна-2» впервые в мире достигла поверхности Луны в районе Моря Дождей "
    "вблизи кратеров Аристилл, Архимед и Автолик.",
    "Московский государственный университет имени М. В. Ломоносова (МГУ) был основан в 1755 году. "
    "Изначально университет располагался в здании Главной аптеки (бывший Земский приказ) на месте "
    "Государственного исторического музея на Красной площади.",
]
CONTEXT = "\n\n".join([f"Контекст {idx + 1}:\n```text\n{val}\n```" for idx, val in enumerate(CHUNKS)])

question = "Какой университет был основан в том же году, когда впервые в истории рукотворный аппарат достиг поверхности Луны?"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": CONTEXT + "\n\nВопрос: " + question + "\n"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

Expected output:

Новосибирский государственный университет (НГУ) был основан в том же году, когда впервые в истории рукотворный аппарат достиг поверхности Луны.

2. Multi-hop Reasoning

This example reuses the same model, tokenizer, and context from the previous snippet.

question = "Через сколько лет после университета в Москве был основан университет в Новосибирске?"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": CONTEXT + "\n\nВопрос: " + question + "\n"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

Expected output:

Университет в Новосибирске был основан через 204 года после Московского государственного университета.

3. Few-Shot Named Entity Recognition

This example reuses the same model and tokenizer from the first snippet.

import json

few_shot_messages = [
    {
        "role": "system",
        "content": "Вы - эксперт в области анализа текстов и извлечения семантической информации из них."
    },
    {
        "role": "user",
        "content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста "
                   "и запишите ответ в JSON-формате.\n\n"
                   "Входной текст:\n\n```text\n"
                   "Научный сотрудник лаборатории прикладных цифровых технологий Международного "
                   "научно-образовательного математического центра НГУ Иван Бондаренко рассказал "
                   "о грантовой программе и о том, как его проект RAGU попал в число победителей.\n```\n"
    },
    {
        "role": "assistant",
        "content": '{"ORGANIZATION": ["лаборатория прикладных цифровых технологий Международного '
                   'научно-образовательного математического центра НГУ", '
                   '"Международный научно-образовательный математический центр НГУ", "НГУ"], '
                   '"PERSON": ["Иван Бондаренко"], "LOCATION": []}'
    },
    {
        "role": "user",
        "content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста "
                   "и запишите ответ в JSON-формате.\n\n"
                   "Входной текст:\n\n```text\n"
                   "Национальный исследовательский университет «Высшая школа экономики» (НИУ ВШЭ) представил "
                   "результаты 15-го мониторинга качества приема на бюджетные и платные места российских вузов "
                   "в 2025 году. В группе лидеров 10 московских университетов, три питерских и по одному "
                   "представителю из таких регионов, как Татарстан (Иннополис), Нижний Новгород "
                   "и Новосибирск (НГУ).\n```\n"
    },
    {
        "role": "assistant",
        "content": '{"ORGANIZATION": ["Национальный исследовательский университет «Высшая школа экономики»", '
                   '"НИУ ВШЭ", "НГУ"], "PERSON": [], "LOCATION": ["московский", "питерский", "Татарстан", '
                   '"Иннополис", "Нижний Новгород", "Новосибирск"]}'
    },
    {
        "role": "user",
        "content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста "
                   "и запишите ответ в JSON-формате.\n\n"
                   "Входной текст:\n\n```text\n"
                   "Почему китайская ИИ-модель DeepSeek гораздо эффективнее и дешевле западных аналогов?\n```\n"
    },
    {
        "role": "assistant",
        "content": '{"ORGANIZATION": [], "PERSON": [], "LOCATION": ["китайская", "западный"]}'
    }
]

input_text = (
    "Станислав Владимирович Дробышевский – российский антрополог, кандидат биологических наук, "
    "доцент кафедры антропологии биологического факультета МГУ им. М.В. Ломоносова, научный редактор "
    "портала \u201cАнтропогенез.ру\u201d и, без сомнения, одна из самых ярких и узнаваемых фигур "
    "в российской науке."
)

text = tokenizer.apply_chat_template(
    few_shot_messages + [{"role": "user", "content": input_text}],
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
response = json.loads(
    tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
)
print(json.dumps(response, ensure_ascii=False, indent=4))

Expected output:

{
    "ORGANIZATION": [
        "биологический факультет МГУ им. М.В. Ломоносова",
        "МГУ им. М.В. Ломоносова"
    ],
    "PERSON": [
        "Станислав Владимирович Дробышевский"
    ],
    "LOCATION": [
        "российская"
    ]
}

Using vLLM for high-throughput serving

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "bond005/meno-lite-0.1"

tok = AutoTokenizer.from_pretrained(model_name)
llm = LLM(
    model=model_name,
    dtype="bfloat16",
    max_model_len=32768,
    gpu_memory_utilization=0.85
)
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

messages = [
    {
        "role": "system",
        "content": "Вы — Менон, разработанный в Новосибирском государственном университете. Вы — полезный помощник."
    },
    {
        "role": "user",
        "content": "Привет! Расскажи о себе."
    }
]
input_text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([input_text], sampling_params)
print(outputs[0].outputs[0].text)

As a result, you will see text similar to the following:

Привет! Меня зовут Менон, и я — виртуальный помощник, созданный в Новосибирском государственном университете. Я здесь, чтобы помочь вам с различными вопросами и задачами.

Additional Examples of Usage for NER and Relation Extraction

Important Note on Few-Shot Prompting

Using few-shot prompting (in-context learning) significantly improves Meno-Lite-0.1's performance on NER and relation extraction tasks and is therefore strongly recommended. The examples below demonstrate this approach.

The set of entity and relation classes is not limited to those shown in the examples — you can define any classes relevant to your domain. For more detailed examples covering diverse entity and relation types, see the NEREL-bench dataset card.

Named Entity Recognition

https://colab.research.google.com/drive/1onh4ovG7iGEjr3SX_IeZ-ZmdO6VpOWBR?usp=sharing

Relation Extraction

https://colab.research.google.com/drive/1P5qQyjrv811jAKGqVZ4oSVO5M5WHsSzv?usp=sharing

📚 Training Details

Continued Pretraining (CPT) data

1.3B tokens

Source Language Description
FineWeb-Edu (randomly sampled) EN High-quality educational web text
RuLM subset filtered by quality RU Russian web text selected for maximal FineWeb-Edu similarity using gte-multilingual-base embeddings
RU FinePDFs-edu RU Educational PDF documents in Russian
RuREBus (Dialogue'20) RU Unlabeled text corpus from the RuREBus shared task

Supervised Fine-Tuning (SFT) data

50M tokens

Source Language Description
NEREL-instruct → instructions RU Named entity recognition corpus converted to instruction format, plus LLM–generated and validated synthetic entity normalization and definitions
LightRAG query logs RU GPT-4o-generated queries over Habr articles and the NSU website
MultiHopRAG EN Multi-hop question answering training dialogs
MTRAGEval EN Multi-turn RAG evaluation training dialogs
Additional custom instructions RU Manually created samples for self-cognition and alignment

Training Procedure

Stage 1 — Continued Pretraining (CPT): The model was further pretrained on a balanced mix of Russian and English educational, legal, and scientific-technical texts. The Russian subset was specifically selected to match the quality distribution of FineWeb-Edu, ensuring that the model absorbs high-quality linguistic patterns rather than noisy web crawls.

Stage 2 — Supervised Fine-Tuning (SFT): The SFT stage used a custom instruction set designed to reinforce extraction, normalization, summarization, and multi-hop QA capabilities. The critical distinction from conventional SFT: our instructions teach models to use context rather than to recall facts.

⚠️ Bias, Risks, and Limitations

  • Hallucination risk: Like all autoregressive LLMs, Meno-Lite-0.1 can generate plausible-sounding but factually incorrect text, especially when relevant context is not provided in the prompt.
  • World knowledge gaps: The model deliberately trades factual recall capacity for context-grounded skills. It should not be used as a standalone knowledge base.
  • Language coverage: While the model retains good English capabilities, it has been primarily validated on Russian and English. Performance on other languages supported by the Qwen2.5 backbone is untested.
  • Training data biases: The model inherits biases present in its pretraining corpora (FineWeb-Edu, RuLM, Habr) and in the GPT-4o/GPT-4o-mini generations used for synthetic SFT data.
  • Context window: Although the model handles contexts up to 128K tokens in passkey tasks, complex reasoning performance degrades at very long contexts (>32K), consistent with other models in this size class.

🎯 Recommendations

Best suited for:

  • RAG pipelines — document QA, retrieval-augmented generation
  • Information extraction — named entity recognition, relation extraction
  • Knowledge graph construction — GraphRAG, automated KB building
  • Document processing — summarization, analysis of legal/technical documents
  • Structured data extraction — converting unstructured text to structured formats

Not recommended for:

  • General-purpose conversational AI without context grounding
  • Tasks requiring extensive world knowledge not provided in context
  • Complex mathematical reasoning
  • Production code generation

📖 Citation

If you use Meno-Lite-0.1 in your research, please cite:

BibTeX:

@misc{bondarenko2026menolite,
  title={Meno-Lite-0.1: A 7B Language Model Optimized for Russian RAG Pipelines},
  author={Ivan Bondarenko},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/bond005/meno-lite-0.1}
}

📜 License

This model is released under the Apache 2.0 license.

🙏 Acknowledgments

This model was developed at Novosibirsk State University. Special thanks to:

  • The Qwen team for the base Qwen2.5 architecture
  • The T-Tech team for T-lite-it-1.0
  • Mikhail Tikhomirov and his colleagues for RuadaptQwen2.5-7B-Lite-Beta
  • Natalia Loukachevitch and her colleagues for NEREL
  • The creators of MERA, LIBRA, and LLM Tool Calling benchmarks
Downloads last month
153
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bond005/meno-lite-0.1

Base model

Qwen/Qwen2.5-7B
Finetuned
(4)
this model