--- license: mit language: - en metrics: - accuracy - f1 base_model: - bilalzafar/CentralBank-BERT pipeline_tag: text-classification library_name: transformers tags: - finance - cbdc - central-bank - financial-nlp - economic-policy - monetary-policy - sentence-classification - text-classification - transformers - bert - discourse-analysis - policy-analysis - centralbank-bert - bis-speeches --- # CBDC-Discourse `CBDC-Discourse` is a **BERT-based sentence classifier** fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: **Feature, Risk-Benefit, and Process**. This model enables structured analysis of CBDC-related policy and research texts by separating **design attributes**, **evaluative outcomes**, and **procedural activities**. | Class | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **Feature** | A sentence that specifies a **concrete design element or operational mechanism** of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links). | | **Risk-Benefit** | A sentence that asserts or implies **outcomes, effects, or trade-offs** (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; “too successful” crowd-out; legal/regulatory fragility. | | **Process** | A sentence about **research, consultations, pilots, governance, timeline, or agenda-setting**, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., “medium-term goal”). | ## Base Model This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**. CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification. ## Dataset The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**. The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**: ## Intended Use This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies. Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**. ## Training Details * Tokenization: WordPiece (CentralBank-BERT tokenizer) * Maximum sequence length: 256 tokens * Dynamic padding (`DataCollatorWithPadding`) * Train/Val/Test split: 80/10/10 stratified by label | Parameter | Value | | ----------------------------- | --------------------------- | | Base model | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) | | Epochs | 6 | | Train batch size (per device) | 8 | | Eval batch size (per device) | 16 | | Gradient accumulation | 2 | | Effective batch size | 16 | | Learning rate | 2e-5 | | Weight decay | 0.01 | | Warmup ratio | 0.06 | | Scheduler | Cosine | | Mixed precision (fp16) | Enabled | * Environment: Google Colab * GPU: Tesla T4 (16GB) * Framework: PyTorch 2.8.0 + Hugging Face Transformers ## Evaluation Results | Split | Accuracy | Macro-F1 | Weighted-F1 | Class | Precision | Recall | F1 | | ---------- | --------- | --------- | ----------- | ---------------- | --------- | ------ | ----- | | Validation | **0.851** | **0.839** | **0.852** | – | – | – | – | | Test | **0.823** | **0.803** | **0.825** | **Feature** | 0.759 | 0.782 | 0.770 | | | | | | **Process** | 0.927 | 0.845 | 0.884 | | | | | | **Risk-Benefit** | 0.700 | 0.817 | 0.754 | --- ## Other CBDC Models This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication. | **Model** | **Purpose** | **Intended Use** | **Link** | | ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- | | **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) | | **bilalzafar/CBDC-BERT** | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT) | | **bilalzafar/CBDC-Stance** | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance) | | **bilalzafar/CBDC-Sentiment** | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment) | | **bilalzafar/CBDC-Type** | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type) | | **bilalzafar/CBDC-Discourse** | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse) | | **bilalzafar/CentralBank-NER** | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER) | ## Repository and Replication Package All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository: 🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)** --- ## How to Use ```python from transformers import pipeline # Load pipeline classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse") # Example sentences sentences = [ "The central bank launched a pilot project for CBDC cross-border settlement.", # Process "Programmability in CBDC allows conditional payments.", # Feature "CBDC may increase risks of bank disintermediation." # Risk-Benefit ] # Predict for s in sentences: result = classifier(s, return_all_scores=False)[0] print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n") # Example output # [{The central bank launched a pilot project for CBDC cross-border settlement. → Process (score=0.9989)}] # [{Programmability in CBDC allows conditional payments. → Feature (score=0.9991)}] # [{CBDC may increase risks of bank disintermediation. → Risk-Benefit (score=0.9986)}] ``` --- ## Citation If you use this model, please cite as: **Zafar, M. B. (2025). CentralBank-BERT: Machine learning evidence on central bank digital currency discourse. *Journal of Economics and Business.* [https://doi.org/10.1016/j.jeconbus.2026.106300](https://doi.org/10.1016/j.jeconbus.2026.106300)** ```bibtex @article{zafar2025centralbankbert, title={CentralBank-BERT: Machine learning evidence on central bank digital currency discourse}, author={Zafar, Muhammad Bilal}, year={2026}, journal={Journal of Economics and Business}, url={https://doi.org/10.1016/j.jeconbus.2026.106300} }