RajveeSheth commited on
Commit
e1c50fb
·
verified ·
1 Parent(s): 3761c75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -45,8 +45,7 @@ Output: [{'मीराबाई': 'PROPN'}, {'चानू': 'PROPN'}, {'ने
45
  ```
46
  ## Training Details
47
  ### Training Data
48
- [COMI-LINGUA Dataset Card](https://huggingface.co/datasets/LingoIITGN/COMI-LINGUA): 125K+ instances (POS: 24,598 filtered usable after refinement). Sources: NDTV/ABP News, X/YouTube, politics (INC/BJP). 3× expert bilingual annotators (Fleiss’ Kappa = 0.817). Initial predictions from CodeSwitch NLP library; ~15% tokens corrected (63,002 / 427,941). Splits: Train (~19.6K), Test (5K). CMI ≈21.60 avg (higher mixing than other tasks). CC-BY-4.0.
49
-
50
  ### Training Procedure
51
  #### Preprocessing
52
  Tokenized with base tokenizer; instruction templates + few-shot examples. Filtered: ≥5 tokens, no hate/non-Hinglish, focused on code-mixed content.
 
45
  ```
46
  ## Training Details
47
  ### Training Data
48
+ [COMI-LINGUA Dataset Card](https://huggingface.co/datasets/LingoIITGN/COMI-LINGUA).
 
49
  ### Training Procedure
50
  #### Preprocessing
51
  Tokenized with base tokenizer; instruction templates + few-shot examples. Filtered: ≥5 tokens, no hate/non-Hinglish, focused on code-mixed content.