|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: spacy |
|
|
tags: |
|
|
- named-entity-recognition |
|
|
- b2b |
|
|
- ecommerce |
|
|
- order-processing |
|
|
- product-extraction |
|
|
- spacy |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: b2b-ecommerce-ner |
|
|
results: |
|
|
- task: |
|
|
type: token-classification |
|
|
name: Named Entity Recognition |
|
|
dataset: |
|
|
type: custom |
|
|
name: B2B Ecommerce Orders |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.82 |
|
|
name: F1 Score |
|
|
- type: precision |
|
|
value: 0.82 |
|
|
name: Precision |
|
|
- type: recall |
|
|
value: 0.81 |
|
|
name: Recall |
|
|
--- |
|
|
|
|
|
# B2B Ecommerce NER Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a Named Entity Recognition (NER) model specifically trained for B2B ecommerce order processing. The model extracts structured information from retailer-to-manufacturer order text, enabling automated order capture and processing. |
|
|
|
|
|
## Supported Entities |
|
|
|
|
|
The model identifies the following entity types: |
|
|
|
|
|
- **PRODUCT**: Product names and descriptions (e.g., "Coca Cola", "Golden Dates", "Chocolate Cleanser") |
|
|
- **QUANTITY**: Order quantities (e.g., "5", "10", "twenty") |
|
|
- **SIZE**: Product sizes and measurements (e.g., "500ML", "250G", "1.25L") |
|
|
- **UNIT**: Units of measurement (e.g., "units", "bottles", "packs") |
|
|
|
|
|
## Features |
|
|
|
|
|
- **High Accuracy**: Achieves F1 score of 0.82 on B2B ecommerce order data |
|
|
- **Product Catalog Matching**: Includes fuzzy matching against a comprehensive product catalog |
|
|
- **Multi-language Support**: Handles mixed English/Hindi text common in Indian B2B commerce |
|
|
- **Real-world Patterns**: Trained on actual retailer order patterns and variations |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from huggingface_model.model import B2BEcommerceNER |
|
|
|
|
|
# Load the model |
|
|
model = B2BEcommerceNER.from_pretrained("path/to/model") |
|
|
|
|
|
# Extract entities from order text |
|
|
results = model.predict(["Order 5 bottles of Coca Cola 650ML"]) |
|
|
|
|
|
print(results[0]) |
|
|
# Output: { |
|
|
# 'text': 'Order 5 bottles of Coca Cola 650ML', |
|
|
# 'entities': { |
|
|
# 'products': [{'text': 'Coca Cola', 'label': 'PRODUCT', 'start': 19, 'end': 28}], |
|
|
# 'quantities': [{'text': '5', 'label': 'QUANTITY', 'start': 6, 'end': 7}], |
|
|
# 'sizes': [{'text': '650ML', 'label': 'SIZE', 'start': 29, 'end': 34}], |
|
|
# 'units': [{'text': 'bottles', 'label': 'UNIT', 'start': 8, 'end': 15}], |
|
|
# 'catalog_matches': [...] |
|
|
# } |
|
|
# } |
|
|
``` |
|
|
|
|
|
### Pipeline Usage |
|
|
|
|
|
```python |
|
|
from huggingface_model.model import pipeline |
|
|
|
|
|
# Create NER pipeline |
|
|
ner_pipeline = pipeline("ner", model="b2b-ecommerce-ner") |
|
|
|
|
|
# Process text |
|
|
entities = ner_pipeline("I need 10 packs of biscuits") |
|
|
``` |
|
|
|
|
|
### Batch Processing |
|
|
|
|
|
```python |
|
|
# Process multiple orders at once |
|
|
orders = [ |
|
|
"Order 5 Coke Zero 650ML", |
|
|
"Send 12 bottles of mango juice", |
|
|
"I need 3 units of Chocolate Cleanser 500ML" |
|
|
] |
|
|
|
|
|
results = model.predict(orders) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on a dataset of 500 B2B ecommerce orders containing: |
|
|
- Real retailer-to-manufacturer communications |
|
|
- Mixed English/Hindi text patterns |
|
|
- Various product categories (beverages, food items, personal care) |
|
|
- Different order formats and structures |
|
|
- 1,002 labeled entities across 4 entity types |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| F1 Score | 0.82 | |
|
|
| Precision | 0.82 | |
|
|
| Recall | 0.81 | |
|
|
|
|
|
The model shows strong performance across all entity types, with particularly good results on PRODUCT and QUANTITY recognition. |
|
|
|
|
|
## Product Catalog Integration |
|
|
|
|
|
The model includes a fuzzy matching system that can match extracted products against a catalog of 1,855+ products, providing: |
|
|
|
|
|
- **Brand Matching**: Match to specific brands (e.g., "Coca Cola", "Ziofit") |
|
|
- **Product Variants**: Find different sizes/variants of the same product |
|
|
- **Confidence Scores**: Numerical confidence for each match (0-100) |
|
|
- **SKU Mapping**: Direct mapping to product SKUs for order processing |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Performance may vary on product names not seen during training |
|
|
- Best results with English text; mixed language support is experimental |
|
|
- Requires product catalog file for fuzzy matching features |
|
|
- Based on spaCy framework, not transformer-based |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Framework**: spaCy 3.8+ |
|
|
- **Base Model**: en_core_web_sm |
|
|
- **Training**: Custom NER component with 50 iterations |
|
|
- **Entity Labels**: 4 custom entity types |
|
|
- **Input**: Raw text strings |
|
|
- **Output**: Structured entity information with optional catalog matching |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install spacy pandas fuzzywuzzy python-levenshtein |
|
|
python -m spacy download en_core_web_sm |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{b2b_ecommerce_ner_2025, |
|
|
title={B2B Ecommerce NER Model for Order Processing}, |
|
|
author={Your Name}, |
|
|
year={2025}, |
|
|
howpublished={Hugging Face Model Hub}, |
|
|
url={https://huggingface.co/your-username/b2b-ecommerce-ner} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the MIT License. |
|
|
|