cardiffnlp
/

roberta-base-tweet-topic-multi-2020

Text Classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

roberta-base-tweet-topic-multi-2020 / README.md

asahi417's picture

model update

de9a09f over 3 years ago

|

history blame contribute delete

3.13 kB

	---
	datasets:
	- cardiffnlp/tweet_topic_multi
	metrics:
	- f1
	- accuracy
	model-index:
	- name: cardiffnlp/roberta-base-tweet-topic-multi-2020
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	name: cardiffnlp/tweet_topic_multi
	type: cardiffnlp/tweet_topic_multi
	args: cardiffnlp/tweet_topic_multi
	split: test_2021
	metrics:
	- name: F1
	type: f1
	value: 0.7252289758534556
	- name: F1 (macro)
	type: f1_macro
	value: 0.5612608131902519
	- name: Accuracy
	type: accuracy
	value: 0.4991066110780226
	pipeline_tag: text-classification
	widget:
	- text: "I'm sure the {@Tampa Bay Lightning@} would’ve rather faced the Flyers but man does their experience versus the Blue Jackets this year and last help them a lot versus this Islanders team. Another meat grinder upcoming for the good guys"
	example_title: "Example 1"
	- text: "Love to take night time bike rides at the jersey shore. Seaside Heights boardwalk. Beautiful weather. Wishing everyone a safe Labor Day weekend in the US."
	example_title: "Example 2"
	---
	# cardiffnlp/roberta-base-tweet-topic-multi-2020

	This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the [tweet_topic_multi](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi). This model is fine-tuned on `train_2020` split and validated on `test_2021` split of tweet_topic.
	Fine-tuning script can be found [here](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi/blob/main/lm_finetuning.py). It achieves the following results on the test_2021 set:

	- F1 (micro): 0.7252289758534556
	- F1 (macro): 0.5612608131902519
	- Accuracy: 0.4991066110780226


	### Usage

	```python
	import math
	import torch
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	def sigmoid(x):
	return 1 / (1 + math.exp(-x))

	tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/roberta-base-tweet-topic-multi-2020")
	model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/roberta-base-tweet-topic-multi-2020", problem_type="multi_label_classification")
	model.eval()
	class_mapping = model.config.id2label

	with torch.no_grad():
	text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}}
	tokens = tokenizer(text, return_tensors='pt')
	output = model(**tokens)
	flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()]
	topic = [class_mapping[n] for n, i in enumerate(flags) if i]
	print(topic)
	```

	### Reference

	```

	@inproceedings{dimosthenis-etal-2022-twitter,
	title = "{T}witter {T}opic {C}lassification",
	author = "Antypas, Dimosthenis and
	Ushio, Asahi and
	Camacho-Collados, Jose and
	Neves, Leonardo and
	Silva, Vitor and
	Barbieri, Francesco",
	booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
	month = oct,
	year = "2022",
	address = "Gyeongju, Republic of Korea",
	publisher = "International Committee on Computational Linguistics"
	}

	```