🐦 Tweet Tone Classifier

A fine-tuned DistilBERT model for binary sentiment classification of tweets β€” predicts whether a tweet is Positive or Negative.

Part of a larger project that also rewrites tweets in different tones (formal, casual, empathetic, assertive) using the Gemini API.


πŸ“Š Model Details

Property Details
Base model distilbert-base-uncased
Task Binary Sentiment Classification
Dataset Sentiment140 (50,000 samples)
Training epochs 3
Batch size 32
Max token length 64
Accuracy ~87%
Language English

πŸš€ Quick Start

Installation

pip install transformers torch

Using the pipeline (easiest)

from transformers import pipeline

classifier = pipeline( "text-classification", model="KinSlay3rs/tweet-tone-classifier" )

result = classifier("I can't believe my flight got cancelled again!!") print(result)

[{'label': 'NEGATIVE', 'score': 0.97}]

result = classifier("Just got promoted!! Best day ever πŸŽ‰") print(result)

[{'label': 'POSITIVE', 'score': 0.98}]

Using the model directly

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("KinSlay3rs/tweet-tone-classifier") model = DistilBertForSequenceClassification.from_pretrained("KinSlay3rs/tweet-tone-classifier") model.eval()

LABELS = {0: "NEGATIVE", 1: "POSITIVE"}

def predict(tweet: str) -> str: inputs = tokenizer(tweet, return_tensors="pt", truncation=True, max_length=64) with torch.no_grad(): logits = model(**inputs).logits label = LABELS[logits.argmax().item()] score = torch.softmax(logits, dim=1).max().item() return f"{label} (confidence: {score:.2f})"

print(predict("This is the worst experience I've ever had."))

NEGATIVE (confidence: 0.96)

print(predict("Absolutely loving the new update!"))

POSITIVE (confidence: 0.94)


πŸ“ Dataset

Trained on a 50,000 sample subset of the Sentiment140 dataset, which contains 1.6 million tweets labelled as positive or negative.

Preprocessing applied:

  • Removed URLs (http://...)
  • Removed Twitter handles (@username)
  • Removed special characters
  • Truncated to 64 tokens

πŸ‹οΈ Training Details

from transformers import TrainingArguments

args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=32, eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, )

Hardware used: Intel i5 7th Gen CPU / Kaggle T4 GPU
Training time: ~25 minutes on GPU


⚠️ Limitations

  • Trained only on English tweets β€” may not generalize to other languages
  • Sarcasm and irony are often misclassified (a known challenge in sentiment analysis)
  • Trained on tweets from 2009 β€” modern slang and emojis may reduce accuracy
  • Only binary classification β€” does not detect neutral sentiment

πŸ”­ Future Work

  • Add neutral class (3-class classification)
  • Train on more recent tweet data
  • Add emoji-aware preprocessing
  • Multilingual support using xlm-roberta-base

πŸ“¦ Full Project

This model is part of the Tweet Tone Classifier & Rewriter project which includes:

  • βœ… Sentiment classification (this model)
  • βœ… Tone rewriting using Gemini API (formal / casual / empathetic / assertive)
  • βœ… Gradio web interface
  • βœ… Deployed on Hugging Face Spaces

πŸ”— GitHub: github.com/KinSlay3rS/GenAI-Projects/Sentement-Analysis-DistilBERT
πŸ”— Live Demo: huggingface.co/spaces/KinSlay3rs/tweet-tone-classifier


πŸ™‹ Author

Made by KinSlay3rs
πŸ”— Hugging Face Profile

Downloads last month
33
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train KinSlay3rs/tweet-tone-classifier

Space using KinSlay3rs/tweet-tone-classifier 1

Evaluation results