Fine Tune LLM Model for Sentiment Analysis

Introduction

LLM models have been a hot topic in the machine learning field lately, and many job openings now seek developers who have experience fine-tuning LLM models. As a 7th-semester student preparing to apply for jobs, I want to share my experience and insights into the process of fine-tuning.

What we will talk about?

LLM

LLM stands for Large Language Model. It is a type of artificial intelligence trained on vast amounts of text data to understand and generate human-like language. LLMs, like GPT (which powers me), can perform various tasks, such as answering questions, writing text, translating languages, and more. They work by predicting the next word in a sequence based on the context provided. For more understanding about LLM, you could read more about LLM from this Medium post by Andreas Stöffelbauer

HuggingFace

Hugging Face is a Github-like platform for machine learning, providing an open-source ecosystem that fosters collaboration among researchers, developers, and data scientists. It hosts a wide array of pre-trained models, datasets, and tools, allowing users to easily access and share state-of-the-art models across various domains, including natural language processing (NLP), computer vision, and more.

DistilBERT

DistilBERT is a transformer-based model derived from BERT (Bidirectional Encoder Representations from Transformers) that has been ‘distilled’ to be smaller and more efficient, while still retaining much of BERT’s language understanding capabilities. According to the documentation, DistilBERT can be fine-tuned for a variety of tasks, making it a versatile choice for various NLP applications.

Sentiment Analysis

Sentiment Analysis is a process where a large language model (LLM) analyzes text to determine the emotional tone or sentiment behind it. The goal is to classify text as positive, negative, or neutral. LLMs use patterns in the text to understand emotions, opinions, or attitudes expressed, making it useful for tasks like social media monitoring, customer feedback analysis, and product reviews. To gain a better understanding of Sentiment Analysis on LLM, you can check out this blog.

CRISP-DM

In this blog post, we will walk through the steps of building a sentiment analysis model using DistilBERT, a popular transformer model. We will use the CRISP-DM framework to guide us through the process. CRISP-DM stands for Cross-Industry Standard Process for Data Mining, which provides a structured methodology for tackling data analysis tasks. We’ll be applying this approach to a Kaggle dataset of Amazon product reviews to classify sentiment into “Good Review” and “Bad Review.”

Business Understanding

The goal of this project is to classify Amazon product reviews into two categories: “Good Review” and “Bad Review”. Sentiment analysis is a powerful tool for understanding customer feedback and improving user experiences. By automating sentiment classification, businesses can quickly analyze large volumes of reviews and make informed decisions.

Data Understanding

The dataset used in this project contains Amazon product reviews that are compressed into .bz2 files that you can download it here. We will need to extract and preprocess these reviews for use in our sentiment analysis model.

def decompress_bz2(file_path, output_path):
    with bz2.open(file_path, 'rt', encoding='utf-8') as file:
        with open(output_path, 'w', encoding='utf-8') as out_file:
            out_file.write(file.read())

decompress_bz2('/kaggle/input/amazonreviews/train.ft.txt.bz2', 'train.ft.txt')
decompress_bz2('/kaggle/input/amazonreviews/test.ft.txt.bz2', 'test.ft.txt')

After decompression, we parse the data into a DataFrame and combine both the training and test datasets. The labels are mapped to binary values (0 for “Bad Review” and 1 for “Good Review”), and the title and text fields are concatenated into a single input_text field for better model accuracy.

# Here's how I map it
df['label'] = df['label'].apply(lambda x: 0 if x == 1 else 1)

Data Preparation

Given the limited memory available on Kaggle’s environment, we will only sample 5% of the dataset for training and evaluation.

df['input_text'] = df['title'] + " " + df['text']
df_prep = df[['input_text', 'label']]
sampled_df = df_prep.groupby('label', group_keys=False).apply(lambda x: x.sample(frac=0.05, random_state=42))
dataset = Dataset.from_pandas(sampled_df)

Here’s the example of the dataset

| input_text                                        | label |
|:--------------------------------------------------|:------|
| I HATE THIS CD.. And you should too...Avoid th... |     0 |
| Too cheap Bought 2 of these 75 to 300 ohm matc... |     0 |
| It had a good walkthrough but it lacked many s... |     0 |
| Didn't know what to think. I like Jim Carrey, ... |     0 |
| Valeo Olympic Spring Collars I should have lis... |     0 |
|                                               ... |   ... |
| Fun and magical read! What a magical book! My ... |     1 |
| Terrific This is a wonderful book!! I was givi... |     1 |
| fun toy I have had good luck with the batterie... |     1 |
| Info Love the Family Guy DVD's I enjoy watchin... |     1 |
| What happend? The only reason I gave this game... |     1 |

Next, we tokenize the text data using DistilBERT’s tokenizer. This prepares the dataset for model training by converting the text into a format that the model can understand.

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
def tokenize_function(examples):
    return tokenizer(examples['input_text'], padding='max_length', truncation=True, max_length=128)
tokenized_datasets = dataset.map(tokenize_function, batched=True)

⚠️

Don’t forget to split your dataset 😉

Modeling

We use DistilBERT, a lighter version of the BERT transformer model, for sequence classification. The model is fine-tuned on the sentiment analysis task, and the training process is set up using the Trainer class from the Hugging Face library.

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)
training_args = TrainingArguments(
    output_dir='/kaggle/working/results', 
    num_train_epochs=5.0, 
    ...
)
trainer = Trainer(model=model, args=training_args, compute_metrics=compute_metrics, train_dataset=train_test_split['train'], eval_dataset=train_test_split['test'])
trainer.train()

Evaluation

After training, we evaluate the model’s performance on the test dataset using various metrics such as accuracy, precision, recall, and F1-score.

results = trainer.evaluate()
print(results)

"""
{
    'eval_accuracy': 0.953925, 
    'eval_precision_macro': 0.9539209871607255, 
    'eval_recall_macro': 0.9539319939428168, 
    'eval_f1_macro': 0.9539242719746999, 
    'eval_loss': 0.15511418879032135, 
    'eval_runtime': 90.9442, 
    'eval_samples_per_second': 439.83, 
    'eval_steps_per_second': 6.872, 
    'epoch': 5.0
}
"""

The metrics show the overall effectiveness of the model. We save the model and tokenizer for future use, including deployment or further fine-tuning.

# Label mapping
label2id = {"Bad Review": 0, "Good Review": 1}
id2label = {0: "Bad Review", 1: "Good Review"}

model_ckpt = "kohendru/distilbert-amazon-sentiment-model"

# Define config
config = AutoConfig.from_pretrained(model_ckpt, label2id=label2id, id2label=id2label)

# Load model with config
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, config=config)

model.save_pretrained("/kaggle/working/distilbert-sentiment-model")
tokenizer.save_pretrained("/kaggle/working/distilbert-sentiment-model")

model.save_pretrained("/kaggle/working/distilbert-sentiment-model", safe_serialization=False) # If you want to save `pytorch_model.bin`

Deployment

Here’s how you can test the model by predicting the sentiment of a sample text.

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        logits = model(**inputs).logits
    prediction = torch.argmax(logits, dim=-1).item()
    return prediction

text = "I really disappointed about this product"
predicted_sentiment = predict_sentiment(text)
print(f"Predicted Sentiment: {predicted_sentiment}")

If you want to test it yourself, I uploaded my model on HuggingFace repository. Or you could try it from streamlit I made.

ℹ️

https://app-distilbert-base-uncased-amazon-sentiment-analysis-7f57t46w.streamlit.app/

Conclusion

Using the CRISP-DM methodology, we were able to build a sentiment analysis model that classifies Amazon reviews into “Good” and “Bad” categories. The model was successfully trained, evaluated, and deployed, and it provides a foundation for future improvements such as hyperparameter tuning or deployment to production systems.

This approach can be applied to other text classification tasks, and using pretrained models like DistilBERT can significantly speed up the process of building effective NLP models.

LSTM and GridSearchCV