Skip to content

This repository contains code for training and evaluating Encoder-Only/Decoder-Only/Encoder-Decoder/Enseble models using the Stanford IMDb dataset.

License

Notifications You must be signed in to change notification settings

wakaflocka17/llm-architecture-comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

105 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 SENTIMENT ANALYSIS 2024/25 - UNICA

Apache License 2.0 Hosted on Hugging Face CC BY 4.0 License

sentiment

Project on Binary Sentiment Analysis using Pretrained, Fine-tuned and Ensemble Transformer Models.


πŸ“‘ Summary

  1. πŸ§‘πŸ»β€πŸŽ“ Student
  2. πŸ“Œ Description
  3. πŸ“„ Notebooks Overview
  4. πŸ“ Project Structure
  5. πŸ” Access to Hugging Face Models
  6. πŸš€ Installation
  7. πŸ§ͺ Run: Model Training & Evaluation
  8. πŸ“Š Metrics and Outputs
  9. πŸ–₯️ Hardware and Limitations
  10. 🀝 Contributions
  11. πŸ“ Licenses
  12. ❓ How to Cite

1. πŸ§‘πŸ»β€πŸŽ“ Student

πŸ‘€ Francesco Congiu


2. πŸ“Œ Description

This project investigates the impact of fine-tuning transformer-based models on the Sentiment Analysis task using the IMDb dataset.
Three architectures are explored:

  1. Decoder-Only: GPT-Neo
  2. Encoder-Only: BERT
  3. Encoder-Decoder: BART

Additionally, we evaluate the performance of an ensemble strategy via majority voting.
Both pretrained and fine-tuned versions are evaluated to compare generalization capabilities.


3. πŸ“„ Notebooks Overview

Note

Each notebook is self-contained and was provided for reproducibility.

Below a quick overview of each file:

Notebook Purpose
train_models_from_scratch.ipynb Fine-tune each model and evaluate them individually
ensemble_model_evaluation.ipynb Run ensemble predictions with majority voting
plot_results.ipynb Plots the metrics results from .json files
test_models.ipynb Download fine-tuned models from HF, run individual inference, summary table and ensemble majority-vote

4. πŸ“ Project Structure

πŸ“¦ sentiment-analysis-transformers/
β”œβ”€β”€ πŸ“ data/                          # (optional: IMDb dataset if local)
β”œβ”€β”€ πŸ“ experiments/
β”‚   β”œβ”€β”€ πŸ“ plots/                     # Graphs and result plots
β”‚   └── πŸ“ results/
β”‚       β”œβ”€β”€ πŸ“ evaluation/
β”‚       β”‚   β”œβ”€β”€ πŸ“ finetuned/
β”‚       β”‚   β”‚   β”œβ”€β”€ bart-base-imdb.json
β”‚       β”‚   β”‚   β”œβ”€β”€ bert-base-uncased-imdb.json
β”‚       β”‚   β”‚   └── gpt-neo-2.7b-imdb.json
β”‚       β”‚   └── πŸ“ pretrained/
β”‚       β”‚       β”œβ”€β”€ bart-base-imdb.json
β”‚       β”‚       β”œβ”€β”€ bert-base-uncased-imdb.json
β”‚       β”‚       └── gpt-neo-2.7b-imdb.json
β”‚       └── πŸ“ validation/
β”‚           └── πŸ“ finetuned/
β”‚               β”œβ”€β”€ bart-base-imdb_metrics.json
β”‚               β”œβ”€β”€ bert-base-uncased-imdb_metrics.json
β”‚               └── gpt-neo-2.7b-imdb_metrics.json
β”‚
β”œβ”€β”€ πŸ“ models/                        # Folder for storing our models
β”œβ”€β”€ πŸ“ notebooks/
β”‚   β”œβ”€β”€ train_models_from_scratch.ipynb
β”‚   β”œβ”€β”€ ensemble_model_evaluation.ipynb
β”‚   β”œβ”€β”€ plot_results.ipynb
β”‚   └── test_models.ipynb
β”‚
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ πŸ“ architectures/
β”‚   β”‚   β”œβ”€β”€ model_bart_base_imdb.py
β”‚   β”‚   β”œβ”€β”€ model_bert_base_uncased_imdb.py
β”‚   β”‚   β”œβ”€β”€ model_gpt_neo_2_7b_imdb.py
β”‚   β”‚   └── model_ensemble_majority_voting.py
β”‚   β”œβ”€β”€ aggregate_json.py
β”‚   β”œβ”€β”€ data_preprocessing.py
β”‚   β”œβ”€β”€ download_models.py
β”‚   β”œβ”€β”€ ensemble_analysis.py
β”‚   β”œβ”€β”€ evaluate.py
β”‚   β”œβ”€β”€ evaluate_ensemble.py
β”‚   β”œβ”€β”€ model_configs.py
β”‚   β”œβ”€β”€ model_configs_eval.py
β”‚   β”œβ”€β”€ model_factory.py
β”‚   β”œβ”€β”€ plot_results.py
β”‚   β”œβ”€β”€ train.py
β”‚   β”œβ”€β”€ upload_models.py
β”‚   └── utils.py
β”‚
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
└── README.md

5. πŸ” Access to Hugging Face Models

In order to download and use pretrained models from the πŸ€— Hugging Face Hub (like bert-base-uncased, gpt-neo-2.7B, or bart-base), you’ll need to authenticate.

5.1 πŸͺͺ How to get your Hugging Face Token

  1. Visit https://huggingface.co/settings/tokens
  2. Click New Token, choose role Read and generate it
  3. Copy the token to your clipboard

When running the notebook, you’ll be prompted to enter your token via:

from huggingface_hub import notebook_login
notebook_login()

Note

Run this manually in the first cell of the notebook if not already included. You only need to do this once per environment or session.


6. πŸš€ Installation

Install requirements for any notebook as needed. For local runs, Python β‰₯ 3.8 is required.

Note

For each notebook, you can use a dedicated environment to keep dependencies isolated.


7. πŸ§ͺ Run: Model Training & Evaluation

7.1 πŸ“˜ train_models_from_scratch.ipynb

The notebook performs the entire process of analyzing the performance of pre-trained and fine-tuned models on the Sentiment Analysis task with the IMDb dataset. The following are the main steps write in the first notebook:

7.1.1 βš™οΈ Environment Setup

!nvidia-smi          # GPU verification
%ls                  # Checking the files present

7.1.2 πŸ”„ Cloning the repository

!test -d DLA_LLMSANALYSIS && rm -rf DLA_LLMSANALYSIS
!git clone https://github.com/wakaflocka17/DLA_LLMSANALYSIS.git
%cd DLA_LLMSANALYSIS

7.1.3 🐍 Creation and activation of the virtual environment

!pip install virtualenv
!python -m virtualenv venv
!source venv/bin/activate

7.1.4 πŸ“¦ Installing dependencies

!venv/bin/pip install -r requirements.txt

7.1.5 πŸ” HuggingFace Login

from huggingface_hub import notebook_login
notebook_login()

7.1.6 🧠 Models training and evaluation

πŸ”Ή BERT
# Training
!venv/bin/python main.py --model_config_key bert_base_uncased --mode train

# Evaluation - pretrained
!venv/bin/python main.py --model_config_key bert_base_uncased --mode eval --eval_type pretrained --output_json_path "results/evaluation/pretrained/bert-base-uncased-imdb.json"

# Evaluation - fine-tuned
!venv/bin/python main.py --model_config_key bert_base_uncased --mode eval --eval_type fine_tuned --output_json_path "results/evaluation/finetuned/bert-base-uncased-imdb.json"
πŸ”Ή BART
# Training
!venv/bin/python main.py --model_config_key bart_base --mode train

# Evaluation - pretrained
!venv/bin/python main.py --model_config_key bart_base --mode eval --eval_type pretrained --output_json_path "results/evaluation/pretrained/bart-base-imdb.json"

# Evaluation - fine-tuned
!venv/bin/python main.py --model_config_key bart_base --mode eval --eval_type fine_tuned --output_json_path "results/evaluation/finetuned/bart-base-imdb.json"
πŸ”Ή GPT-Neo
# Training
!venv/bin/python main.py --model_config_key gpt_neo_2_7b --mode train

# Evaluation - pretrained
!venv/bin/python main.py --model_config_key gpt_neo_2_7b --mode eval --eval_type pretrained --output_json_path "results/evaluation/pretrained/gpt-neo-2.7b-imdb.json"

# Evaluation - fine-tuned
!venv/bin/python main.py --model_config_key gpt_neo_2_7b --mode eval --eval_type fine_tuned --output_json_path "results/evaluation/finetuned/gpt-neo-2.7b-imdb.json"

7.1.7 ☁️ Uploading to Hugging Face Hub

!venv/bin/python src/upload_models.py --only bert-base-uncased-imdb
!venv/bin/python src/upload_models.py --only bart-base-imdb
!venv/bin/python src/upload_models.py --only gpt-neo-2.7B-imdb

7.2 πŸ‘₯ ensemble_model_evaluation.ipynb

This notebook performs ensemble Majority Voting among the fine-tuned models for the Sentiment Analysis task on the IMDb dataset. Following are the steps performed:

7.2.1 βš™οΈ Environment Setup

!nvidia-smi          # GPU verification
%ls                  # Checking the files present

7.2.2 πŸ”„ Cloning the repository

!test -d DLA_LLMSANALYSIS && rm -rf DLA_LLMSANALYSIS
!git clone https://github.com/wakaflocka17/DLA_LLMSANALYSIS.git
%cd DLA_LLMSANALYSIS

7.2.3 🐍 Creation and activation of the virtual environment

!pip install virtualenv
!python -m virtualenv venv
!source venv/bin/activate

7.2.4 πŸ“¦ Installing dependencies

!venv/bin/pip install -r requirements.in --constraint constraints.txt

7.2.5 πŸ” HuggingFace Login

from huggingface_hub import notebook_login
notebook_login()

7.2.6 ⬇️ Downloading Fine-Tuned models

πŸ”Ή BERT
# Download
!venv/bin/python src/download_models.py bert_base_uncased
πŸ”Ή BART
# Download
!venv/bin/python src/download_models.py bart_base
πŸ”Ή GPT-Neo
# Download
!venv/bin/python src/download_models.py gpt_neo_2_7b

7.2.7 🧠 Ensemble model evaluation

!venv/bin/python main.py --model_config_key ensemble_majority_voting --mode eval --eval_type fine_tuned --output_json_path "experiments/results/evaluation/ensemble-majority-voting-imdb.json"

7.2.8 ☁️ Uploading the Ensemble model to Hugging Face Hub

!venv/bin/python src/upload_models.py --only ensemble_majority_voting

7.3 πŸ“Š plot_results.ipynb

This notebook is used to plot the results of the evaluation of the models. It uses the plot_results.py script to generate the plots.

7.3.1 πŸ”„ Cloning the repository

!test -d DLA_LLMSANALYSIS && rm -rf DLA_LLMSANALYSIS
!git clone https://github.com/wakaflocka17/DLA_LLMSANALYSIS.git
%cd DLA_LLMSANALYSIS

7.3.2 πŸ“Š Plotting the results

import os, glob, json
import matplotlib.pyplot as plt
import matplotlib.transforms as mtransforms

# 1) Configuration
metrics     = ["accuracy", "precision", "recall", "f1"]
base_dir    = "experiments/results/evaluation"
single_keys = ["bart-base", "bert-base-uncased", "gpt-neo-2.7b"]

colors = {
    "bart-base":         "#4285F4",
    "bert-base-uncased": "#DB4437",
    "gpt-neo-2.7b":      "#F4B400",
    "ensemble":          "#0F9D58",
}
hatches = ['...', '///', '\\\\\\', 'xxx']


def load_evaluation(base_dir, metrics):
    """Carica tutti i JSON e restituisce un dict con pretrained, finetuned ed ensemble."""
    data = {"pretrained": {}, "finetuned": {}}
    for phase in data:
        for path in glob.glob(f"{base_dir}/{phase}/*-imdb.json"):
            name = os.path.basename(path).replace("-imdb.json", "")
            d = json.load(open(path))
            data[phase][name] = [d[m] for m in metrics]

    # ensemble
    ens_path = os.path.join(base_dir, "ensemble-mv-idb-metrics.json")
    if not os.path.exists(ens_path):
        raise FileNotFoundError("ensemble-mv-idb-metrics.json non trovato")
    em = json.load(open(ens_path))
    data["finetuned"]["ensemble"] = [em[m] for m in metrics]
    return data


def style_axes(ax):
    """Applica spine nere e setta un po’ di tight_layout / legend standard."""
    for s in ax.spines.values():
        s.set_visible(True)
        s.set_linewidth(0.8)
        s.set_color("black")
    ax.legend(loc="upper left", bbox_to_anchor=(1.02,1.0),
              frameon=False, fontsize="small")


def annotate_bars(ax, bars, rotate=False, pad=6, fontsize=7):
    """
    Posiziona le etichette sopra ogni barra.
    Se rotate=True le ruota di 90Β° e usa ax.text per centrarle.
    """
    if not rotate:
        for rects in bars:
            ax.bar_label(rects, fmt="%.3f", padding=pad, fontsize=fontsize)
    else:
        fig = ax.get_figure()
        for rects in bars:
            for r in rects:
                x = r.get_x() + r.get_width()/2
                y = r.get_height()
                trans = mtransforms.offset_copy(ax.transData, fig=fig,
                                                x=0, y=pad, units='points')
                ax.text(x, y, f"{y:.3f}",
                        transform=trans, ha='center', va='bottom',
                        rotation=90, fontsize=fontsize)


def plot_group(eval_values, labels, title, out_png,
               width_scale=0.8, rotate_labels=False,
               label_fontsize=7, label_padding=6, ylim_top=1.05):
    n = len(labels)
    x = range(len(metrics))
    width = width_scale / n

    fig, ax = plt.subplots(figsize=(8 + 2*rotate_labels, 4))
    bars = []
    for i, name in enumerate(labels):
        if name == "ensemble":
            base = "ensemble"
        elif name.endswith("-pretrained"):
            base = name[:-11]
        elif name.endswith("-finetuned"):
            base = name[:-10]
        else:
            base = name

        rects = ax.bar(
            [xi + i*width for xi in x],
            eval_values[name],
            width=width,
            color=colors[base],          
            edgecolor="white",
            hatch=hatches[i % len(hatches)],
            linewidth=1.2,
            label=name
        )
        bars.append(rects)

    # annotate
    annotate_bars(ax, bars, rotate=rotate_labels,
                  pad=label_padding, fontsize=label_fontsize)

    ax.set_xticks([xi + width*(n-1)/2 for xi in x])
    ax.set_xticklabels([m.capitalize() for m in metrics])
    ax.set_ylim(0, ylim_top)
    ax.set_ylabel("Score")
    ax.set_title(title, pad=15)
    style_axes(ax)

    fig.tight_layout()
    fig.savefig(out_png, dpi=300, bbox_inches="tight")
    plt.show()


if __name__ == "__main__":
    data = load_evaluation(base_dir, metrics)

    # a) pretrained
    plot_group(
        data["pretrained"],
        single_keys,
        "Pretrained Models Evaluation",
        "pretrained_evaluation.png",
        width_scale=0.8,
        rotate_labels=False,
        ylim_top=1.05
    )

    # b) fine-tuned + ensemble
    plot_group(
        data["finetuned"],
        single_keys + ["ensemble"],
        "Fine-tuned Models & Ensemble Evaluation",
        "finetuned_plus_ensemble.png",
        width_scale=0.99,
        rotate_labels=True,
        label_fontsize=7,
        label_padding=6,
        ylim_top=1.10
    )

    # c) all models
    all_keys = [f"{k}-pretrained" for k in single_keys] + \
               [f"{k}-finetuned"  for k in single_keys] + \
               ["ensemble"]
    all_data = {
        **{f"{k}-pretrained": data["pretrained"][k] for k in single_keys},
        **{f"{k}-finetuned":  data["finetuned"][k]  for k in single_keys},
        "ensemble": data["finetuned"]["ensemble"]
    }

    plot_group(
        all_data,
        all_keys,
        "All Models Comparison",
        "all_models_comparison.png",
        width_scale=0.95,
        rotate_labels=True,
        label_fontsize=7,
        label_padding=6,
        ylim_top=1.10
    )

7.4 πŸ€– test_models.ipynb

This notebook pulls down your fine-tuned BERT, BART and GPT-Neo models from Hugging Face, wraps each in a HuggingFace Transformers pipeline, runs individual inference, builds a summary table, and finally runs a simple majority-vote ensemble.

7.4.1 πŸ“¦ Installing dependencies

!pip install transformers datasets huggingface_hub

7.4.2 πŸ” Hugging Face Login

from huggingface_hub import notebook_login
notebook_login()  # paste your token when prompted

7.4.3 πŸš€ Download & Load Models

import pandas as pd
from collections import Counter
from IPython.display import display
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

single_model_repos = {
    "BERT":    "wakaflocka17/bert-imdb-finetuned",
    "BART":    "wakaflocka17/bart-imdb-finetuned",
    "GPT-Neo": "wakaflocka17/gptneo-imdb-finetuned",
}

pipelines = {}

for name, repo_id in single_model_repos.items():
    local_dir = snapshot_download(repo_id)
    tok = AutoTokenizer.from_pretrained(local_dir)
    mdl = AutoModelForSequenceClassification.from_pretrained(local_dir)
    pipelines[name] = pipeline("text-classification", model=mdl, tokenizer=tok, return_all_scores=False)

print("βœ… Pipelines loaded.")

7.4.4 πŸ” Build and display the Summary Table

reviews = [
    "I absolutely loved this movie!",
    "Quite boring and too long."
]

ground_truths = ["POSITIVE", "NEGATIVE"]
rows = []

for text, gt in zip(reviews, ground_truths):
    preds = {name: pipelines[name]([text], batch_size=1)[0]["label"] for name in pipelines}
    vote = Counter(preds.values()).most_common(1)[0][0]
    rows.append({
        "Review": text,
        "Ground Truth": gt,
        **preds,
        "Ensemble Vote": vote
    })

df = pd.DataFrame(rows)
display(df)

Or, if you are interested in specifically testing one of the previously downloaded and set up models, take a look at the code below.

7.4.5 🟨 Single model testing: Bert

def infer_bert(text):
    return pipelines['BERT'](text)

single_review = "I absolutely loved this movie!"
reviews_list = [
    "A compelling and moving story.",
    "I found it rather dull and predictable."
]

print("BERT single:", infer_bert(single_review))
print("BERT batch:", pipelines['BERT'](reviews_list, batch_size=8))

7.4.6 🟩 Single model testing: Bart

def infer_bart(text):
    return pipelines['BART'](text)

single_review = "I absolutely loved this movie!"
reviews_list = [
    "A compelling and moving story.",
    "I found it rather dull and predictable."
]

print("BART single:", infer_bart(single_review))
print("BART batch:", pipelines['BART'](batch_reviews, batch_size=8))

7.4.7 🟦 Single model testing: Gpt-Neo-2.7b

def infer_gptneo(text):
    return pipelines['GPT-Neo'](text)

single_review = "I absolutely loved this movie!"
reviews_list = [
    "A compelling and moving story.",
    "I found it rather dull and predictable."
]

print("GPT-Neo single:", infer_gptneo(single_review))
print("GPT-Neo batch:", pipelines['GPT-Neo'](batch_reviews, batch_size=8))

7.4.8 πŸŸ₯ Single model testing: Ensemble model (using Majority Voting)

from collections import Counter

single_review = "I absolutely loved this movie!"
reviews_list = [
    "A compelling and moving story.",
    "I found it rather dull and predictable."
]

print("=== Ensemble Predictions (Majority Vote) ===")

# Single review
single_vote = Counter(
    pipelines[name]([reviews[0]], batch_size=1)[0]["label"]
    for name in pipelines
).most_common(1)[0][0]
print(f"Single: {single_vote}")

# Batch of reviews
batch_votes = []
for text in reviews:
    preds = [pipelines[name]([text], batch_size=1)[0]["label"] for name in pipelines]
    vote = Counter(preds).most_common(1)[0][0]
    batch_votes.append(vote)
print(f"Batch: {batch_votes}")

8. πŸ“Š Metrics and Outputs

8.1 πŸ“‘ Description

Each model evaluation is based on the following metrics:

Metric Description Formula (Simplified)
Accuracy Overall correctness of the model (TP + TN) / (TP + TN + FP + FN)
Precision How many predicted positives are correct TP / (TP + FP)
Recall Ability to detect all true positives TP / (TP + FN)
F1-Score Harmonic mean of precision and recall 2 Γ— (Precision Γ— Recall) / (Precision + Recall)

Where:

  • TP = True Positives
  • TN = True Negatives
  • FP = False Positives
  • FN = False Negatives

8.2 πŸ“‚ Output Format

The evaluation metrics are saved as .json files for each model in the following format:

{
  "accuracy": 0.91,
  "precision": 0.90,
  "recall": 0.91,
  "f1": 0.90
}

8.3 πŸ“Š Metrics Plots

To evaluate the effectiveness of our models on the stanford/imdb dataset, we generated three bar charts comparing the main classification metrics (Accuracy, Precision, Recall and F1-Score).

8.3.1 Pretrained Models Evaluation

Pretrained Models Evaluation

In this first plot we compare the out-of-the-box performance of the pretrained versions of BART, BERT and GPT-Neo.

  • Accuracy: BART-base gets the best result (0.516), while GPT-Neo-2.7b settles at 0.474 and BERT-base-uncased at 0.455;
  • Precision: BART-base is still in the lead (0.527), with BERT at 0.475 and GPT-Neo at 0.458;
  • Recall: BERT stands out clearly (0.830), a sign of a strong ability to recover all positive instances, while BART and GPT-Neo remain below 0.31;
  • F1-Score: The harmony between precision and recall rewards BERT (0.604), compared with 0.389 for BART and 0.347 for GPT-Neo.

Note

Insight: from pretrained the transformer architectures are not yet homogeneous: BERT favors recall at the expense of accuracy, BART is more balanced, GPT-Neo suffers in all metrics.

8.3.2 Fine-tuned Models & Ensemble Evaluation

After we performed light fine-tuning on the IMDB training set: Fine-tuned Models & Ensemble Evaluation

Modello Accuracy Precision Recall F1
BART-base 0.880 0.884 0.874 0.879
BERT-base-uncased 0.873 0.866 0.883 0.875
GPT-Neo-2.7b 0.841 0.854 0.823 0.838
Ensemble 0.933 0.956 0.908 0.931
  • All models improve by more than +0.35 points in accuracy;
  • The ensemble achieves the highest accuracy (0.956) and also leads on accuracy (0.933) and F1 (0.931);
  • BERT-base-uncased wins the highest recall among singles (0.883) and an F1 of 0.875;
  • GPT-Neo, although enhanced by fine-tuning, remains the furthest behind with F1=0.838.

Note

Insight: fine-tuning completely transforms performance, narrowing the gap between architectures; ensemble, combining the strengths of each, proves to be the most robust choice.

8.3.3 All Models Comparison

In this chart we place side by side:

  1. Pretrained (light hatching);
  2. Fine-tuned (dark hatching);
  3. Ensemble using Fine-tuned models (green). All Models Comparison In fact, we can see how:
  • The jump from pretrained (~0.50) to fine-tuned (~0.88) is evident on all metrics;
  • No single fine-tuned model reaches ensemble levels: +0.053 in F1 compared to the best of the individuals;
  • The ensemble acts as a β€œmanifold” of strengths, offering the greatest stability on precision, recall, and F1.

8.3.4 Conclusions

These results highlight how, for our Binary Sentiment Analysis task, fine-tuning was essential to be able to extract relevant information. More importantly, it was essential to observe how the ensembler further multiplies the overall effectiveness of the other individual three models.


9. πŸ–₯️ Hardware and Limitations

Note

πŸ§ͺ All training and evaluation were conducted on Google Colab Pro+ with the following setup:

  • Runtime environment: Google Colab Pro+
  • GPU: NVIDIA A100 (40GB VRAM)
  • RAM: High-RAM Instance (β‰ˆ 52 GB)
  • Backend: PyTorch with CUDA
  • Cost: €22 spent on Colab Pro+ to fully leverage GPU performance

Warning

  • Training GPT-Neo locally (especially on CPU or low-VRAM GPU) may be extremely slow or unstable
  • If using Apple Silicon (M1/M2/M3/M4), consider the MPS backend but expect slower inference on large models

10. 🀝 Contributions

Feel free to contribute to the project! πŸ’‘
We welcome improvements, especially in the following areas:

  • Adding new Transformer models (e.g. T5, DeBERTa, DistilBERT)
  • Improving ensemble strategies (voting, stacking, etc.)
  • Suggesting or implementing new evaluation metrics (e.g. calibration, fairness, coverage@k)

10.1 πŸ“Œ How to Contribute

  1. Fork the repository
  2. Create a new branch:
    git checkout -b feature-name
  3. Commit your changes:
    git commit -m "Add new evaluation metric"
  4. Push the branch:
    git push origin feature-name
  5. Open a Pull Request on GitHub

πŸ“¬ We’ll review your proposal and get back to you as soon as possible!


11. πŸ“ Licenses

Note

Code: This repository's source code is licensed under the Apache License 2.0. You can read more at http://www.apache.org/licenses/LICENSE-2.0

Documentation: All documentation, including this README, is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). See the full text in the LICENSE_DOCS file.


12. ❓ How to Cite

@misc{Sentiment-Project,
author       = {Francesco Congiu},
title        = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models},
howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}},
year         = {2025}
}

About

This repository contains code for training and evaluating Encoder-Only/Decoder-Only/Encoder-Decoder/Enseble models using the Stanford IMDb dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published