Contingency Table: Baseline vs. Text-STILT

7 Apr 2024

Authors:

(1) Muzhaffar Hazman, University of Galway, Ireland;

(2) Susan McKeever, Technological University Dublin, Ireland;

(3) Josephine Griffith, University of Galway, Ireland.

Table of Links

Abstract and Introduction

Related Works

Methodology

Results

Limitations and Future Works

Conclusion, Acknowledgments, and References

A Hyperparameters and Settings

B Metric: Weighted F1-Score

C Architectural Details

D Performance Benchmarking

E Contingency Table: Baseline vs. Text-STILT

Table 8: Contingency Table between similarly performing Text-STILT (trained with 60% memes) and Baseline (trained with 100% memes).

Table 8 shows the contingency table – as one would prepare for a McNemar’s Test between two classifiers (McNemar, 1947) – between the model trained with Text-STILT on 60% Memes and Baseline trained on 100% Memes available which had the most similar performance. While the two models performed similarly in terms of Weighted F1- scores, Text-STILT correctly classified a notable number of memes that Baseline did not and vice versa. Examples of such memes are discussed in Section 4.1. Furthermore, approximately 40% of memes in the testing set were incorrectly classified by both models. This suggests that these memes convey sentiment in a way that cannot be reliably predicted by either approach.

This paper is available on arxiv under CC 4.0 license.

← Previous

Unimodal Training for Multimodal Meme Sentiment Classification: Performance Benchmarking

Up Next →

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning