Authors:
(1) Muzhaffar Hazman, University of Galway, Ireland;
(2) Susan McKeever, Technological University Dublin, Ireland;
(3) Josephine Griffith, University of Galway, Ireland.
Table of Links
Conclusion, Acknowledgments, and References
A Hyperparameters and Settings
E Contingency Table: Baseline vs. Text-STILT
5 Limitations and Future Works
To generate comparable results between Baseline and Text-STILT, we kept many hyperparameters constant. Additional work would be required to determine the maximum achievable performance of Text-STILT on the chosen task.
Despite the efficacy of Text-STILT over ImageSTILT, these results do not suggest that only the text modality is significant in classifying multimodal memes. Previous works have performed modality ablation studies in this problem space (Bucur et al., 2022; Pramanick et al., 2021b; Keswani et al., 2020) with multimodal architectures remaining the apparent state of the practise. All models in this work are similarly multimodal. In the future, we plan to reformulate Image-STILT with respect to the approach and data used to isolate the cause of its non-performance on the downstream task. Furthermore, we did not test Text-STILT on classifiers that represent the image modality of a meme in textual forms, as others did (Singh et al., 2022; Pramanick et al., 2021b).
Notwithstanding our results, Text-STILT may not benefit all multimodal meme classifiers. Phang
et al. (2019) showed that STILT offers varying degrees of benefit depending on the encoders chosen. Future work is needed to verify if these observations hold across the wide range of pretrained encoders commonly used in meme classifiers. In particular, some modifications to unimodal STILTs are needed to be applied to single-stream multimodal encoders, as those used in other works.
Furthermore, Pruksachatkun et al. (2020) showed that intermediate training benefits various text-only tasks differently. We have yet to identify other meme classification tasks that would benefit from unimodal STILTs. Thus, we plan to conduct more extensive experimentation to validate the effectiveness of Text-STILT on other meme classification tasks, e.g. pairing hate-speech detection in text (Toraman et al., 2022) as an intermediate task for hateful meme detection (Kiela et al., 2020).
This paper is available on arxiv under CC 4.0 license.