Authors:
(1) Jingjing Wang, School of Computing Clemson University Clemson, South Carolina, USA;
(2) Joshua Luo, The Westminster Schools Atlanta, Georgia, USA;
(3) Grace Yang, South Windsor High School South Windsor, Connecticut, USA;
(4) Allen Hong, D.W. Daniel High School Clemson, South Carolina, USA;
(5) Feng Luo, School of Computing Clemson University Clemson, South Carolina, USA.
Table of Links
V. DISCUSSION.
In this section, we discuss the limitations of GPT model, the conclusion of our study and future work.
A. Limitations
From the experiment results, we can learn that the current GPT-3.5 model still has some limitations..
Lack of Deep Contextual Understanding: GPT model, despite its advanced capabilities, is still lacking a deep understanding of human nuances, societal norms, and cultural contexts, which are often critical to accurately interpreting and responding to subjective tasks. This limitation becomes
particularly pronounced when the GPT model is faced with local idioms, colloquialisms, or culturally specific references.
Difficulty in Understanding Implicit Meaning: LLMs typically struggle with interpreting and generating content that contains implicit or hidden meanings, especially those requiring an understanding of human emotions, intentions, or sarcasm. This limitation becomes particularly apparent in tasks involving the interpretation of sarcasm or offensiveness in memes, as these tasks often require a nuanced understanding of cultural or social contexts.
Biases in Training Data: LLMs are trained on vast amounts of data from the internet, which may contain various forms of biases. The presence of biases inside the model might accidentally impact its responses in subjective tasks, leading to much less accurate outcomes.
B. Conclusion and Future Work
Our efforts to assess how well GPT could analyze sentiments in memes led us to some interesting and insightful findings.
On one hand, the model performed impressively when it came to classifying non-hateful memes and understanding the positive mood of a meme. The significant accuracy achieved in non-hateful meme classification, positive sentiment determination, and humor recognition, demonstrate the GPT’s ability to comprehend and interpret the content in a majority of memes correctly.
However, the accuracy rate dropped a lot when encountered with detection of hateful, sarcasm and offensive content in memes. The lower accuracy rates seen in these categories shows the intricate nature of identifying concealed hateful or offensive content inside memes.
As GPT-4 released in recent days, it will be beneficial if we can integrate the latest model in our framework; moreover, as fine-tuning GPT-3.5-Turbo is available, we can fine tune our exist models to make some improvements in its ability to detect hateful and offensive content. We leave these as future work.
REFERENCES
[1] C. Bauckhage, “Insights into Internet Memes”, ICWSM, vol. 5, no. 1, pp. 42-49, Aug. 2021.
[2] L. Shifman, ”Memes in a digital world: Reconciling with a conceptual troublemaker”, Journal of computer-mediated communication, vol. 18, no. 3, pp. 362–377, 2013, Oxford University Press Oxford, UK.
[3] D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, P. Ringshia, and D. Testuggine, “The hateful memes challenge: Detecting hate speech in multimodal memes,” Advances in neural information processing systems, vol. 33, pp. 2611–2624, 2020.
[4] S. Suryawanshi, B. R. Chakravarthi, M. Arcan, and P. Buitelaar, “Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text,” in Proceedings of the second workshop on trolling, aggression and cyberbullying, 2020, pp. 32–41.
[5] A. Williams, C. Oliver, K. Aumer, and C. Meyers, “Racial microaggressions and perceptions of Internet memes,” Computers in Human Behavior, vol. 63, pp. 424–432, 2016, Elsevier.
[6] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
[7] K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stuber, J. ¨ Topalis, T. Weber, P. Wesp, B. Sabel, J. Ricke, et al., “Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports,” arXiv preprint arXiv:2212.14882, 2022.
[8] B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, Y. Wu, “How close is chatgpt to human experts? comparison corpus, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
[9] C. Sharma, D. Bhageria, W. Scott, S. Pykl, A. Das, T. Chakraborty, V. Pulabaigari, B. Gamback, “SemEval-2020 Task 8: Memotion Analysis– The Visuo-Lingual Metaphor!,” arXiv preprint arXiv:2008.03781, 2020.
[10] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[11] P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, et al., “Holistic evaluation of language models,” arXiv preprint arXiv:2211.09110, 2022.
[12] OpenAI, “Introducing chatgpt,” 2023.
[13] K. Zhou, J. Yang, C. C. Loy, Z. Liu, “Conditional prompt learning for vision-language models,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825, 2022.
[14] Z. Yang, L. Li, J. Wang, K. Lin, E. Azarnasab, F. Ahmed, Z. Liu, C. Liu, M. Zeng, L. Wang, “Mm-react: Prompting chatgpt for multimodal reasoning and action,” arXiv preprint arXiv:2303.11381, 2023.
[15] P. Lu, S. Mishra, T. Xia, L. Qiu, K. W. Chang, S. C. Zhu, O. Tafjord, P. Clark, A. Kalyan, “Learn to explain: Multimodal reasoning via thought chains for science question answering,” Advances in Neural Information Processing Systems, vol. 35, pp. 2507–2521, 2022.
[16] W. Huang, P. Abbeel, D. Pathak, I. Mordatch, “Language models as zeroshot planners: Extracting actionable knowledge for embodied agents,” International Conference on Machine Learning, pp. 9118–9147, 2022.
[17] B. Paranjape, S. Lundberg, S. Singh, H. Hajishirzi, L. Zettlemoyer, M. T. Ribeiro, “ART: Automatic multi-step reasoning and tool-use for large language models,” arXiv preprint arXiv:2303.09014, 2023.
[18] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[19] S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, T. Chakraborty, “MOMENTA: A multimodal framework for detecting harmful memes and their targets,” arXiv preprint arXiv:2109.05184, 2021.
[20] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
[21] S. Huang, L. Dong, W. Wang, F. Wei, M. Lapata, B. Chang, X. Yan, “Language Models as Meta-Prompts: Multimodal Meta-Learning through Prompt Engineering,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
[22] S. Uppal, S. Bhagat, D. Hazarika, N. Majumder, S. Poria, R. Zimmermann, A. Zadeh, “Multimodal research in vision and language: A review of current and emerging trends,” Information Fusion, vol. 77, pp. 149– 171, 2022, Elsevier.
[23] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” International conference on machine learning, pp. 8748–8763, 2021, PMLR.
[24] Y. Hu, H. Hua, Z. Yang, W. Shi, N. A. Smith, J. Luo, “Promptcap: Prompt-guided task-aware image captioning,” arXiv preprint arXiv:2211.09699, 2022.
[25] K.-L. Chiu, A. Collins, R. Alexander, “Detecting hate speech with gpt3,” arXiv preprint arXiv:2103.12407, 2021.
[26] S. Sharma, F. Alam, M. S. Akhtar, D. Dimitrov, G. D. S. Martino, H. Firooz, A. Halevy, F. Silvestri, P. Nakov, T. Chakraborty, “Detecting and understanding harmful memes: A survey,” arXiv preprint arXiv:2205.04274, 2022.
[27] K. Tanaka, H. Yamane, Y. Mori, Y. Mukuta, T. Harada, “Learning to Evaluate Humor in Memes Based on the Incongruity Theory,” Proceedings of the Second Workshop on When Creative AI Meets Conversational AI, pp. 81–93, 2022.
[28] K. Scott, “Memes as multimodal metaphors: A relevance theory analysis,” Pragmatics & Cognition, vol. 28, no. 2, pp. 277–298, 2021, John Benjamins Publishing Company Amsterdam/Philadelphia.
[29] S. Sharma, A. Kulkarni, T. Suresh, H. Mathur, P. Nakov, M. S. Akhtar, T. Chakraborty, “Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?,” arXiv preprint arXiv:2301.11219, 2023.
[30] A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani, et al., “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv preprint arXiv:2204.00598, 2022.
[31] S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, T. Chakraborty, “MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets,” Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4439–4455, Nov. 2021, Punta Cana, Dominican Republic, Association for Computational Linguistics.
This paper is available on arxiv under CC 4.0 license.