Publications tagged with Computational linguistics
Published:
Publications tagged with "Computational linguistics"
- Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636
Abstract
Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors. - Marulli, F., Verde, L., & Campanile, L. (2021). Exploring data and model poisoning attacks to deep learning-based NLP systems [Conference paper]. Procedia Computer Science, 192, 3570–3579. https://doi.org/10.1016/j.procs.2021.09.130
Abstract
Natural Language Processing (NLP) is being recently explored also to its application in supporting malicious activities and objects detection. Furthermore, NLP and Deep Learning have become targets of malicious attacks too. Very recent researches evidenced that adversarial attacks are able to affect also NLP tasks, in addition to the more popular adversarial attacks on deep learning systems for image processing tasks. More precisely, while small perturbations applied to the data set adopted for training typical NLP tasks (e.g., Part-of-Speech Tagging, Named Entity Recognition, etc..) could be easily recognized, models poisoning, performed by the means of altered data models, typically provided in the transfer learning phase to a deep neural networks (e.g., poisoning attacks by word embeddings), are harder to be detected. In this work, we preliminary explore the effectiveness of a poisoned word embeddings attack aimed at a deep neural network trained to accomplish a Named Entity Recognition (NER) task. By adopting the NER case study, we aimed to analyze the severity of such a kind of attack to accuracy in recognizing the right classes for the given entities. Finally, this study represents a preliminary step to assess the impact and the vulnerabilities of some NLP systems we adopt in our research activities, and further investigating some potential mitigation strategies, in order to make these systems more resilient to data and models poisoning attacks. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International.