Understanding Readability of Large Language Models Output: An Empirical Analysis
Understanding Readability of Large Language Models Output: An Empirical Analysis
Venue & metadata
- Journal/Proceedings: Procedia Computer Science
- Volume: 246
- Number: C
- Pages: 5273 – 5282
- Note: Cited by: 2; All Open Access, Gold Open Access
- Author keywords: Artificial vs Hand-crafted generation; Generative AI; LLMs; Readability; Text Evaluation Metrics; Text Generation; Text Quality Indexes
Abstract
Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors.
Keywords
Artificial vs hand-crafted generation GS Evaluation metrics GS Generative AI GS Language model GS Large language model GS Quality indices GS Readability GS Text evaluation GS Text evaluation metric GS Text generations GS Text qualities GS Text quality index GS Computational linguistics GS
Links & artifacts
Suggested citation
Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636