Understanding Readability of Large Language Models Output: An Empirical Analysis

Understanding Readability of Large Language Models Output: An Empirical Analysis

Conference Marulli, Fiammetta and Campanile, Lelio and de Biase, Maria Stella and Marrone, Stefano and Verde, Laura and Bifulco, Marianna — 2024 · Procedia Computer Science

Venue & metadata

  • Journal/Proceedings: Procedia Computer Science
  • Volume: 246
  • Number: C
  • Pages: 5273 – 5282
  • Note: Cited by: 2; All Open Access, Gold Open Access
  • Author keywords: Artificial vs Hand-crafted generation; Generative AI; LLMs; Readability; Text Evaluation Metrics; Text Generation; Text Quality Indexes

Abstract

Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors.

Keywords

Artificial vs hand-crafted generation GS Evaluation metrics GS Generative AI GS Language model GS Large language model GS Quality indices GS Readability GS Text evaluation GS Text evaluation metric GS Text generations GS Text qualities GS Text quality index GS Computational linguistics GS

Links & artifacts

DOI Publisher

Suggested citation

Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636

← Back to Publications