Understanding Readability of Large Language Models Output: An Empirical Analysis

2 minute read

Conference Fiammetta Marulli, Lelio Campanile, Maria Stella Biase, Stefano Marrone, Laura Verde, Marianna Bifulco — 2024 · Procedia Computer Science

Venue & metadata

Journal/Proceedings: Procedia Computer Science
Volume: 246
Number: C
Pages: 5273 – 5282
Note: Cited by: 2; All Open Access, Gold Open Access
Author keywords: Artificial vs Hand-crafted generation; Generative AI; LLMs; Readability; Text Evaluation Metrics; Text Generation; Text Quality Indexes

Abstract

Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors.

Keywords

Artificial vs hand-crafted generation Evaluation metrics Generative AI Language model Large language model Quality indices Readability Text evaluation Text evaluation metric Text generations Text qualities Text quality index Computational linguistics

Links & artifacts

DOI Publisher

@conference{Marulli20245273,
  author = {Marulli, Fiammetta and Campanile, Lelio and de Biase, Maria Stella and Marrone, Stefano and Verde, Laura and Bifulco, Marianna},
  title = {Understanding Readability of Large Language Models Output: An Empirical Analysis},
  year = {2024},
  journal = {Procedia Computer Science},
  volume = {246},
  number = {C},
  pages = {5273 – 5282},
  doi = {10.1016/j.procs.2024.09.636}
}

Suggested citation

Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636

← Back to Publications