Topic: Readability

Published:

# Topic: Readability

Authorship analysis Readability Text analysis Text authorship Text classification Text evaluation Text evaluation metric Text generations Text processing Text qualities Text quality index

2025

  1. Campanile, L., de Biase, M. S., & Marulli, F. (2025). Edge-Cloud Distributed Approaches to Text Authorship Analysis: A Feasibility Study [Book chapter]. Lecture Notes on Data Engineering and Communications Technologies, 250, 284–293. https://doi.org/10.1007/978-3-031-87778-0_28
    Abstract
    Automatic authorship analysis, often referred to as stylometry, is a captivating yet contentious field that employs computational techniques to determine the authorship of textual artefacts. In recent years, the importance of author profiling has grown significantly due to the proliferation of automatic text generation systems. These include both early-generation bots and the latest generative AI-based models, which have heightened concerns about misinformation and content authenticity. This study proposes a novel approach to evaluate the feasibility and effectiveness of contemporary distributed learning methods. The approach leverages the computational advantages of distributed systems while preserving the privacy of human contributors, enabling the collection and analysis of extensive datasets of “human-written” texts in contrast to those generated by bots. More specifically, the proposed method adopts a Federated Learning (FL) framework, integrating readability and stylometric metrics to deliver a privacy-preserving solution for Authorship Attribution (AA). The primary objective is to enhance the accuracy of AA processes, thus achieving a more robust “authorial fingerprint”. Experimental results reveal that while FL effectively protects privacy and mitigates data exposure risks, the combined use of readability and stylometric features significantly increases the accuracy of AA. This approach demonstrates promise for secure and scalable AA applications, particularly in privacy-sensitive contexts and real-time edge computing scenarios. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
    DOI Publisher Details
    Details

2024

  1. Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636
    Abstract
    Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors.
    DOI Publisher Details
    Details

2021

  1. Marulli, F., Balzanella, A., Campanile, L., Iacono, M., & Mastroianni, M. (2021). Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources [Conference paper]. Proceedings of the International Joint Conference on Neural Networks, 2021-July. https://doi.org/10.1109/IJCNN52387.2021.9534377
    Abstract
    Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work. © 2021 IEEE.
    DOI Publisher Details
    Details

← Back to all publications

2025

  1. Campanile, L., de Biase, M. S., & Marulli, F. (2025). Edge-Cloud Distributed Approaches to Text Authorship Analysis: A Feasibility Study [Book chapter]. Lecture Notes on Data Engineering and Communications Technologies, 250, 284–293. https://doi.org/10.1007/978-3-031-87778-0_28
    Abstract
    Automatic authorship analysis, often referred to as stylometry, is a captivating yet contentious field that employs computational techniques to determine the authorship of textual artefacts. In recent years, the importance of author profiling has grown significantly due to the proliferation of automatic text generation systems. These include both early-generation bots and the latest generative AI-based models, which have heightened concerns about misinformation and content authenticity. This study proposes a novel approach to evaluate the feasibility and effectiveness of contemporary distributed learning methods. The approach leverages the computational advantages of distributed systems while preserving the privacy of human contributors, enabling the collection and analysis of extensive datasets of “human-written” texts in contrast to those generated by bots. More specifically, the proposed method adopts a Federated Learning (FL) framework, integrating readability and stylometric metrics to deliver a privacy-preserving solution for Authorship Attribution (AA). The primary objective is to enhance the accuracy of AA processes, thus achieving a more robust “authorial fingerprint”. Experimental results reveal that while FL effectively protects privacy and mitigates data exposure risks, the combined use of readability and stylometric features significantly increases the accuracy of AA. This approach demonstrates promise for secure and scalable AA applications, particularly in privacy-sensitive contexts and real-time edge computing scenarios. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
    DOI Publisher Details
    Details

2024

  1. Marulli, F., Campanile, L., de Biase, M. S., Marrone, S., Verde, L., & Bifulco, M. (2024). Understanding Readability of Large Language Models Output: An Empirical Analysis [Conference paper]. Procedia Computer Science, 246(C), 5273–5282. https://doi.org/10.1016/j.procs.2024.09.636
    Abstract
    Recently, Large Language Models (LLMs) have seen some impressive leaps, achieving the ability to accomplish several tasks, from text completion to powerful chatbots. The great variety of available LLMs and the fast pace of technological innovations in this field, is making LLM assessment a hard task to accomplish: understanding not only what such a kind of systems generate but also which is the quality of their results is of a paramount importance. Generally, the quality of a synthetically generated object could refer to the reliability of the content, to the lexical variety or coherence of the text. Regarding the quality of text generation, an aspect that up to now has not been adequately discussed is concerning the readability of textual artefacts. This work focuses on the latter aspect, proposing a set of experiments aiming to better understanding and evaluating the degree of readability of texts automatically generated by an LLM. The analysis is performed through an empirical study based on: considering a subset of five pre-trained LLMs; considering a pool of English text generation tasks, with increasing difficulty, assigned to each of the models; and, computing a set of the most popular readability indexes available from the computational linguistics literature. Readability indexes will be computed for each model to provide a first perspective of the readability of textual contents artificially generated can vary among different models and under different requirements of the users. The results obtained by evaluating and comparing different models provide interesting insights, especially into the responsible use of these tools by both beginners and not overly experienced practitioners. © 2024 The Authors.
    DOI Publisher Details
    Details

2021

  1. Marulli, F., Balzanella, A., Campanile, L., Iacono, M., & Mastroianni, M. (2021). Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources [Conference paper]. Proceedings of the International Joint Conference on Neural Networks, 2021-July. https://doi.org/10.1109/IJCNN52387.2021.9534377
    Abstract
    Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work. © 2021 IEEE.
    DOI Publisher Details
    Details

← Back to all publications