Publications tagged with Natural language processing
Published:
Publications tagged with "Natural language processing"
- Campanile, L., Zona, R., Perfetti, A., & Rosatelli, F. (2025). An AI-Driven Methodology for Patent Evaluation in the IoT Sector: Assessing Relevance and Future Impact [Conference paper]. International Conference on Internet of Things, Big Data and Security, IoTBDS - Proceedings, 501–508. https://doi.org/10.5220/0013519700003944
Abstract
The rapid expansion of the Internet of Things has led to a surge in patent filings, creating challenges in evaluating their relevance and potential impact. Traditional patent assessment methods, relying on manual review and keyword-based searches, are increasingly inadequate for analyzing the complexity of emerging IoT technologies. In this paper, we propose an AI-driven methodology for patent evaluation that leverages Large Language Models and machine learning techniques to assess patent relevance and estimate future impact. Our framework integrates advanced Natural Language Processing techniques with structured patent metadata to establish a systematic approach to patent analysis. The methodology consists of three key components: (1) feature extraction from patent text using LLM embeddings and conventional NLP methods, (2) relevance classification and clustering to identify emerging technological trends, and (3) an initial formulation of impact estimation based on semantic similarity and citation patterns. While this study focuses primarily on defining the methodology, we include a minimal validation on a sample dataset to illustrate its feasibility and potential. The proposed approach lays the groundwork for a scalable, automated patent evaluation system, with future research directions aimed at refining impact prediction models and expanding empirical validation. Copyright © 2025 by SCITEPRESS - Science and Technology Publications, Lda. - Campanile, L., de Biase, M. S., Marrone, S., Marulli, F., Raimondo, M., & Verde, L. (2022). Sensitive Information Detection Adopting Named Entity Recognition: A Proposed Methodology [Conference paper]. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13380 LNCS, 377–388. https://doi.org/10.1007/978-3-031-10542-5_26
Abstract
Protecting and safeguarding privacy has become increasingly important, especially in recent years. The increasing possibilities of acquiring and sharing personal information and data through digital devices and platforms, such as apps or social networks, have increased the risks of privacy breaches. In order to effectively respect and guarantee the privacy and protection of sensitive information, it is necessary to develop mechanisms capable of providing such guarantees automatically and reliably. In this paper we propose a methodology able to automatically recognize sensitive data. A Named Entity Recognition was used to identify appropriate entities. An improvement in the recognition of these entities is achieved by evaluating the words contained in an appropriate context window by assessing their similarity to words in a domain taxonomy. This, in fact, makes it possible to refine the labels of the recognized categories using a generic Named Entity Recognition. A preliminary evaluation of the reliability of the proposed approach was performed. In detail, texts of juridical documents written in Italian were analyzed. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. - Campanile, L., Biase, M. S. de, Marrone, S., Raimondo, M., & Verde, L. (2022). On the Evaluation of BDD Requirements with Text-based Metrics: The ETCS-L3 Case Study [Conference paper]. Smart Innovation, Systems and Technologies, 309, 561–571. https://doi.org/10.1007/978-981-19-3444-5_48
Abstract
A proper requirement definition phase is of a paramount importance in software engineering. It is the first and prime mean to realize efficient and reliable systems. System requirements are usually formulated and expressed in natural language, given its universality and ease of communication and writing. Unfortunately, natural language can be a source of ambiguity, complexity and omissions, which may cause system failures. Among the different approaches proposed by the software engineering community, Behavioural-Driven Development (BDD) is affirming as a valid, practical method to structure effective and non-ambiguous requirement specifications. The paper tackles with the problem of measuring requirements in BDD by assessing some traditional Natural Language Processing-related metrics with respect to a sample excerpt of requirement specification rewritten according to the BDD criteria. This preliminary assessment is made on the ERTMS-ETCS Level 3 case study whose specification, up to this date, is not managed by a standardisation body. The paper demonstrates the necessity of novel metrics able to cope with the BDD specification paradigms. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. - Marulli, F., Verde, L., & Campanile, L. (2021). Exploring data and model poisoning attacks to deep learning-based NLP systems [Conference paper]. Procedia Computer Science, 192, 3570–3579. https://doi.org/10.1016/j.procs.2021.09.130
Abstract
Natural Language Processing (NLP) is being recently explored also to its application in supporting malicious activities and objects detection. Furthermore, NLP and Deep Learning have become targets of malicious attacks too. Very recent researches evidenced that adversarial attacks are able to affect also NLP tasks, in addition to the more popular adversarial attacks on deep learning systems for image processing tasks. More precisely, while small perturbations applied to the data set adopted for training typical NLP tasks (e.g., Part-of-Speech Tagging, Named Entity Recognition, etc..) could be easily recognized, models poisoning, performed by the means of altered data models, typically provided in the transfer learning phase to a deep neural networks (e.g., poisoning attacks by word embeddings), are harder to be detected. In this work, we preliminary explore the effectiveness of a poisoned word embeddings attack aimed at a deep neural network trained to accomplish a Named Entity Recognition (NER) task. By adopting the NER case study, we aimed to analyze the severity of such a kind of attack to accuracy in recognizing the right classes for the given entities. Finally, this study represents a preliminary step to assess the impact and the vulnerabilities of some NLP systems we adopt in our research activities, and further investigating some potential mitigation strategies, in order to make these systems more resilient to data and models poisoning attacks. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International. - Marulli, F., Balzanella, A., Campanile, L., Iacono, M., & Mastroianni, M. (2021). Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources [Conference paper]. Proceedings of the International Joint Conference on Neural Networks, 2021-July. https://doi.org/10.1109/IJCNN52387.2021.9534377
Abstract
Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work. © 2021 IEEE.