Topic: speech
Published:
2024
- DetailsVerde, L., Marulli, F., De Fazio, R., Campanile, L., & Marrone, S. (2024). HEAR set: A ligHtwEight acoustic paRameters set to assess mental health from voice analysis [Article]. Computers in Biology and Medicine, 182. https://doi.org/10.1016/j.compbiomed.2024.109021
Abstract
Background: Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. Method: In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. Results: The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. Conclusions: The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders. © 2024 The Author(s)
2023
- DetailsCampanile, L., de Fazio, R., Di Giovanni, M., Marrone, S., Marulli, F., & Verde, L. (2023). Inferring Emotional Models from Human-Machine Speech Interactions [Conference paper]. Procedia Computer Science, 225, 1241–1250. https://doi.org/10.1016/j.procs.2023.10.112
Abstract
Human-Machine Interfaces (HMIs) are getting more and more important in a hyper-connected society. Traditional HMIs are built considering cognitive features while emotional ones are often neglected, bringing sometimes such interfaces to misuse. As a part of a long run research, oriented to the definition of an HMI engineering approach, this paper concretely proposes a method to build an emotional-aware explicit model of the user starting from the behaviour of the human with a virtual agent. The paper also proposes an instance of this model inference process in voice assistants in an automatic depression context, which can constitute the core phase to realize a Human Digital Twin of a patient. The case study generated a model composed of Fluid Stochastic Petri Net sub-models, achieved after the data analysis by a Support Vector Machine. © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
2022
- DetailsVerde, L., Campanile, L., Marulli, F., & Marrone, S. (2022). Speech-based Evaluation of Emotions-Depression Correlation. Proceedings of the 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022. https://doi.org/10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927758
Abstract
Early detection of depression symptoms is fundamental to limit the onset of further associated behavioural disorders, such as psychomotor or social withdrawal. The combination of Artificial Intelligence and speech analysis revealed the existence of objectively measurable physical manifestations for early detection of depressive symptoms, constituting a valid support to evaluate these signals. To push forward the research state-of-art, this aim of this paper is to understand quantitative correlations between emotional states and depression by proposing a study across different datasets containing speech of both depressed/non-depressed people and emotional-related samples. The relationship between affective measures and depression can, in fact, a support to evaluate the presence of depression state. This work constitutes a preliminary step of a study whose final aim is to pursue AI-powered personalized medicine by building sophisticated Clinical Decision Support Systems for depression, as well as other psychological disorders. © 2022 IEEE.
2021
- DetailsMarulli, F., Verde, L., & Campanile, L. (2021). Exploring data and model poisoning attacks to deep learning-based NLP systems [Conference paper]. Procedia Computer Science, 192, 3570–3579. https://doi.org/10.1016/j.procs.2021.09.130
Abstract
Natural Language Processing (NLP) is being recently explored also to its application in supporting malicious activities and objects detection. Furthermore, NLP and Deep Learning have become targets of malicious attacks too. Very recent researches evidenced that adversarial attacks are able to affect also NLP tasks, in addition to the more popular adversarial attacks on deep learning systems for image processing tasks. More precisely, while small perturbations applied to the data set adopted for training typical NLP tasks (e.g., Part-of-Speech Tagging, Named Entity Recognition, etc..) could be easily recognized, models poisoning, performed by the means of altered data models, typically provided in the transfer learning phase to a deep neural networks (e.g., poisoning attacks by word embeddings), are harder to be detected. In this work, we preliminary explore the effectiveness of a poisoned word embeddings attack aimed at a deep neural network trained to accomplish a Named Entity Recognition (NER) task. By adopting the NER case study, we aimed to analyze the severity of such a kind of attack to accuracy in recognizing the right classes for the given entities. Finally, this study represents a preliminary step to assess the impact and the vulnerabilities of some NLP systems we adopt in our research activities, and further investigating some potential mitigation strategies, in order to make these systems more resilient to data and models poisoning attacks. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International.
2024
- DetailsVerde, L., Marulli, F., De Fazio, R., Campanile, L., & Marrone, S. (2024). HEAR set: A ligHtwEight acoustic paRameters set to assess mental health from voice analysis [Article]. Computers in Biology and Medicine, 182. https://doi.org/10.1016/j.compbiomed.2024.109021
Abstract
Background: Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. Method: In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. Results: The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. Conclusions: The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders. © 2024 The Author(s)
2023
- DetailsCampanile, L., de Fazio, R., Di Giovanni, M., Marrone, S., Marulli, F., & Verde, L. (2023). Inferring Emotional Models from Human-Machine Speech Interactions [Conference paper]. Procedia Computer Science, 225, 1241–1250. https://doi.org/10.1016/j.procs.2023.10.112
Abstract
Human-Machine Interfaces (HMIs) are getting more and more important in a hyper-connected society. Traditional HMIs are built considering cognitive features while emotional ones are often neglected, bringing sometimes such interfaces to misuse. As a part of a long run research, oriented to the definition of an HMI engineering approach, this paper concretely proposes a method to build an emotional-aware explicit model of the user starting from the behaviour of the human with a virtual agent. The paper also proposes an instance of this model inference process in voice assistants in an automatic depression context, which can constitute the core phase to realize a Human Digital Twin of a patient. The case study generated a model composed of Fluid Stochastic Petri Net sub-models, achieved after the data analysis by a Support Vector Machine. © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
2022
- DetailsVerde, L., Campanile, L., Marulli, F., & Marrone, S. (2022). Speech-based Evaluation of Emotions-Depression Correlation. Proceedings of the 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022. https://doi.org/10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927758
Abstract
Early detection of depression symptoms is fundamental to limit the onset of further associated behavioural disorders, such as psychomotor or social withdrawal. The combination of Artificial Intelligence and speech analysis revealed the existence of objectively measurable physical manifestations for early detection of depressive symptoms, constituting a valid support to evaluate these signals. To push forward the research state-of-art, this aim of this paper is to understand quantitative correlations between emotional states and depression by proposing a study across different datasets containing speech of both depressed/non-depressed people and emotional-related samples. The relationship between affective measures and depression can, in fact, a support to evaluate the presence of depression state. This work constitutes a preliminary step of a study whose final aim is to pursue AI-powered personalized medicine by building sophisticated Clinical Decision Support Systems for depression, as well as other psychological disorders. © 2022 IEEE.
2021
- DetailsMarulli, F., Verde, L., & Campanile, L. (2021). Exploring data and model poisoning attacks to deep learning-based NLP systems [Conference paper]. Procedia Computer Science, 192, 3570–3579. https://doi.org/10.1016/j.procs.2021.09.130
Abstract
Natural Language Processing (NLP) is being recently explored also to its application in supporting malicious activities and objects detection. Furthermore, NLP and Deep Learning have become targets of malicious attacks too. Very recent researches evidenced that adversarial attacks are able to affect also NLP tasks, in addition to the more popular adversarial attacks on deep learning systems for image processing tasks. More precisely, while small perturbations applied to the data set adopted for training typical NLP tasks (e.g., Part-of-Speech Tagging, Named Entity Recognition, etc..) could be easily recognized, models poisoning, performed by the means of altered data models, typically provided in the transfer learning phase to a deep neural networks (e.g., poisoning attacks by word embeddings), are harder to be detected. In this work, we preliminary explore the effectiveness of a poisoned word embeddings attack aimed at a deep neural network trained to accomplish a Named Entity Recognition (NER) task. By adopting the NER case study, we aimed to analyze the severity of such a kind of attack to accuracy in recognizing the right classes for the given entities. Finally, this study represents a preliminary step to assess the impact and the vulnerabilities of some NLP systems we adopt in our research activities, and further investigating some potential mitigation strategies, in order to make these systems more resilient to data and models poisoning attacks. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International.
