Posts by Collection

portfolio

publications

Poli2Sum@ CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models

Published in In 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) @ SIGIR 2019 (Vol. 2414, pp. 233–246), 2019

This paper presents the Poli2Sum approach to the 5th Computational Linguistics Scientific Document Summarization Shared Task (BIRNDL CL-SciSumm 2019).

Recommended citation: La Quatra, M., Cagliero, L., & Baralis, E. (2019). Poli2Sum@CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models. In 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) @ SIGIR 2019 (Vol. 2414, pp. 233–246). http://ceur-ws.org/Vol-2414/paper24.pdf

Combining Machine Learning and Natural Language Processing for Language-Specific, Multi-Lingual, and Cross-Lingual Text Summarization: A Wide-Ranging Overview

Published in Trends and Applications of Text Summarization Techniques, 2019

The recent advances in multimedia and web-based applications have eased the accessibility to large collections of textual documents. To automate the process of document analysis, the research community has put relevant efforts into extracting short summaries of the document content.

Recommended citation: Cagliero, Luca, Paolo Garza, and Moreno La Quatra. "Combining Machine Learning and Natural Language Processing for Language-Specific, Multi-Lingual, and Cross-Lingual Text Summarization: A Wide-Ranging Overview." Trends and Applications of Text Summarization Techniques. IGI Global, 2020. 1-31. https://www.igi-global.com/chapter/combining-machine-learning-and-natural-language-processing-for-language-specific-multi-lingual-and-cross-lingual-text-summarization/235739

From Hotel Reviews to City Similarities: A Unified Latent-Space Model

Published in Electronics, 9(1), 2020

In the context of hospitality management, a challenging research problem is to identify effective strategies to explain hotel reviews and ratings and their correlation with the urban context. Under this umbrella, the paper investigates the use of sentence-based embedding models to deeply explore the similarities and dissimilarities between cities in terms of the corresponding hotel reviews and the surrounding points of interests.

Recommended citation: Cagliero, L.; La Quatra, M.; Apiletti, D. From Hotel Reviews to City Similarities: A Unified Latent-Space Model. Electronics 2020, 9, 197. https://www.mdpi.com/2079-9292/9/1/197

Exploiting pivot words to classify and summarize discourse facets of scientific papers

Published in Scientometrics (2020), 2020

This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets.

Recommended citation: La Quatra, M., Cagliero, L. & Baralis, E. Exploiting pivot words to classify and summarize discourse facets of scientific papers. Scientometrics (2020). https://doi.org/10.1007/s11192-020-03532-3 https://doi.org/10.1007/s11192-020-03532-3

Extracting Highlights of Scientific Articles: a Supervised Summarization Approach

Published in Expert Systems With Applications (2020), 2020

This paper presents a supervised approach, based on regression techniques, with the twofold aim at automatically extracting highlights of past articles with missing annotations and simplifying the process of manually annotating new articles.

Recommended citation: Cagliero L. & La Quatra M., Extracting Highlights of Scientific Articles: a Supervised Summarization Approach, Expert Systems with Applications, 2020, 113659, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2020.113659. https://doi.org/10.1016/j.eswa.2020.113659

End-to-end Training For Financial Report Summarization

Published in 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, 2020

The proposed methodology exploit the advancements in the Natural Language Understanding field to create a fine-tuned architecture able to summarize financial documents.

Recommended citation: La Quatra, M., & Cagliero, L. (2020, December). End-to-end Training For Financial Report Summarization. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (pp. 118-123). https://www.aclweb.org/anthology/2020.fnp-1.20/

Summarize Dates First: A Paradigm Shift in Timeline Summarization

Published in ACM SIGIR 2021, 2021

Timeline summarization aims at presenting long news stories in a compact manner. This paper proposes a new approach, namely Summarize Date First, which focuses on first generating date-level summaries then selecting the most relevant dates on top of summarized knowledge. In the latter stage, it performs date aggregations to consider high-level temporal references as well.

Recommended citation: Moreno La Quatra, Luca Cagliero, Elena Baralis, Alberto Messina, and Maurizio Montagnuolo. 2021. Summarize Dates First: A Paradigm Shift in Timeline Summarization. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 21). Association for Computing Machinery, New York, NY, USA, 418–427. DOI:https://doi.org/10.1145/3404835.3462954 https://doi.org/10.1145/3404835.3462954

Automatic slides generation in the absence of training data

Published in IEEE COMPSAC 2021, 2021

The aim of this paper is to use unsupervised summarization methods to generate sentence-level summaries of the paper sections, which are then refined by applying an optimization step. It evaluates the quality of the output slides by taking into account the original paper structure as well. The results, achieved on a benchmark collection of papers and slides, show that unsupervised models performed better than supervised ones on specific paper facets.

Recommended citation: L. Cagliero and M. L. Quatra, "Automatic slides generation in the absence of training data," 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021, pp. 103-108, doi: 10.1109/COMPSAC51774.2021.00025. https://doi.org/10.1109/COMPSAC51774.2021.00025

Leveraging full-text article exploration for citation analysis

Published in Springer Scientometrics, 2021

This paper proposes a classification approach to automatically predict whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases

Recommended citation: La Quatra, M., Cagliero, L. & Baralis, E. Leveraging full-text article exploration for citation analysis. Scientometrics (2021). https://doi.org/10.1007/s11192-021-04117-4 https://doi.org/10.1007/s11192-021-04117-4

Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora

Published in IEEE Access (2021), 2021

This paper proposes a new methodology to automatically infer aligned domain-specific word embeddings for a target language on the basis of the general-purpose and domain-specific models available for a source language (typically, English). The proposed inference method relies on a two-step process, which first automatically identifies domain-specific words and then opportunistically reuses the non-linear space transformations applied to the word vectors of the source language.

Recommended citation: L. Cagliero and M. La Quatra, "Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora," in IEEE Access, vol. 9, pp. 137309-137321, 2021, doi: 10.1109/ACCESS.2021.3118093. https://doi.org/10.1109/ACCESS.2021.3118093

talks

Poster presentation BIRNDL 2019

Aggiornato:

Supervised models are trained on a variety of data features related to the structure, semantics and syntax of the text. The idea behind is to effectively explore the latent connections between citing context and sentences in the reference paper.

Using Regression Models to pinpoint Relevant Content in Research Papers

Aggiornato:

Thanks to the world-scale diffusion of web-based applications, digital libraries are playing a foundamental role in giving access to research papers thus allowing researchers to disseminate their main research findings. Our work focuses on extracting the sentences that best summarize the main topics and finding of the research manuscript in an automated manner.

PhD project presentation

Aggiornato:

Using a video presentation I show some of the research trends investigated during my PhD studies. I give an very high-level overview multilingual and timeline summarization tasks that we address by using NLP and Deep Learning.

Poster presentation @ FNS 2020

Aggiornato:

The summarization architecture proposed for the FNS 2020 shared task is based on a three-phases process.

  • Preprocessing step: clean input financial reports and annotate its content at sentence level.
  • Training step: deep learning models are fine-tuned for the regression task exploiting the annotations obtained during the preprocessing step.
  • Evaluation phase: is applied at document level. The sentences of each annual reports make a forward pass through the fine-tuned model to obtain the estimated relevance score. The final summary merges sentences according to the relevance score predicted by the fine-tuned architecture.

teaching

Deep Natural Language Processing (2021-Current)

Teaching assistant for master course, Politecnico di Torino, DAUIN, 2021

This is a master-level course for student in “Data Science and Engineering” specialization. I’m involved as teaching assistant both for in-class and Lab practices. An unextensive list of the topics of the course is reported below: