Deep learning for text summarization

2023-2024 tavasz

Nyelvtechnológia

Téma leírása

Abstractive text summarization is a text generation task where a model generates a short summary of a long text. We are looking for students for an ongoing project at HUN-REN SZTAKI (https://www.sztaki.hu/). Our group is responsible for building the only open source summarization corpus for Hungarian. We also trained various large language models (LLMs). Our data and models are available here: https://huggingface.co/SZTAKI-HLT 

Team members have completed the following theses:

  • Evaluating Automatic Metrics for Hungarian Abstractive Summarization
  • Hallucination in Abstractive Summarization
  • A Hybrid Approach for Abstractive Summarization Using Extractive Summarization
  • Controllable Text Generation in Abstractive Summarization


We published the following papers:

  • From News to Summaries: Building a Hungarian Corpus for Extractive and Abstractive Summarization (under review) 

The next direction we want to explore is data hallucination in English and Hungarian.
The project is open for both Hungarian and foreign students.
If you are interested in joining our project, please email me (acs.judit AT sztaki.hu) with a short introduction.


Feltételek

  • Intermediate Python
  • Machine learning basics
  • Ability to work in a Linux environment

Külső partner: HUN-REN SZTAKI

Maximális létszám: 3 fő