Bidirectional and Auto-Regressive Transformer (BART) for Indonesian Abstractive Text Summarization

Authors

  • Gaduh Hartawan UIN Sunan Gunung Djati Bandung
  • Dian Sa'adillah Maylawati UIN Sunan Gunung Djati Bandung
  • Wisnu Uriawan UIN Sunan Gunung Djati Bandung

DOI:

https://doi.org/10.33795/jip.v10i4.5242

Keywords:

Abstractive summarization, BART, Natural Language Processing, Transformers

Abstract

Automatic summarization technology is developing rapidly to reduce reading time and obtain relevant information in Natural Language Processing technology research. There are two main approaches to text summarization: abstractive and extractive. The challenge of abstractive summarization results is higher than abstractive because abstractive summarization produces new and more natural words. Therefore, this research aims to produce abstractive summaries from Indonesian language texts with good readability. This research uses the Bidirectional and Auto-Regressive Transformer (BART) model, an innovative Transformers model combining two leading Transformer architectures, namely the BERT encoder and GPT decoder. The dataset used in this research is Liputan6, with model performance evaluation using ROUGE evaluation. The research results show that BART can produce good abstractive summaries with ROUGE-1, ROUGE-2, and ROUGE-L values of 37.19, 14.03, and 33.85, respectively.

Downloads

Download data is not yet available.

References

A Survey On Natural Language Processing For Social Media, ACM Computing Surveys (CSUR) (2018).

Abualigah, L., Bashabsheh, M. Q., Alabool, H., & Shehab, M. (2020). Text Summarization: A Brief Review (pp. 1–15). https://doi.org/10.1007/978-3-030-34614-0_1

Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan, 15(1), 37–43. https://doi.org/10.24036/jtip.v15i1.451

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679

Fahluzi, A. (2024). Alfahluzi/indobert-summarization-bert2bert-extreme · Hugging Face. https://huggingface.co/Alfahluzi/indobert-summarization-bert2bert-extreme

Gkouti, N., Malakasiotis, P., Toumpis, S., & Androutsopoulos, I. (2024). Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?

Halim, F., Liliana, & Kartika, G. (2022). Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT. Jurnal Infra, 10(1).

Huda, A. F., Putri, M. W., Awalluddin, A. S., & Sholeha, R. (2022). Text Summarization of Hadits in Indonesian Language Using The Combination of Fuzzy Logic Scoring And Latent Semantic Analysis (LSA). 2022 8th International Conference on Wireless and Telematics (ICWT), 1–4. https://doi.org/10.1109/ICWT55831.2022.9935408

Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Systems with Applications, 129. https://doi.org/10.1016/j.eswa.2019.03.045

Kasture, N. R., Yargal, N., Singh, N. N., Kulkarni, N., & Mathur, V. (2014). A Survey on Methods of Abstractive Text Summarization. International Journal for Research in Emerging Science and Technology, 1(6), 54–57.

Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. ArXiv Preprint ArXiv:2011.00679.

Kurniawan, K., & Louvan, S. (2018). Indosum: A New Benchmark Dataset for Indonesian Text Summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/https://doi.org/10.1109/IALP.2018.8629109

Laksana, M. D. B., Karyawati, A. E., Putri, L. A. A. R., Santiyasa, I. W., Sanjaya ER, N. A., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), 11(2), 339. https://doi.org/10.24843/JLK.2022.v11.i02.p13

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. http://arxiv.org/abs/1910.13461

Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches out (WAS 2004), 1.

Lin, N., Li, J., & Jiang, S. (2022). A simple but effective method for Indonesian automatic text summarisation. Connection Science, 34(1), 29–43. https://doi.org/10.1080/09540091.2021.1937942

Moratanch, N., & Chitrakala, S. (2016). A survey on abstractive text summarization. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), 1–7. https://doi.org/10.1109/ICCPCT.2016.7530193

Moratanch, N., & Chitrakala, S. (2017). A survey on extractive text summarization. 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), 1–6. https://doi.org/10.1109/ICCCSP.2017.7944061

Ng, J.-P., & Abrecht, V. (2015). Better summarization evaluation with word embeddings for rouge. ArXiv Preprint ArXiv:1508.06034.

Rahimi, S. R., Mozhdehi, A. T., & Abdolahi, M. (2017). An overview on extractive text summarization. 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), 0054–0062. https://doi.org/10.1109/KBEI.2017.8324874

Rush, A. M., Chopra, S., & Weston, J. (2015). A Neural Attention Model for Abstractive Sentence Summarization. Proceedings of EMNLP 2015, 1509(685).

shashwatswain. (2023, June 8). BART Model for Text Auto Completion in NLP. Geeksforgeeks.Org. https://www.geeksforgeeks.org/bart-model-for-text-auto-completion-in-nlp/

Shirwandkar, N. S., & Kulkarni, S. (2018). Extractive text summarization using deep learning. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 1–5.

Widjaja, A. (2023). thonyyy/pegasus_indonesian_base-finetune · Hugging Face. https://huggingface.co/thonyyy/pegasus_indonesian_base-finetune

Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-trained Model. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880

Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K., & Mahoney, M. (2021). ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10665–10673. https://doi.org/10.1609/aaai.v35i12.17275

Yuliska, Y., & Syaliman, K. U. (2020). Literatur Review Terhadap Metode, Aplikasi dan Dataset Peringkasan Dokumen Teks Otomatis untuk Teks Berbahasa Indonesia. IT Journal Research and Development, 5(1). https://doi.org/10.25299/itjrd.2020.vol5(1).4688

Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., & Huang, X. (2020). Extractive Summarization as Text Matching.

Downloads

Published

2024-08-30

How to Cite

Hartawan, G., Maylawati, D. S., & Uriawan, W. (2024). Bidirectional and Auto-Regressive Transformer (BART) for Indonesian Abstractive Text Summarization. Jurnal Informatika Polinema, 10(4), 535–542. https://doi.org/10.33795/jip.v10i4.5242