Bidirectional and Auto-Regressive Transformer (BART) for Indonesian Abstractive Text Summarization
DOI:
https://doi.org/10.33795/jip.v10i4.5242Keywords:
Abstractive summarization, BART, Natural Language Processing, TransformersAbstract
Automatic summarization technology is developing rapidly to reduce reading time and obtain relevant information in Natural Language Processing technology research. There are two main approaches to text summarization: abstractive and extractive. The challenge of abstractive summarization results is higher than abstractive because abstractive summarization produces new and more natural words. Therefore, this research aims to produce abstractive summaries from Indonesian language texts with good readability. This research uses the Bidirectional and Auto-Regressive Transformer (BART) model, an innovative Transformers model combining two leading Transformer architectures, namely the BERT encoder and GPT decoder. The dataset used in this research is Liputan6, with model performance evaluation using ROUGE evaluation. The research results show that BART can produce good abstractive summaries with ROUGE-1, ROUGE-2, and ROUGE-L values of 37.19, 14.03, and 33.85, respectively.
Downloads
References
A Survey On Natural Language Processing For Social Media, ACM Computing Surveys (CSUR) (2018).
Abualigah, L., Bashabsheh, M. Q., Alabool, H., & Shehab, M. (2020). Text Summarization: A Brief Review (pp. 1–15). https://doi.org/10.1007/978-3-030-34614-0_1
Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan, 15(1), 37–43. https://doi.org/10.24036/jtip.v15i1.451
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
Fahluzi, A. (2024). Alfahluzi/indobert-summarization-bert2bert-extreme · Hugging Face. https://huggingface.co/Alfahluzi/indobert-summarization-bert2bert-extreme
Gkouti, N., Malakasiotis, P., Toumpis, S., & Androutsopoulos, I. (2024). Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?
Halim, F., Liliana, & Kartika, G. (2022). Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT. Jurnal Infra, 10(1).
Huda, A. F., Putri, M. W., Awalluddin, A. S., & Sholeha, R. (2022). Text Summarization of Hadits in Indonesian Language Using The Combination of Fuzzy Logic Scoring And Latent Semantic Analysis (LSA). 2022 8th International Conference on Wireless and Telematics (ICWT), 1–4. https://doi.org/10.1109/ICWT55831.2022.9935408
Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Systems with Applications, 129. https://doi.org/10.1016/j.eswa.2019.03.045
Kasture, N. R., Yargal, N., Singh, N. N., Kulkarni, N., & Mathur, V. (2014). A Survey on Methods of Abstractive Text Summarization. International Journal for Research in Emerging Science and Technology, 1(6), 54–57.
Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. ArXiv Preprint ArXiv:2011.00679.
Kurniawan, K., & Louvan, S. (2018). Indosum: A New Benchmark Dataset for Indonesian Text Summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/https://doi.org/10.1109/IALP.2018.8629109
Laksana, M. D. B., Karyawati, A. E., Putri, L. A. A. R., Santiyasa, I. W., Sanjaya ER, N. A., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), 11(2), 339. https://doi.org/10.24843/JLK.2022.v11.i02.p13
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. http://arxiv.org/abs/1910.13461
Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches out (WAS 2004), 1.
Lin, N., Li, J., & Jiang, S. (2022). A simple but effective method for Indonesian automatic text summarisation. Connection Science, 34(1), 29–43. https://doi.org/10.1080/09540091.2021.1937942
Moratanch, N., & Chitrakala, S. (2016). A survey on abstractive text summarization. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), 1–7. https://doi.org/10.1109/ICCPCT.2016.7530193
Moratanch, N., & Chitrakala, S. (2017). A survey on extractive text summarization. 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), 1–6. https://doi.org/10.1109/ICCCSP.2017.7944061
Ng, J.-P., & Abrecht, V. (2015). Better summarization evaluation with word embeddings for rouge. ArXiv Preprint ArXiv:1508.06034.
Rahimi, S. R., Mozhdehi, A. T., & Abdolahi, M. (2017). An overview on extractive text summarization. 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), 0054–0062. https://doi.org/10.1109/KBEI.2017.8324874
Rush, A. M., Chopra, S., & Weston, J. (2015). A Neural Attention Model for Abstractive Sentence Summarization. Proceedings of EMNLP 2015, 1509(685).
shashwatswain. (2023, June 8). BART Model for Text Auto Completion in NLP. Geeksforgeeks.Org. https://www.geeksforgeeks.org/bart-model-for-text-auto-completion-in-nlp/
Shirwandkar, N. S., & Kulkarni, S. (2018). Extractive text summarization using deep learning. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 1–5.
Widjaja, A. (2023). thonyyy/pegasus_indonesian_base-finetune · Hugging Face. https://huggingface.co/thonyyy/pegasus_indonesian_base-finetune
Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-trained Model. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880
Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K., & Mahoney, M. (2021). ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10665–10673. https://doi.org/10.1609/aaai.v35i12.17275
Yuliska, Y., & Syaliman, K. U. (2020). Literatur Review Terhadap Metode, Aplikasi dan Dataset Peringkasan Dokumen Teks Otomatis untuk Teks Berbahasa Indonesia. IT Journal Research and Development, 5(1). https://doi.org/10.25299/itjrd.2020.vol5(1).4688
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., & Huang, X. (2020). Extractive Summarization as Text Matching.