Studi Empiris BART untuk Abstraksi Teks Semi-Struktural Domain SIPAKAT AIR
DOI:
https://doi.org/10.33795/jip.v12i1.8285Keywords:
Abstractive Summarization, BART, Teks Semi-Struktural, Natural Language Processing, SIPAKAT AIRAbstract
SIPAKAT AIR (Sistem Informasi dan Pelaporan Bidang Sumber Daya Air)) merupakan sistem informasi pemerintah daerah yang mencatat data proyek infrastruktur sumber daya air dalam format semi-struktural, yaitu gabungan antara elemen tabular dan uraian naratif. Format ini menyulitkan proses peringkasan otomatis karena tidak sepenuhnya terstruktur maupun bebas, sehingga memerlukan pendekatan yang adaptif dan semantik. Penelitian ini menyajikan studi empiris pengembangan model abstractive summarization berbasis BART yang dibangun dan dilatih dari awal (from scratch) menggunakan dataset SIPAKAT AIR. Dataset terdiri atas 200 pasangan teks dan ringkasan yang dikonstruksi dari proyek aktual. Tokenizer khusus dilatih menggunakan pendekatan ByteLevelBPETokenizer untuk mencerminkan struktur kalimat teknis dalam korpus internal. Model BART dikonfigurasi ringan (2-layer encoder-decoder) dan dilatih menggunakan Trainer API dari Huggingface. Evaluasi menggunakan metrik ROUGE, BERTScore, dan token-level menunjukkan performa kompetitif: ROUGE-1 F1 sebesar 0,5080, ROUGE-L F1 sebesar 0,5082, BERTScore F1 sebesar 0,81, serta token-level F1 sebesar 0,73 dengan akurasi 0,71. Model mampu menghasilkan ringkasan padat dan kontekstual, sesuai digunakan untuk sistem notifikasi atau tampilan ringkas pada dashboard proyek. Kontribusi metodologis dari penelitian ini mencakup perancangan pipeline summarization dari nol untuk bahasa Indonesia serta pembuktian bahwa arsitektur ringan dapat berfungsi optimal pada domain terbatas. Penelitian ini memberikan kontribusi pada pengembangan NLP dalam bahasa Indonesia serta membuka peluang penerapan adaptif untuk data semi-struktural sektor publik.
Downloads
References
Huang, D., Cui, L., Yang, S., Bao, G., Wang, K., Xie, J., & Zhang, Y. (2020). What Have We Achieved on Text Summarization? https://github.com/hddbang/PolyTope
Irsan, I. C., Zhang, T., Thung, F., Lo, D., & Jiang, L. (2022). AutoPRTitle: A Tool for Automatic Pull Request Title Generation. http://arxiv.org/abs/2206.11619
Jearanaitanakij, K., Boonpong, S., Teainnagrm, K., Thonglor, T., Kullawan, T., & Yongpiyakul, C. (2024). Fast Hybrid Approach for Thai News Summarization. Technol. Horiz, 41(4), 410307. https://doi.org/10.55003/ETH.410307
Ketineni, S., & Sheela, J. (2024). Modified CNN with Transfer Learning for Multi-Document Summarization: Proposing Co-Occurrence Matrix Generation-Based Knowledge Extraction. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 32(05), 721–745. https://doi.org/10.1142/S021848852450017X
Kharate, N. (2024). Manisha Gaikwad 2 Gitanjali Shinde 3 Parikshit Mahalle 4 Nilesh Sable “An Extensive study of Symantic and Syntatic Approaches to Automatic Text Summarization.” In J. Electrical Systems (Vol. 20, Issue 1).
Mehamed, M. A., Xiong, S., & Aberha, A. F. (2025). Hybrid Approach for Automatic Text Summarization for Low-resourced Amharic Language. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 24(7). https://doi.org/10.1145/3743677
Peng, J., Guo, F., Liu, S., Fang, H., Xu, Z., & Wang, T. (2023). Recent Advances and Future Prospects of Mycosporine-like Amino Acids. In Molecules (Vol. 28, Issue 14). Multidisciplinary Digital Publishing Institute (MDPI). https://doi.org/10.3390/molecules28145588
Prakash, A. (2025). Enhancing News Article Summarization with Machine Learning. International Journal for Global Academic & Scientific Research, 3(4), 20–34. https://doi.org/10.55938/ijgasr.v3i4.152
Saeed, M. Y., Awais, M., Younas, M., Shah, M. A., Khan, A., Uddin, M. I., & Mahmoud, M. (2021). An abstractive summarization technique with variable length keywords as per document diversity. Computers, Materials and Continua, 66(3), 2409–2423. https://doi.org/10.32604/cmc.2021.014330
Suliman, W., Yaseen, A., & Hamada, N. (2025). Advancements in abstractive text summarization: a deep learning approach. IAES International Journal of Artificial Intelligence, 14(3), 2315–2327. https://doi.org/10.11591/ijai.v14.i3.pp2315-2327
Thornton, A., Morgan, W. H., Bladon, E. K., Smith, R. K., & Sutherland, W. J. (2025). Coral Conservation: Global Evidence for the Effects of Actions. In Coral Conservation: Global Evidence for the Effects of Action. Open Book Publishers. https://doi.org/10.11647/OBP.0453
V, N., & R, M. T. (n.d.). Improved Bi-GRU framework for Multi-document text summarization with aspect and thematic feature descriptor: Model training via hybrid optimization. Intelligent Decision Technologies, 0(0), 18724981251346852. https://doi.org/10.1177/18724981251346852
Wang, Y. (2024). Research of types and current state of machine translation. Applied and Computational Engineering, 37(1), 95–101. https://doi.org/10.54254/2755-2721/37/20230479
Wiratmoko, G., Thamrin, H., & Pamungkas, E. W. (2025). Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts. Jurnal Online Informatika, 10(1), 196–204. https://doi.org/10.15575/join.v10i1.1506
Zhang, M., Zhou, G., Yu, W., Huang, N., & Liu, W. (2022). A Comprehensive Survey of Abstractive Text Summarization Based on Deep Learning. In Computational Intelligence and Neuroscience (Vol. 2022). Hindawi Limited. https://doi.org/10.1155/2022/7132226
Zin, M. M., Nguyen, H. T., Satoh, K., Sugawara, S., & Nishino, F. (2023). Information Extraction from Lengthy Legal Contracts: Leveraging Query-Based Summarization and GPT-3.5. Frontiers in Artificial Intelligence and Applications, 379, 177–186. https://doi.org/10.3233/FAIA230963






