Main Article Content
Abstract
Pemantauan kepadatan penumpang di stasiun kereta api secara akurat dan real-time merupakan aspek krusial dalam mendukung peningkatan kenyamanan, efisiensi, dan keselamatan pada sistem transportasi publik. Penelitian ini mengusulkan implementasi arsitektur Vision Transformer (ViT-Base) yang telah melalui tahap pelatihan awal (pre-trained) pada dataset ImageNet-21K, untuk melakukan deteksi dan estimasi kepadatan penumpang berbasis visual. Model tersebut dioptimalkan agar dapat dijalankan secara efisien pada perangkat komputasi edge Jetson Orin Nano, sehingga memungkinkan pemrosesan data secara lokal dengan konsumsi sumber daya yang rendah. Evaluasi kinerja dilakukan berdasarkan empat parameter utama, yakni tingkat akurasi, latensi, konsumsi energi, dan efisiensi komputasi. Hasil eksperimen menunjukkan bahwa ViT-Base mampu mencapai akurasi deteksi sebesar 91,17%, dengan latensi rata-rata sebesar 46,59 ms, konsumsi energi 0,1332 joule, dan efisiensi komputasi sebesar 0,171 %/msW. Temuan ini mengindikasikan bahwa ViT-Base merupakan solusi yang menjanjikan untuk sistem pemantauan kepadatan penumpang berbasis edge computing, khususnya dalam konteks penerapan pada lingkungan transportasi publik yang menuntut efisiensi dan kecepatan tinggi.
Keywords
Article Details
Copyright (c) 2025 Mas Nurul Achmadiah, Novendra Setyawan, Anindya Dwi Risdhayanti

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
- S. Terabe, T. Kato, H. Yaginuma, N. Kang, and K. Tanaka, “Risk Assessment Model for Railway Passengers on a Crowded Platform,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2673, no. 1, pp. 524–531, Jan. 2019, doi: 10.1177/0361198118821925.
- L. Jiao et al., “A Survey of Deep Learning-Based Object Detection,” IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/ACCESS.2019.2939201.
- M. Ahmad, I. Ahmed, and A. Adnan, “Overhead View Person Detection Using YOLO,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), IEEE, Oct. 2019, pp. 0627–0633. doi: 10.1109/UEMCON47517.2019.8992980.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
- Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 2961–2969.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” Jul. 2021.
- W. Liu et al., “SSD: Single Shot MultiBox Detector,” Dec. 2015, doi: 10.1007/978-3-319-46448-0_2.
- M. Nurul Achmadiah, A. Ahamad, C.-C. Sun, and W.-K. Kuo, “Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems,” IEEE Internet Things J, vol. 12, no. 11, pp. 16681–16694, Jun. 2025, doi: 10.1109/JIOT.2025.3536526.
- M. N. Achmadiah, N. Setyawan, A. A. Bryantono, C.-C. Sun, and W.-K. Kuo, “Fast Person Detection Using YOLOX With AI Accelerator For Train Station Safety,” in 2024 International Electronics Symposium (IES), IEEE, Aug. 2024, pp. 504–509. doi: 10.1109/IES63037.2024.10665874.
- J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
- J. Pan et al., “EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers,” May 2022.
- Youvan, “Developing and deploying ai applications on nvidia jetson orin nx: A comprehensive guide,” 2024.
References
S. Terabe, T. Kato, H. Yaginuma, N. Kang, and K. Tanaka, “Risk Assessment Model for Railway Passengers on a Crowded Platform,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2673, no. 1, pp. 524–531, Jan. 2019, doi: 10.1177/0361198118821925.
L. Jiao et al., “A Survey of Deep Learning-Based Object Detection,” IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/ACCESS.2019.2939201.
M. Ahmad, I. Ahmed, and A. Adnan, “Overhead View Person Detection Using YOLO,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), IEEE, Oct. 2019, pp. 0627–0633. doi: 10.1109/UEMCON47517.2019.8992980.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 2961–2969.
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” Jul. 2021.
W. Liu et al., “SSD: Single Shot MultiBox Detector,” Dec. 2015, doi: 10.1007/978-3-319-46448-0_2.
M. Nurul Achmadiah, A. Ahamad, C.-C. Sun, and W.-K. Kuo, “Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems,” IEEE Internet Things J, vol. 12, no. 11, pp. 16681–16694, Jun. 2025, doi: 10.1109/JIOT.2025.3536526.
M. N. Achmadiah, N. Setyawan, A. A. Bryantono, C.-C. Sun, and W.-K. Kuo, “Fast Person Detection Using YOLOX With AI Accelerator For Train Station Safety,” in 2024 International Electronics Symposium (IES), IEEE, Aug. 2024, pp. 504–509. doi: 10.1109/IES63037.2024.10665874.
J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
J. Pan et al., “EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers,” May 2022.
Youvan, “Developing and deploying ai applications on nvidia jetson orin nx: A comprehensive guide,” 2024.