Identification of Socio Economic Registration Data Using OCR Based Tesseract and Google Cloud Vision
Author's Country: Indonesia
DOI:
https://doi.org/10.36805/bit-cs.v5i2.6258Keywords:
Hand Writing, Optical Character Recognition, Socioeconomic infrastructure, Surveyor officerAbstract
The Indonesian government program, called Socio-Economic Registration (Regsosek), aims to measure and monitor the socio-economic conditions of low-income people. One of the relevant data used for research is Regsosek. This method is used to analyze the influence of economic and social infrastructure on economic growth, analyze the socio-economic determinants of ownership of work accident insurance for informal workers, create a women's socio-economic vulnerability index (IKSEP), and study intercultural literacy from a social, economic and political perspective. The success of the government's Socio-Economic Registration program depends on the role of data collection officers or surveyors, who directly interact with the community to obtain information about Socio-Economic Registration (Regsosek) data collection. This method also has other obstacles that significantly affect the overall results of the survey, where the survey results must be entered manually by the surveyor from a form with handwritten data, after which it is entered into the website. This method is vulnerable to human error, where the handwriting is difficult to read, and mistakes are made during the data input. The technology that can be used to handle this problem is implementing the OCR method, where writing that was initially handwritten manually can be identified and converted into digital text that can be edited (editable text) and processed automatically. This research shows that the proposed method has good accuracy, with an Accuracy of 96.45%, CER 0.3%, and WER 4.30%.
Downloads
References
. Arianto, R. F., Rahman, A. Y., & Marisa, F. (2023). Text Recognition For Socioeconomic Data
Survey Sheet Using Ocr Tesseract.
. Berg, S. A., So, R. H. Y., & Seo, S. Y. (2019). Application Of Optical Character Recognition
With Tesseract In Logistics Management. International Journal Of Internet Manufacturing And
Services, 6(3), Article 3. Https://Doi.Org/10.1504/Ijims.2019.10022461
. Chesley, E., Marcantonio, J., & Pearson, A. (2019). Towards Syriac Digital Corpora: Evaluation
Of Tesseract 4.0 For Syriac Ocr. Hugoye: Journal Of Syriac Studies, 22(1), Article 1.
Https://Doi.Org/10.31826/Hug-2019-220105
. González, G., & Evans, C. L. (2019). Biomedical Image Processing With Containers And Deep
Learning: An Automated Analysis Pipeline: Data Architecture, Artificial Intelligence, Automated
Processing, Containerization, And Clusters Orchestration Ease The Transition From Data
Acquisition To Insights In Medium‐To‐Large Datasets. Bioessays, 41(6), Article 6.
Https://Doi.Org/10.1002/Bies.201900004
. Haji, C. M. (2022). Linguistic Analysis On Cursive Characters. The Journal Of Duhok University,
(2), Article 2. Https://Doi.Org/10.26682/Sjuod.2022.25.2.3
. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K. J., Krishnan, P., Yin, X., & Hassner, T.
(2021). A Multiplexed Network For End-To-End, Multilingual Ocr. 2021 Ieee/Cvf Conference
On Computer Vision And Pattern Recognition
(Cvpr), 4545–4555.
Https://Doi.Org/10.1109/Cvpr46437.2021.00452
. Hukkeri, G. S., Goudar, R. H., Janagond, P., & Patil, P. S. (2022). Machine Learning In Ocr
Technology: Performance Analysis Of Different Ocr Methods For Slide-To-Text Conversion In
Lecture Videos. International Journal Of Advanced Computer Science And Applications, 13(8),
Article 8. Https://Doi.Org/10.14569/Ijacsa.2022.0130839
. Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten Optical Character Recognition
(Ocr): A Comprehensive Systematic Literature Review (Slr). Ieee Access, 8, 142642–
Https://Doi.Org/10.1109/Access.2020.3012542
. Muharom, S. (2019). Pengenalan Nomor Ruangan Menggunakan Kamera Berbasis Ocr Dan
Template Matching. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 4(1),
Article 1. Https://Doi.Org/10.25139/Inform.V4i1.1371
. Mursari, L. R., & Wibowo, A. (2021). The Effectiveness Of Image Preprocessing On Digital
Handwritten Scripts Recognition With The Implementation Of Ocr Tesseract. Computer Engineering
And Applications Journal, 10(3), Article 3. Https://Doi.Org/10.18495/Comengapp.V10i3.386
.
Putri, M. H., & Yuhan, R. J. (2020). Indeks Kerawanan Sosial Ekonomi Perempuan Indonesia
Tahun 2017. Seminar Nasional Official Statistics, 2019(1), Article 1.
Https://Doi.Org/10.34123/Semnasoffstat.V2019i1.117
. Rohman, M. A. A., & Djasuli, M. (2022). Penerapan Good Corporate Governance Tranparansi
Terhadap Kinerja Surveyor Registrasi Sosial Ekonomi Dalam Mewujudkan Data Akurat.
. Smith, R., Newton, C., & Cheatle, P. (N.D.). Adaptive Thresholding For Ocr: A Significant Test.
. Suharto, E. (2015). Peran Perlindungan Sosial Dalam Mengatasi Kemiskinan Di Indonesia: Studi
Kasus Program Keluarga Harapan. Sosiohumaniora, 17(1), Article 1.
Https://Doi.Org/10.24198/Sosiohumaniora.V17i1.5668
. Wibawa, C., & Anggraeni, D. T. (2023). Comparison Of Image Segmentation Method In Image
Character Extraction Preprocessing Using Optical Character Recoginiton. Jurnal Teknik Informatika
(Jutif),
(3),
–589.
Https://Doi.Org/10.52436/1.Jutif.2023.4.3.956

Downloads
Published
Issue
Section
License
This work is licensed under a Lisensi Creative Commons Atribusi-BerbagiSerupa 4.0 Internasional.