Identification of Socio Economic Registration Data Using OCR Based Tesseract and Google Cloud Vision

Author's Country: Indonesia

Authors

  • Lionardi Ursaputra Pratama a:1:{s:5:"en_US";s:28:"Universitas Widyagama Malang";}
  • Aviv Yuniar Rahman Universitas Widyagama Malang
  • Rangga Pahlevi Putra Universitas Widyagama Malang

DOI:

https://doi.org/10.36805/bit-cs.v5i2.6258

Keywords:

Hand Writing, Optical Character Recognition, Socioeconomic infrastructure, Surveyor officer

Abstract

The Indonesian government program, called Socio-Economic Registration (Regsosek), aims to measure and monitor the socio-economic conditions of low-income people. One of the relevant data used for research is Regsosek. This method is used to analyze the influence of economic and social infrastructure on economic growth, analyze the socio-economic determinants of ownership of work accident insurance for informal workers, create a women's socio-economic vulnerability index (IKSEP), and study intercultural literacy from a social, economic and political perspective. The success of the government's Socio-Economic Registration program depends on the role of data collection officers or surveyors, who directly interact with the community to obtain information about Socio-Economic Registration (Regsosek) data collection. This method also has other obstacles that significantly affect the overall results of the survey, where the survey results must be entered manually by the surveyor from a form with handwritten data, after which it is entered into the website. This method is vulnerable to human error, where the handwriting is difficult to read, and mistakes are made during the data input. The technology that can be used to handle this problem is implementing the OCR method, where writing that was initially handwritten manually can be identified and converted into digital text that can be edited (editable text) and processed automatically. This research shows that the proposed method has good accuracy, with an Accuracy of 96.45%, CER 0.3%, and WER 4.30%.

Downloads

Download data is not yet available.

References

. Arianto, R. F., Rahman, A. Y., & Marisa, F. (2023). Text Recognition For Socioeconomic Data

Survey Sheet Using Ocr Tesseract.

. Berg, S. A., So, R. H. Y., & Seo, S. Y. (2019). Application Of Optical Character Recognition

With Tesseract In Logistics Management. International Journal Of Internet Manufacturing And

Services, 6(3), Article 3. Https://Doi.Org/10.1504/Ijims.2019.10022461

. Chesley, E., Marcantonio, J., & Pearson, A. (2019). Towards Syriac Digital Corpora: Evaluation

Of Tesseract 4.0 For Syriac Ocr. Hugoye: Journal Of Syriac Studies, 22(1), Article 1.

Https://Doi.Org/10.31826/Hug-2019-220105

. González, G., & Evans, C. L. (2019). Biomedical Image Processing With Containers And Deep

Learning: An Automated Analysis Pipeline: Data Architecture, Artificial Intelligence, Automated

Processing, Containerization, And Clusters Orchestration Ease The Transition From Data

Acquisition To Insights In Medium‐To‐Large Datasets. Bioessays, 41(6), Article 6.

Https://Doi.Org/10.1002/Bies.201900004

. Haji, C. M. (2022). Linguistic Analysis On Cursive Characters. The Journal Of Duhok University,

(2), Article 2. Https://Doi.Org/10.26682/Sjuod.2022.25.2.3

. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K. J., Krishnan, P., Yin, X., & Hassner, T.

(2021). A Multiplexed Network For End-To-End, Multilingual Ocr. 2021 Ieee/Cvf Conference

On Computer Vision And Pattern Recognition

(Cvpr), 4545–4555.

Https://Doi.Org/10.1109/Cvpr46437.2021.00452

. Hukkeri, G. S., Goudar, R. H., Janagond, P., & Patil, P. S. (2022). Machine Learning In Ocr

Technology: Performance Analysis Of Different Ocr Methods For Slide-To-Text Conversion In

Lecture Videos. International Journal Of Advanced Computer Science And Applications, 13(8),

Article 8. Https://Doi.Org/10.14569/Ijacsa.2022.0130839

. Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten Optical Character Recognition

(Ocr): A Comprehensive Systematic Literature Review (Slr). Ieee Access, 8, 142642–

Https://Doi.Org/10.1109/Access.2020.3012542

. Muharom, S. (2019). Pengenalan Nomor Ruangan Menggunakan Kamera Berbasis Ocr Dan

Template Matching. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 4(1),

Article 1. Https://Doi.Org/10.25139/Inform.V4i1.1371

. Mursari, L. R., & Wibowo, A. (2021). The Effectiveness Of Image Preprocessing On Digital

Handwritten Scripts Recognition With The Implementation Of Ocr Tesseract. Computer Engineering

And Applications Journal, 10(3), Article 3. Https://Doi.Org/10.18495/Comengapp.V10i3.386

.

Putri, M. H., & Yuhan, R. J. (2020). Indeks Kerawanan Sosial Ekonomi Perempuan Indonesia

Tahun 2017. Seminar Nasional Official Statistics, 2019(1), Article 1.

Https://Doi.Org/10.34123/Semnasoffstat.V2019i1.117

. Rohman, M. A. A., & Djasuli, M. (2022). Penerapan Good Corporate Governance Tranparansi

Terhadap Kinerja Surveyor Registrasi Sosial Ekonomi Dalam Mewujudkan Data Akurat.

. Smith, R., Newton, C., & Cheatle, P. (N.D.). Adaptive Thresholding For Ocr: A Significant Test.

. Suharto, E. (2015). Peran Perlindungan Sosial Dalam Mengatasi Kemiskinan Di Indonesia: Studi

Kasus Program Keluarga Harapan. Sosiohumaniora, 17(1), Article 1.

Https://Doi.Org/10.24198/Sosiohumaniora.V17i1.5668

. Wibawa, C., & Anggraeni, D. T. (2023). Comparison Of Image Segmentation Method In Image

Character Extraction Preprocessing Using Optical Character Recoginiton. Jurnal Teknik Informatika

(Jutif),

(3),

–589.

Https://Doi.Org/10.52436/1.Jutif.2023.4.3.956

Downloads

Published

2024-06-30

How to Cite

[1]
“Identification of Socio Economic Registration Data Using OCR Based Tesseract and Google Cloud Vision: Author’s Country: Indonesia”, bit-cs, vol. 5, no. 2, pp. 64–73, Jun. 2024, doi: 10.36805/bit-cs.v5i2.6258.

Most read articles by the same author(s)