Main Article Content


The Indonesian government program, called Socio-Economic Registration (Regsosek), aims to measure and monitor the socio-economic conditions of low-income people. One of the relevant data used for research is Regsosek. This method is used to analyze the influence of economic and social infrastructure on economic growth, analyze the socio-economic determinants of ownership of work accident insurance for informal workers, create a women's socio-economic vulnerability index (IKSEP), and study intercultural literacy from a social, economic and political perspective. The success of the government's Socio-Economic Registration program depends on the role of data collection officers or surveyors, who directly interact with the community to obtain information about Socio-Economic Registration (Regsosek) data collection. This method also has other obstacles that significantly affect the overall results of the survey, where the survey results must be entered manually by the surveyor from a form with handwritten data, after which it is entered into the website. This method is vulnerable to human error, where the handwriting is difficult to read, and mistakes are made during the data input. The technology that can be used to handle this problem is implementing the OCR method, where writing that was initially handwritten manually can be identified and converted into digital text that can be edited (editable text) and processed automatically. This research shows that the proposed method has good accuracy, with an Accuracy of 96.45%, CER 0.3%, and WER 4.30%.


Hand Writing Optical Character Recognition Socioeconomic infrastructure Surveyor officer

Article Details

How to Cite
L. Ursaputra Pratama, A. Yuniar Rahman, and R. Pahlevi Putra, “Identification of Socio Economic Registration Data Using OCR Based Tesseract and Google Cloud Vision”, bit-cs, vol. 5, no. 2, pp. 64-73, Jun. 2024.


  1. 1]. Arianto, R. F., Rahman, A. Y., & Marisa, F. (2023). Text Recognition For Socioeconomic Data
  2. Survey Sheet Using Ocr Tesseract.
  3. [2]. Berg, S. A., So, R. H. Y., & Seo, S. Y. (2019). Application Of Optical Character Recognition
  4. With Tesseract In Logistics Management. International Journal Of Internet Manufacturing And
  5. Services, 6(3), Article 3. Https://Doi.Org/10.1504/Ijims.2019.10022461
  6. [3]. Chesley, E., Marcantonio, J., & Pearson, A. (2019). Towards Syriac Digital Corpora: Evaluation
  7. Of Tesseract 4.0 For Syriac Ocr. Hugoye: Journal Of Syriac Studies, 22(1), Article 1.
  8. Https://Doi.Org/10.31826/Hug-2019-220105
  9. [4]. González, G., & Evans, C. L. (2019). Biomedical Image Processing With Containers And Deep
  10. Learning: An Automated Analysis Pipeline: Data Architecture, Artificial Intelligence, Automated
  11. Processing, Containerization, And Clusters Orchestration Ease The Transition From Data
  12. Acquisition To Insights In Medium‐To‐Large Datasets. Bioessays, 41(6), Article 6.
  13. Https://Doi.Org/10.1002/Bies.201900004
  14. [5]. Haji, C. M. (2022). Linguistic Analysis On Cursive Characters. The Journal Of Duhok University,
  15. 25(2), Article 2. Https://Doi.Org/10.26682/Sjuod.2022.25.2.3
  16. [6]. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K. J., Krishnan, P., Yin, X., & Hassner, T.
  17. (2021). A Multiplexed Network For End-To-End, Multilingual Ocr. 2021 Ieee/Cvf Conference
  18. On Computer Vision And Pattern Recognition
  19. (Cvpr), 4545–4555.
  20. Https://Doi.Org/10.1109/Cvpr46437.2021.00452
  21. [7]. Hukkeri, G. S., Goudar, R. H., Janagond, P., & Patil, P. S. (2022). Machine Learning In Ocr
  22. Technology: Performance Analysis Of Different Ocr Methods For Slide-To-Text Conversion In
  23. Lecture Videos. International Journal Of Advanced Computer Science And Applications, 13(8),
  24. Article 8. Https://Doi.Org/10.14569/Ijacsa.2022.0130839
  25. [8]. Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten Optical Character Recognition
  26. (Ocr): A Comprehensive Systematic Literature Review (Slr). Ieee Access, 8, 142642–
  27. 142668. Https://Doi.Org/10.1109/Access.2020.3012542
  28. [9]. Muharom, S. (2019). Pengenalan Nomor Ruangan Menggunakan Kamera Berbasis Ocr Dan
  29. Template Matching. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 4(1),
  30. Article 1. Https://Doi.Org/10.25139/Inform.V4i1.1371
  31. [10]. Mursari, L. R., & Wibowo, A. (2021). The Effectiveness Of Image Preprocessing On Digital
  32. Handwritten Scripts Recognition With The Implementation Of Ocr Tesseract. Computer Engineering
  33. And Applications Journal, 10(3), Article 3. Https://Doi.Org/10.18495/Comengapp.V10i3.386
  34. [11].
  35. Putri, M. H., & Yuhan, R. J. (2020). Indeks Kerawanan Sosial Ekonomi Perempuan Indonesia
  36. Tahun 2017. Seminar Nasional Official Statistics, 2019(1), Article 1.
  37. Https://Doi.Org/10.34123/Semnasoffstat.V2019i1.117
  38. [12]. Rohman, M. A. A., & Djasuli, M. (2022). Penerapan Good Corporate Governance Tranparansi
  39. Terhadap Kinerja Surveyor Registrasi Sosial Ekonomi Dalam Mewujudkan Data Akurat.
  40. [13]. Smith, R., Newton, C., & Cheatle, P. (N.D.). Adaptive Thresholding For Ocr: A Significant Test.
  41. [14]. Suharto, E. (2015). Peran Perlindungan Sosial Dalam Mengatasi Kemiskinan Di Indonesia: Studi
  42. Kasus Program Keluarga Harapan. Sosiohumaniora, 17(1), Article 1.
  43. Https://Doi.Org/10.24198/Sosiohumaniora.V17i1.5668
  44. [15]. Wibawa, C., & Anggraeni, D. T. (2023). Comparison Of Image Segmentation Method In Image
  45. Character Extraction Preprocessing Using Optical Character Recoginiton. Jurnal Teknik Informatika
  46. (Jutif),
  47. 4(3),
  48. 583–589.
  49. Https://Doi.Org/10.52436/1.Jutif.2023.4.3.956

DB Error: Unknown column 'Array' in 'where clause'