Japanese Text Recognition from Images Using Tesseract OCR [macOS Edition]

Tadashi Shigeoka ·  Thu, March 23, 2023

I’d like to introduce how to recognize Japanese text from images using the OSS tool Tesseract OCR on macOS.

Tesseract OCR

Background: OSS Tool with Japanese OCR Support

While searching for an OSS tool with Japanese OCR support, I read the article 第577回 Tesseract OCRで文字認識をする | gihyo.jp and found that Tesseract OCR looked promising, so I tried it.

Initial Setup for Tesseract

For initial setup of Tesseract, perform installation and download of Japanese trained model files in order.

Installing Tesseract

brew install tesseract

Download Japanese Trained Model Files

cd /opt/homebrew/share/tessdata/
wget https://github.com/tesseract-ocr/tessdata/raw/main/jpn.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/main/jpn_vert.traineddata

Japanese OCR with Tesseract

tesseract target.png - -l jpn

That’s all from the Gemba, where I recognized Japanese text from images using Tesseract OCR.

Reference Information