Japanese Text Recognition from Images Using Tesseract OCR [macOS Edition]
I’d like to introduce how to recognize Japanese text from images using the OSS tool Tesseract OCR on macOS.
While searching for an OSS tool with Japanese OCR support, I read the article 第577回 Tesseract OCRで文字認識をする | gihyo.jp and found that Tesseract OCR looked promising, so I tried it.
For initial setup of Tesseract, perform installation and download of Japanese trained model files in order.
brew install tesseract
cd /opt/homebrew/share/tessdata/
wget https://github.com/tesseract-ocr/tessdata/raw/main/jpn.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/main/jpn_vert.traineddata
tesseract target.png - -l jpn
That’s all from the Gemba, where I recognized Japanese text from images using Tesseract OCR.