Tesseract OCR 的API使用中,有的图片可以读出,有的读不出直接报错呢?

2025年03月22日 16:25
有2个网友回答
网友(1):

我也想知道怎么回事

网友(2):

1、tesseract-ocr-setup-3.01-1.exe
因我本地为windows系统,所以用这个
2、chi_sim.traineddata.gz
中文识别时需要的。
安装tesseract-ocr
自定义安装语言包
在Tesseract-OCR安装目录下找到 tessdata目录,其是用来存放语言包,可把 chi_sim.traineddata.gz 解压缩之后的chi_sim.traineddata文件复制到该目录下即可。

本文使用参考blog中的例子
如下:
package org.img;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.Locale;

import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.metadata.IIOMetadata;
import javax.imageio.stream.ImageInputStream;
import javax.imageio.stream.ImageOutputStream;

import com.sun.media.imageio.plugins.tiff.TIFFImageWriteParam;

public class ImageIOHelper {
/**
* 图片文件转换为tif格式
* @param imageFile 文件路径
* @param imageFormat 文件扩展名
* @return
*/
public static File createImage(File imageFile, String imageFormat) {
File tempFile = null;
try {
Iterator readers = ImageIO.getImageReadersByFormatName(imageFormat);
ImageReader reader = readers.next();

ImageInputStream iis = ImageIO.createImageInputStream(imageFile);
reader.setInput(iis);
//Read the stream metadata
IIOMetadata streamMetadata = reader.getStreamMetadata();

//Set up the writeParam
TIFFImageWriteParam tiffWriteParam = new TIFFImageWriteParam(Locale.CHINESE);
tiffWriteParam.setCompressionMode(ImageWriteParam.MODE_DISABLED);

//Get tif writer and set output to file
Iterator writers = ImageIO.getImageWritersByFormatName("tiff");
ImageWriter writer = writers.next();

BufferedImage bi = reader.read(0);
IIOImage image = new IIOImage(bi,null,reader.getImageMetadata(0));
tempFile = tempImageFile(imageFile);
ImageOutputStream ios = ImageIO.createImageOutputStream(tempFile);
writer.setOutput(ios);
writer.write(streamMetadata, image, tiffWriteParam);
ios.close();

writer.dispose();
reader.dispose();

} catch (IOException e) {
e.printStackTrace();
}
return tempFile;
}

private static File tempImageFile(File imageFile) {
String path = imageFile.getPath();
StringBuffer strB = new StringBuffer(path);
strB.insert(path.lastIndexOf('.'),0);
return new File(strB.toString().replaceFirst("(?<=//.)(//w+)$", "tif"));
}

}