tesseract-ocr | python 识别图片文字

谷歌联盟提供的广告↓

[网站前端制作]有关标签↓

CSS | HTML | DIV
TABLE | Script_JS
form | iframe

[网赚]有关标签↓

谷歌联盟 | 西联汇款
加入谷歌联盟注意事项

[软件应用]有关标签↓

Fireworks | Dreamwear
Excel | MS_SQL

[Windwos系统]有关标签↓

IIS | 服务器故障与安全
电脑问题和技巧

网站常用SEO图标代码
私密隐藏似友情链接
站长常用工具网站

腾讯企业邮箱管理入口

★ 上海一网通办

★ Python技巧【处理：字符串、列表、元组、字典】★

快递搜·YD·追踪api客户账单后台

快递搜·YD·管理员后台

菜鸟IT的博客 >> Python

tesseract-ocr | python 识别图片文字

来自链接：http://t.zoukankan.com/qinlangsky-p-13491528.html

Pytesseract简介
Pytesseract是python的光学字符识别（OCR）工具。也就是说，它将识别并读取嵌入图像中的文本。 Pytesseract是Google的Tesseract-OCR引擎的包装器。它作为独立的调用脚本也很有用，因为它可以读取Python Imaging Library支持的所有图像类型，包括jpeg，png，gif，bmp，tiff等，而tesseract-ocr默认只支持tiff和bmp。

安装
安装tesseract-ocr
sudo apt-get install tesseract-ocr
安装语言库
tesseract-ocr-eng是英文库，tesseract-ocr-chi-sim是中文库

sudo apt-get install tesseract-ocr-eng tesseract-ocr-chi-sim
安装依赖及pytesseract
pytesseract是python调用谷歌tesseract-ocr工具的一个库，用于识别图片中的信息

# 安装Pillow
sudo pip3 install Pillow
# 安装pytesseract
sudo pip3 install pytesseract
使用
try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

# 如果PATH中没有tesseract可执行文件，请指定tesseract路径
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
# Example tesseract_cmd = r'/usr/share/tesseract'

# 识别的图像的字符串
print(pytesseract.image_to_string(Image.open('test.png')))

# 指定语言识别图像字符串,eng为英语
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))

# In order to bypass the image conversions of pytesseract, just use relative or absolute image path
# NOTE: In this case you should provide tesseract supported images or tesseract will return error
print(pytesseract.image_to_string('test.png'))

# Batch processing with a single file containing the list of multiple image file paths
print(pytesseract.image_to_string('images.txt'))

# Timeout/terminate the tesseract job after a period of time
try:
    print(pytesseract.image_to_string('test.jpg', timeout=2)) # Timeout after 2 seconds
    print(pytesseract.image_to_string('test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
    # Tesseract processing is terminated
    pass

# 获取图像边界框
print(pytesseract.image_to_boxes(Image.open('test.png')))

# 获取包含边界框，置信度，行和页码的详细数据
print(pytesseract.image_to_data(Image.open('test.png')))

# 获取方向和脚本检测
print(pytesseract.image_to_osd(Image.open('test.png')))

# Get a searchable PDF
pdf = pytesseract.image_to_pdf_or_hocr('test.png', extension='pdf')
with open('test.pdf', 'w+b') as f:
    f.write(pdf) # pdf type is bytes by default

# Get HOCR output
hocr = pytesseract.image_to_pdf_or_hocr('test.png', extension='hocr')
示例
提取本地图片上的文字为一个整体的字符串

def get_image_string(path, filename):
    '''使用谷歌开源框架ocr技术提取图片上的信息为字符串

    :Param path: <str> 图片的位置

    :Param filename: <str> 图片的名称

    :Return : <string> 从图片中提取的字符串

    '''
    username = getpass.getuser()
    path_base = '/home/' + str(username) + '/' + str(path) + '/' + str(filename) + '.png'
    text = pytesseract.image_to_string(Image.open(path_base), lang="chi_sim").replace(" ", "").replace("
", "")
    print(text)
    return text

菜鸟IT博客[2022.05.10-10:47] 访问：416

[关闭窗口]

Google公司(谷歌联盟)提供的广告↓

本页的htm伪静态链接网址：分享链接加载中....

学习Python的关键点
【1】★ Python:解决小数点后面四舍五入的精度丢失问题(二进制转换导致,另解决prec动态保留小数点后多少位)
【2】 ★ Python:单个py文件打包exe程序
【3】 ★ 给自己写的exe程序加上注册授权保护
【4】 ★ Python的http请求超时设置（timeout）| 异常类型/捕获异常
【5】 ★ Python的多线程的线程池的使用| ThreadPoolExecutor
【6】 ★ Python能用到的免费代理IP网址
【7】 ★ Python_用于测试代理IP是否有效
【8】 ★ Python Requests post() 方法 | post方式提交api
【9】 ★★★ Python Tkinter Gui视频学习教程
【10】 ★★★ Python 小项目实战-视频学习教程
【11】 ★★★ Python 爬虫项目实战-视频教程
【12】 ★★★ 高级进阶的关于python的五本书: 「Python从菜鸟到高手」、「html5+css3+JavaScript从入门到精通」、「Django Web应用开发实战」、「漫画算法」
【13】 ★★★ Request库-爬虫
【14】 ★★★ Python 图形识别文字
【15】 ★★★ Python 滑动验证码识别【图文教程】
【16】 ★★★ Python 滑动验证码识别【视频教程】
【18】 ★★★ Python 关于Class类的应用【视频教程】
【19.1】 ★★★ Python 关于进度条的制作（1）
【19.2】 ★★★ Python 关于进度条的制作（2）
【20】★★★ Python 抓取某宝的商品信息
【21】★★★ Python 一小时学会全栈开发浏览器版本的企业管理系统【视频教程】
【22】★★★ Python 全栈开发——前端+后端【视频教程】
【23】★★★ Layui 浏览器前端模块化UI框架
【24】★★★ Python-Django-Web应用开发【视频教程】
【25】★★★ Python 从0开始学【视频教程】
【26】★★★ Python的tkinter界面打包exe以后关于icon图标报错的解决办法！
【27】★★★ Python的tkinter视频学习教程【N个系统性学习合集视频】★★★
【28】★★★ Python全栈开发【视频教程】★★★