[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别




  • 为解决本项目中对数学公式预测的准确性,做了其他的改进和尝试,效果还不错,https://github.com/xiaofengShi/Image2Katex 希望能有所帮助,另外,这几月换了工作,并且转了方向,还是cv方向,不过不做ocr相关了,目前主要做显著目标检测以及搜索意图相关,对repo的提问回答较慢,请见谅。


  • 文字方向检测 0、90、180、270度检测
  • 文字检测 后期将切换到keras版本文本检测 实现keras端到端的文本检测及识别
  • 不定长OCR识别


sh setup.sh
## CPU环境
sh setup-cpu.sh
##CPU python3环境
sh setup-python3.sh



  • 一共分为3个网络 1. 文本方向检测网络-Classify(vgg16)
  • 2. 文本区域检测网络-CTPN(CNN+RNN)
  • 3. EndToEnd文本识别网络-CRNN(CNN+GRU/LSTM+CTC)








def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2 ** np.arange(3, 6)):
    heights = [11, 16, 23, 33, 48, 68, 97, 139, 198, 283]
    widths = [16]
    sizes = []
    for h in heights:
        for w in widths:
            sizes.append((h, w))
    return generate_basic_anchors(sizes)


OCR 端到端识别:CRNN


提供keras 与pytorch版本的训练代码,在理解keras的基础上,可以切换到pytorch版本,此版本更稳定








运行demo.py  写入测试图片的路径即可,如果想要显示ctpn的结果,修改文件./ctpn/ctpn/other.py 的draw_boxes函数的最后部分,cv2.inwrite('dest_path',img),如此,可以得到ctpn检测的文字区域框以及图像的ocr识别结果


1 对ctpn进行训练

  • 定位到路径--./ctpn/ctpn/train_net.py
  • 预训练的vgg网络路径VGG_imagenet.npy 将预训练权重下载下来,pretrained_model指向该路径即可, 此外整个模型的预训练权重checkpoint
  • ctpn数据集还是百度云 数据集下载完成并解压后,将.ctpn/lib/datasets/pascal_voc.py 文件中的pascal_voc 类中的参数self.devkit_path指向数据集的路径即可

2 对crnn进行训练

  • keras版本 ./train/keras_train/train_batch.py model_path--指向预训练权重位置 MODEL_PATH---指向模型训练保存的位置 keras模型预训练权重
  • pythorch版本./train/pytorch-train/crnn_main.py
    help="path to crnn (to continue training)",
    help='Where to store samples and models',




ctpn原始图像1 =========================================================== ctpn检测1 =========================================================== ctpn+crnn结果1



ctpn原始图像2 =========================================================== ctpn检测2 =========================================================== ctpn+crnn结果2





列举可用于文本检测和识别领域模型训练的一些大型公开数据集, 不涉及仅用于模型fine-tune任务的小型数据集。

Chinese Text in the Wild(CTW)

该数据集包含32285张图像,1018402个中文字符(来自于腾讯街景), 包含平面文本,凸起文本,城市文本,农村文本,低亮度文本,远处文本,部分遮挡文本。图像大小2048*2048,数据集大小为31GB。以(8:1:1)的比例将数据集分为训练集(25887张图像,812872个汉字),测试集(3269张图像,103519个汉字),验证集(3129张图像,103519个汉字)。


Reading Chinese Text in the Wild(RCTW-17)



ICPR MWI 2018 挑战赛




该数据集共1555张图像,11459文本行,包含水平文本,倾斜文本,弯曲文本。文件大小441MB。大部分为英文文本,少量中文文本。训练集:1255张 测试集:300

http:// arxiv.org/pdf/1710.10400v

Google FSNS(谷歌街景文本数据集)


http:// arxiv.org/pdf/1702.03970v1




Synthetic Data for Text Localisation


Code: https://github.com/ankush-me/SynthText (英文版)
Code https://github.com/wang-tf/Chinese_OCR_synthetic_data(中文版)

Synthetic Word Dataset







  • pytorch的crnn训练中出现问题


    pytorch训练过程中,会在utils.py里的encode()函数卡住,不停迭代 def encode(self, text, depth=0): """Support batch or single str.""" if isinstance(text, str): text = [self.dict[char.lower()] for char in text] length = [len(text)] if isinstance(text, str): text = [self.dict.get(char, 0) for char in text] length = [len(text)] ######## add for unicode # elif isinstance(text, unicode): # text = [self.dict.get(char, self.dict[u'-']) for char in text] # length = [len(text)] elif isinstance(text, collections.Iterable): length = [len(text)] print(length) #试着打印这个length 会一直增加 text = ''.join(str(v) for v in text) text, _ = self.encode(text) if depth: return text, len(text) return (torch.IntTensor(text), torch.IntTensor(length))


    opened by infinitisun 12
  'NoneType' object has no attribute 'model_checkpoint_path'

    'NoneType' object has no attribute 'model_checkpoint_path'

    您好,我在运行demo.py的时候出现的这个问题一直没有找到,请问有跑通的同胞们分享以下怎么解决这个问题呢? File "/home/xshine6/Downloads/CHINESE-OCR/ctpn/ctpn/model.py", line 37, in load_tf_model reader = tf.train.NewCheckpointReader(ckpt.model_checkpoint_path) AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    opened by Wini1680 3
  ModuleNotFoundError: No module named 'utils'

    ModuleNotFoundError: No module named 'utils'

    请问一下这个是什么问题 Using TensorFlow backend. Traceback (most recent call last): File "demo.py", line 8, in import model File "D:\QQ\CHINESE-OCR-master (2)\CHINESE-OCR-master\model.py", line 13, in from crnn.crnn import crnnOcr File "D:\QQ\CHINESE-OCR-master (2)\CHINESE-OCR-master\crnn\crnn.py", line 11, in import models.crnn as crnn File "./crnn\models\crnn.py", line 4, in import utils ModuleNotFoundError: No module named 'utils'

    opened by Nicolasdeke 2
  • oc4的预训练模型是否不是用trainbatch.py训练出来的?


    在训练自己的模型的时候(keys修改过)发现得到的结果总是00或000。 是否预训练模型不是用trainbatch.py训练出来的?


    1. ocr/model.predict方法只是把图像resize为(32,width),之后reshape为(32,width,1),将数据处理为0,1之间。

    2. 而trainbatch.py和dataset.py中会在resize后把数据reshape为(-1,32,256,1)。之后处理为0,1之间之后,再减0.5,再除0.5。

    3. train.py中是Length = int(imgW/4)-1,trainbatch.py中是Length = int(imgW/4)-2



    opened by Jamsa 2
  请教下,win10环境跑demo.py一直跑不起来,怎么回事。。


    报错信息如下:(折腾好久了,跪求指点) 2020-08-25 16:21:11.287250: E tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.4.1 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. Traceback (most recent call last): File "d:/python_files/CV/OCR/CHINESE-OCR/demo.py", line 24, in img, model='keras', adjust=True, detectAngle=True) File "d:\python_files\CV\OCR\CHINESE-OCR\model.py", line 102, in model angle = angle_detect(img=np.copy(img)) ##文字朝向检测 File "d:\python_files\CV\OCR\CHINESE-OCR\angle\predict.py", line 66, in predict pred = model.predict(np.array([img])) File "C:\Users\AppData\Roaming\Python\Python36\site-packages\keras\engine\training.py", line 1462, in predict callbacks=callbacks) File "C:\Users\AppData\Roaming\Python\Python36\site-packages\keras\engine\training_arrays.py", line 324, in predict_loop batch_outs = f(ins_batch) File "C:\Users\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\keras\backend.py", line 3476, in call
    run_metadata=self.run_metadata) File "C:\Users\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\client\session.py", line 1472, in call run_metadata_ptr) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node block1_conv1/convolution}}]] [[predictions_class/Softmax/_291]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node block1_conv1/convolution}}]]

    opened by lmw0320 1
  demo.py 的模型 的ocr部分好像不是在用crnn?

    demo.py 的模型 的ocr部分好像不是在用crnn?

    根據 #83

    针对错误 NameError: name 'basemodel' is not defined
    下载ocr0.2.h5( https://github.com/jiangxiluning/chinese-ocr/blob/master/ocr/ocr0.2.h5 ),放到./CHINESE-OCR/ocr/路径下

    的建議 我成功運行了demo.py 但問題發現 這個ocr h5被load的時候不是在用crnn?

    我發現demo.py模型ocr的代碼是在 https://github.com/xiaofengShi/CHINESE-OCR/blob/46395d14802d876adc0ee0c943621d6b4e3ddf3a/ocr/model.py#L28

    def get_model(height, nclass):
        rnnunit = 256
        input = Input(shape=(height, None, 1), name='the_input')
        m = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same', name='conv1')(input)
        m = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool1')(m)
        m = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same', name='conv2')(m)
        m = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool2')(m)
        m = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same', name='conv3')(m)
        m = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same', name='conv4')(m)
        m = ZeroPadding2D(padding=(0, 1))(m)
        m = MaxPooling2D(pool_size=(2, 2), strides=(2, 1), padding='valid', name='pool3')(m)
        m = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', name='conv5')(m)
        m = BatchNormalization(axis=1)(m)
        m = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', name='conv6')(m)
        m = BatchNormalization(axis=1)(m)
        m = ZeroPadding2D(padding=(0, 1))(m)
        m = MaxPooling2D(pool_size=(2, 2), strides=(2, 1), padding='valid', name='pool4')(m)
        m = Conv2D(512, kernel_size=(2, 2), activation='relu', padding='valid', name='conv7')(m)
        # m的输出维度为HWC?
        # 将输入的维度按照给定模式进行重排,例如,当需要将RNN和CNN网络连接时,可能会用到该层
        # 将维度转成WHC
        m = Permute((2, 1, 3), name='permute')(m)
        m = TimeDistributed(Flatten(), name='timedistrib')(m)
        m = Bidirectional(GRU(rnnunit, return_sequences=True), name='blstm1')(m)
        m = Dense(rnnunit, name='blstm1_out', activation='linear')(m)
        m = Bidirectional(GRU(rnnunit, return_sequences=True), name='blstm2')(m)
        y_pred = Dense(nclass, name='blstm2_out', activation='softmax')(m)
        basemodel = Model(inputs=input, outputs=y_pred)
        labels = Input(name='the_labels', shape=[None, ], dtype='float32')
        input_length = Input(name='input_length', shape=[1], dtype='int64')
        label_length = Input(name='label_length', shape=[1], dtype='int64')
        loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
        model = Model(inputs=[input, labels, input_length, label_length], outputs=[loss_out])
        sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
        # model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')
        model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
        # model.summary()
        return model, basemodel
    opened by sunset1234321 1
  ModuleNotFoundError: No module named 'keys_ocr'

    ModuleNotFoundError: No module named 'keys_ocr'

    请问明明同一文件夹下有keys_ocr.py, 为什么说找不到呢,感谢

    Traceback (most recent call last): File "demo.py", line 8, in import keras_model File "/Users/yanwenqian/Desktop/CHINESE-OCR/keras_model.py", line 13, in from ocr.model import predict as ocr File "/Users/yanwenqian/Desktop/CHINESE-OCR/ocr/model.py", line 10, in import keys_ocr ModuleNotFoundError: No module named 'keys_ocr'

    opened by wyan00 0
  再Ubuntu 16.04下 运行demo.py出错

    再Ubuntu 16.04下 运行demo.py出错

    Traceback (most recent call last): File "demo.py", line 8, in import model File "/home/dc2-user/dlprojects/CHINESE-OCR1/model.py", line 13, in from crnn.crnn import crnnOcr File "/home/dc2-user/dlprojects/CHINESE-OCR1/crnn/crnn.py", line 11, in import models.crnn as crnn File "./crnn/models/crnn.py", line 4, in import utils ImportError: No module named 'utils' 可是.\CHINESE-OCR1\crnn\models目录下确实是有utils.py文件呀。

    opened by moFang222 0
  !): Make pytorch train in GPU work.

    !): Make pytorch train in GPU work.

    Fix https://github.com/xiaofengShi/CHINESE-OCR/issues/17

    ref: https://github.com/Sierkinhane/crnn_chinese_characters_rec/blob/master/utils.py

    thx @Sierkinhane

    opened by tbfly 0
  • 使用作者的pytorch-train/crnn.main.py训练,accuracy一直为0


    使用作者的数据集源代码训练的,使用的环境配置为:python3.6+pytorch1.3 中间调了三个bug

    1. 由于在'cpu_texts = [clean_txt(tx.decode('utf-8')) for tx in cpu_texts]'这句时会报错str没有decode的方法,查询资料后感觉是python2和python3的区别,便把代码改成了'cpu_texts = [clean_txt(tx.encode('utf-8').decode('utf-8')) for tx in cpu_texts]';
    2. 在clean_texts函数中,作者是将在字典中找不到的字用空格代替了,但是这样在查询时会报错误,因此将该函数中的newTxt += u' '改成了newTxt += u'',即直接去掉了找不到的字;
    3. 由于预测出来的preds在后续处理中会报错:IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2),因此将‘preds = preds.squeeze(2)’这句代码删除了; 不知道是我修改的代码导致特征的维度发生了问题还是什么,模型训练过程中得到的accuracy一直为0,有没有大佬可以帮忙解答一下~感激不尽! https://imgtu.com/i/hy8weP
    opened by yyyyykp 2
  • 出现File

    出现File "D:/新建文件夹 (4)/pythonProject5/CHINESE-OCR-master/ctpn/lib/utils/bbox.py", line 14 ctypedef np.float_t DTYPE_t ^ SyntaxError: invalid syntax

    File "D:/新建文件夹 (4)/pythonProject5/CHINESE-OCR-master/ctpn/lib/utils/bbox.py", line 14 ctypedef np.float_t DTYPE_t ^ SyntaxError: invalid syntax 有谁知道这个问题怎么解决吗

    opened by elist11 4
  • 我配置了两天多,终于在windows10+anaconda3+python3.6+pytorch下配置好了,不过速度和准确率感人。


    我配置了两天多,终于在windows10+anaconda3+python3.6+pytorch下配置好了,主要是三点吧。 1、安装pytroch gpu版本,并验证通过。 2、可以仔细看博主的setup.sh中有个sh make.sh,因此需要找到那个make.sh中有个python setup.py build_ext --inplace,应该是要生成一个bbox.py和cython_nms.py。但是由于经常编译不成功,可以参考:https://github.com/xiaofengShi/CHINESE-OCR/issues/130 ,将bbox.py和cython_nms.py重新保存。 3、下载相关资源,主要是ctpn-checkpoint 和角度模型modelAngle.h5、ocr0.2.h5等资源,可以参考:https://www.jianshu.com/p/58671f61e886,然后坐着很多文件路径用的自己的绝对路劲,可以将路径进行全部的查找替换成你自己的路劲,比如可以搜索:“xiaofeng”等关键词。

    不过速度和准确率感人。 如下图所示,运行了./test/4.jpg图片,耗时300多秒,显卡是GTX2060,不知道是不是我电脑问题, image


    opened by QiTianDaShengMaoHuiFei 4
