Deep Learning Chinese Word Segment

Overview

引用 

  本项目模型BiLSTM+CRF参考论文:http://www.aclweb.org/anthology/N16-1030 ,IDCNN+CRF参考论文:https://arxiv.org/abs/1702.02098

构建

  1. 安装好bazel代码构建工具,安装好tensorflow(目前本项目需要tf 1.0.0alpha版本以上)

  2. 切换到本项目代码目录,运行./configure

  3. 编译后台服务

    bazel build //kcws/cc:seg_backend_api

训练

  1. 关注待字闺中公众号 回复 kcws 获取语料下载地址:

    logo

  2. 解压语料到一个目录

  3. 切换到代码目录,运行:

python kcws/train/process_anno_file.py <语料目录> pre_chars_for_w2v.txt

bazel build third_party/word2vec:word2vec

先得到初步词表

./bazel-bin/third_party/word2vec/word2vec -train pre_chars_for_w2v.txt -save-vocab pre_vocab.txt -min-count 3

处理低频词   python kcws/train/replace_unk.py pre_vocab.txt pre_chars_for_w2v.txt chars_for_w2v.txt

训练word2vec

./bazel-bin/third_party/word2vec/word2vec -train chars_for_w2v.txt -output vec.txt -size 50 -sample 1e-4 -negative 5 -hs 1 -binary 0 -iter 5

构建训练语料工具

bazel build kcws/train:generate_training

生成语料

./bazel-bin/kcws/train/generate_training vec.txt <语料目录> all.txt

得到train.txt , test.txt文件

python kcws/train/filter_sentence.py all.txt

  1. 安装好tensorflow,切换到kcws代码目录,运行:

python kcws/train/train_cws.py --word2vec_path vec.txt --train_data_path <绝对路径到train.txt> --test_data_path test.txt --max_sentence_len 80 --learning_rate 0.001  (默认使用IDCNN模型,可设置参数”--use_idcnn False“来切换BiLSTM模型)

  1. 生成vocab

bazel build kcws/cc:dump_vocab

./bazel-bin/kcws/cc/dump_vocab vec.txt kcws/models/basic_vocab.txt

  1. 导出训练好的模型

python tools/freeze_graph.py --input_graph logs/graph.pbtxt --input_checkpoint logs/model.ckpt --output_node_names "transitions,Reshape_7" --output_graph kcws/models/seg_model.pbtxt

  1. 词性标注模型下载 (临时方案,后续文档给出词性标注模型训练,导出等)

    https://pan.baidu.com/s/1bYmABk 下载pos_model.pbtxt到kcws/models/目录下

  2. 运行web service

./bazel-bin/kcws/cc/seg_backend_api --model_path=kcws/models/seg_model.pbtxt(绝对路径到seg_model.pbtxt>) --vocab_path=kcws/models/basic_vocab.txt --max_sentence_len=80

词性标注的训练说明:

https://github.com/koth/kcws/blob/master/pos_train.md

自定义词典

目前支持自定义词典是在解码阶段,参考具体使用方式请参考kcws/cc/test_seg.cc 字典为文本格式,每一行格式如下:

<自定义词条>\t<权重>

比如:

蓝瘦香菇 4

权重为一个正整数,一般4以上,越大越重要

demo

http://45.32.100.248:9090/

附: 使用相同模型训练的公司名识别demo:

http://45.32.100.248:18080

Comments
  • 大神,bazel build //kcws/cc:seg_backend_api 报错

    大神,bazel build //kcws/cc:seg_backend_api 报错

    ERROR: /root/kcws/third_party/gflags/BUILD:12:1: Executing genrule //third_party/gflags:gflags-srcs failed: bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 77.

    opened by maczhao 15
  • ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted.

    ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted.

    Hi, when I build the kcws, there are some issues, how can I fix them?

    the issues are as follow:

    [root@bio-x-2 cc]# /opt/BioDir/dl/bazel-0.4.3/output/bazel build //kcws/cc:seg_backend_api WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing. WARNING: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/workspace.bzl:13:5: path_prefix was specified to tf_workspace but is no longer used and will be removed in the future. WARNING: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/workspace.bzl:15:5: tf_repo_name was specified to tf_workspace but is no longer used and will be removed in the future. ERROR: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/core/platform/default/build_config/BUILD:108:1: error loading package '@jpeg//': Extension file not found. Unable to load package for '//third_party:common.bzl': BUILD file not found on package path and referenced by '@org_tensorflow//tensorflow/core/platform/default/build_config:jpeg'. ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted. INFO: Elapsed time: 2.612s

    ================= I build the bazel tools as follow:

    [root@bio-x-2 bazel-0.4.3]# bash ./compile.sh INFO: You can skip this first step by providing a path to the bazel binary as second argument: INFO: ./compile.sh compile /path/to/bazel  Building Bazel from scratch.......  Building Bazel with Bazel. .WARNING: /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions. INFO: Found 1 target... INFO: From Compiling third_party/ijar/platform_utils.cc [for host]: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:67:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:67:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ INFO: From Compiling third_party/ijar/ijar.cc [for host]: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ INFO: From Compiling src/main/cpp/blaze_util_posix.cc: src/main/cpp/blaze_util_posix.cc: In function 'void blaze::Daemonize(const string&)': src/main/cpp/blaze_util_posix.cc:190:28: warning: ignoring return value of 'int dup(int)', declared with attribute warn_unused_result [-Wunused-result] (void) dup(STDOUT_FILENO); // stderr (2>&1) ^ src/main/cpp/blaze_util_posix.cc: In function 'uint64_t blaze::AcquireLock(const string&, bool, bool, blaze::BlazeLock*)': src/main/cpp/blaze_util_posix.cc:578:30: warning: ignoring return value of 'int ftruncate(int, __off_t)', declared with attribute warn_unused_result [-Wunused-result] (void) ftruncate(lockfd, 0); ^ src/main/cpp/blaze_util_posix.cc:583:47: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result] (void) write(lockfd, msg.data(), msg.size()); ^ INFO: From JavacBootstrap src/java_tools/buildjar/java/com/google/devtools/build/buildjar/libbootstrap_JarOwner.jar [for host]: warning: Implicitly compiled files were not subject to annotation processing. Use -proc:none to disable annotation processing or -implicit to specify a policy for implicit compilation. 1 warning INFO: From Building src/main/protobuf/libextra_actions_base_java_proto.jar (1 source jar): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/java_tools/junitrunner/java/com/google/testing/coverage/JacocoCoverage.jar (9 source files): Note: src/java_tools/junitrunner/java/com/google/testing/coverage/MethodProbesMapper.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/tools/android/java/com/google/devtools/build/android/ziputils/libziputils_lib.jar (12 source files): Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libconcurrent.jar (18 source files): Note: src/main/java/com/google/devtools/build/lib/concurrent/AbstractQueueVisitor.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building third_party/java/apkbuilder/apkbuilder.jar (15 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libutil.jar (45 source files): Note: src/main/java/com/google/devtools/build/lib/util/OrderedSetMultimap.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/cmdline/libcmdline.jar (10 source files): Note: src/main/java/com/google/devtools/build/lib/cmdline/RepositoryName.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/skyframe/libskyframe.jar (67 source files): Note: src/main/java/com/google/devtools/build/skyframe/ReverseDepsUtilImpl.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libsyntax.jar (86 source files): Note: src/main/java/com/google/devtools/build/lib/syntax/BuiltinFunction.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libpackages-internal.jar (98 source files): Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/actions/libactions.jar (91 source files): Note: src/main/java/com/google/devtools/build/lib/actions/Actions.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libbuild-base.jar (381 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libproto-rules.jar (13 source files): Note: src/main/java/com/google/devtools/build/lib/rules/proto/ProtoCommon.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/query2/libquery2.jar (12 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/query2/libquery-output.jar (10 source files): Note: src/main/java/com/google/devtools/build/lib/query2/output/QueryOutputUtils.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/genquery/libgenquery.jar (2 source files): Note: src/main/java/com/google/devtools/build/lib/rules/genquery/GenQuery.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/cpp/libcpp.jar (80 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libpython-rules.jar (15 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libjava-compilation.jar (37 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: src/main/java/com/google/devtools/build/lib/rules/java/JavaCompileAction.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libjava-rules.jar (32 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libandroid-rules.jar (59 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libideinfo.jar (4 source files): Note: src/main/java/com/google/devtools/build/lib/ideinfo/AndroidStudioInfoAspect.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/objc/libobjc.jar (114 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: src/main/java/com/google/devtools/build/lib/rules/objc/IterableWrapper.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libruntime.jar (94 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/sandbox/libsandbox.jar (16 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/worker/libworker.jar (11 source files): Note: src/main/java/com/google/devtools/build/lib/worker/WorkerSpawnStrategy.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libbazel-rules.jar (87 source files, 14 resources): Note: src/main/java/com/google/devtools/build/lib/bazel/rules/java/BazelJavaSemantics.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Target //src:bazel up-to-date: bazel-bin/src/bazel INFO: Elapsed time: 178.725s, Critical Path: 170.17s WARNING: /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.

    Build successful! Binary is here: /opt/BioDir/dl/bazel-0.4.3/output/bazel

    opened by Sun-shan 12
  • error when run bazel build //kcws/cc:seg_backend_api

    error when run bazel build //kcws/cc:seg_backend_api

    ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'tensorflow/workspace.bzl': no such package '@org_tensorflow//tensorflow': local_repository rule //external:org_tensorflow must specify an existing directory. INFO: Elapsed time: 0.049s

    build on: centos6.8 x64 no gpu support Build label: 0.4.1- (@non-git) tensorflow-0.11.0

    opened by busyfree 11
  • 关于标注部分的问题

    关于标注部分的问题

    大神好,我昨天仔细研究了您新添加的词性标注模块,然后我发现有几步好像有点问题,我自己尝试更改了一下,现在已经跑通了,99.57%的准确率,请您看看,问题如下: 1、在第五步骤,传入参数“lines_withpos.txt”,然而在代码里面并没有写入信息,我觉得应该得在代码里面添加 写入每个标注与其对应的序号。 2、在第六步骤,传入的第三个参数应该是上一步生成的词典“lines_withpos.txt”而不是”pos_vocab.txt“。

    您看这样是正确的吗?

    opened by oneapmlj 7
  • gflags link failed

    gflags link failed

    Linking using thirdparty gflags failed.

    Fixed by using self compiled gflags, maybe version issues of gflag. Modification made to Build files.

    
    --- a/third_party/glog/BUILD
    +++ b/third_party/glog/BUILD
    @@ -45,10 +45,7 @@ cc_library(
             "include/glog/stl_logging.h",
             "include/glog/vlog_is_on.h",
         ],
    -    deps = [
    -      "//third_party/gflags:gflags-cxx",
    -
    -    ],
    +    linkopts = ["-lgflags"],
         hdrs = [
             "include/glog/logging.h",
         ],
    
    opened by Vimos 7
  • 修改了max_word_num 的最大值,运行起来报错

    修改了max_word_num 的最大值,运行起来报错

    koth大大,请教个问题,我修改了 seg_backend_api.cc的 DEFINE_int32(max_word_num, 300, "max num of word per sentence ");将值改到了300,我测试的句子里面的字数比较多,在运行时报以下错误: E0918 11:23:35.434610 26934 tfmodel.cc:88] Error during inference: Invalid argument: Input to reshape is a tensor with 640 values, but the requested shape requires a multiple of 1200 [[Node: Reshape_7 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _output_shapes=[[?,300,4]], _device="/job:localhost/replica:0/task:0/cpu: 0"](idcnn_1/scores, Reshape_7/shape)]] 2017-09-18 11:23:35.434675: E kcws/cc/tf_seg_model.cc:321] Error during inference:

    这种情况是不是我要重新训练models里面的word_vocab.txt文件?还是什么问题呢?如果是word_vocab.txt的问题,这个文本文件怎么训练呢?谢谢解惑.

    opened by younger911 6
  • F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.

    F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.

    root@nlpDemo:/mnt/kcws# export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl root@nlpDemo:/mnt/kcws# pip install --upgrade $TF_BINARY_URL

    通过这种安装的tensorflow,可以运行的。 但是这个项目启动会抛这个错误

    opened by weisong82 6
  • 关于默认分词的效果

    关于默认分词的效果

    我按照说明操作后,分词的效果如下。分词效果不是很准,下面是分词结果,这个正常吗? { "msg": "OK", "segments": [ "赵雅", "淇", "洒泪", "道", "歉", " ", "和林", "丹", "没", "有", "任", "何", "经济", "关", "系" ], "status": 0 }

    duplicate 
    opened by dengzz 5
  • embedding_size  AssertionError

    embedding_size AssertionError

    在最后train的时候:也就是运行: python kcws/train/train_cws_lstm.py --word2vec_path vec.txt --train_data_path <绝对路径到train.txt> --test_data_path test.txt --max_sentence_len 80 --learning_rate 0.001

    报错: Traceback (most recent call last): File "kcws/train/train_cws_lstm.py", line 262, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "kcws/train/train_cws_lstm.py", line 228, in main FLAGS.word2vec_path, FLAGS.num_hidden) File "kcws/train/train_cws_lstm.py", line 62, in init self.c2v = self.load_w2v(c2vPath) File "kcws/train/train_cws_lstm.py", line 132, in load_w2v assert (dim == (FLAGS.embedding_size)) AssertionError

    然后修改了:train_cws_lstm.py 的 tf.app.flags.DEFINE_integer("embedding_size", 50, "embedding size")tf.app.flags.DEFINE_integer("embedding_size", 200, "embedding size")就好

    opened by rockyzhengwu 5
  • 词性标注模型最后一步报错 MemoryError

    词性标注模型最后一步报错 MemoryError

    $ python tools/freeze_graph.py --input_graph pos_logs/graph.pbtxt --input_checkpoint pos_logs/model.ckpt --output_node_names "transitions,Reshape_9" --output_graph kcws/models/pos_model.pbtxt Traceback (most recent call last): File "tools/freeze_graph.py", line 202, in app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "tools/freeze_graph.py", line 134, in main FLAGS.variable_names_blacklist) File "tools/freeze_graph.py", line 93, in freeze_graph text_format.Merge(f.read().decode("utf-8"), input_graph_def) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 525, in Merge descriptor_pool=descriptor_pool) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 579, in MergeLines return parser.MergeLines(lines, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 612, in MergeLines self._ParseOrMerge(lines, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 627, in _ParseOrMerge self._MergeField(tokenizer, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 714, in _MergeField tokenizer.Consume(':') File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1078, in Consume if not self.TryConsume(token): File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1065, in TryConsume self.NextToken() File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1314, in NextToken match = self._TOKEN.match(self._current_line, self._column) MemoryError

    opened by kinghuangdd 4
  • 编译后台服务出现新错误。。。

    编译后台服务出现新错误。。。

    您好,执行命令:bazel build //kcws/cc:seg_backend_api 报错如下:

    ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:5:1: Reassignment of builtin build function 'package_name' not permitted. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/glog/BUILD:5:1: Reassignment of builtin build function 'package_name' not permitted. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:empty.cc' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:include/gflags/gflags_declare.h' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:lib/libgflags.a' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:include/gflags/gflags.h' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/base/BUILD:3:1: Target '//third_party/gflags:gflags-cxx' contains an error and its package is in error and referenced by '//base:base'. ERROR: /home/di/pycharmProjects/segment/kcws/base/BUILD:3:1: Target '//third_party/glog:glog-cxx' contains an error and its package is in error and referenced by '//base:base'. ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted. INFO: Elapsed time: 0.167s

    执行命令:bazel build third_party/word2vec:word2vec 能成功bazel,其他的命令如:bazel build kcws/train:generate_training,bazel build kcws/cc:dump_vocab均会类似如上错误。在build文件中加了“licenses(["notice"])”依然不行。。。 请问大神这是是什么原因,有空的话能不能帮看一下,不甚感激!

    opened by yufengzhixing 4
  • 编译后台服务报错

    编译后台服务报错

    WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files: /home/cly/github/kcws/tools/bazel.rc INFO: Writing tracer profile to '/home/cly/.cache/bazel/_bazel_cly/271de499a4ab5fb7350261a41335ecd2/command.profile.gz' ERROR: /home/cly/github/kcws/WORKSPACE:5:1: name 'new_http_archive' is not defined ERROR: /home/cly/github/kcws/WORKSPACE:18:1: name 'new_http_archive' is not defined ERROR: /home/cly/github/kcws/WORKSPACE:34:1: name 'http_archive' is not defined ERROR: error loading package '': Encountered error while reading extension file 'tools/build_defs/repo/http.bzl': no such package '@bazel_tools//tools/build_defs/repo': error loading package 'external': Could not load //external package ERROR: error loading package '': Encountered error while reading extension file 'tools/build_defs/repo/http.bzl': no such package '@bazel_tools//tools/build_defs/repo': error loading package 'external': Could not load //external package INFO: Elapsed time: 0.032s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)

    opened by lingyiliu016 2
  • error C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2). cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings”

    error C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2). cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings”

    ERROR: C:/users/thomas/appdata/local/temp/_bazel_thomas/infhcau0/external/protob uf/BUILD:113:1: C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2): cl.exe failed: error executing command cd C:/users/thomas/appdata/local/temp/_bazel_thomas/infhcau0/execroot/main

    SET INCLUDE=F:\Tools\Microsoft Visual Studio 14.0\VC\INCLUDE;F:\Tools\Microsof t Visual Studio 14.0\VC\ATLMFC\INCLUDE;C:\Program Files (x86)\Windows Kits\10\in clude\10.0.14393.0\ucrt;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\inclu de\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.14393.0\shared;C:\Prog ram Files (x86)\Windows Kits\10\include\10.0.14393.0\um;C:\Program Files (x86)\W indows Kits\10\include\10.0.14393.0\winrt; SET LIB=F:\Tools\Microsoft Visual Studio 14.0\VC\LIB\amd64;F:\Tools\Microsof t Visual Studio 14.0\VC\ATLMFC\LIB\amd64;C:\Program Files (x86)\Windows Kits\10
    lib\10.0.14393.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib \um\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.14393.0\um\x64; SET PATH=F:\Tools\Microsoft Visual Studio 14.0\Common7\IDE\CommonExtensions
    Microsoft\TestWindow;F:\Tools\Microsoft Visual Studio 14.0\VC\BIN\amd64;C:\WINDO WS\Microsoft.NET\Framework64\v4.0.30319;F:\Tools\Microsoft Visual Studio 14.0\VC \VCPackages;F:\Tools\Microsoft Visual Studio 14.0\Common7\IDE;F:\Tools\Microsoft Visual Studio 14.0\Common7\Tools;F:\Tools\Microsoft Visual Studio 14.0\Team Too ls\Performance Tools\x64;F:\Tools\Microsoft Visual Studio 14.0\Team Tools\Perfor mance Tools;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Program Files (x86 )\Windows Kits\10\bin\x86;C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\b in\NETFX 4.6.1 Tools\x64;;C:\WINDOWS\system32 SET PWD=/proc/self/cwd SET TEMP=C:\Users\Thomas\AppData\Local\Temp SET TMP=C:\Users\Thomas\AppData\Local\Temp F:/Tools/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe /c external/protobuf /src/google/protobuf/struct.pb.cc /Fobazel-out/msvc_x64-fastbuild/bin/external/p rotobuf/objs/protobuf/external/protobuf/src/google/protobuf/struct.pb.o /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm50 0 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /Iexternal/protobuf /Ibazel-o ut/msvc_x64-fastbuild/genfiles/external/protobuf /Iexternal/bazel_tools /Ibazel- out/msvc_x64-fastbuild/genfiles/external/bazel_tools /Iexternal/protobuf/src /Ib azel-out/msvc_x64-fastbuild/genfiles/external/protobuf/src /Iexternal/bazel_tool s/tools/cpp/gcc3 /showIncludes /MT /Od /Z7 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function. cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings” Target //kcws/cc:seg_backend_api failed to build ____Elapsed time: 2.704s, Critical Path: 0.13s

    opened by thomas1984 2
  • 关于模型导出--output_node_names

    关于模型导出--output_node_names "transitions,Reshape_9" "transitions,Reshape_7" 什么意思

    模型导出时指定 output node 在解码的时候作为模型的输出; 训练的时候不是应该指定这两个名字吗? 我在bilstm.py 文件找到了 Reshape_7 这个output的定义 但没找到pos训练 Reshape_9 这个output的定义 以及transitions的定义, 这两个是tensorflow 默认的output node还是什么? 麻烦解释下,谢谢

    opened by forever1dream 3
Releases(test)
Owner
null
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 417 Oct 3, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 3, 2023
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 8, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 6, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

null 99 Dec 12, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 4, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 8, 2022
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 7, 2023
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022