:mag: Ambar: Document Search Engine

RD17

Last update: Jan 9, 2023

Related tags

Organization search search-engine pdf ocr search-in-text self-hosted ambar ambar-search

Overview

🔍 Ambar: Document Search Engine

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

Easily deploy Ambar with a single docker-compose file
Perform Google-like search through your documents and contents of your images
Tag your documents
Use a simple REST API to integrate Ambar into your workflow

Features

Search

Tutorial: Mastering Ambar Search Queries

Fuzzy Search (John~3)
Phrase Search ("John Smith")
Search By Author (author:John)
Search By File Path (filename:*.txt)
Search By Date (when: yesterday, today, lastweek, etc)
Search By Size (size>1M)
Search By Tags (tags:ocr)
Search As You Type
Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

Content Extraction

Ambar supports large files (>30MB)

Supported file types:

ZIP archives
Mail archives (PST)
MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
OCR over images
Email messages with attachments
Adobe PDF (with OCR)
OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
OpenOffice documents
RTF, Plaintext
HTML / XHTML
Multithread processing

Installation

Notice: Ambar requires Docker to run

You can build Docker images by yourself or buy prebuilt Docker images for $50 here.

Installation instruction for prebuilt images: here
Tutorial on how to build images from scratch see below

If you want to see how Ambar works w/o installing it, try our live demo. No signup required.

Building the images yourself

All the images required to run Ambar can be built locally. In general, each image can be built by navigating into the directory of the component in question, performing the compilation steps required and building the image like that:

# From project root
$ cd FrontEnd
$ docker build . -t <image_name>

The resulting image can be referred to by the name specified, and run by the containerization tooling of your choice.

In order to use a local Dockerfile with docker-compose, simply change the image option to build, setting the value to the relative path of the directory containing the Dockerfile. Then run docker-compose build to build the relevant images. For example:

# docker-compose.yml from project root, referencing local dockerfiles
pipeline0:
  build: ./Pipeline/
image: chazu/ambar-pipeline
  localcrawler:
    image: ./LocalCrawler/

Note that some of the components require compilation or other build steps be performed on the host before the docker images can be built. For example, FrontEnd:

# Assuming a suitable version of node.js is installed (docker uses 8.10)
$ npm install
$ npm run compile

FAQ

Is it open-source?

Yes, it's fully open-source.

Is it free?

Yes, it is forever free and open-source.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language please contact us on [email protected].

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

I have a problem what should I do?

Request a dedicated support session by mailing us on [email protected]

Change Log

Privacy Policy

License

MIT License

Comments

Bug: Fresh install, going to server IP redirects me to "https://frontend"

Hey all. I tried to freshly install this using the directions and the ambar.py script. Running Ubuntu Server 17.04. Initially it said ambar running on http://:80 but putting in the IP into the config under fe and host gets it to say http://i.p.i.p:80 but still no change.

Anyone have any ideas what I might be doing wrong? I can provide any more info needed.

Thanks, hbh7
help wanted

opened by hbh7 27
Ambar behind a HTTP proxy

Hi,

I seem to be having problems with the ambar_webapi docker not using the system HTTP proxy correctly.

I have installed ambar self-hosted community edition onto Centos 7, which is behind a HTTP proxy. I have setup systemd for docker to define the HTTP_PROXY and HTTPS_PROXY correctly. ie I can download/book amabar ok.

I can then access the web front end ok, (changed to port 8005) but everything else is standard.. however I can't login, signup or anything - get an 'opps something went wrong message'.

Inspecting the docker log for ambar_webapi seems to show attempts to access a remote host (52.64.9.77) (and amazonaws.com host - mandrillapp.com??) without using the HTTPS proxy ` [root@kgs-sts-fusion ambar]# ./ambar.py start

/\ _ \ /'_/\/\ _\ /\ _ /\ `\
\ \ \L\ /\ \ \ \L\ \ \ \L\ \ \ \L\ \
\ \ __ \ \ _\ \ \ _ <'\ \ __ \ \ , /
\ \ /\ \ \ _/\ \ \ \L\ \ \ /\ \ \ \ \
\ _\ _\ _\ _\ _/ \ _\ _\ _\ _
////// //// /////_// /

Docker version 17.04.0-ce, build 4845c56 docker-compose version 1.13.0, build 1719ceb vm.max_map_count = 262144 net.ipv4.ip_local_port_range = 15000 61000 net.ipv4.tcp_fin_timeout = 30 net.core.somaxconn = 1024 net.core.netdev_max_backlog = 2000 net.ipv4.tcp_max_syn_backlog = 2048 Creating network "ambar_internal_network" with the default driver Creating ambar_db_1 ... Creating ambar_rabbit_1 ... Creating ambar_proxy_1 ... Creating ambar_es_1 ... Creating ambar_db_1 Creating ambar_webapi-cache_1 ... Creating ambar_rabbit_1 Creating ambar_es_1 Creating ambar_webapi-cache_1 Creating ambar_es_1 ... done Creating ambar_webapi_1 ... Creating ambar_webapi_1 ... done Creating ambar_frontend_1 ... Creating ambar_frontend_1 ... done Waiting for Ambar to start... Ambar is running on http://147.66.12.53:8005 [root@kgs-sts-fusion ambar]# cat docker inspect --format='{{.LogPath}}' ambar_webapi_1 {"log":"2017/05/03 05:44:08 Waiting for host: \n","stream":"stderr","time":"2017-05-03T05:44:08.385748978Z"} {"log":"2017/05/03 05:44:08 Waiting for host: es:9200\n","stream":"stderr","time":"2017-05-03T05:44:08.385884331Z"} {"log":"2017/05/03 05:44:08 Connected to unix:///var/run/docker.sock\n","stream":"stderr","time":"2017-05-03T05:44:08.388017144Z"} {"log":"2017/05/03 05:44:22 Received 200 from http://es:9200\n","stream":"stderr","time":"2017-05-03T05:44:22.302769292Z"} {"log":"Crawler schedule service initialized\n","stream":"stdout","time":"2017-05-03T05:44:24.380922736Z"} {"log":"Pipeline initialized\n","stream":"stdout","time":"2017-05-03T05:44:24.71064609Z"} {"log":"Started on :::8080\n","stream":"stdout","time":"2017-05-03T05:44:24.720793191Z"} {"log":"{ [Error: connect ECONNREFUSED 52.64.27.232:443]\n","stream":"stderr","time":"2017-05-03T06:53:42.270438821Z"} {"log":" code: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:42.270489177Z"} {"log":" errno: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:42.270497139Z"} {"log":" syscall: 'connect',\n","stream":"stderr","time":"2017-05-03T06:53:42.270503494Z"} {"log":" address: '52.64.27.232',\n","stream":"stderr","time":"2017-05-03T06:53:42.27050999Z"} {"log":" port: 443 }\n","stream":"stderr","time":"2017-05-03T06:53:42.270516275Z"} {"log":"{ [Error: connect ECONNREFUSED 52.64.9.77:443]\n","stream":"stderr","time":"2017-05-03T06:53:43.182362118Z"} {"log":" code: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:43.18240549Z"} {"log":" errno: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:43.182413382Z"} {"log":" syscall: 'connect',\n","stream":"stderr","time":"2017-05-03T06:53:43.182444112Z"} {"log":" address: '52.64.9.77',\n","stream":"stderr","time":"2017-05-03T06:53:43.182451306Z"} {"log":" port: 443 }\n","stream":"stderr","time":"2017-05-03T06:53:43.182457382Z"} ` it seems that when I try to recover my password I type my email, and hit 'recover password' causes a new entry in the ambar_webapi docker log which looks like our http proxy (see above).

I have not yet been able to login at all to the Ambar web front end.

any ideas?

Regards Kym
bug

opened by knewbery 24
Initial e-mail does not arrive

Hi. What conditions must be met to successfully send login credentials? I'm trying your brilliant software in my internal network and cannot use auth 'none' for security reasons. Here's my specs: Ubuntu 16.04.03 Docker: 17.09.1-ce Docker-compose: 1.18.0

Thanks.
bug

opened by nonylion 20

"Oops.... Something went wrong" during loading

It seems like the api is not accessible, even though installation went without any apparent issue. During loading of the page, I get the error "Oops.... Something went wrong" at the bottom. It looks like the ambar-webapi container is restarting every 5 minutes due to not connecting to the ambar-es container?

andrew@onlyoffice:~$ sudo ./ambar.py start


______           ____     ______  ____
/\  _  \  /'\_/`\/\  _`\  /\  _  \/\  _`\
\ \ \L\ \/\      \ \ \L\ \ \ \L\ \ \ \L\ \
 \ \  __ \ \ \__\ \ \  _ <'\ \  __ \ \ ,  /
  \ \ \/\ \ \ \_/\ \ \ \L\ \ \ \/\ \ \ \ \
   \ \_\ \_\ \_\ \_\ \____/ \ \_\ \_\ \_\ \_\
    \/_/\/_/\/_/ \/_/\/___/   \/_/\/_/\/_/\/ /



Docker version 17.03.1-ce, build c6d412e
docker-compose version 1.11.2, build dfed245
vm.max_map_count = 262144
net.ipv4.ip_local_port_range = 15000 61000
net.ipv4.tcp_fin_timeout = 30
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2000
net.ipv4.tcp_max_syn_backlog = 2048
ambar_db_1 is up-to-date
ambar_es_1 is up-to-date
ambar_rabbit_1 is up-to-date
ambar_frontend_1 is up-to-date
ambar_webapi_1 is up-to-date
ambar_webapi-cache_1 is up-to-date
Waiting for Ambar to start...
Ambar is running on http://10.20.30.13:80

ambar-webapi container log output:

2017/04/07 05:08:51 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
2017/04/07 05:08:52 Waiting for host:
2017/04/07 05:08:52 Waiting for host: es:9200
2017/04/07 05:08:52 Connected to unix:///var/run/docker.sock
2017/04/07 05:13:52 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
2017/04/07 05:13:52 Waiting for host:
2017/04/07 05:13:52 Waiting for host: es:9200
2017/04/07 05:13:52 Connected to unix:///var/run/docker.sock
2017/04/07 05:18:52 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
2017/04/07 05:18:52 Waiting for host:
2017/04/07 05:18:52 Waiting for host: es:9200
2017/04/07 05:18:52 Connected to unix:///var/run/docker.sock

ambar-es container logs:

[2017-04-07T05:22:01,567][INFO ][o.e.n.Node               ] [BtkYnk-] stopping ...
[2017-04-07T05:22:01,633][INFO ][o.e.n.Node               ] [BtkYnk-] stopped
[2017-04-07T05:22:01,633][INFO ][o.e.n.Node               ] [BtkYnk-] closing ...
[2017-04-07T05:22:01,646][INFO ][o.e.n.Node               ] [BtkYnk-] closed
[2017-04-07T05:22:03,494][INFO ][o.e.n.Node               ] [] initializing ...
[2017-04-07T05:22:03,612][INFO ][o.e.e.NodeEnvironment    ] [BtkYnk-] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/mapper/onlyoffice--vg-root)]], net usable_space [34.7gb], net total_space [46.6gb], spins? [possibly], types [ext4]
[2017-04-07T05:22:03,612][INFO ][o.e.e.NodeEnvironment    ] [BtkYnk-] heap size [1007.3mb], compressed ordinary object pointers [true]
[2017-04-07T05:22:03,660][INFO ][o.e.n.Node               ] node name [BtkYnk-] derived from node ID [BtkYnk-rRXGLNCk4JZeisA]; set [node.name] to override
[2017-04-07T05:22:03,665][INFO ][o.e.n.Node               ] version[5.2.2], pid[1], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/4.4.0-72-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [aggs-matrix-stats]
[2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [ingest-common]
[2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-expression]
[2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-groovy]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-mustache]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-painless]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [percolator]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [reindex]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [transport-netty3]
[2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [transport-netty4]
[2017-04-07T05:22:05,242][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded plugin [analysis-morphology]
[2017-04-07T05:22:05,395][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
[2017-04-07T05:22:08,149][INFO ][o.e.n.Node               ] initialized
[2017-04-07T05:22:08,150][INFO ][o.e.n.Node               ] [BtkYnk-] starting ...
[2017-04-07T05:22:08,258][WARN ][i.n.u.i.MacAddressUtil   ] Failed to find a usable hardware address from the network interfaces; using random bytes: f5:84:67:88:74:e6:c5:b2
[2017-04-07T05:22:08,326][INFO ][o.e.t.TransportService   ] [BtkYnk-] publish_address {172.19.0.3:9300}, bound_addresses {[::]:9300}
[2017-04-07T05:22:08,335][INFO ][o.e.b.BootstrapChecks    ] [BtkYnk-] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-04-07T05:22:11,400][INFO ][o.e.c.s.ClusterService   ] [BtkYnk-] new_master {BtkYnk-}{BtkYnk-rRXGLNCk4JZeisA}{bcr5fJbTS6WeNLWTn3-wbg}{172.19.0.3}{172.19.0.3:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-04-07T05:22:11,419][INFO ][o.e.h.HttpServer         ] [BtkYnk-] publish_address {172.19.0.3:9200}, bound_addresses {[::]:9200}
[2017-04-07T05:22:11,419][INFO ][o.e.n.Node               ] [BtkYnk-] started
[2017-04-07T05:22:11,669][INFO ][o.e.g.GatewayService     ] [BtkYnk-] recovered [2] indices into cluster_state
[2017-04-07T05:22:12,231][INFO ][o.e.c.r.a.AllocationService] [BtkYnk-] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[ambar_log_record_data][7]] ...]).

bug

opened by agreenfield1 18

Cannot view/download files

Hi,

I am struggling to understand how to access my files from the Web interface?

Is there meant to be a download button? I can find the image preview, but that is it..

bug

opened by dandantheflyingman 15
SMB crawler not working, share verified working

Installed clean today on clean Ubuntu 16.04 install. Verified I can connect to the share from Windows and Linux using mount -t cifs. Crawler config:

{ "id": "data", "uid": "data_d033e22ae348aeb5660fc2140aec35850c4da997", "description": "nas crawler", "type": "smb", "locations": [ { "host_name": "nas", "ip_address": "10.0.0.100", "location": "data" } ], "file_regex": "(\.doc[a-z]$)|(\.xls[a-z]$)|(\.txt$)|(\.csv$)|(\.htm[a-z]$)|(\.ppt[a-z]$)|(\.pdf$)|(\.msg$)|(\.eml$)|(\.rtf$)|(\.md$)|(\.png$)|(\.bmp$)|(\.tif[f]$)|(\.jp[e]g$)|(\.hwp$)", "credentials": { "auth_type": "ntlm", "login": "jes", "password": "*****", "token": "" }, "schedule": { "is_active": true, "cron_schedule": "/15 * * * *" }, "max_file_size_bytes": 30000000, "verbose": true }

Error: 2017-07-14 11:15:00.688: [info] filecrawler initialized 2017-07-14 11:15:00.695: [error] 2017-07-14 11:15:00.700: [error] error connecting to Smb share on nas

Notice that there is nothing by the error at all.

Also, how do I get to the logs for this system? I looked at docker logs but they said nothing about this issue. Thank you.
help wanted

opened by effnorwood 15
Ambar is loading ...

I followed the step-by-step with same environment, ubuntu server 16.04LTS. Docker CE version 17.06.2 However, I got "Ambar is loading..." "Oops something went wrong" message. I saw same error message in closed issue. Please advise.
help wanted

opened by andychoi 14

Invalid port specification: "None"

[root@searchbox ambar]# ./ambar.py start
 

______           ____     ______  ____       
/\  _  \  /'\_/`\/\  _`\  /\  _  \/\  _`\    
\ \ \L\ \/\      \ \ \L\ \ \ \L\ \ \ \L\ \  
 \ \  __ \ \ \__\ \ \  _ <'\ \  __ \ \ ,  /   
  \ \ \/\ \ \ \_/\ \ \ \L\ \ \ \/\ \ \ \ \  
   \ \_\ \_\ \_\ \_\ \____/ \ \_\ \_\ \_\ \_\
    \/_/\/_/\/_/ \/_/\/___/   \/_/\/_/\/_/\/ /


                                              
Docker version 1.12.1, build 23cf638
docker-compose version 1.12.0, build b31ff33
vm.max_map_count = 262144
net.ipv4.ip_local_port_range = 15000 61000
net.ipv4.tcp_fin_timeout = 30
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2000
net.ipv4.tcp_max_syn_backlog = 2048
Creating ambar_db_1
Creating ambar_rabbit_1
Creating ambar_es_1
Creating ambar_frontend_1

ERROR: for es  Cannot create container for service es: b'Invalid port specification: "None"'

ERROR: for db  Cannot create container for service db: b'Invalid port specification: "None"'

ERROR: for rabbit  Cannot create container for service rabbit: b'Invalid port specification: "None"'

ERROR: for frontend  Cannot create container for service frontend: b'Invalid port specification: "None"'
ERROR: Encountered errors while bringing up the project.
Traceback (most recent call last):
  File "./ambar.py", line 218, in <module>
    start(configuration)
  File "./ambar.py", line 187, in start
    runShellCommandStrict('docker-compose -f {0}/docker-compose.yml -p ambar up -d'.format(PATH))
  File "./ambar.py", line 45, in runShellCommandStrict
    subprocess.check_call(command, shell = True)
  File "/usr/local/lib/python3.5/subprocess.py", line 584, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'docker-compose -f /root/ambar/docker-compose.yml -p ambar up -d' returned non-zero exit status 1

opened by kirichenko 14

Tag by folder

I'm trying to follow what happened in Issue #175 but am unable to reproduce his results.

Here's my code:

def AutoTagAmbarFile(self, AmbarFile): self.SetOCRTag(AmbarFile) self.SetSourceIdTag(AmbarFile) self.SetArchiveTag(AmbarFile) self.SetImageTag(AmbarFile) self.SetFolderTag(AmbarFile)

Followed by this:

def SetFolderTag(self, AmbarFile): if('folderName' in AmbarFile['meta']['full_name']): self.AddTagToAmbarFile(AmbarFile['file_id'], AmbarFile['meta']['full_name'] ,self.AUTO_TAG_TYPE, 'folderName')

I've tried altering a pre-existing tag as did the poster in Issue #175 , but was unable to see any change after I rebuilt the Pipeline image, pulled the new image, and spun up a new instance of AMBAR. I've tried clearing my browser cache, as that had caused issues in the past, but there was no change.

Is there somewhere else I need to change some code in order for the new tag to show up on the search page?

Thanks in advance for any help you can offer!

opened by s1rk1t 13
ERROR: for serviceapi Container "xxxx" is unhealthy.

Hi,

I received this error while trying to start docker : I think there is a problem with ElasticSearch service. sudo docker-compose up -d root_db_1 is up-to-date root_es_1 is up-to-date root_rabbit_1 is up-to-date root_redis_1 is up-to-date ERROR: for serviceapi Container "b5182a16944e" is unhealthy. ERROR: Encountered errors while bringing up the project.

`version: "2.1" networks: internal_network: services: db: restart: always networks: - internal_network image: ambar/ambar-mongodb:2.0.1 environment: - cacheSizeGB=2 volumes: - /home/docker/db:/data/db expose: - "27017" ports: - "27017:27017" es: restart: always networks: - internal_network image: ambar/ambar-es:2.0.1 expose: - "9200" ports: - "9200:9200" environment: - cluster.name=ambar-es - ES_JAVA_OPTS=-Xms2g -Xmx2g ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 cap_add: - IPC_LOCK volumes: - /home/docker/es:/usr/share/elasticsearch/data rabbit: restart: always networks: - internal_network image: ambar/ambar-rabbit:2.0.1 hostname: rabbit expose: - "15672" - "5672" ports: - "15672:15672" - "5672:5672" volumes: - /home/docker/rabbit:/var/lib/rabbitmq redis: restart: always sysctls: - net.core.somaxconn=1024 networks: - internal_network image: ambar/ambar-redis:2.0.1 expose: - "6379" ports: - "6379:6379" serviceapi: depends_on: redis: condition: service_healthy rabbit: condition: service_healthy es: condition: service_healthy db: condition: service_healthy restart: always networks: - internal_network image: ambar/ambar-serviceapi:2.0.1 expose: - "8081" ports: - "8081:8081" environment: - mongoDbUrl=mongodb://db:27017/ambar_data - elasticSearchUrl=http://es:9200 - redisHost=redis - redisPort=6379 - rabbitHost=amqp://rabbit - langAnalyzer=ambar_en volumes: - /var/run/docker.sock:/var/run/docker.sock webapi: depends_on: serviceapi: condition: service_healthy restart: always networks: restart: always networks: - internal_network image: ambar/ambar-webapi:2.0.1 expose: - "8080" ports: - "8080:8080" environment: - analyticsToken=cda4b0bb11a1f32aed7564b08c455992 - uiLang=en - mongoDbUrl=mongodb://db:27017/ambar_data - elasticSearchUrl=http://es:9200 - redisHost=redis - redisPort=6379 - serviceApiUrl=http://serviceapi:8081 - rabbitHost=amqp://rabbit volumes: - /var/run/docker.sock:/var/run/docker.sock frontend: depends_on: webapi: condition: service_healthy image: ambar/ambar-frontend:2.0.1 restart: always networks: - internal_network ports: - "80:80" expose: - "80" environment: - api=http://145.239.139.196:8080 pipeline0: depends_on: serviceapi: condition: service_healthy image: ambar/ambar-pipeline:2.0.1 restart: always networks: - internal_network environment: - id=0 - api_url=http://serviceapi:8081 - rabbit_host=amqp://rabbit crawler0: depends_on: serviceapi: condition: service_healthy image: ambar/ambar-local-crawler restart: always image: ambar/ambar-local-crawler restart: always networks: - internal_network environment: - apiUrl=http://serviceapi:8081 - crawlPath=/usr/data - name=craw volumes: - /home/docker/2:/usr/data

opened by mizbanpaytakht 13

Getting lots of the following 2 errors running docker-compose build

Running latest off master I see lots of index_not_found errors. At what point and whose responsibility is it to post the index to es?

serviceapi_1    | { Error: [index_not_found_exception] no such index, with { resource.type="index_or_alias" & resource.id="ambar_file_data" & index_uuid="_na_" & index="ambar_file_data" }
serviceapi_1    |     at respond (/node_modules/elasticsearch/src/lib/transport.js:289:15)
serviceapi_1    |     at checkRespForFailure (/node_modules/elasticsearch/src/lib/transport.js:248:7)
serviceapi_1    |     at HttpConnector.<anonymous> (/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
serviceapi_1    |     at IncomingMessage.wrapper (/node_modules/lodash/lodash.js:4929:19)
serviceapi_1    |     at emitNone (events.js:111:20)
serviceapi_1    |     at IncomingMessage.emit (events.js:208:7)
serviceapi_1    |     at endReadableNT (_stream_readable.js:1064:12)
serviceapi_1    |     at _combinedTickCallback (internal/process/next_tick.js:138:11)
serviceapi_1    |     at process._tickCallback (internal/process/next_tick.js:180:9)
serviceapi_1    |   status: 404,
serviceapi_1    |   displayName: 'NotFound',
serviceapi_1    |   message: '[index_not_found_exception] no such index, with { resource.type="index_or_alias" & resource.id="ambar_file_data" & index_uuid="_na_" & index="ambar_file_data" }',
serviceapi_1    |   path: '/ambar_file_data/ambar_file/_search',
serviceapi_1    |   query: { _source: 'false' },
serviceapi_1    |   body:
serviceapi_1    |    { error:
serviceapi_1    |       { root_cause: [Array],
serviceapi_1    |         type: 'index_not_found_exception',
serviceapi_1    |         reason: 'no such index',
serviceapi_1    |         'resource.type': 'index_or_alias',
serviceapi_1    |         'resource.id': 'ambar_file_data',
serviceapi_1    |         index_uuid: '_na_',
serviceapi_1    |         index: 'ambar_file_data' },
serviceapi_1    |      status: 404 },
serviceapi_1    |   statusCode: 404,
serviceapi_1    |   response: '{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"ambar_file_data","index_uuid":"_na_","index":"ambar_file_data"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"ambar_file_data","index_uuid":"_na_","index":"ambar_file_data"},"status":404}',
serviceapi_1    |   toString: [Function],
serviceapi_1    |   toJSON: [Function] }

wontfix

opened by AYapejian 12

no basic auth credentials

Hi! I bought a prebuilt image and got the instructions in the letter. I logged in with the information they sent me. But anytime i try to docker-compose pull i get error no basic auth credentials . What should I do?

opened by sylzerret 0
fix: LocalCrawler/Dockerfile to reduce vulnerabilities
The following vulnerabilities are fixed with an upgrade:

https://snyk.io/vuln/SNYK-DEBIAN8-GIT-340820

https://snyk.io/vuln/SNYK-DEBIAN8-GIT-340907

https://snyk.io/vuln/SNYK-DEBIAN8-PROCPS-309313

https://snyk.io/vuln/SNYK-DEBIAN8-WGET-300469

https://snyk.io/vuln/SNYK-UPSTREAM-NODE-538286
opened by ghost 0

Releases(v2.1.18)

v2.1.18(Sep 17, 2018)

Source code(tar.gz)
Source code(zip)
v2.1.8(May 18, 2018)

Release notes
Source code(tar.gz)
Source code(zip)
v2.0.0rc(Apr 18, 2018)

Source code(tar.gz)
Source code(zip)

:mag: Ambar: Document Search Engine

Related tags

Overview

🔍 Ambar: Document Search Engine

Features

Search

Crawling

Content Extraction

Installation

Building the images yourself

FAQ

Is it open-source?

Is it free?

Does it perform OCR?

Which languages are supported for OCR?

Does it support tagging?

What about searching in PDF?

What is the maximum file size it can handle?

I have a problem what should I do?

Sponsors

Change Log

Privacy Policy

License

Comments

Releases(v2.1.18)

v2.1.18(Sep 17, 2018)

v2.1.8(May 18, 2018)

v2.0.0rc(Apr 18, 2018)

Owner

RD17

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Deep Image Search - AI-Based Image Search Engine

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Open Source Discord bot with many cool features like Weather, Balance, Avatar, User, Server, RP-commands, Gif search, YouTube search, VK post search etc.

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

🛠️ Learn a technology X by doing a project - Search engine of project-based learning

Pixel art search engine for opengameart

All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

domhttpx is a google search engine dorker with HTTP toolkit built with python, can make it easier for you to find many URLs/IPs at once with fast time.

Google Search Engine Results Pages (SERP) in locally, no API key, no signup required

Simple algorithm search engine like google in python using function

Crawl the information of a given keyword on Google search engine

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

A sentence search engine that fetches examples from trusted news/media organisations. Great for writing better English.