:mag: Ambar: Document Search Engine

Overview

Version License

🔍 Ambar: Document Search Engine

Ambar Search

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

  • Easily deploy Ambar with a single docker-compose file
  • Perform Google-like search through your documents and contents of your images
  • Tag your documents
  • Use a simple REST API to integrate Ambar into your workflow

Features

Search

Tutorial: Mastering Ambar Search Queries

  • Fuzzy Search (John~3)
  • Phrase Search ("John Smith")
  • Search By Author (author:John)
  • Search By File Path (filename:*.txt)
  • Search By Date (when: yesterday, today, lastweek, etc)
  • Search By Size (size>1M)
  • Search By Tags (tags:ocr)
  • Search As You Type
  • Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

Content Extraction

Ambar supports large files (>30MB)

Supported file types:

  • ZIP archives
  • Mail archives (PST)
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OCR over images
  • Email messages with attachments
  • Adobe PDF (with OCR)
  • OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
  • OpenOffice documents
  • RTF, Plaintext
  • HTML / XHTML
  • Multithread processing

Installation

Notice: Ambar requires Docker to run

You can build Docker images by yourself or buy prebuilt Docker images for $50 here.

  • Installation instruction for prebuilt images: here
  • Tutorial on how to build images from scratch see below

If you want to see how Ambar works w/o installing it, try our live demo. No signup required.

Building the images yourself

All the images required to run Ambar can be built locally. In general, each image can be built by navigating into the directory of the component in question, performing the compilation steps required and building the image like that:

# From project root
$ cd FrontEnd
$ docker build . -t <image_name>

The resulting image can be referred to by the name specified, and run by the containerization tooling of your choice.

In order to use a local Dockerfile with docker-compose, simply change the image option to build, setting the value to the relative path of the directory containing the Dockerfile. Then run docker-compose build to build the relevant images. For example:

# docker-compose.yml from project root, referencing local dockerfiles
pipeline0:
  build: ./Pipeline/
image: chazu/ambar-pipeline
  localcrawler:
    image: ./LocalCrawler/

Note that some of the components require compilation or other build steps be performed on the host before the docker images can be built. For example, FrontEnd:

# Assuming a suitable version of node.js is installed (docker uses 8.10)
$ npm install
$ npm run compile

FAQ

Is it open-source?

Yes, it's fully open-source.

Is it free?

Yes, it is forever free and open-source.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language please contact us on [email protected].

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

I have a problem what should I do?

Request a dedicated support session by mailing us on [email protected]

Sponsors

Change Log

Change Log

Privacy Policy

Privacy Policy

License

MIT License

Issues
  • Bug: Fresh install, going to server IP redirects me to

    Bug: Fresh install, going to server IP redirects me to "https://frontend"

    Hey all. I tried to freshly install this using the directions and the ambar.py script. Running Ubuntu Server 17.04. Initially it said ambar running on http://:80 but putting in the IP into the config under fe and host gets it to say http://i.p.i.p:80 but still no change.

    Anyone have any ideas what I might be doing wrong? I can provide any more info needed.

    Thanks, hbh7

    help wanted 
    opened by hbh7 27
  • Ambar behind a HTTP proxy

    Ambar behind a HTTP proxy

    Hi,

    I seem to be having problems with the ambar_webapi docker not using the system HTTP proxy correctly.

    I have installed ambar self-hosted community edition onto Centos 7, which is behind a HTTP proxy. I have setup systemd for docker to define the HTTP_PROXY and HTTPS_PROXY correctly. ie I can download/book amabar ok.

    I can then access the web front end ok, (changed to port 8005) but everything else is standard.. however I can't login, signup or anything - get an 'opps something went wrong message'.

    Inspecting the docker log for ambar_webapi seems to show attempts to access a remote host (52.64.9.77) (and amazonaws.com host - mandrillapp.com??) without using the HTTPS proxy ` [[email protected] ambar]# ./ambar.py start


    /\ _ \ /'_/\/\ _\ /\ _ /\ `\
    \ \ \L\ /\ \ \ \L\ \ \ \L\ \ \ \L\ \
    \ \ __ \ \ _
    \ \ \ _ <'\ \ __ \ \ , /
    \ \ /\ \ \ _/\ \ \ \L\ \ \ /\ \ \ \ \
    \ _\ _\ _\ _\ _/ \ _\ _\ _\ _
    /
    ///// ///
    / /////_// /

    Docker version 17.04.0-ce, build 4845c56 docker-compose version 1.13.0, build 1719ceb vm.max_map_count = 262144 net.ipv4.ip_local_port_range = 15000 61000 net.ipv4.tcp_fin_timeout = 30 net.core.somaxconn = 1024 net.core.netdev_max_backlog = 2000 net.ipv4.tcp_max_syn_backlog = 2048 Creating network "ambar_internal_network" with the default driver Creating ambar_db_1 ... Creating ambar_rabbit_1 ... Creating ambar_proxy_1 ... Creating ambar_es_1 ... Creating ambar_db_1 Creating ambar_webapi-cache_1 ... Creating ambar_rabbit_1 Creating ambar_es_1 Creating ambar_webapi-cache_1 Creating ambar_es_1 ... done Creating ambar_webapi_1 ... Creating ambar_webapi_1 ... done Creating ambar_frontend_1 ... Creating ambar_frontend_1 ... done Waiting for Ambar to start... Ambar is running on http://147.66.12.53:8005 [[email protected] ambar]# cat docker inspect --format='{{.LogPath}}' ambar_webapi_1 {"log":"2017/05/03 05:44:08 Waiting for host: \n","stream":"stderr","time":"2017-05-03T05:44:08.385748978Z"} {"log":"2017/05/03 05:44:08 Waiting for host: es:9200\n","stream":"stderr","time":"2017-05-03T05:44:08.385884331Z"} {"log":"2017/05/03 05:44:08 Connected to unix:///var/run/docker.sock\n","stream":"stderr","time":"2017-05-03T05:44:08.388017144Z"} {"log":"2017/05/03 05:44:22 Received 200 from http://es:9200\n","stream":"stderr","time":"2017-05-03T05:44:22.302769292Z"} {"log":"Crawler schedule service initialized\n","stream":"stdout","time":"2017-05-03T05:44:24.380922736Z"} {"log":"Pipeline initialized\n","stream":"stdout","time":"2017-05-03T05:44:24.71064609Z"} {"log":"Started on :::8080\n","stream":"stdout","time":"2017-05-03T05:44:24.720793191Z"} {"log":"{ [Error: connect ECONNREFUSED 52.64.27.232:443]\n","stream":"stderr","time":"2017-05-03T06:53:42.270438821Z"} {"log":" code: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:42.270489177Z"} {"log":" errno: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:42.270497139Z"} {"log":" syscall: 'connect',\n","stream":"stderr","time":"2017-05-03T06:53:42.270503494Z"} {"log":" address: '52.64.27.232',\n","stream":"stderr","time":"2017-05-03T06:53:42.27050999Z"} {"log":" port: 443 }\n","stream":"stderr","time":"2017-05-03T06:53:42.270516275Z"} {"log":"{ [Error: connect ECONNREFUSED 52.64.9.77:443]\n","stream":"stderr","time":"2017-05-03T06:53:43.182362118Z"} {"log":" code: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:43.18240549Z"} {"log":" errno: 'ECONNREFUSED',\n","stream":"stderr","time":"2017-05-03T06:53:43.182413382Z"} {"log":" syscall: 'connect',\n","stream":"stderr","time":"2017-05-03T06:53:43.182444112Z"} {"log":" address: '52.64.9.77',\n","stream":"stderr","time":"2017-05-03T06:53:43.182451306Z"} {"log":" port: 443 }\n","stream":"stderr","time":"2017-05-03T06:53:43.182457382Z"} ` it seems that when I try to recover my password I type my email, and hit 'recover password' causes a new entry in the ambar_webapi docker log which looks like our http proxy (see above).

    I have not yet been able to login at all to the Ambar web front end.

    any ideas?

    Regards Kym

    bug 
    opened by knewbery 24
  • Initial e-mail does not arrive

    Initial e-mail does not arrive

    Hi. What conditions must be met to successfully send login credentials? I'm trying your brilliant software in my internal network and cannot use auth 'none' for security reasons. Here's my specs: Ubuntu 16.04.03 Docker: 17.09.1-ce Docker-compose: 1.18.0

    Thanks.

    bug 
    opened by nonylion 20
  • "Oops.... Something went wrong" during loading

    It seems like the api is not accessible, even though installation went without any apparent issue. During loading of the page, I get the error "Oops.... Something went wrong" at the bottom. It looks like the ambar-webapi container is restarting every 5 minutes due to not connecting to the ambar-es container?

    [email protected]:~$ sudo ./ambar.py start
    
    
    ______           ____     ______  ____
    /\  _  \  /'\_/`\/\  _`\  /\  _  \/\  _`\
    \ \ \L\ \/\      \ \ \L\ \ \ \L\ \ \ \L\ \
     \ \  __ \ \ \__\ \ \  _ <'\ \  __ \ \ ,  /
      \ \ \/\ \ \ \_/\ \ \ \L\ \ \ \/\ \ \ \ \
       \ \_\ \_\ \_\ \_\ \____/ \ \_\ \_\ \_\ \_\
        \/_/\/_/\/_/ \/_/\/___/   \/_/\/_/\/_/\/ /
    
    
    
    Docker version 17.03.1-ce, build c6d412e
    docker-compose version 1.11.2, build dfed245
    vm.max_map_count = 262144
    net.ipv4.ip_local_port_range = 15000 61000
    net.ipv4.tcp_fin_timeout = 30
    net.core.somaxconn = 1024
    net.core.netdev_max_backlog = 2000
    net.ipv4.tcp_max_syn_backlog = 2048
    ambar_db_1 is up-to-date
    ambar_es_1 is up-to-date
    ambar_rabbit_1 is up-to-date
    ambar_frontend_1 is up-to-date
    ambar_webapi_1 is up-to-date
    ambar_webapi-cache_1 is up-to-date
    Waiting for Ambar to start...
    Ambar is running on http://10.20.30.13:80
    

    ambar-webapi container log output:

    2017/04/07 05:08:51 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
    2017/04/07 05:08:52 Waiting for host:
    2017/04/07 05:08:52 Waiting for host: es:9200
    2017/04/07 05:08:52 Connected to unix:///var/run/docker.sock
    2017/04/07 05:13:52 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
    2017/04/07 05:13:52 Waiting for host:
    2017/04/07 05:13:52 Waiting for host: es:9200
    2017/04/07 05:13:52 Connected to unix:///var/run/docker.sock
    2017/04/07 05:18:52 Timeout after 5m0s waiting on dependencies to become available: [unix:///var/run/docker.sock http://es:9200]
    2017/04/07 05:18:52 Waiting for host:
    2017/04/07 05:18:52 Waiting for host: es:9200
    2017/04/07 05:18:52 Connected to unix:///var/run/docker.sock
    

    ambar-es container logs:

    [2017-04-07T05:22:01,567][INFO ][o.e.n.Node               ] [BtkYnk-] stopping ...
    [2017-04-07T05:22:01,633][INFO ][o.e.n.Node               ] [BtkYnk-] stopped
    [2017-04-07T05:22:01,633][INFO ][o.e.n.Node               ] [BtkYnk-] closing ...
    [2017-04-07T05:22:01,646][INFO ][o.e.n.Node               ] [BtkYnk-] closed
    [2017-04-07T05:22:03,494][INFO ][o.e.n.Node               ] [] initializing ...
    [2017-04-07T05:22:03,612][INFO ][o.e.e.NodeEnvironment    ] [BtkYnk-] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/mapper/onlyoffice--vg-root)]], net usable_space [34.7gb], net total_space [46.6gb], spins? [possibly], types [ext4]
    [2017-04-07T05:22:03,612][INFO ][o.e.e.NodeEnvironment    ] [BtkYnk-] heap size [1007.3mb], compressed ordinary object pointers [true]
    [2017-04-07T05:22:03,660][INFO ][o.e.n.Node               ] node name [BtkYnk-] derived from node ID [BtkYnk-rRXGLNCk4JZeisA]; set [node.name] to override
    [2017-04-07T05:22:03,665][INFO ][o.e.n.Node               ] version[5.2.2], pid[1], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/4.4.0-72-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
    [2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [aggs-matrix-stats]
    [2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [ingest-common]
    [2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-expression]
    [2017-04-07T05:22:05,239][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-groovy]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-mustache]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [lang-painless]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [percolator]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [reindex]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [transport-netty3]
    [2017-04-07T05:22:05,240][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded module [transport-netty4]
    [2017-04-07T05:22:05,242][INFO ][o.e.p.PluginsService     ] [BtkYnk-] loaded plugin [analysis-morphology]
    [2017-04-07T05:22:05,395][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
    [2017-04-07T05:22:08,149][INFO ][o.e.n.Node               ] initialized
    [2017-04-07T05:22:08,150][INFO ][o.e.n.Node               ] [BtkYnk-] starting ...
    [2017-04-07T05:22:08,258][WARN ][i.n.u.i.MacAddressUtil   ] Failed to find a usable hardware address from the network interfaces; using random bytes: f5:84:67:88:74:e6:c5:b2
    [2017-04-07T05:22:08,326][INFO ][o.e.t.TransportService   ] [BtkYnk-] publish_address {172.19.0.3:9300}, bound_addresses {[::]:9300}
    [2017-04-07T05:22:08,335][INFO ][o.e.b.BootstrapChecks    ] [BtkYnk-] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
    [2017-04-07T05:22:11,400][INFO ][o.e.c.s.ClusterService   ] [BtkYnk-] new_master {BtkYnk-}{BtkYnk-rRXGLNCk4JZeisA}{bcr5fJbTS6WeNLWTn3-wbg}{172.19.0.3}{172.19.0.3:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
    [2017-04-07T05:22:11,419][INFO ][o.e.h.HttpServer         ] [BtkYnk-] publish_address {172.19.0.3:9200}, bound_addresses {[::]:9200}
    [2017-04-07T05:22:11,419][INFO ][o.e.n.Node               ] [BtkYnk-] started
    [2017-04-07T05:22:11,669][INFO ][o.e.g.GatewayService     ] [BtkYnk-] recovered [2] indices into cluster_state
    [2017-04-07T05:22:12,231][INFO ][o.e.c.r.a.AllocationService] [BtkYnk-] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[ambar_log_record_data][7]] ...]).
    
    bug 
    opened by agreenfield1 18
  • Cannot view/download files

    Cannot view/download files

    Hi,

    I am struggling to understand how to access my files from the Web interface?

    Is there meant to be a download button? I can find the image preview, but that is it..

    image

    bug 
    opened by dandantheflyingman 15
  • SMB crawler not working, share verified working

    SMB crawler not working, share verified working

    Installed clean today on clean Ubuntu 16.04 install. Verified I can connect to the share from Windows and Linux using mount -t cifs. Crawler config:

    { "id": "data", "uid": "data_d033e22ae348aeb5660fc2140aec35850c4da997", "description": "nas crawler", "type": "smb", "locations": [ { "host_name": "nas", "ip_address": "10.0.0.100", "location": "data" } ], "file_regex": "(\.doc[a-z]$)|(\.xls[a-z]$)|(\.txt$)|(\.csv$)|(\.htm[a-z]$)|(\.ppt[a-z]$)|(\.pdf$)|(\.msg$)|(\.eml$)|(\.rtf$)|(\.md$)|(\.png$)|(\.bmp$)|(\.tif[f]$)|(\.jp[e]g$)|(\.hwp$)", "credentials": { "auth_type": "ntlm", "login": "jes", "password": "*****", "token": "" }, "schedule": { "is_active": true, "cron_schedule": "/15 * * * *" }, "max_file_size_bytes": 30000000, "verbose": true }

    Error: 2017-07-14 11:15:00.688: [info] filecrawler initialized 2017-07-14 11:15:00.695: [error] 2017-07-14 11:15:00.700: [error] error connecting to Smb share on nas

    Notice that there is nothing by the error at all.

    Also, how do I get to the logs for this system? I looked at docker logs but they said nothing about this issue. Thank you.

    help wanted 
    opened by effnorwood 15
  • Ambar is loading ...

    Ambar is loading ...

    I followed the step-by-step with same environment, ubuntu server 16.04LTS. Docker CE version 17.06.2 However, I got "Ambar is loading..." "Oops something went wrong" message. I saw same error message in closed issue. Please advise.

    help wanted 
    opened by andychoi 14
  • Invalid port specification:

    Invalid port specification: "None"

    [[email protected] ambar]# ./ambar.py start
     
    
    ______           ____     ______  ____       
    /\  _  \  /'\_/`\/\  _`\  /\  _  \/\  _`\    
    \ \ \L\ \/\      \ \ \L\ \ \ \L\ \ \ \L\ \  
     \ \  __ \ \ \__\ \ \  _ <'\ \  __ \ \ ,  /   
      \ \ \/\ \ \ \_/\ \ \ \L\ \ \ \/\ \ \ \ \  
       \ \_\ \_\ \_\ \_\ \____/ \ \_\ \_\ \_\ \_\
        \/_/\/_/\/_/ \/_/\/___/   \/_/\/_/\/_/\/ /
    
    
                                                  
    Docker version 1.12.1, build 23cf638
    docker-compose version 1.12.0, build b31ff33
    vm.max_map_count = 262144
    net.ipv4.ip_local_port_range = 15000 61000
    net.ipv4.tcp_fin_timeout = 30
    net.core.somaxconn = 1024
    net.core.netdev_max_backlog = 2000
    net.ipv4.tcp_max_syn_backlog = 2048
    Creating ambar_db_1
    Creating ambar_rabbit_1
    Creating ambar_es_1
    Creating ambar_frontend_1
    
    ERROR: for es  Cannot create container for service es: b'Invalid port specification: "None"'
    
    ERROR: for db  Cannot create container for service db: b'Invalid port specification: "None"'
    
    ERROR: for rabbit  Cannot create container for service rabbit: b'Invalid port specification: "None"'
    
    ERROR: for frontend  Cannot create container for service frontend: b'Invalid port specification: "None"'
    ERROR: Encountered errors while bringing up the project.
    Traceback (most recent call last):
      File "./ambar.py", line 218, in <module>
        start(configuration)
      File "./ambar.py", line 187, in start
        runShellCommandStrict('docker-compose -f {0}/docker-compose.yml -p ambar up -d'.format(PATH))
      File "./ambar.py", line 45, in runShellCommandStrict
        subprocess.check_call(command, shell = True)
      File "/usr/local/lib/python3.5/subprocess.py", line 584, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command 'docker-compose -f /root/ambar/docker-compose.yml -p ambar up -d' returned non-zero exit status 1
    
    opened by kirichenko 14
  • Tag by folder

    Tag by folder

    I'm trying to follow what happened in Issue #175 but am unable to reproduce his results.

    Here's my code:

    def AutoTagAmbarFile(self, AmbarFile): self.SetOCRTag(AmbarFile) self.SetSourceIdTag(AmbarFile) self.SetArchiveTag(AmbarFile) self.SetImageTag(AmbarFile) self.SetFolderTag(AmbarFile)

    Followed by this:

    def SetFolderTag(self, AmbarFile): if('folderName' in AmbarFile['meta']['full_name']): self.AddTagToAmbarFile(AmbarFile['file_id'], AmbarFile['meta']['full_name'] ,self.AUTO_TAG_TYPE, 'folderName')

    I've tried altering a pre-existing tag as did the poster in Issue #175 , but was unable to see any change after I rebuilt the Pipeline image, pulled the new image, and spun up a new instance of AMBAR. I've tried clearing my browser cache, as that had caused issues in the past, but there was no change.

    Is there somewhere else I need to change some code in order for the new tag to show up on the search page?

    Thanks in advance for any help you can offer!

    opened by s1rk1t 13
  • ERROR: for serviceapi  Container

    ERROR: for serviceapi Container "xxxx" is unhealthy.

    Hi,

    I received this error while trying to start docker : I think there is a problem with ElasticSearch service. sudo docker-compose up -d root_db_1 is up-to-date root_es_1 is up-to-date root_rabbit_1 is up-to-date root_redis_1 is up-to-date ERROR: for serviceapi Container "b5182a16944e" is unhealthy. ERROR: Encountered errors while bringing up the project.

    `version: "2.1" networks: internal_network: services: db: restart: always networks: - internal_network image: ambar/ambar-mongodb:2.0.1 environment: - cacheSizeGB=2 volumes: - /home/docker/db:/data/db expose: - "27017" ports: - "27017:27017" es: restart: always networks: - internal_network image: ambar/ambar-es:2.0.1 expose: - "9200" ports: - "9200:9200" environment: - cluster.name=ambar-es - ES_JAVA_OPTS=-Xms2g -Xmx2g ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 cap_add: - IPC_LOCK volumes: - /home/docker/es:/usr/share/elasticsearch/data rabbit: restart: always networks: - internal_network image: ambar/ambar-rabbit:2.0.1 hostname: rabbit expose: - "15672" - "5672" ports: - "15672:15672" - "5672:5672" volumes: - /home/docker/rabbit:/var/lib/rabbitmq redis: restart: always sysctls: - net.core.somaxconn=1024 networks: - internal_network image: ambar/ambar-redis:2.0.1 expose: - "6379" ports: - "6379:6379" serviceapi: depends_on: redis: condition: service_healthy rabbit: condition: service_healthy es: condition: service_healthy db: condition: service_healthy restart: always networks: - internal_network image: ambar/ambar-serviceapi:2.0.1 expose: - "8081" ports: - "8081:8081" environment: - mongoDbUrl=mongodb://db:27017/ambar_data - elasticSearchUrl=http://es:9200 - redisHost=redis - redisPort=6379 - rabbitHost=amqp://rabbit - langAnalyzer=ambar_en volumes: - /var/run/docker.sock:/var/run/docker.sock webapi: depends_on: serviceapi: condition: service_healthy restart: always networks: restart: always networks: - internal_network image: ambar/ambar-webapi:2.0.1 expose: - "8080" ports: - "8080:8080" environment: - analyticsToken=cda4b0bb11a1f32aed7564b08c455992 - uiLang=en - mongoDbUrl=mongodb://db:27017/ambar_data - elasticSearchUrl=http://es:9200 - redisHost=redis - redisPort=6379 - serviceApiUrl=http://serviceapi:8081 - rabbitHost=amqp://rabbit volumes: - /var/run/docker.sock:/var/run/docker.sock frontend: depends_on: webapi: condition: service_healthy image: ambar/ambar-frontend:2.0.1 restart: always networks: - internal_network ports: - "80:80" expose: - "80" environment: - api=http://145.239.139.196:8080 pipeline0: depends_on: serviceapi: condition: service_healthy image: ambar/ambar-pipeline:2.0.1 restart: always networks: - internal_network environment: - id=0 - api_url=http://serviceapi:8081 - rabbit_host=amqp://rabbit crawler0: depends_on: serviceapi: condition: service_healthy image: ambar/ambar-local-crawler restart: always image: ambar/ambar-local-crawler restart: always networks: - internal_network environment: - apiUrl=http://serviceapi:8081 - crawlPath=/usr/data - name=craw volumes: - /home/docker/2:/usr/data

    opened by mizbanpaytakht 13
  • Getting lots of the following 2 errors running docker-compose build

    Getting lots of the following 2 errors running docker-compose build

    Running latest off master I see lots of index_not_found errors. At what point and whose responsibility is it to post the index to es?

    serviceapi_1    | { Error: [index_not_found_exception] no such index, with { resource.type="index_or_alias" & resource.id="ambar_file_data" & index_uuid="_na_" & index="ambar_file_data" }
    serviceapi_1    |     at respond (/node_modules/elasticsearch/src/lib/transport.js:289:15)
    serviceapi_1    |     at checkRespForFailure (/node_modules/elasticsearch/src/lib/transport.js:248:7)
    serviceapi_1    |     at HttpConnector.<anonymous> (/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
    serviceapi_1    |     at IncomingMessage.wrapper (/node_modules/lodash/lodash.js:4929:19)
    serviceapi_1    |     at emitNone (events.js:111:20)
    serviceapi_1    |     at IncomingMessage.emit (events.js:208:7)
    serviceapi_1    |     at endReadableNT (_stream_readable.js:1064:12)
    serviceapi_1    |     at _combinedTickCallback (internal/process/next_tick.js:138:11)
    serviceapi_1    |     at process._tickCallback (internal/process/next_tick.js:180:9)
    serviceapi_1    |   status: 404,
    serviceapi_1    |   displayName: 'NotFound',
    serviceapi_1    |   message: '[index_not_found_exception] no such index, with { resource.type="index_or_alias" & resource.id="ambar_file_data" & index_uuid="_na_" & index="ambar_file_data" }',
    serviceapi_1    |   path: '/ambar_file_data/ambar_file/_search',
    serviceapi_1    |   query: { _source: 'false' },
    serviceapi_1    |   body:
    serviceapi_1    |    { error:
    serviceapi_1    |       { root_cause: [Array],
    serviceapi_1    |         type: 'index_not_found_exception',
    serviceapi_1    |         reason: 'no such index',
    serviceapi_1    |         'resource.type': 'index_or_alias',
    serviceapi_1    |         'resource.id': 'ambar_file_data',
    serviceapi_1    |         index_uuid: '_na_',
    serviceapi_1    |         index: 'ambar_file_data' },
    serviceapi_1    |      status: 404 },
    serviceapi_1    |   statusCode: 404,
    serviceapi_1    |   response: '{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"ambar_file_data","index_uuid":"_na_","index":"ambar_file_data"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"ambar_file_data","index_uuid":"_na_","index":"ambar_file_data"},"status":404}',
    serviceapi_1    |   toString: [Function],
    serviceapi_1    |   toJSON: [Function] }
    
    wontfix 
    opened by AYapejian 12
  • no basic auth credentials

    no basic auth credentials

    Hi! I bought a prebuilt image and got the instructions in the letter. I logged in with the information they sent me. But anytime i try to docker-compose pull i get error no basic auth credentials . What should I do?

    opened by sylzerret 0
  • fix: LocalCrawler/Dockerfile to reduce vulnerabilities

    fix: LocalCrawler/Dockerfile to reduce vulnerabilities

    The following vulnerabilities are fixed with an upgrade:

    • https://snyk.io/vuln/SNYK-DEBIAN8-GIT-340820
    • https://snyk.io/vuln/SNYK-DEBIAN8-GIT-340907
    • https://snyk.io/vuln/SNYK-DEBIAN8-PROCPS-309313
    • https://snyk.io/vuln/SNYK-DEBIAN8-WGET-300469
    • https://snyk.io/vuln/SNYK-UPSTREAM-NODE-538286
    opened by ghost 0
Releases(v2.1.18)
Owner
RD17
Creating custom software to suit any need
RD17
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 4.7k May 26, 2022
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 65 May 16, 2022
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

null 14 Apr 20, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

null 18 Dec 2, 2021
Deep Image Search - AI-Based Image Search Engine

Deep Image Search is an AI-based image search engine that includes deep transfer learning features Extraction and tree-based vectorized search technique.

null 65 May 16, 2022
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 18 Apr 20, 2022
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 690 May 11, 2022
Open Source Discord bot with many cool features like Weather, Balance, Avatar, User, Server, RP-commands, Gif search, YouTube search, VK post search etc.

Сокобот Дискорд бот с открытым исходным кодом. Содержит в себе экономику, полезные команды (!аватар, !юзер, !сервер и тд.), рп-команды (!обнять, !глад

serverok 2 Jan 16, 2022
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Manos Pitsidianakis 149 May 19, 2022
🛠️ Learn a technology X by doing a project - Search engine of project-based learning

Learn X by doing Y ??️ Learn a technology X by doing a project Y Website You can contribute by adding projects to the CSV file.

William 344 May 19, 2022
Pixel art search engine for opengameart

Pixel Art Reverse Image Search for OpenGameArt What does the final search look like? The final search with an example can be found here. It looks like

Eivind Magnus Hvidevold 91 May 23, 2022
All in one Search Engine Scrapper for used by API or Python Module. It's Free!

All in one Search Engine Scrapper for used by API or Python Module. How to use: Video Documentation Senginta is All in one Search Engine Scrapper. Wit

null 30 Oct 23, 2021
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Search Engine (GShop, GVideo and many too) and Baidu Search Engine.

null 30 Oct 23, 2021
domhttpx is a google search engine dorker with HTTP toolkit built with python, can make it easier for you to find many URLs/IPs at once with fast time.

domhttpx is a google search engine dorker with HTTP toolkit built with python, can make it easier for you to find many URLs/IPs at once with fast time

Naufal Ardhani 58 May 13, 2022
Google Search Engine Results Pages (SERP) in locally, no API key, no signup required

Local SERP Google Search Engine Results Pages (SERP) in locally, no API key, no signup required Make sure the chromedriver and required package are in

theblackcat102 4 Jun 29, 2021
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
Crawl the information of a given keyword on Google search engine

Crawl the information of a given keyword on Google search engine

null 2 Mar 24, 2022
Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

TwitterScraper Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine . Screenshot Data Users Only

Remax Alghamdi 7 Mar 21, 2022
A sentence search engine that fetches examples from trusted news/media organisations. Great for writing better English.

A sentence search engine that fetches examples from trusted news/media websites. Great for improving writing & speaking better English.

Stephen Appiah 1 Apr 4, 2022
For AILAB: Cross Lingual Retrieval on Yelp Search Engine

Cross-lingual Information Retrieval Model for Document Search Train Phase CUDA_VISIBLE_DEVICES="0,1,2,3" \ python -m torch.distributed.launch --nproc_

Chilia Waterhouse 105 Mar 15, 2022
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Podalirius 7 May 7, 2022
A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Fabian Offert 3 Jan 16, 2022
A simple search engine that allow searching for chess games

A simple search engine that allow searching for chess games based on queries about opening names & opening moves. Built with Python 3.10 and python-chess.

Tyler Hoang 1 Dec 2, 2021
A pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine

pytest-elasticsearch What is this? This is a pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine. It

Clearcode 57 May 4, 2022
YouTube Video Search Engine For Python

YouTube-Video-Search-Engine Introduction With the increasing demand for electronic devices, it is hard for people to choose the best products from mul

null 1 Dec 21, 2021
A chess engine with basic AI capabilities (search for best move using MinMax algorithm)

A chess engine with basic AI capabilities (search for best move using MinMax algorithm)

Ken Wu 1 Feb 2, 2022
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation, and natural language understanding (NLU).

Diego 1 Mar 20, 2022
Kroomsa: A search engine for the curious

Kroomsa A search engine for the curious. It is a search algorithm designed to en

Wingify 7 Mar 5, 2022