API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.

deepset

Last update: Nov 10, 2022

Related tags

FastAPI Projects search nlp api corona question-answering faq covid-19 covid-data

Overview

This open source project serves two purposes.

Collection and evaluation of a Question Answering dataset to improve existing QA/search methods - COVID-QA
Question matching capabilities: Provide trustworthy answers to questions about COVID-19 via NLP - outdated

COVID-QA

Link to COVID-QA Dataset
Accompanying paper on OpenReview
Annotation guidelines as pdf or videos
deepset/roberta-base-squad2-covid a QA model trained on COVID-QA

Update 14th April, 2020: We are open sourcing the first batch of SQuAD style question answering annotations. Thanks to Tony Reina for managing the process and the many professional annotators who spend valuable time looking through Covid related research papers.

FAQ matching

Update 17th June, 2020: As the pandemic is thankfully slowing down and other information sources have catched up, we decided to take our hosted API and UI offline. We will keep the repository here as an inspiration for other projects and to share the COVID-QA dataset.

⚡ Problem

People have many questions about COVID-19
Answers are scattered on different websites
Finding the right answers takes a lot of time
Trustworthiness of answers is hard to judge
Many answers get outdated soon

💡 Idea

Aggregate FAQs and texts from trustworthy data sources (WHO, CDC ...)
Provide a UI where people can ask questions
Use NLP to match incoming questions of users with meaningful answers
Users can provide feedback about answers to improve the NLP model and flag outdated or wrong answers
Display most common queries without good answers to guide data collection and model improvements

⚙️ Tech

Scrapers to collect data
Elasticsearch to store texts, FAQs, embeddings
NLP Models implemented via Haystack to find answers via a) detecting similar question in FAQs b) detect answers in free texts (extractive QA)
React Frontend

Comments

Feature : Matched Question and Feedback Option
Thanks again for creating this PR. Great work!

Two comments:

Right now the answer displayed by the bot doesn't contain the "matched question". However, it might be helpful for the users to see that in order to judge if the answer is really relevant to their question. You find it in the response JSON in the field "question".

We now also have the option for user feedback (see API endpoint). So people can rate if the given answer was helpful or not and we will use the data to improve the NLP model. This could be also an helpful addition to the telegram bot.

Would be great to hear your thoughts on that and maybe address them in a separate PR.

Originally posted by @tholor in https://github.com/deepset-ai/COVID-QA/pull/58#issuecomment-602513354
opened by theapache64 11
Question : API
I've been developing a telegram bot using the API. Currently am using https://covid-middleware.deepset.ai/api/bert/question to get the answers.

curl -X POST \ https://covid-middleware.deepset.ai/api/bert/question \ -H 'content-type: application/json' \ -d '{ "question":"community spread?" }'

but the swagger doesn't list this API and shows a different one with different structures.

So, my question is, Which API should I choose to get the answers? @tanaysoni
opened by theapache64 10
Document Retrieval for extractive QA with COVID-QA

Thank you so much for sharing your data and tools! I am working with the question-answering dataset for an experiment of my own.

@Timoeller mentioned in #103 that the documents used in the annotation tool to create the COVID-QA.json dataset "are a subset of CORD-19 papers that annotators deemed related to Covid." I was wondering if these are the same documents as listed in faq_covidbert.csv.

The reason I ask is that, as a workaround I've created my own retrieval txt file(s) through extracting the answers from COVID-QA.json, but the results are hit or miss. They are particularly off if I break the file up into chunks to improve performance, for instance into a separate txt file for each answer. I'm assuming this is due to lost context. I'm wondering if I should simply be using faq_covidbert as illustrated here, even though I am using extractive-QA.

The reason I did my method is that I was trying to follow an approach most closely approximating the extractive QA tutorial.

My ultimate objective is to compare the experience of using extractive QA vs FAQ-style QA, so I presumed that it would be apropos to have a bit of separation in the doc storage dataset.

Thank you!

opened by aaronbriel 6
Integrate SIL language identification API
This PR integrates more inclusive language identification as compared to cld2/3. To this end, the SIL Language Identification API is used as the default language identification model. This API supports 1035 languages currently including many lower resourced languages, and hopefully by using this language ID COVID-QA can start leveraging a variety of data gathered in lowered resourced languages (e.g., via elasticsearch). Some sources of this info are the SIL COVID resources, the endangered languages project, and this repo.

Important points

I've integrated new environmental variables for the API key, secret and url in config.py. These can be obtained for free at developers.sil.org

I took cld2/3 out of the loop because closely related languages (e.g., russian and bulgarian) might be misidentified. cld2/3 doesn't support many languages and is thus biased in terms of the data that was used to train the model.

The SIL API returns ISO 639-3 codes only (3 letter language codes)

How to test

$ cd covid_nlp/language/ $ python detect_language.py

Other suggestions/ notes:

Right now any non-English language is routed to the FAQ-based matching with elasticsearch (I think). It would be great to upgrade this a bit to determine which languages currently have data. Then for languages without any content, we could route those to something like PanLex resources or SIL posters for low resource languages.

I would like to upgrade the SIL API to support language ID for many samples at once. This may help in dynamically determining what content we have in the datastore.

Supported languages in the API

Supported via Text Classification: biv, cub, cmo, hvn, kus, yal, pwg, myv, guo, des, leu, eip, cso, zia, kri, mca, kno, zza, maz, bps, qub, rmy, lvs, tab, nld, moa, ssg, maw, pww, sab, udm, zsm, zao, dzo, gnw, bru, kog, cwe, bim, tgo, mlh, blz, ckt, lok, smo, kpq, eng, nnq, kmr, pir, cab, tuo, bvc, xte, txu, pny, klv, jic, khm, mhy, yli, kha, dop, ojb, gvl, meq, cof, qvo, kqe, btd, bwu, nii, arb, xtn, top, lex, lob, cjp, por, ote, tmc, sun, grt, mcd, sja, naw, plw, zas, soq, khq, cek, ozm, kud, ted, bmq, pan, rjs, ktb, nhw, krj, ycn, ita, tir, prg, kpf, qup, msy, emp, ncu, qxo, ell, mzk, tim, yaz, dtb, upv, cou, noa, nhy, adh, cly, saj, fuq, rmo, gla, sim, apd, kpr, ota, kqp, gso, afr, kxc, mbt, wiu, pbb, cor, qul, gwr, twu, qve, arl, bku, alz, mto, bak, guc, lat, kgr, agm, cwt, iws, mip, ctp, khz, kyq, vie, dad, dug, yas, irk, kez, mza, nou, yue, law, kur, atg, mco, acr, lhu, myb, tik, djk, hae, tpp, yuj, mwq, rav, kzj, tuk, pbi, ffm, kmo, ybb, bgz, slk, cbv, gof, bjz, jiv, lln, xrb, cjo, qxn, prk, cot, xed, dgi, nsn, mpm, bzj, kne, cnl, bhw, gyr, akh, ntp, pls, aoz, som, tlf, xsb, eus, mfe, hak, aby, mej, myw, dsb, kru, snw, tpt, cle, nyy, tgp, agd, btt, mf1, quz, swg, sck, dyo, qvc, due, mmo, nca, oss, urt, hrv, btx, ban, pib, iri, sba, kub, lif, npl, icr, mbh, amu, sag, zpt, pss, gle, azz, hag, lzh, acu, ara, hns, zpq, mio, zty, cuc, usa, dan, miq, akb, nyo, cbi, caa, gdn, pms, mpt, wer, teo, ghs, mxt, fin, mjv, kwd, cax, zpl, ntr, ake, nog, tlj, aah, ach, mit, fij, apz, ceb, gde, gdr, mcp, cui, twb, mta, ncj, ino, men, mhi, mir, pez, quy, yre, asm, bdd, zpm, hot, zpi, kao, kyu, mvc, zpz, nzi, stp, srp, dik, guk, hat, zca, opm, aso, way, uig, krs, dig, sbl, glg, ava, avk, mkd, con, jac, mbb, heb, ces, mwv, wob, ddn, fuf, jbu, chr, kms, kwi, soy, qvn, rap, sxn, sgw, rel, ukr, gnd, bgt, thk, nob, dga, mie, orv, kyz, guh, pag, pse, tfr, cul, bhl, xsr, vag, qvw, nst, azg, muv, pad, cco, ese, gcf, pol, akp, sey, bex, vut, pam, lus, gvc, vol, stn, kdc, gym, med, wuv, gng, pui, kle, arz, myx, aak, hif, ian, sig, ign, mvp, xuo, kup, bbr, amf, zai, cya, nia, raw, nyf, ayp, czt, saq, zae, sah, kzf, swe, jam, poi, dob, hnn, mhr, okv, aze, gor, nij, aai, mkl, ron, isl, cpb, mup, nod, sus, knf, laj, nnb, tqo, bfd, cok, alj, pcm, kpw, myk, bbo, uvl, jbo, kia, kat, mux, agn, bjv, tly, mak, ixi, spp, xtd, ifu, urd, bom, bel, ruf, mhl, kek, bts, nhe, duo, mfz, otq, trs, old, bus, dbq, tcc, bba, cat, tee, cfm, bef, nwb, tca, dgz, cnk, crn, dah, chv, kwf, aom, bcl, nfr, fal, tpw, gos, crh, tnr, deu, yuw, oku, hoc, luc, rim, zar, ndy, pbc, udu, daa, miy, mog, obo, aia, knk, sgb, kbh, aoj, gaw, jvn, hsb, ljp, rnl, acc, avt, kbm, sbd, nhi, itv, yle, kbp, mzm, ame, amk, srn, ido, mqj, acm, box, xla, gag, tem, ses, boa, lmk, ker, bov, lew, bul, gbo, bmv, agu, aau, kkj, smt, ziw, ind, ter, hla, xsu, lef, qwh, zpu, xal, adj, gux, rus, ztq, kij, lgg, alp, frd, agr, miz, nin, mfq, gmv, urb, bpr, hye, boj, bua, wnu, naf, tgl, acd, sgz, lsm, yat, ton, for, fuv, wwa, tue, atq, iry, kyc, rai, pab, grn, hus, tav, lao, sda, tat, ilo, ury, nyn, lis, nkf, mtp, mxb, waj, kpz, aeu, krc, rwo, tbo, mai, avn, npy, vid, wba, mox, sne, yaa, hun, ben, mhx, viv, bav, vun, tuf, gur, cmr, cgc, sld, aon, ttc, ura, wap, dyi, gwi, ann, kue, quw, cbc, mfy, mtj, mya, mti, mgo, ppk, tac, est, ngp, pkb, zpo, cap, zab, fuh, tbc, dos, mag, mcu, xmm, cmn, mil, mww, apr, big, cdf, gvf, mda, lad, cnt, ipi, bon, kki, mqb, gum, kab, ctd, cme, ong, taj, usp, tpz, moz, ina, kvn, quf, thv, mlp, hin, sps, eka, bmr, sdm, mop, ubu, bnj, lem, gog, kbr, ahk, enb, gej, mif, uzb, ixl, dtp, yid, mnb, mpg, bss, ccp, muy, kto, avu, tos, dww, car, qvm, neb, csk, yrb, amn, jun, imo, nmz, gbi, maa, snc, lip, jpn, nak, bkv, awb, iba, mqf, tvw, xsm, cym, cuk, guu, mxv, nan, coe, mgh, msm, fra, sue, amr, rom, gai, kcg, mur, ctg, nlc, nch, yss, gfk, bos, myy, mxq, kpv, dnw, lac, shn, taq, tna, thl, kdj, spa, jav, anv, atb, cbs, ken, yam, asg, spl, zaw, gah, tnn, alt, enq, sqi, mib, yad, zyp, lit, sur, ife, ktj, ifk, nsu, abt, hne, rro, zpc, mfi, gun, hil, run, qxh, lia, dts, lee, ltz, mzw, pis, epo, ptp, tzj, chz, nim, pes, tzt, ngu, mor, pao, wmw, dsh, bwq, sny, zaa, ber, yut, keo, faa, kxm, ndz, arq, not, cko, ceg, dgk, gqr, tlb, bxr, kaq, mnf, jmc, mar, muh, inb, knj, tha, prf, mon, nnw, nhu, mfh, bcw, bre, kmd, acn, quc, hig, pah, sri, bfo, ade, bgr, rmc, cjv, auy, amh, war, guq, bud, tby, tlh, hrx, pmf, kwj, awa, mee, vmy, cpa, heh, dwr, cni, gui, kje, sas, srm, wuu, buk, lgl, xav, kyf, lwo, mal, fai, far, lww, oci, blt, pau, hto, cbr, abi, mbc, mim, rej, sml, min, yby, nno, lfn, roo, kor, yva, toc, tnk, knv, bvz, nds, gna, nhx, nuj, kjh, urk, gub, amm, nho, huu, qvz, mva, ile, grc, bao, mfk, sil, cbk, sll, snn, mcq, mek, slv, ksr, qvs, kaz, kqy, bkl, bib, tur, yml, suk, kaa, huv, krl, bmh, kze, csb, ape, ppo, ttr, ndj, hub, tte, ess, zos, nvm

Supported via rule-based methods (based on unicode blocks and writing system scripts): xsr, ind, cmo, lif, ron, ojb, nod, mww, men, jun, rus, btd, jpn, hil, arb, pol, run, mai, alt, kyu, amh, taj, war, nld, pam, ljp, bud, lus, grt, mak, sun, akb, dzo, bku, urd, bru, tgl, pag, som, kbp, arz, pan, bts, vie, sas, gag, kxm, mjv, taq, fuf, ita, chr, bul, mya, jav, atb, blt, ceb, ccp, bcl, lao, ilo, mar, oss, hnn, btx, rej, ban, lis
opened by dwhitena 6

Docker build succeeds, docker run fails with elasticsearch error

I followed the instructions here https://github.com/deepset-ai/COVID-QA/tree/master/backend. Perhaps a port is not configured correctly?

INFO:     initializing identifier
WARNING:  PUT http://localhost:9200/document [status:N/A request:0.004s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 229, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f66d3bb9b10>: Failed to establish a new connection: [Errno 111] Connection refused
WARNING:  PUT http://localhost:9200/document [status:N/A request:0.002s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 229, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f66d3bb9d50>: Failed to establish a new connection: [Errno 111] Connection refused
WARNING:  PUT http://localhost:9200/document [status:N/A request:0.002s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 229, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f66d3eb08d0>: Failed to establish a new connection: [Errno 111] Connection refused
WARNING:  PUT http://localhost:9200/document [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 229, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f66d3bb9cd0>: Failed to establish a new connection: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 229, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f66d3bb9cd0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/main.py", line 331, in main
    run(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/main.py", line 354, in run
    server.run()
  File "/usr/local/lib/python3.7/site-packages/uvicorn/main.py", line 382, in run
    loop.run_until_complete(self.serve(sockets=sockets))
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.7/site-packages/uvicorn/main.py", line 389, in serve
    config.load()
  File "/usr/local/lib/python3.7/site-packages/uvicorn/config.py", line 288, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/importer.py", line 20, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./backend/api.py", line 11, in <module>
    from backend.controller.router import router as api_router
  File "./backend/controller/router.py", line 3, in <module>
    from backend.controller import autocomplete, model, feedback
  File "./backend/controller/model.py", line 60, in <module>
    excluded_meta_data=EXCLUDE_META_DATA_FIELDS,
  File "/home/user/src/farm-haystack/haystack/database/elasticsearch.py", line 48, in __init__
    self.client.indices.create(index=index, ignore=400, body=custom_mapping)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 92, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 104, in create
    "PUT", _make_path(index), params=params, headers=headers, body=body
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 362, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 241, in perform_request
    raise ConnectionError("N/A", str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f66d3bb9cd0>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f66d3bb9cd0>: Failed to establish a new connection: [Errno 111] Connection refused)```

opened by ghost 6

Model 2 Issue

While using model 2, the API returns answer for almost everything but not in English. I belive model 2 should only return English answers.

Try Define gravity?

opened by theapache64 5
Create english evaluation dataset for question similarity
We should create a simple, evaluation dataset that can be used to benchmark our models for matching similar questions.

What should be sufficient for a rough baseline:

100-300 question pairs of similar questions

extending that with 50% false pairs

enhancement NLP / Modeling
opened by tholor 4
Add Folkhälsomyndigheten as a source with data in Swedish and English

Here's two new sources from the Swedish government agency for public health. Hopefully I can be more helpful when I get more familiar. I'm very interested in haystack and this is a great project for it. Im very interested in helping with getting this running on the Covid research dataset for healthcare professionals.

https://www.folkhalsomyndigheten.se/the-public-health-agency-of-sweden/communicable-disease-control/covid-19/

https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/fragor-och-svar/

opened by ViktorAlm 3
add BMG scraper

very nice Source from the Bundesamt fuer Gesundheit with 106 question answer pairs.

We have to decide if we want to include: https://www.zusammengegencorona.de/informieren/wirtschaftliche-folgen/ https://www.zusammengegencorona.de/informieren/weitere-informationen/ these sites answer only very specific questions or link to other sites.

opened by Runinho 3
Train BERT on Quora Question Pairs Dataset

Using Sentence Transformers (https://github.com/UKPLab/sentence-transformers) I will start training a model on the Quora Question Pairs Dataset (https://www.kaggle.com/c/quora-question-pairs) that can classify duplicate question pairs.
NLP / Modeling

opened by brandenchan 3
Fine-tune BERT on CORD-2019 dataset

Fine-tune BERT (or word embeddings?) on CORD-2019 dataset published on Kaggle:

[CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.]

The dataset has 2GB. I guess the domain is quite different from the FAQ, as the dataset is made up of scientific papers, but could still be valuable to introduce some substantial vocabulary related to the virus.
NLP / Modeling

opened by andra-pumnea 3
Where do I get the document subset of Cord-19 used for covid-qa

The paper mentions "We selected 147 scientific articles mostly related to COVID-19 from the CORD-19" . How can I get the subset of documents to create an index ?

opened by jdpsen 1
Bump scrapy from 2.0.1 to 2.6.2
Bumps scrapy from 2.0.1 to 2.6.2.

Release notes

Sourced from scrapy's releases.

2.6.2

Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

See the changelog.

2.6.1

Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

2.6.0

Security fixes for cookie handling (see details below)

Python 3.10 support

asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version

Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

See the full changelog

Security bug fixes

When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

2.5.1

Security bug fix:

If you use HttpAuthMiddleware (i.e. the http_user and http_pass spider attributes) for HTTP authentication, any request exposes your credentials to the request target.

To prevent unintended exposure of authentication credentials to unintended domains, you must now additionally set a new, additional spider attribute, http_auth_domain, and point it to the specific domain to which the authentication credentials must be sent.

If the http_auth_domain spider attribute is not set, the domain of the first request will be considered the HTTP authentication target, and authentication credentials will only be sent in requests targeting that domain.

If you need to send the same HTTP authentication credentials to multiple domains, you can use w3lib.http.basic_auth_header instead to set the value of the Authorization header of your requests.

If you really want your spider to send the same HTTP authentication credentials to any domain, set the http_auth_domain spider attribute to None.

Finally, if you are a user of scrapy-splash, know that this version of Scrapy breaks compatibility with scrapy-splash 0.7.2 and earlier. You will need to upgrade scrapy-splash to a greater version for it to continue to work.

2.5.0

Official Python 3.9 support

Experimental HTTP/2 support

New get_retry_request() function to retry requests from spider callbacks

... (truncated)

Changelog

Sourced from scrapy's changelog.

Scrapy 2.6.2 (2022-07-25)

Security bug fix:

When :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware processes a request with :reqmeta:proxy metadata, and that :reqmeta:proxy metadata includes proxy credentials, :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set.

There are third-party proxy-rotation downloader middlewares that set different :reqmeta:proxy metadata every time they process a request.

Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware and any third-party proxy-rotation downloader middleware.

These third-party proxy-rotation downloader middlewares could change the :reqmeta:proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the :reqmeta:proxy metadata, causing the credentials of one proxy to be sent to a different proxy.

To prevent the unintended leaking of proxy credentials, the behavior of :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware is now as follows when processing a request:

If the request being processed defines :reqmeta:proxy metadata that includes credentials, the Proxy-Authorization header is always updated to feature those credentials.

If the request being processed defines :reqmeta:proxy metadata without credentials, the Proxy-Authorization header is removed unless it was originally defined for the same proxy URL.

To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header.

If the request has no :reqmeta:proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed.

It is no longer possible to set a proxy URL through the :reqmeta:proxy metadata but set the credentials through the Proxy-Authorization header. Set proxy credentials through the :reqmeta:proxy metadata instead.

... (truncated)

Commits

aecbccb Bump version: 2.6.1 → 2.6.2

af7dd16 Merge pull request from GHSA-9x8m-2xpf-crp3

4205609 Fixed intersphinx references

e3e69d1 Pin documentation requirements (#5536)

54bfb96 Cover #5525 in the 2.6.2 release notes (#5535)

4ef7182 If TWISTED_REACTOR is None, reuse any pre-installed reactor (#5528)

1c1cd5d Update the 2.6.2 release notes

84c29a2 Unset the release date of still-unreleased 2.6.2 (#5503)

b9b9422 Merge pull request #5482 from alexpdev/parse_help_msg

915c288 edit

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies python
opened by dependabot[bot] 0
Bump node-sass from 4.13.1 to 7.0.0 in /covid-frontend
Bumps node-sass from 4.13.1 to 7.0.0.

Release notes

Sourced from node-sass's releases.

v7.0.0

Breaking changes

Drop support for Node 15 (@nschonni)

Set rejectUnauthorized to true by default (@scott-ut, #3149)

Features

Add support for Node 17 (@nschonni)

Dependencies

Bump eslint from 7.32.0 to 8.0.0 (@nschonni, #3191)

Bump fs-extra from 0.30.0 to 10.0.0 (@nschonni, #3102)

Bump npmlog from 4.1.2 to 5.0.0 (@nschonni, #3156)

Bump chalk from 1.1.3 to 4.1.2 (@nschonni, #3161)

Community

Remove double word "support" from documentation (@pzrq, #3159)

Misc

Bump various GitHub Actions dependencies (@nschonni)

Supported Environments

OS Architecture Node

Windows x86 & x64 12, 14, 16, 17

OSX x64 12, 14, 16, 17

Linux* x64 12, 14, 16, 17

Alpine Linux x64 12, 14, 16, 17

FreeBSD i386 amd64 12, 14

*Linux support refers to major distributions like Ubuntu, and Debian

v6.0.1

Dependencies

Remove mkdirp (@jimmywarting, #3108)

Bump meow to 9.0.0 (@ykolbin, #3125)

Bump mocha to 9.0.1 (@xzyfer, #3134)

Misc

Use default Apline version from docker-node (@nschonni, #3121)

Supported Environments

... (truncated)

Changelog

Sourced from node-sass's changelog.

v4.14.0

https://github.com/sass/node-sass/releases/tag/v4.14.0

Commits

918dcb3 Lint fix

0a21792 Set rejectUnauthorized to true by default (#3149)

e80d4af chore: Drop EOL Node 15 (#3122)

d753397 feat: Add Node 17 support (#3195)

dcf2e75 build(deps-dev): bump eslint from 7.32.0 to 8.0.0

bfa1a3c build(deps): bump actions/setup-node from 2.4.0 to 2.4.1

80d6c00 chore: Windows x86 on GitHub Actions (#3041)

566dc27 build(deps-dev): bump fs-extra from 0.30.0 to 10.0.0 (#3102)

7bb5157 build(deps): bump npmlog from 4.1.2 to 5.0.0 (#3156)

2efb38f build(deps): bump chalk from 1.1.3 to 4.1.2 (#3161)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0
Preprocessing of context to fit max_length

Hi, would you please help me understand how the preprocessing is done for theCovidQA corpus ? Why I ask is because the context in the CovidQA dataset seems to be so much larger than the maximum length set in the code (which is 300+ and BERT max_length is 512 tokens). How is the data processed to fit into the limit ? Couldn't find the code for that in the Git. Please advice. Thank you.

opened by Geethi2020 1
Applied iterator pattern to Response

Implemented the iterator pattern on classes Response and ResponseToIndividualQuestion.

This makes it easier to traverse the results collection in the Response and get access to the results in a sequential manner without any need to know its underlying representation.

opened by yasserelsaid 0