Wikidata scholarly profiles

Overview

Scholia


Scholia is a python package and webapp for interaction with scholarly information in Wikidata.

Webapp

As a webapp, it currently runs from Wikimedia Toolforge, a facility provided by the Wikimedia Foundation. It is accessible from

https://scholia.toolforge.org/

The webapp displays scholarly profiles for individual researchers, research topics, organizations, journals, works, events, awards and so on. For instance, the scholarly profile for psychologist Uta Frith is accessible from

https://scholia.toolforge.org/author/Q8219

The information displayed on the page is only what is available in Wikidata.

Script

It is possible to use methods of the scholia package as a script:

$ python -m scholia.query twitter-to-q fnielsen
Q20980928

Contributing

A simple way to get up and running is to launch Scholia via Gitpod, which installs the dependencies listed in requirements.txt automatically and launches the web app via runserver.py.

See file CONTRIBUTING.rst for technical details on how to improve Scholia.

References

Comments
  • Creating a configuration file in Scholia

    Creating a configuration file in Scholia

    Creating a configuration file in Scholia to make the data sources (SPARQL EndPoint) configurable. This is very important to allow caching servers in an easy and transparent way. An added benefit is the fact that we can now support new data sources (SPARQL end-points) in an easy way.

    I would appreciate comments, especially on the variable names of the config file and in the way they were set (3 servers).

    opened by nunogit 27
  • Simplify running Scholia locally

    Simplify running Scholia locally

    This PR does the following to make it easier to develop Scholia locally:

    1. Adds documentation in the README on how to install Scholia locally
    2. Adds an entrypoint in the setup.py file so you can run scholia from the shell instead of python -m scholia
    3. Adds a run command to the CLI that mimics runserver.py
    opened by cthoyt 19
  • Better or fix issue template: bug report and feature request is not shown

    Better or fix issue template: bug report and feature request is not shown

    #1603 introduced an issue template. In the directory https://github.com/WDscholia/scholia/tree/master/.github/ISSUE_TEMPLATE there are three files, but on the GitHub issue page there is only shown the question line while neither but report nor feature requests are not shown.

    I would have expected that a but report and a feature request option was available on https://github.com/WDscholia/scholia/issues/new/choose

    bug documentation 
    opened by fnielsen 16
  • On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it

    On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it

    e.g. number of triples in Wikidata

    SELECT (count(*) as ?counts) WHERE {
      ?s ?p ?o .
      }
    

    and some WikiCite-focused ones, e.g. as per this list

    or some version of http://wikicite.org/statistics.html .

    LandingPage stats P50-author P2860-cites P496-ORCID P2093-author-name-string P225-taxon-name P356-DOI P921-main-subject P932-PMCID P625-geolocation P108-employer P1416-affiliation P166-award-received 
    opened by Daniel-Mietchen 14
  • check and externalize explicit sparql queries #1284

    check and externalize explicit sparql queries #1284

    related #1283, #1282, #785

    Externalize explicit sparql queries

    Converting the explicit queries (tables) first to externalized query format. For example, files who have the following queries:

    somePanelDescriptionSparql = `
    SELECT ...
    ...
    `
    

    This PR solves partially the issue #1284.

    The following files still have explicit queries:

    • [x] 404_chemical.html
    • [x] author.html
    • [x] author-index-curation.html
    • [x] authors.html
    • [x] award_curation.html
    • [x] chemical-index-curation.html
    • [x] chemical-index.html
    • [x] lexeme_empty.html
    • [x] pathway_empty.html
    • [x] pathway.html
    • [x] property.html
    • [x] software_empty.html
    • [x] topic_curation.html
    • [x] use_empty.html
    • [x] use.html
    • [x] venue_curation.html
    • [x] venues.html
    • [x] work_cito.html
    • [x] work_empty.html
    • [x] works.html

    Some annotations

    • Files with empty will use the aspect as index but the name of html file will be the same for the moment. For example, use_empty.html have an aspect named use-index. The name of these html will be change in other PR
    • Externalization for author.html, pathway.html and 404_chemical.html required the creation of new macros and custom JS functions for extracting queries to external .sparql files
    opened by curibe 13
  • Chemical classes are special and the regular chemical aspect does not…

    Chemical classes are special and the regular chemical aspect does not…

    … work well. The new page looks like (except that "Related compound" in the screenshot is "Example compounds", capped at 500).

    image

    @fnielsen, please do take note of the change regarding figuring out what feature to show... for this patch I had to change the logic: if something cannot be types (P31), then it needs to determine the things the item is subclass of (more expensive) which is run only when no suitable aspect was found...

    enhancement 
    opened by egonw 13
  • Add CiTO panels to work/venue aspect (using ask query)

    Add CiTO panels to work/venue aspect (using ask query)

    Close #1610, also start using ask queries (#617) and add a way to hide panels (close #741)

    Description

    Please include a summary of the change, relevant motivation and context. If possible and applicable, include before and after screenshots and a URL where the changes can be seen.

    On a venue (Q4775205) without CiTO image

    On a venue (Q6294930) with CiTO

    image image

    On a work with CiTO the highlight panel is added

    image

    1. Created a macro and a JS function to handle "ask" queries
      1. As we need to pass the panels which should be loaded in the success case, I made use of a call function, which calls the macro with another macro:
    {% call ask_query_callback('cito') %}
      {{ sparql_to_iframe('articles-by-intention') }}
    
      {{ sparql_to_iframe('incoming-bubble') }}
    
      {{ sparql_to_table('incoming',
        options={
          "linkPrefixes": { "intention": "../../cito/" }
        }
      ) }}
    
      {{ sparql_to_iframe('outgoing-bubble') }}
    
      {{ sparql_to_table('outgoing',
        options={
          "linkPrefixes": { "intention": "../../cito/" }
        }
      ) }}
    
      {{ sparql_to_table('most-reused-articles',
        options={
          "linkPrefixes": { "citedArticle": "../../work/" },
          "linkSuffixes": { "citedArticle": "/cito" },
        }
      ) }}
    {% endcall %}
    

    The macro takes the panel parameter 'cito' and passes that to askQuery as shown below. The body (all of the {{ sparql_to..}} statements) are passed in the callback function with {{ caller() }}

    {% macro ask_query_callback(panel) -%}
    // {{ panel }} ask query
    askQuery("{{ panel }}", `# tool: scholia
    {% include 'ask_' + aspect + '_' + panel + '.sparql' %}`, 
    () => {
        {{ caller() }};
    });
    {%- endmacro %}
    

    The JS function is generic, and takes the ask query (that jinja includes from a file), a panel name (which is used to show/hide the panels) and the callback function (which is the result of the sparql_to_iframe, sparql_to_table macros)

    function askQuery(panel, askQuery, callback) {
         var endpointUrl = 'https://query.wikidata.org/sparql';
         
         settings = {
           headers: { Accept: 'application/sparql-results+json' },
           data: { query: askQuery },
         };
    
         $.ajax(endpointUrl, settings).then((data) => {
            if (data.boolean) {
                // unhide panels
                document.getElementById(panel).classList.remove("d-none");
                callback();
            } else {
                // hide from table of contents
                var headings = document.querySelectorAll("#" + panel + " h2, #" + panel + " h3");
                for (var elem of headings) {
                    document.querySelector("li a[href='#" + elem.id + "']").parentElement.classList.add("d-none")
                }
            }
        });
    }
    
    1. Move the venue/work CiTO panels to the venue/work page
    2. Remove the /cito route and page

    Caveats

    Please list anything which has been left out of this PR or which should be considered before this PR is accepted Check any of the following which apply:

    • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
      • Removes the /cito route
    • [x] This change requires a documentation update
      • [ ] I have made corresponding changes to the documentation
      • I've documented above, but not sure if there is a better place to note this behaviour
    • [ ] This change requires new dependencies (please list)

    if you make changes to the python code

    • [ ] my code passes tox check, you can receive warnings about tests, documentation or both

    Testing

    Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

    • Checked that the panels showed with the correct information on a venue with CiTO (Q6294930)
    • Checked that the requests aren't performed and the panels are hidden on a venue without CiTO (Q4775205)
    • Checked a work with CiTO (Q21090124) and without (Q21090025)

    Checklist

    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [x] My changes generate no new warnings
    • [x] I have not used code from external sources without attribution
    • [x] I have considered accessibility in my implementation
    • [x] There are no remaining debug statements (print, console.log, ...)
    ready for merge 
    opened by carlinmack 11
  • Externalize SPARQL queries into templates

    Externalize SPARQL queries into templates

    Reference #791 (original issue) and references #906 (concrete solution proposal)

    I haven't marked this PR as closing either issue because there are still many other HTML templates that would need to be modified, so consider this PR as a pilot that can be easily followed by one or more PRs to address the remaining HTML templates.

    Changes

    • Move SPARQL strings in HTML templates into dedicated SPARQL templates
    • Update README with explanation for new contributors on how to use templating
    opened by cthoyt 11
  • Adds a chemistry/missing page for curation

    Adds a chemistry/missing page for curation

    • with column on the /chemical/ frontpage
    • adds a few more example chemical structures

    I expect more patches later for more physchem properties from @lahire.

    aspects ready for merge 
    opened by egonw 11
  • Display ORCID iDs

    Display ORCID iDs

    In the "Prolific authors" section, please display the subject's ORCID iD (P496), if available.

    If not available, please include a "search ORCID" link, formatted like https://orcid.org/orcid-search/quick-search?searchQuery=Andy+Mabbett, so that volunteers can more easily find the iDs and add them to Wikidata

    aspects JavaScript SPARQL examples P496-ORCID 
    opened by pigsonthewing 11
  • Add Bioschemas by proxying Wikidata content (making Google bots happy)

    Add Bioschemas by proxying Wikidata content (making Google bots happy)

    @fnielsen, this is now a finished patch, but if you have additional ideas, plz let me know (if not, please do merge in).

    The new design solve the problem of the robots.txt blocking calls and limiting the SEO indexing of Scholia pages:

    • a Scholia proxy is defined with the URL pattern /$qid/bioschemas which returns JSON
    • the existing base.html uses this new call instead of a call to wikidata.org (with the robots.txt problem)

    The extra call is only made when the aspect template has an id=bioschemas holder.

    Possibly future optimization:

    • [x] property_for_q() calls are replaced by a single properties_for_q(q, {"P235": "key1", "Pxx": "key2", ...})
    • [x] 2-3 helper functions get added to simplify the code
    • [x] other bits of the page get included in a similar way (like descriptions), making it also available for SEO
    • [x] use the Wikidata description as Bioschemas content
    ready for review 
    opened by egonw 10
  • Get user data fails for Google Scholar

    Get user data fails for Google Scholar

    Describe the bug Get user data fails for Google Scholar To Reproduce Steps to reproduce the behavior: python -m py.test --doctest-modules scholia/googlescholar.py

    or

    python -m scholia.googlescholar get-user-data 9cagBQYAAAAJ

    Expected behavior No error. Data should be returned.

    Additional context This also fails with tox.

    bug 
    opened by fnielsen 2
  • Vejhistorie OJS journal is not scraped correctly

    Vejhistorie OJS journal is not scraped correctly

    Describe the bug Vejhistorie OJS journal is not scraped correctly

    To Reproduce Steps to reproduce the behavior:

    $ python -m scholia.scrape.ojs issue-url-to-quickstatements https://tidsskrift.dk/vejhistorie/issue/view/9914
    CREATE
    LAST	P31	Q13442814
    LAST	P856	"https://tidsskrift.dk/vejhistorie/article/view/135395"
    
    

    Expected behavior Output of more metadata

    Additional context There does not seem to be meta tags in the HTML for this issue.

    bug OJS 
    opened by fnielsen 0
  • New property for crystal structures, new statistics

    New property for crystal structures, new statistics

    Description

    Small patch: when calculating the number of crystal structures, compounds with a CSD Refcode can be counted too. This property was accepted this week.

    Caveats

    Potentially, a more complex makes the query not run fast enough, but this does not seem to be the case (not noticeable ).

    Testing

    Visit https://scholia.toolforge.org/chemical/ (before/after)

    Checklist

    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [x] My changes generate no new warnings
    • [ x I have not used code from external sources without attribution
    • [x] I have considered accessibility in my implementation
    • [x] There are no remaining debug statements (print, console.log, ...)
    opened by egonw 0
  • Panel for the author curation page to list articles that are not used as reference for any statement

    Panel for the author curation page to list articles that are not used as reference for any statement

    Fixes #2213

    Description

    This patch adds a panel to an author curation page, listing works for that author that do not support any statements.

    Caveats

    No caveats I can foresee.

    Testing

    Test is with any author with multiple articles. The output should look something like this:

    image

    Checklist

    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [x] My changes generate no new warnings
    • [x] I have not used code from external sources without attribution
    • [ ] I have considered accessibility in my implementation
    • [x] There are no remaining debug statements (print, console.log, ...)
    opened by egonw 8
  • only 0.67% of the articles in Wikidata are used to support statements

    only 0.67% of the articles in Wikidata are used to support statements

    On the Telegram channel for Wikidata, they looked into how many articles are actually used as reference to support a claim. That turned out to be 263,247. Or, 0.67%.

    So, another curation people can do around an author (e.g. by an author) is to use their articles as "citation" in statements.

    data-quality 
    opened by egonw 0
  • Fixing an issue and implements a feature request around linking versions of papers

    Fixing an issue and implements a feature request around linking versions of papers

    Fixes #1597 and fixes #1886

    Description

    The first (oldest) patch fixes the problem reported in both bug reports. The second (newer) patch implements the suggestion reported in the https://github.com/WDscholia/scholia/issues/1597#issuecomment-898766620 comment (and the similar for retractions).

    Caveats

    There are no code changes.

    Testing

    Suggested to test the following pages with various situations, which before/after sometimes differs, demonstrating what is fixed:

    • https://scholia.toolforge.org/work/Q24613508
    • https://scholia.toolforge.org/work/Q24564615
    • https://scholia.toolforge.org/work/Q114679534
    • https://scholia.toolforge.org/work/Q102319086
    • https://scholia.toolforge.org/work/Q102092244

    Checklist

    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [x] My changes generate no new warnings
    • [x] I have not used code from external sources without attribution
    • [ ] I have considered accessibility in my implementation
    • [x] There are no remaining debug statements (print, console.log, ...)
    opened by egonw 0
Releases(v0.3)
Owner
Finn Årup Nielsen
Data science. Data and text mining, neuroinformatics, social media, wiki.
Finn Årup Nielsen
A machine learning software for extracting information from scholarly documents

GROBID GROBID documentation Visit the GROBID documentation for more detailed information. Summary GROBID (or Grobid, but not GroBid nor GroBiD) means

Patrice Lopez 1.9k Jan 8, 2023
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022
Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

Dennis Priskorn 9 Nov 17, 2022
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 3, 2023
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 4, 2023
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Profil3r is an OSINT tool that allows you to find potential profiles of a person on social networks, as well as their email addresses 🕵️

Profil3r is an OSINT tool that allows you to find potential profiles of a person on social networks, as well as their email addresses. This program also alerts you to the presence of a data leak for the found emails.

null 1.1k Aug 24, 2021
NExfil is an OSINT tool written in python for finding profiles by username.

NExfil is an OSINT tool written in python for finding profiles by username. The provided usernames are checked on over 350 websites within few seconds.

thewhiteh4t 1.4k Jan 1, 2023
Output provisioning profiles in a diffable way

normalize-profile This tool reads Apple's provisioning profile files and produces reproducible output perfect for diffing. You can easily integrate th

Keith Smiley 8 Oct 18, 2022
Badge-Link-Creater 'For more beautiful profiles.'

Badge-Link-Creater 'For more beautiful profiles.' Ready Badges Prepares the codes of the previously prepared badges for you. Note Click here for more

Mücahit Gündüz 9 Oct 19, 2022
trackbranch is a tool for developers that can be used to store collections of branches in the form of profiles.

trackbranch trackbranch is a tool for developers that can be used to store collections of branches in the form of profiles. This can be useful for sit

Kevin Morris 1 Oct 21, 2021
A simple python-function, to gain all wlan passwords from stored wlan-profiles on a computer.

Wlan Fetcher Windows10 Description A simple python-function, to gain all wlan passwords from stored wlan-profiles on a computer. Usage This Script onl

null 2 Nov 20, 2021
Automatically mass follows tons of NameMC profiles.

Automatically mass follows tons of NameMC profiles. (Creates REAL traffic to your profile)

Jam 3 Jun 29, 2022
Account Profiles Dumper for Fortnite.

Fortnite Profile Dumper This program allows you to dump your Fortnite account profiles. How to use it? After starting the FortniteProfileDumper.py, yo

PRO100KatYT 12 Jul 28, 2022
Collect links to profiles by username through search engines

Marple Summary Collect links to profiles by username through search engines (currently Google and DuckDuckGo). Quick Start ./marple.py soxoj Results:

null 125 Dec 19, 2022
My telegram bot to download Instagram Profiles

Instagram Profile Get for Telegram My telegram bot to download Instagram Profiles First you have to get a telegrm bot api key from @BotFather Then you

Ali Yoonesi 2 Sep 22, 2022
Just imagine normal bancho, but you can have multiple profiles and funorange speed up maps ranked

Local osu! server Just imagine normal bancho, but you can have multiple profiles and funorange speed up maps ranked (coming soon)! Windows Setup Insta

Cover 25 Nov 15, 2022
Fortnite Dumper for anyone's Save the World profiles.

Anyone's Fortnite Save the World Profile Dumper This program allows you to dump anyone's Fortnite Save the World Profiles. How to use it? After starti

PRO100KatYT 6 Apr 13, 2022
A discord bot for checking what linked profiles a user has to their Ubisoft account

ubisoft_discord_profiles A Discord bot for checking what linked profiles a user has to their Ubisoft account. This can be setup using an enviromental

Andrei 1 Dec 17, 2021
Yesitsme - Simple OSINT script to find Instagram profiles by name and e-mail/phone

Simple OSINT script to find Instagram profiles by name and e-mail/phone

null 108 Jan 7, 2023