Various converters to convert value sets from CSV to JSON, etc.

Overview

ValueSet Converters

Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able to be uploaded to a FHIR server.

Set up / installation

  1. You must have Python3 installed.
  2. Run to clone repo: git clone https://github.com/HOT-Ecosystem/ValueSet-Converters.git
  3. Change directory: cd ValueSet-Converters
  4. Make & use virtual environment: virtualenv env; source env/bin/activate
  5. Run to install dependencies: pip install -r requirements.txt
  6. To use the "VSAC to OMOP/FHIR JSON" tool, which fetches from Google Sheets, you'll need the following:
    3.a. Access to this google sheet.
    3.b. Place credentials.json and token.json inside the env/ directory. These can be obtained from Joe (will upload them to a Google Drive folder later).
  7. Create an env/.env file based on env/.env.example, replacing VSAC_API_KEY with your own VSAC API key as shown in your profile. More instructions on getting an API key can be found in "Step 1" on this page.

Tools

First, cd into the directory where this repository was cloned.

1. CSV to FHIR JSON

First, convert your CSV to have column names like the example below. Then can run these commands.

Syntax

python3 -m value_set_csv_to_fhir_json path/to/FILE.csv

Example

python3 -m value_set_csv_to_fhir_json examples/1/input/n3cLikeExtensionalValueSetExample.csv

Before:

valueSet.id,valueSet.name,valueSet.description,valueSet.status,valueSet.codeSystem,valueSet.codeSystemVersion,concept.code,concept.display
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1234,mama bear
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1235,papa bear
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1236,baby bear

After:

\n\t\t\t

A family of bears.

\n\t\t
" }, "name": "bear family", "title": "bear family", "status": "draft", "description": "A family of bears.", "compose": { "include": [ { "system": "http://loinc.org", "version": 2.36, "concept": [ { "code": 1234, "display": "mama bear" }, { "code": 1235, "display": "papa bear" }, { "code": 1236, "display": "baby bear" } ] } ] } }">
{
    "resourceType": "ValueSet",
    "id": 1,
    "meta": {
        "profile": [
            "http://hl7.org/fhir/StructureDefinition/shareablevalueset"
        ]
    },
    "text": {
        "status": "generated",
        "div": "
  
\"http://www.w3.org/1999/xhtml\">\n\t\t\t

A family of bears.

\n\t\t
"
}, "name": "bear family", "title": "bear family", "status": "draft", "description": "A family of bears.", "compose": { "include": [ { "system": "http://loinc.org", "version": 2.36, "concept": [ { "code": 1234, "display": "mama bear" }, { "code": 1235, "display": "papa bear" }, { "code": 1236, "display": "baby bear" } ] } ] } }

2. VSAC to OMOP/FHIR JSON

This will fetch from the following google sheet: https://docs.google.com/spreadsheets/d/1jzGrVELQz5L4B_-DqPflPIcpBaTfJOUTrVJT5nS_j18/edit#gid=1335629675

Syntax

  • With default options: python3 -m value_set_vsac_to_json
  • Choosing an output format: python3 -m value_set_vsac_to_json -f omop

Options:

Short flag Long flag Options Default Description
-f --format ['omop', 'fhir'] 'omop' Output format.
Comments
  • generate a set of concept set datafile to manage the bulk import of concept sets to the enclave.

    generate a set of concept set datafile to manage the bulk import of concept sets to the enclave.

    Concept set can be created (edit: Joe: In the future when they have added this feature) via a bulk import process and by pass the concept set editor for importing a bulk of externally authorized concept sets. Specify detail specification to handle this process.

    Tables needed to generate (edit: Joe: Added links):

    • [x] 1. concept set container: (concept_set_container_edited table)
    • [x] 2. concept set version: (code_sets table)
    • [x] 3. concept set metadata: (concept_set_version_item_rv_edited table)

    More details will be uploaded. This is a place holder issues for Joe and Steph.

    new feature 
    opened by stephanieshong 11
  • Value set name collisions

    Value set name collisions

    Description

    (@stephanieshong Please feel free to edit this and change this text if I got anything wrong.) There are some edge cases where we have value sets with the same name. And sometimes there is a single OID which represents a grouping of those value sets. And, in a smaller subset of those edge cases, there is no such OID / value set on UMLS.

    In those cases, we should create a new grouping value set:

    • Create to be uploaded in enclave. We don't need to make in UMLS.
    • Will not have an OID
    • Members of the concept set should be all the members of all of the value sets with the same name.
    • Provenance should include the OIDs of those value sets.

    Tasks

    • [x] 1. Update our code to revert the changes we made (the appending of code systems to the value set names).
    • [ ] 2. Handle collision cases
      • [ ] 2.a. Use existing grouping sets that exist.
      • [ ] 2.b. Handle cases where there isn't an existing grouping set.

    Task 2 Options

    Option (d) seems to be the best, based on our discussion at the 2022/02/17 BIDS data meeting.

    ~a. Merge the value sets~

    I think this assumes that the value sets that would be merged are of entirely different code systems. I suppose we can work with that assumption for now, but if that isn't the case, we'll need to think about what to do. This one seems preferred.

    ~b. Give the value sets different names~

    This is what we've already done. We decided to append code system names to the value set. But it seems this is not preferred.

    ~c. Only upload 1 of the value sets~

    I this is the case, we'd need to store information about the value set collisions to some data storage layer (file or DB). Then wed need a curation process to determine which set is best.

    d. Create a new set (when grouping not available)

    1. Optional: Create new set in VSAC and get OID: https://www.nlm.nih.gov/vsac/support/authorguidelines/createvs.html (edit: We decided this doesn't need to be done)
    2. Optional: Add new OID to cset.csv (only if we do (1))
    3. Update code to programmatically create new sets in these collision cases where no grouping already exists.
    4. Upload to enclave
    5. Optional: Register IDs? (options: a, b) (only if we do (1))
    • Davera mentioned this, but I'm not sure it's necessary, as VSAC autogenerates new OIDs (step 1 above). HL7 also charges $500 to register an OID.

    e. Use previously created combination value sets / OIDs (when grouping set available)

    A "grouping set" is when a value set is defined as a "grouping" in UMLS VSAC, and its members are OIDs that point to its composite value sets.

    Apparently, for some (or perhaps all?) instances of value sets which have more than 1 instance in VSAC, there exist other value sets which have grouped these multiple instances together. I believe this is mainly to account for different code systems. So, for example, there may be a value sets for blood transfusion. There may be a set that is (i) in one code system (e.g. SNOMED) and another that uses (ii) a different code system (e.g. ICD10CM). Then, there may be a (iii) 3rd value set which combines the codes with both. In this case, what we we want to do is to use and upload (iii) to the enclave, and not upload (i) and (ii).

    We should use this option (e) as much as possible, and only use option (d) (creating a new value set) if VSAC doesn't already have such a grouped value set.

    ux issue 
    opened by joeflack4 10
  • New outputs: GRAVITY value sets

    New outputs: GRAVITY value sets

    Description

    Allow vsac_wrangler to fetch from the Lisa2 VSAC GRAVITY sheet in this GoogleSheets spreadsheet: https://docs.google.com/spreadsheets/d/1jzGrVELQz5L4B_-DqPflPIcpBaTfJOUTrVJT5nS_j18/edit#gid=1272275514

    Related

    #42

    update 
    opened by joeflack4 10
  • Get BIDS license for UMLS so we can share credentials for API access

    Get BIDS license for UMLS so we can share credentials for API access

    Oh, @DaveraGabriel, you're talking about VSAC access, not JHU OMOP access. We should move that to a different issue. I already made the mistake of confusing the two in this thread.

    Originally posted by @Sigfried in https://github.com/HOT-Ecosystem/ValueSet-Tools/issues/14#issuecomment-1011233079

    security 
    opened by Sigfried 6
  • new enclave_wrangler bug

    new enclave_wrangler bug

    I'm getting a new error while trying to run enclave_wrangler for Lisa6, which has a single oid. I put the error message in the commit comment for https://github.com/HOT-Ecosystem/ValueSet-Tools/commit/fdfe3cfc0e47a2ba0e662797857e6ce7c4e8a213

    bug 
    opened by Sigfried 5
  • Performance: Browser caching & global state

    Performance: Browser caching & global state

    Overview

    We want front-end experience to be fast, and we don't want to make repeated, unnecessary requests.

    Options

    Can do one or all of the following

    • [ ] 1. Global state for a given user session
    • [ ] 2. Browser caching (for multiple user sessions on same machine)
    ux issue urgency:2/3 ease:2/3 
    opened by joeflack4 4
  • problem with checkboxes on rows in ComparisonDataTable

    problem with checkboxes on rows in ComparisonDataTable

    Getting errors with codeset_id 419757429, which is currently included in the Example Comparison set of cset ids. Need to figure out why. This is occurring in both the develop branch and the db branch. I don't remember it being an issue before we switched to working on the db branch, so maybe the problem is arising from some change in the data rather than a problem with code. Or maybe if we go to an older commit on the development branch the problem will disappear.

    opened by Sigfried 3
  • Hierarchy

    Hierarchy

    Updates

        Backend
        - Finished off hierarchy portion of route cr-hierarchy
        - Added functions for building hierarchy to utils
        - Misc: Codestyle updates
    
    update 
    opened by joeflack4 3
  • Modify/create cset locally and upload to enclave

    Modify/create cset locally and upload to enclave

    User experience & code flow description

    Updating an existing cset

    1. Frontend: User clicks on header / button to edit concept set.
    2. Frontend: reacts to click ~a. Frontend: They get a page which shows: (i) list of concept names~ (can do this in the future) b. Frontend: A new column.
    3. Frontend: The label should say (draft) instead of (v#). The user can check/uncheck the concepts they want / don't want
    4. Frontend: They click a button to 'commit changes' a. Will need their user RID in order to set the created_by fields for concept_set_container and code_set b. What other metadata do we need (for MVP)?
    5. Backend: Hit route (new route?)
    6. Backend: Persist changes (multiple files? which files?) a. Persist by writing to prepped files? yes...and maybe use git diff and patch
    7. Backend: Push to enclave?
    8. Frontend: User sees confirmation message that commit succeeded?

    Creating a new cset

    TODO

    Concerns

    1. How to handle production server updates to termhub-csets vs local deveopment?
    • Maybe have them commit to different branches, e.g. if env variable is production, commit to main, else develop.
    new feature 
    opened by Sigfried 3
  • Need to expand concept subset in datasets.py

    Need to expand concept subset in datasets.py

    Having problems with missing concepts and missing links between concepts... can explain later.

    What I think we should do is expand the subset of concepts in prepped files to: all of the concepts appear in the concept_ancestor table where either the ancestor_concept_id or the descendant_concept_id is included in the concept_set_members table. Right

    opened by Sigfried 3
  • Datasets download updates

    Datasets download updates

    Updates

    WIP:
    - Update: remove "Unnamed: x" columns from datasets
    - Update: move jupyter transform code over to enclave_wrangler
    - Update: datasets: filter concept sets that have no container
    
    opened by joeflack4 3
  • Add check for auth token validity

    Add check for auth token validity

    Overview

    Check token validity early and often.

    Sub-tasks

    • [x] 1. If enclave_wrangler request fails, check if token is dead, and print err about that if so.
    • [x] 2. If token will expire soon (e.g. 2 weeks), print warning.
    • [ ] 3. Check on server start
    • [ ] 4. Optional: Check on schedule (e.g. daily) (harder)

    Additional details

    Here's how:

    ➜ curl -XGET https://unite.nih.gov/multipass/api/me -H "Authorization: Bearer $PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN"
    {
      "id": "6387db50-9f12-48d2-b7dc-e8e88fdf51e3",
      "username": "[email protected]",
      "attributes": {
        "multipass:organization": [
          "NIH"
        ],
        "multipass:email:primary": [
          "[email protected]"
        ],
        "multipass:organization-rid": [
          "ri.multipass..organization.73f45502-dee1-46e9-ab49-64a738b13971"
        ],
        "upn": [
          "[email protected]"
        ],
        "multipass:realm": [
          "nih-adfs"
        ],
        "multipass:realm-name": [
          "NIH Auth"
        ]
      }
    }
    
    ➜ curl -XGET https://unite.nih.gov/multipass/api/token/ttl -H "Authorization: Bearer $PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN"
    12654423
    

    That last number is time-to-live in seconds.

    I'm not sure where to run these checks. When the time is getting close, we need to ask Mariam Deacy to generate a new one for us.

    urgency:1/3 ease:3/3 
    opened by Sigfried 2
  • SAML Login

    SAML Login

    Overview

    We would like to allow users to log in using NCATS unified authentication (example), or something like it.

    We met w/ the JHU cloud infra team today, and Patrick Le mentioned to us that Hopkins is part of some federated network. It uses "shibboleth" along w/ SAML. There is a different team that can help us work through this, but this would require some programming on our end.

    urgency:1/3 ease:1/3 
    opened by joeflack4 1
  • RDBMS setup

    RDBMS setup

    Overview

    Set up RDBMS to serve data to the REST API, rather than loading datasets as global variables. Hopefully PostgreSQL. But given that the JHU hosting team cannot provide special services for that, we may consider another option.

    Subtask list

    • [x] 1. Complete local setup
    • [ ] 2. Setup documentation
    • [ ] 3. Complete setup on server, and update documentation if changes
    • [x] 4. Optimize schema: e.g. TEXT --> VARCHAR(n)
    • [x] 5. Load dataset data
    • [ ] 6. Create pipeline for loading/refreshing dataset data
    • [ ] #186
    • [ ] 8. Build functions or whatever to populate mySQL tables with data from N3C ontology API. (Could we just do dataset data once and then keep up-to-date using ontology API?)
    • [ ] #187
    urgency:2/3 
    opened by Sigfried 1
  • CRUD: (i) update: `concepts` on `concept_set_version`, (ii) create: `concept_set_version`

    CRUD: (i) update: `concepts` on `concept_set_version`, (ii) create: `concept_set_version`

    Sub-tasks

    • [ ] 1. Backend (@joeflack4)
    • [ ] 2. Frontend (@sigfried)
    • [ ] 3. Allow for other users (@amin)

    Sub-task details

    1. Backend

    • [x] enclave_wrangler functions
    • [ ] Unit tests (still need completion w/ teardowns)
    • [x] app.py routes (Joe: I think they're done)

    2. Frontend

    Pending backend.

    3. Allow for other users

    Amin needs to change permissions so that other users, using their TOKENs, I believe, can make calls.

    new feature urgency:3/3 ease:3/3 
    opened by Sigfried 5
Owner
Health Open Terminology Ecosystem
Health Open Terminology Ecosystem
csv2ir is a script to convert ir .csv files to .ir files for the flipper.

csv2ir csv2ir is a script to convert ir .csv files to .ir files for the flipper. For a repo of .ir files, please see https://github.com/logickworkshop

Alex 38 Dec 31, 2022
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 5, 2022
Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida.

Transformador de CSV para SQL Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida, e com isso insira seus dados usan

William Rodrigues 4 Oct 17, 2022
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
CSV To VCF (Multiples en un archivo)

CSV To VCF Convierte archivo CSV a Tarjeta VCF (varias en una) How to use En main.py debes reemplazar CONTACTOS.csv por tu archivo csv, y debes respet

Jorge Ivaldi 2 Jan 12, 2022
Add Ranges and page numbers to IIIF Manifest from a CSV.

Add Ranges and page numbers to IIIF Manifest from CSV specific to a workflow of the Bibliotheca Hertziana.

Raffaele Viglianti 3 Apr 28, 2022
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022
Nmap XML output to CSV and HTTP/HTTPS URLS.

xml-to-csv-url Convert NMAP's XML output to CSV file and print URL addresses for HTTP/HTTPS ports. NOTE: OS Version Parsing is not working properly ye

null 1 Dec 21, 2021
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

The Alan Turing Institute 1k Dec 19, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

null 3 Feb 9, 2022
Generates a clean .txt file of contents of a 3 lined csv file

Generates a clean .txt file of contents of a 3 lined csv file. File contents is the .gml file of some function which stores the contents of the csv as a map.

Alex Eckardt 1 Jan 9, 2022
Remove [x]_ from StudIP zip Archives and archive_filelist.csv completely

This tool removes the "[x]_" at the beginning of StudIP zip Archives. It also deletes the "archive_filelist.csv" file

Kelke vl 1 Jan 19, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

null 1 Feb 6, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

null 1 Feb 13, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 9, 2021
This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

FileBulk This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that h

Enoc Mena 1 Jun 26, 2022
A bot discord that can create directories, file, rename, move, navigate throw directories etc....

File Manager Discord What is the purpose of this program ? This program is made for a Discord bot. Its purpose is to organize the messages sent in a c

null 1 Feb 2, 2022
Vericopy - This Python script provides various usage modes for secure local file copying and hashing.

Vericopy This Python script provides various usage modes for secure local file copying and hashing. Hash data is captured and logged for paths before

null 15 Nov 5, 2022
Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

Jonathan Campbell 2 Nov 20, 2022