Convert CSV files into a SQLite database

Overview

csvs-to-sqlite

PyPI Changelog Tests License

Convert CSV files into a SQLite database. Browse and publish that SQLite database with Datasette.

Basic usage:

csvs-to-sqlite myfile.csv mydatabase.db

This will create a new SQLite database called mydatabase.db containing a single table, myfile, containing the CSV content.

You can provide multiple CSV files:

csvs-to-sqlite one.csv two.csv bundle.db

The bundle.db database will contain two tables, one and two.

This means you can use wildcards:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

If you pass a path to one or more directories, the script will recursively search those directories for CSV files and create tables for each one.

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

Handling TSV (tab-separated values)

You can use the -s option to specify a different delimiter. If you want to use a tab character you'll need to apply shell escaping like so:

csvs-to-sqlite my-file.tsv my-file.db -s $'\t'

Refactoring columns into separate lookup tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example taken from the Open Elections project)

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Installation

$ pip install csvs-to-sqlite

csvs-to-sqlite now requires Python 3. If you are running Python 2 you can install the last version to support Python 2:

$ pip install csvs-to-sqlite==0.9.2

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT            Field separator in input .csv
  -q, --quoting INTEGER           Control field quoting behavior per csv.QUOTE_*
                                  constants. Use one of QUOTE_MINIMAL (0),
                                  QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or
                                  QUOTE_NONE (3).

  --skip-errors                   Skip lines with too many fields instead of
                                  stopping the import

  --replace-tables                Replace tables if they already exist
  -t, --table TEXT                Table to use (instead of using CSV filename)
  -c, --extract-column TEXT       One or more columns to 'extract' into a
                                  separate lookup table. If you pass a simple
                                  column name that column will be replaced with
                                  integer foreign key references to a new table
                                  of that name. You can customize the name of
                                  the table like so:     state:States:state_name
                                  
                                  This will pull unique values from the 'state'
                                  column and use them to populate a new 'States'
                                  table, with an id column primary key and a
                                  state_name column containing the strings from
                                  the original column.

  -d, --date TEXT                 One or more columns to parse into ISO
                                  formatted dates

  -dt, --datetime TEXT            One or more columns to parse into ISO
                                  formatted datetimes

  -df, --datetime-format TEXT     One or more custom date format strings to try
                                  when parsing dates/datetimes

  -pk, --primary-key TEXT         One or more columns to use as the primary key
  -f, --fts TEXT                  One or more columns to use to populate a full-
                                  text index

  -i, --index TEXT                Add index on this column (or a compound index
                                  with -i col1,col2)

  --shape TEXT                    Custom shape for the DB table - format is
                                  csvcol:dbcol(TYPE),...

  --filename-column TEXT          Add a column with this name and populate with
                                  CSV file name

  --fixed-column 
   
    ...   Populate column with a fixed string
  --fixed-column-int 
    
     ...
                                  Populate column with a fixed integer
  --fixed-column-float 
     
      ...
                                  Populate column with a fixed float
  --no-index-fks                  Skip adding index to foreign key columns
                                  created using --extract-column (default is to
                                  add them)

  --no-fulltext-fks               Skip adding full-text index on values
                                  extracted using --extract-column (default is
                                  to add them)

  --just-strings                  Import all columns as text strings by default
                                  (and, if specified, still obey --shape,
                                  --date/datetime, and --datetime-format)

  --version                       Show the version and exit.
  --help                          Show this message and exit.


     
    
   
Comments
  • No create index option - please create new release

    No create index option - please create new release

    Hi Simon,

    Love this wonderful tool! Thanks a million for using it and prompting sqlite!

    The README shows an option for creating an index: https://github.com/simonw/csvs-to-sqlite/blob/master/README.md

    -i, --index TEXT Add index on this column (or a compound index with -i col1,col2)

    I don't seem to have that version in my csvs-to-sqlite:

    % csvs-to-sqlite --help
    Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME
    
      PATHS: paths to individual .csv files or to directories containing .csvs
    
      DBNAME: name of the SQLite database file to create
    
    Options:
      -s, --separator TEXT       Field separator in input .csv
      --replace-tables           Replace tables if they already exist
      -c, --extract-column TEXT  One or more columns to 'extract' into a separate
                                 lookup table. If you pass a simple column name
                                 that column will be replaced with integer foreign
                                 key references to a new table of that name. You
                                 can customize the name of the table like so:
    
                                     --extract-column state:States:state_name
    
                                 This will pull unique values from the 'state'
                                 column and use them to populate a new 'States'
                                 table, with an id column primary key and a
                                 state_name column containing the strings from the
                                 original column.
      -f, --fts TEXT             One or more columns to use to populate a full-
                                 text index
      --version                  Show the version and exit.
      --help                     Show this message and exit.
    
    % csvs-to-sqlite --version
    csvs-to-sqlite, version 0.7
    

    Is this an option in an unreleased build of csvs-to-sqlite? If so, when do you think it would be ready?

    Thanks!

    opened by jungle-boogie 7
  • Add support for loading CSVs directly from URLs

    Add support for loading CSVs directly from URLs

    Closes #4

    This adds a new parameter type to support paths or URLs on the CLI.

    Something like csvs-to-sqlite https://data.stadt-zuerich.ch/dataset/wirtschaft_preise_ziw_basis2010/resource/54c7ae15-f673-41af-a7cd-b861b8c6744e/download/ziwbasis2010apr2010apr2016.csv blah.db now works.

    opened by betatim 6
  • Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    If you feed the tool a number of CSVs there should be an option to have them all go into the same database table, optionally with a column that gets populated with the name of the CSV file:

    csvs-to-sqlite ~/openelections-data/ --table=combined_table --filename-column=source
    
    opened by simonw 5
  • Allow appending files to an existing SQLite database

    Allow appending files to an existing SQLite database

    Tool currently quits with an error if you try this.

    Should it be possible to update existing tables with new data? Not sure about this. We certainly can’t handle schema changes. We would need to be told a column to treat as a “primary key”.

    Given how cheap it is to recreate the database from scratch I’m inclined to say it’s not worth bothering with table updates.

    opened by simonw 5
  • add

    add "fixed-column" option

    For colname:value, add a column colname and populate with value.

    The use case here is that I have a snakemake workflow that produces a lot of csv files with a structure like: samples/<sample_id>/<condition_name>/this.csv I want to import them all into the same table, where sample_id and condition_name are columns. I tried to match how --filename_column works, but allow for multiple new columns to be added.

    I haven't yet added any tests, but if you think this option would be a useful addition, I will add some tests to cover the basic use case and the combination with shape.

    opened by williamrowell 4
  • Interpret all columns as TEXT data type

    Interpret all columns as TEXT data type

    Is it possible to interpret all columns as TEXT datatype (through a flag maybe?) I think the columns values are sampled and then column datatype is guessed. If there is an incompatible value in some row, then that row seems to be skipped. So instead, is it possible I load everything as TEXT data type? I just need the data in some format(for data comparison purposes), But all csv data must go into the table without skipping rows.

    Thanks Thyag

    opened by tsundara 4
  • csvs-to-sqlite has a hard requirement on package versions.

    csvs-to-sqlite has a hard requirement on package versions.

    Currently version required is hardcoded:

    https://github.com/simonw/csvs-to-sqlite/blob/dccbf65b37bc9eed50e9edb80a42f257e93edb1f/setup.py#L22-L28

    The latest Pandas version is 0.21.0, this makes csvs-to-sqlite unusable with it by default and unusable with any package that requires the latest pandas.

    Is there a reason why a specific version is enforced and not "0.20.3 or later"?

    opened by mratsim 4
  • ".\" added to table name in SQLITE database file

    Hello Simon, thanks a lot for your great tool and sharing.

    I am using it to convert a bunch of csv files in a folder. i get a .db file as output; anyhow my table names are marked with a "."chars which do not allow me to perform further sqlite queries...could you please advice me how to avoid this issue? Below my cmd input and output: C:~Documents\Profesional\2018-02-15>csvs-to-sqlite C:~Documents\Profesional\2018-02-15 output.db extract_columns=() Loaded 4 dataframes c:\users\joel0\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py:1362: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. chunksize=chunksize, dtype=dtype) Created output.db from 4 CSV files

    thanks in advance

    opened by joel0977 4
  • --fts list-of-columns option

    --fts list-of-columns option

    Creates a FTS5 table populated with the content of the specified columns: https://sqlite.org/fts5.html (falls back on FTS4 if FTS5 is not supported)

    Would make it trivial to add fulltext search to data from a CSV file.

    opened by simonw 4
  • Columns are typed

    Columns are typed "REAL" if they are integers with some NaN/blanks

    This is bad. If a column has all integers and some blanks it should result in an INTEGER.

    Example: this CSV https://github.com/openelections/openelections-data-ca/blob/master/2016/20161108__ca__general__yolo__precinct.csv produces this SQL table:

    CREATE TABLE "2016/20161108__ca__general__yolo__precinct" (
    "county" TEXT,
      "precinct" INTEGER,
      "office" INTEGER,
      "district" REAL,
      "party" REAL,
      "candidate" INTEGER,
      "votes" INTEGER
    ,
    FOREIGN KEY (county) REFERENCES county(id),
        FOREIGN KEY (party) REFERENCES party(id),
        FOREIGN KEY (precinct) REFERENCES precinct(id),
        FOREIGN KEY (office) REFERENCES office(id),
        FOREIGN KEY (candidate) REFERENCES candidate(id))
    
    opened by simonw 4
  • --shape option for specifying the

    --shape option for specifying the "shape" of the resulting table

    This option will allow you to tell the command exactly what columns should be created in the new table and what their types should be.

    For example:

    csvs-to-sqlite votes.csv votes.db --shape "county:Cty,votes:Vts(REAL)"
    

    This will produce a table with just two columns: Cty and Vts. Those columns will correspond to the county and votes columns in the original CSV.

    The Cty column will use the default type detected by pandas - but the Vts column will be forced to be a REAL column instead.

    opened by simonw 3
  • Column `filing_id` not found despite its manifest existence

    Column `filing_id` not found despite its manifest existence

    I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

    michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    Loaded 18 dataframes
    Traceback (most recent call last):
      File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
        sys.exit(cli())
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
        df.to_sql(df.table_name, conn, if_exists="append", index=False)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
        return sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
        return pandas_sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
        return table.insert(chunksize, method)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
        num_inserted = exec_insert(conn, keys, chunk_iter)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
        conn.executemany(self.insert_statement(num_rows=1), data_list)
    sqlite3.OperationalError: table ./company_filings has no column named filing_id
    
    real	1m30.052s
    user	1m12.763s
    sys	0m16.640s
    

    So the complaint is about company_filings.csv not having a filing_id. Yet:

    michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url 47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm 104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm 121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm 152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL 174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL 186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt 213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm 214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm 462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm

    
    Why is this column not being found?
    opened by MichaelTiemannOSC 1
  • low_memory=False not a valid option

    low_memory=False not a valid option

    Hi, csvs-to-sqlite suggests to use low_memory=False, but there is no way to use such option from the command line, and it is not mentioned in the documentation.

    /home/user/.local/pipx/venvs/csvs-to-sqlite/lib/python3.8/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    

    https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/csvs_to_sqlite/utils.py#L43

    opened by frafra 0
  • error_bad_lines argument has been deprecated

    error_bad_lines argument has been deprecated

    I get a deprecation warning on the latest stable.

    csvs_to_sqlite/utils.py:38: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
    
    opened by frafra 0
  • option to lower case column names, replace space with underscore, and remove reservered character from column names

    option to lower case column names, replace space with underscore, and remove reservered character from column names

    it's generally easier to if the column names of a sqlite table are lower case, have no spaces, and have no reserved characters. It would be nice if there was an option in this tool to make transform the columns names thus.

    opened by fgregg 0
  • Optionally maintain checksums of CSV files for faster updates

    Optionally maintain checksums of CSV files for faster updates

    Wanted to see if there is interest in a patch that helps speed up our workflows significantly, or if there are any further ideas for improving on such a feature. If this is out of scope for this project, I'm happy to continue maintaining my fork of this project.

    Use Case

    We currently maintain a folder of >200 CSV files with a total of a few hundred megabytes, and have a CI step that builds these CSVs into a sqlite database. These CSV files get updated 2-3 times a day, but only small changes are made to them. Currently, running csvs-to-sqlite with the --replace-tables flag takes roughly 6-7 minutes, which is too long for our use case.

    Solution

    Add a --update-tables flag that maintains a checksum hash of each CSV file in a table called .csvs-meta (happy to change this or make it configurable), and only reads the csv and loads the dataframe if the checksum has changed.

    Forked Version Here

    opened by dkaoster 0
Releases(1.3)
  • 1.3(Nov 18, 2021)

    • New options for populating a fixed column - so every inserted row will have the same column with the same value. Options are --fixed-column column-name string-value, --fixed-column-int column-name integer-value and --fixed-column-float column-name float-value. Thanks, William Rowell. #81
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Nov 3, 2020)

  • 1.1(Aug 9, 2020)

  • 1.0(Aug 3, 2019)

  • 0.9.2(Jul 3, 2019)

  • 0.9.1(Jun 24, 2019)

  • 0.9(Jan 17, 2019)

    • Support for loading CSVs directly from URLs, thanks @betatim - #38
    • New -pk/--primary-key options, closes #22
    • Create FTS index for extracted column values
    • Added --no-fulltext-fks option, closes #32
    • Now using black for code formatting
    • Bumped versions of dependencies
    Source code(tar.gz)
    Source code(zip)
  • 0.8(Apr 24, 2018)

    • -d and -df options for specifying date/datetime columns, closes #33

    • Maintain lookup tables in SQLite, refs #17

    • --index option to specify which columns to index, closes #24

    • Test confirming --shape and --filename-column and -c work together #25

    • Use usecols when loading CSV if shape specified

    • --filename-column is now compatible with --shape, closes #10

    • --no-index-fks option

      By default, csvs-to-sqlite creates an index for every foreign key column that is added using the --extract-column option.

      For large tables, this can dramatically increase the size of the resulting database file on disk. The new --no-index-fks option allows you to disable this feature to save on file size.

      Refs #24 which will allow you to explicitly list which columns SHOULD have an index created.

    • Added --filename-column option, refs #10

    • Fixes for Python 2, refs #25

    • Implemented new --shape option - refs #25

    • --table option for specifying table to write to, refs #10

    • Updated README to cover --skip-errors, refs #20

    • Add --skip-errors option (#20) [Jani Monoses]

    • Less verbosity (#19) [Jani Monoses]

      Only log extract_columns info when that option is passed.

    • Add option for field quoting behaviour (#15) [Jani Monoses]

    Source code(tar.gz)
    Source code(zip)
  • 0.7(Nov 26, 2017)

  • 0.6.1(Nov 25, 2017)

  • 0.6(Nov 24, 2017)

    SQLite full-text search support

    • Added --fts option for setting up SQLite full-text search.

      The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

      https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

      Usage:

      csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
      

      Example generated with this option: https://sf-trees-search.now.sh/

      Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

      Will be used in https://github.com/simonw/datasette/issues/131

    • --fts and --extract-column now cooperate.

      If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

      Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq

      csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
          -c qLegalStatus -c qSpecies -c qSiteInfo \
          -c PlantType -c qCaretaker -c qCareAssistant \
          -f qLegalStatus -f qSpecies -f qAddress \
          -f qSiteInfo -f PlantType -f qCaretaker \
          -f qCareAssistant -f PermitNotes
      

      Closes #9

    • Handle column names with spaces in them.

    • Added csvs-to-sqlite --version option.

      Using http://click.pocoo.org/5/api/#click.version_option

    Source code(tar.gz)
    Source code(zip)
  • 0.5(Nov 19, 2017)

    Now handles columns with integers and nulls in correctly

    Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

    http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

    To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.

    See #5 - also a8ab524 and 0997b7b

    Source code(tar.gz)
    Source code(zip)
  • 0.3(Nov 17, 2017)

    • Mechanism for converting columns into separate tables

      Let's say you have a CSV file that looks like this:

      county,precinct,office,district,party,candidate,votes
      Clark,1,President,,REP,John R. Kasich,5
      Clark,2,President,,REP,John R. Kasich,0
      Clark,3,President,,REP,John R. Kasich,7
      

      (Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )

      You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

      csvs-to-sqlite openelections-data-*/*.csv \
          -c county:County:name \
          -c precinct:Precinct:name \
          -c office -c district -c party -c candidate \
          openelections.db
      

      The format is as follows:

      column_name:optional_table_name:optional_table_value_column_name
      

      If you just specify the column name e.g. -c office, the following table will be created:

      CREATE TABLE "party" (
          "id" INTEGER PRIMARY KEY,
          "value" TEXT
      );
      

      If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

      CREATE TABLE "Precinct" (
          "id" INTEGER PRIMARY KEY,
          "name" TEXT
      );
      

      The original tables will be created like this:

      CREATE TABLE "ca__primary__san_francisco__precinct" (
          "county" INTEGER,
          "precinct" INTEGER,
          "office" INTEGER,
          "district" INTEGER,
          "party" INTEGER,
          "candidate" INTEGER,
          "votes" INTEGER,
          FOREIGN KEY (county) REFERENCES County(id),
          FOREIGN KEY (party) REFERENCES party(id),
          FOREIGN KEY (precinct) REFERENCES Precinct(id),
          FOREIGN KEY (office) REFERENCES office(id),
          FOREIGN KEY (candidate) REFERENCES candidate(id)
      );
      

      They will be populated with IDs that reference the new derived tables.

      Closes #2

    Source code(tar.gz)
    Source code(zip)
Owner
Simon Willison
Simon Willison
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

null 2 Oct 15, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

null 1 Feb 13, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 2, 2023
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

The Alan Turing Institute 1k Dec 19, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

null 1 Feb 6, 2022
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

null 8 Dec 6, 2022
Python virtual filesystem for SQLite to read from and write to S3

Python virtual filesystem for SQLite to read from and write to S3

Department for International Trade 70 Jan 4, 2023
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
Convert All TXT Files To One File.

AllToOne Convert All TXT Files To One File. Hi ?? , I'm Alireza A Python Developer Boy ?? I’m currently working on my C# projects ?? I’m currently Lea

null 4 Jun 7, 2022
Creates folders into a directory to categorize files in that directory by file extensions and move all things from sub-directories to current directory.

Categorize and Uncategorize Your Folders Table of Content TL;DR just take me to how to install. What are Extension Categorizer and Folder Dumper Insta

Furkan Baytekin 1 Oct 17, 2021
Nintendo Game Boy music assembly files parser into musicxml format

GBMusicParser Nintendo Game Boy music assembly files parser into musicxml format This python code will get an file.asm from the disassembly of a Game

null 1 Dec 11, 2021
Python script for converting figma produced SVG files into C++ JUCE framework source code

AutoJucer Python script for converting figma produced SVG files into C++ JUCE framework source code Watch the tutorial here! Getting Started Make some

SuperConductor 1 Nov 26, 2021
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
🧹 Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

?? Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

Nep 2 Feb 7, 2022
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 5, 2022
Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida.

Transformador de CSV para SQL Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida, e com isso insira seus dados usan

William Rodrigues 4 Oct 17, 2022
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
CSV To VCF (Multiples en un archivo)

CSV To VCF Convierte archivo CSV a Tarjeta VCF (varias en una) How to use En main.py debes reemplazar CONTACTOS.csv por tu archivo csv, y debes respet

Jorge Ivaldi 2 Jan 12, 2022
Add Ranges and page numbers to IIIF Manifest from a CSV.

Add Ranges and page numbers to IIIF Manifest from CSV specific to a workflow of the Bibliotheca Hertziana.

Raffaele Viglianti 3 Apr 28, 2022