LightCSV - This CSV reader is implemented in just pure Python.

Related tags

lightcsv
Overview

LightCSV

Python 3.8 Python 3.9 Code style: black

Simple light CSV reader

This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.

Usage

Usage is pretty straightforward:

from lightcsv import LightCSV

for row in LightCSV().read_file("myfile.csv"):
    print(row)

This will open a file named myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either \n or \r\n. The file will be opened in text-mode with utf-8 encoding.

You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:

from lightcsv import LightCSV

with open("myfile.csv") as f:
    for row in LightCSV().read(f):
        print(row)
NOTE: Blank lines at any point in the file will be ignored

Parameters

LightCSV can be parametrized during initialization to fine-tune its behaviour.

The following example shows initialization with default parameters:

from lightcsv import LightCSV

myCSV_reader = LightCSV(
    separator=",",
    quote_char='"',
    field_names = None,
    strict=True,
    has_headers=False
)

Available settings:

  • separator: character used as separator (defaults to ,)
  • quote_char: character used to quote strings (defaults to ").
    This char can be escaped by duplicating it.
  • field_names: can be any iterable or sequence of str (i.e. a list of strings).
    If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
  • strict: Sets whether the parser runs in strict mode or not.
    In strict mode the parser will raise a ValueError exception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
  • has_headers: whether the first row should be taken as column titles or not.
    If set, field_names cannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.

Data types recognized

The parser will try to match the following types are recognized in this order:

  • None (empty values). Unlike CSV reader, it will return None (null) for empty values.
    Empty strings ("") are recognized correctly.
  • str (strings): Anything that is quoted with the quotechar. Default quotechar is ".
    If the string contains a quote, it must be escaped duplicating it. i.e. "HELLO ""WORLD""" decodes to HELLO "WORLD" string.
  • int (integers): an integer with a preceding optional sign.
  • float: any float recognized by Python
  • datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like 2022-02-02 22:02:02
  • date: a date in ISO format, like 2022-02-02
  • time: a time in ISO format, like 22:02:02

If all this parsing attempts fails, a string will be returned, unless strict_mode is set to True. In the latter case, a ValueError exception will be raised.

Implementing your own type recognizer

You can implement your own deserialization by subclassing LightCSV and override the method parse_obj().

For example, suppose we want to recognize hexadecimal integers in the format 0xNNN.... We can implement it this way:

import re
from lightcsv import LightCSV

RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$')  # matches 0xNNNN (hexadecimals)


class CSVHexRecognizer(LightCSV):
    def parse_obj(self, lineno: int, chunk: str):
        if RE_HEXA.match(chunk):
            return int(chunk[2:], 16)
        
        return super().parse_obj(lineno, chunk)

As you can see, you have to override parse_obj(). If your match fails, you have to invoke super() (overridden) parse_obj() method and return its result.


Why

Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With LightCSV you just open a filename and iterate over its rows.

Decoding None for empty cells is needed very often and can be really cumbersome as the standard csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).

Issues
Owner
Jose Rodriguez
Computer Scientist. Software Engineer. Opinions expressed here are solely my own and not necessarily those of my employer.
Jose Rodriguez
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as well as the raw data from that table into a csv.

Joshua Wren 1 Oct 20, 2021
pydicom - Read, modify and write DICOM files with python code

pydicom is a pure Python package for working with DICOM files. It lets you read, modify and write DICOM data in an easy "pythonic" way.

DICOM in Python 1.3k Oct 17, 2021
Uncompress DEFLATE streams in pure Python

stream-inflate Uncompress DEFLATE streams in pure Python. Installation pip install stream-inflate Usage from stream_inflate import stream_inflate impo

Michal Charemza 2 Oct 19, 2021
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 497 Oct 10, 2021
Organizer is a python program that organizes your downloads folder

Organizer Organizer is a python program that organizes your downloads folder, it can run as a service and so will start along with the system, and the

Gustavo 2 Oct 18, 2021
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 486 Oct 22, 2021
Python file organizer application

Python file organizer application

Pak Maneth 1 Oct 11, 2021
gitfs is a FUSE file system that fully integrates with git - Version controlled file system

gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the remote.

Presslabs 2.2k Oct 22, 2021
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem

RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This lets you manage files in the reMarkable Cloud using the same tools you use on your local system.

Robert Schroll 54 Oct 1, 2021
Creates folders into a directory to categorize files in that directory by file extensions and move all things from sub-directories to current directory.

Categorize and Uncategorize Your Folders Table of Content TL;DR just take me to how to install. What are Extension Categorizer and Folder Dumper Insta

Furkan Baytekin 1 Oct 17, 2021
Python library and shell utilities to monitor filesystem events.

Watchdog Python API and shell utilities to monitor file system events. Works on 3.6+. If you want to use Python 2.6, you should stick with watchdog <

Yesudeep Mangalapilly 5k Oct 19, 2021
A python wrapper for libmagic

python-magic python-magic is a Python interface to the libmagic file type identification library. libmagic identifies file types by checking their hea

Adam Hupp 2k Oct 23, 2021
organize - The file management automation tool

organize - The file management automation tool

Thomas Feldmann 1.1k Oct 22, 2021
Nintendo Game Boy music assembly files parser into musicxml format

GBMusicParser Nintendo Game Boy music assembly files parser into musicxml format This python code will get an file.asm from the disassembly of a Game

null 1 Oct 16, 2021
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Benedikt Schmitt 340 Oct 22, 2021
Annotate your Python requirements.txt file with summaries of each package.

Summarize Requirements ?? ?? Annotate your Python requirements.txt file with a short summary of each package. This tool: takes a Python requirements.t

Zeke Sikelianos 6 Oct 15, 2021
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.5k Oct 17, 2021
CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api keys and etc) in advance. By scanning lines, filtering, and using AI model as option, CredSweeper reports lines with possible credentials, where the line is, and expected type of the credential as a result.

Samsung 4 Oct 22, 2021
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 1.7k Oct 22, 2021