Simple light CSV reader
This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.
Usage is pretty straightforward:
from lightcsv import LightCSV for row in LightCSV().read_file("myfile.csv"): print(row)
This will open a file named
myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either
\r\n. The file will be opened in text-mode with
You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:
from lightcsv import LightCSV with open("myfile.csv") as f: for row in LightCSV().read(f): print(row)
NOTE: Blank lines at any point in the file will be ignored
LightCSV can be parametrized during initialization to fine-tune its behaviour.
The following example shows initialization with default parameters:
from lightcsv import LightCSV myCSV_reader = LightCSV( separator=",", quote_char='"', field_names = None, strict=True, has_headers=False )
separator: character used as separator (defaults to
quote_char: character used to quote strings (defaults to
This char can be escaped by duplicating it.
field_names: can be any iterable or sequence of
str(i.e. a list of strings).
If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
strict: Sets whether the parser runs in strict mode or not.
In strict mode the parser will raise a
ValueErrorexception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
has_headers: whether the first row should be taken as column titles or not.
field_namescannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.
Data types recognized
The parser will try to match the following types are recognized in this order:
None(empty values). Unlike CSV reader, it will return
None(null) for empty values.
Empty strings (
"") are recognized correctly.
str(strings): Anything that is quoted with the
quotechar. Default quotechar is
If the string contains a quote, it must be escaped duplicating it. i.e.
"HELLO ""WORLD"""decodes to
int(integers): an integer with a preceding optional sign.
float: any float recognized by Python
datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like
date: a date in ISO format, like
time: a time in ISO format, like
If all this parsing attempts fails, a string will be returned, unless
strict_mode is set to
True. In the latter case, a
ValueError exception will be raised.
Implementing your own type recognizer
You can implement your own deserialization by subclassing
LightCSV and override the method
For example, suppose we want to recognize hexadecimal integers in the format
0xNNN.... We can implement it this way:
import re from lightcsv import LightCSV RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$') # matches 0xNNNN (hexadecimals) class CSVHexRecognizer(LightCSV): def parse_obj(self, lineno: int, chunk: str): if RE_HEXA.match(chunk): return int(chunk[2:], 16) return super().parse_obj(lineno, chunk)
As you can see, you have to override
parse_obj(). If your match fails, you have to invoke
parse_obj() method and return its result.
Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With
LightCSV you just open a filename and iterate over its rows.
None for empty cells is needed very often and can be really cumbersome as the standard
csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).