Import, connect and transform data into Excel

George Karakostas

Last update: Jan 19, 2022

Related tags

Data Analysis xlwings_query

Overview

xlwings_query

Import, connect and transform data into Excel.

Description

The concept is to apply data transformations to a main query object. When the data is ready, export it to an Excel table. It is inspired by MS Power Query based in Python. The target is to use the power of Pandas and overcome platform issues with Excel.

Sample usage

Specify the same filename as the xlsx in same folder.

"""
Transform data
"""
import xlwings_query as xwq

def main():
    """
    Main function
    """
    q = xwq.Query(__file__, 'Target sheet')
    q.excel_workbook('My source workbook.xlsx')
    # ...

main()

Methods

Query initialisation

Specify the target excel filename and the query name.

excel_workbook

Append an Excel workbook to the query.

Comments

Cross-platform compatibility

There is a platform issue with api reference in Mac:

https://github.com/Wtower/xlwings_query/blob/e25eae1a400376fdd9cc6ab169a382f26a48312b/src/xlwings_query.py#L69

https://docs.xlwings.org/en/stable/missing_features.html

Unable to reference rows count.

opened by Wtower 1
Remove context manager
Initially implemented a context manager in which to allow xlwings app to live.

It has been noticed that when xlwings create an instance and exit as recommended in

https://docs.xlwings.org/en/stable/connect_to_workbook.html

with xw.App() as app: book = app.books['Book1']

It does not allow Excel to properly exit and causes it to recover workbooks on next open.

The app will be started visible and stayed open.
opened by Wtower 1
Create project and py package

Packaging Python Projects — Python Packaging User Guide Dependencies Management in Setuptools - setuptools 60.5.4 documentation (pypa.io) Installing Python packages from local file system folder to virtualenv with pip - Stack Overflow

opened by Wtower 1

Cross-platform filenames with accents

There is an issue with filenames with accents across Mac/Win:

>>> str_win = 'Τρέχοντα υπόλοιπα προμηθευτών.xlsx' # Copied from Explorer
>>> str_mac = 'Τρέχοντα υπόλοιπα προμηθευτών.xlsx' # Copied from Finder
>>> str_win == str_mac
False

>>> str_win.encode('utf-8')
b'\xce\xa4\xcf\x81\xce\xad\xcf\x87\xce\xbf\xce\xbd\xcf\x84\xce\xb1 \xcf\x85\xcf\x80\xcf\x8c\xce\xbb\xce\xbf\xce\xb9\xcf\x80\xce\xb1 \xcf\x80\xcf\x81\xce\xbf\xce\xbc\xce\xb7\xce\xb8\xce\xb5\xcf\x85\xcf\x84\xcf\x8e\xce\xbd.xlsx'

>>> str_mac.encode('utf-8')
b'\xce\xa4\xcf\x81\xce\xb5\xcc\x81\xcf\x87\xce\xbf\xce\xbd\xcf\x84\xce\xb1 \xcf\x85\xcf\x80\xce\xbf\xcc\x81\xce\xbb\xce\xbf\xce\xb9\xcf\x80\xce\xb1 \xcf\x80\xcf\x81\xce\xbf\xce\xbc\xce\xb7\xce\xb8\xce\xb5\xcf\x85\xcf\x84\xcf\x89\xcc\x81\xce\xbd.xlsx'

>>> import unicodedata
>>> unicodedata.normalize('NFD', str_win) == str_mac
True

>>> str_win == unicodedata.normalize('NFC', str_mac)
True

>>> from pathlib import Path
>>> Path(str_win).is_file() # on Windows
True

>>> Path(str_win).is_file() # on Mac
True

>>> from pathlib import Path
>>> Path(str_mac).is_file() # on Windows
False

>>> Path(str_mac).is_file() # on Mac
True

Therefore, normalize with unicodedata.normalize('NFC', str_mac_or_win) to solve the issue.

https://nedbatchelder.com/blog/201106/filenames_with_accents.html

Remove the fuzzy matching of filenames previously attempted to solve the issue.

opened by Wtower 0

Obtain the workbook
According to the documentation

https://docs.xlwings.org/en/stable/api.html#xlwings.main.Books.open https://stackoverflow.com/questions/49883687/xlwings-book-hangs-if-file-is-already-open/49905875#49905875

xw.books.open() should return a specified xlsx file and open it if not yet open.

This does not work well for workbooks that are shared in OneDrive or live in shared folders of OneDrive. Any attempt to open, or even access the fullname property of that book throws the following error.

xlwings.XlwingsError: Couldn't find your local OneDrive file, see: xlwings.org/error

This is a false error description after many hours of investigation.

A workaround would be to check the name property for an open worksheet of the same name, ignoring the path.

This is very dangerous if another workbook of the same file is open (both cannot be open as xl forbids it).
opened by Wtower 0

Owner

George Karakostas

GitHub

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

1 Jan 19, 2022

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift

214 Jan 2, 2023

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

3.3k Jan 4, 2023

A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

9 Sep 15, 2022

Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

6 Jul 1, 2022

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

2 Dec 1, 2021

pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

182 Nov 11, 2022

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

1 Dec 3, 2021

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

3.7k Jan 3, 2023

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 1, 2022

For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project > Downloads 2. Download all documents,

4 Dec 28, 2021

A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

72 Dec 30, 2022

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

359 Dec 22, 2022

Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

1 Jan 16, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

791 Jan 4, 2023

Import, connect and transform data into Excel

Related tags

Overview

xlwings_query

Description

Sample usage

Methods

Query initialisation

excel_workbook

Comments

Cross-platform compatibility

Remove context manager

Create project and py package

Cross-platform filenames with accents

Obtain the workbook

Owner

George Karakostas

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

A tool to compare differences between dataframes and create a differences report in Excel

Transform-Invariant Non-Negative Matrix Factorization

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

pipeline for migrating lichess data into postgresql

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

For making Tagtog annotation into csv dataset

A Python module for clustering creators of social media content into networks

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Python data processing, analysis, visualization, and data operations

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code