Import, connect and transform data into Excel

Overview

xlwings_query

Import, connect and transform data into Excel.

Description

The concept is to apply data transformations to a main query object. When the data is ready, export it to an Excel table. It is inspired by MS Power Query based in Python. The target is to use the power of Pandas and overcome platform issues with Excel.

Sample usage

Specify the same filename as the xlsx in same folder.

"""
Transform data
"""
import xlwings_query as xwq

def main():
    """
    Main function
    """
    q = xwq.Query(__file__, 'Target sheet')
    q.excel_workbook('My source workbook.xlsx')
    # ...

main()

Methods

Query initialisation

Specify the target excel filename and the query name.

excel_workbook

Append an Excel workbook to the query.

Comments
  • Cross-platform compatibility

    Cross-platform compatibility

    There is a platform issue with api reference in Mac:

    https://github.com/Wtower/xlwings_query/blob/e25eae1a400376fdd9cc6ab169a382f26a48312b/src/xlwings_query.py#L69

    https://docs.xlwings.org/en/stable/missing_features.html

    Unable to reference rows count.

    opened by Wtower 1
  • Remove context manager

    Remove context manager

    Initially implemented a context manager in which to allow xlwings app to live.

    It has been noticed that when xlwings create an instance and exit as recommended in

    https://docs.xlwings.org/en/stable/connect_to_workbook.html

    with xw.App() as app:
        book = app.books['Book1']
    

    It does not allow Excel to properly exit and causes it to recover workbooks on next open.

    The app will be started visible and stayed open.

    opened by Wtower 1
  • Cross-platform filenames with accents

    Cross-platform filenames with accents

    There is an issue with filenames with accents across Mac/Win:

    >>> str_win = 'Τρέχοντα υπόλοιπα προμηθευτών.xlsx' # Copied from Explorer
    >>> str_mac = 'Τρέχοντα υπόλοιπα προμηθευτών.xlsx' # Copied from Finder
    >>> str_win == str_mac
    False
    
    >>> str_win.encode('utf-8')
    b'\xce\xa4\xcf\x81\xce\xad\xcf\x87\xce\xbf\xce\xbd\xcf\x84\xce\xb1 \xcf\x85\xcf\x80\xcf\x8c\xce\xbb\xce\xbf\xce\xb9\xcf\x80\xce\xb1 \xcf\x80\xcf\x81\xce\xbf\xce\xbc\xce\xb7\xce\xb8\xce\xb5\xcf\x85\xcf\x84\xcf\x8e\xce\xbd.xlsx'
    
    >>> str_mac.encode('utf-8')
    b'\xce\xa4\xcf\x81\xce\xb5\xcc\x81\xcf\x87\xce\xbf\xce\xbd\xcf\x84\xce\xb1 \xcf\x85\xcf\x80\xce\xbf\xcc\x81\xce\xbb\xce\xbf\xce\xb9\xcf\x80\xce\xb1 \xcf\x80\xcf\x81\xce\xbf\xce\xbc\xce\xb7\xce\xb8\xce\xb5\xcf\x85\xcf\x84\xcf\x89\xcc\x81\xce\xbd.xlsx'
    
    >>> import unicodedata
    >>> unicodedata.normalize('NFD', str_win) == str_mac
    True
    
    >>> str_win == unicodedata.normalize('NFC', str_mac)
    True
    
    >>> from pathlib import Path
    >>> Path(str_win).is_file() # on Windows
    True
    
    >>> Path(str_win).is_file() # on Mac
    True
    
    >>> from pathlib import Path
    >>> Path(str_mac).is_file() # on Windows
    False
    
    >>> Path(str_mac).is_file() # on Mac
    True
    

    Therefore, normalize with unicodedata.normalize('NFC', str_mac_or_win) to solve the issue.

    https://nedbatchelder.com/blog/201106/filenames_with_accents.html

    Remove the fuzzy matching of filenames previously attempted to solve the issue.

    opened by Wtower 0
  • Obtain the workbook

    Obtain the workbook

    According to the documentation

    https://docs.xlwings.org/en/stable/api.html#xlwings.main.Books.open https://stackoverflow.com/questions/49883687/xlwings-book-hangs-if-file-is-already-open/49905875#49905875

    xw.books.open() should return a specified xlsx file and open it if not yet open.

    This does not work well for workbooks that are shared in OneDrive or live in shared folders of OneDrive. Any attempt to open, or even access the fullname property of that book throws the following error.

    xlwings.XlwingsError: Couldn't find your local OneDrive file, see: xlwings.org/error
    

    This is a false error description after many hours of investigation.

    A workaround would be to check the name property for an open worksheet of the same name, ignoring the path.

    This is very dangerous if another workbook of the same file is open (both cannot be open as xl forbids it).

    opened by Wtower 0
Owner
George Karakostas
George Karakostas
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Renato 214 Jan 2, 2023
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 4, 2023
A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

Andre Pretorius 9 Sep 15, 2022
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 1, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 1, 2021
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

Chris Carbonell 1 Dec 3, 2021
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 3, 2023
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

null 13 Nov 1, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project > Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

null 72 Dec 30, 2022
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 1, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023