Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Lawrence Livermore National Laboratory

Last update: Aug 19, 2022

Related tags

Data Analysis python performance hpc graphs data-analytics performance-analysis hierarchical-data trees comparative-analysis radiuss

Overview

Hatchet

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing performance data that has a hierarchy (for example, serial or parallel profiles that represent calling context trees, call graphs, nested regions’ timers, etc.). Hatchet implements various operations to analyze a single hierarchical data set or compare multiple data sets, and its API facilitates analyzing such data programmatically.

To use hatchet, install it with pip:

$ pip install llnl-hatchet

Or, if you want to develop with this repo directly, run the install script from the root directory, which will build the cython modules and add the cloned directory to your PYTHONPATH:

$ source install.sh

Documentation

See the Getting Started page for basic examples and usage. Full documentation is available in the User Guide.

Examples of performance analysis using hatchet are available here.

Contributing

Hatchet is an open source project. We welcome contributions via pull requests, and questions, feature requests, or bug reports via issues.

Authors

Many thanks go to Hatchet's contributors.

Citing Hatchet

If you are referencing Hatchet in a publication, please cite the following paper:

Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. Hatchet: Pruning the Overgrowth in Parallel Profiles. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). ACM, New York, NY, USA. DOI

License

Hatchet is distributed under the terms of the MIT license.

All contributions must be made under the MIT license. Copyrights in the Hatchet project are retained by contributors. No copyright assignment is required to contribute to Hatchet.

See LICENSE and NOTICE for details.

SPDX-License-Identifier: MIT

LLNL-CODE-741008

Comments

Has anyone tried loading the flamegraph output from hatchet into https://www.speedscope.app/?

I can generate a Flamegraph using Brendan Gregg's Flamegraph perl script as shown in the documentation for Hatchet. https://www.speedscope.app/ is a nice visualizer for Flamegraphs that is suppose to support the folded stacks format, but it does not recognize my file (the one that Flamegraph.pl reads fine). I was just wondering if anyone has successfully used https://www.speedscope.app/ with output from hatchet or not (in which case, maybe it is something specific to file file).

opened by balos1 4
Adds a GitHub Action to test PyPI Releases on a Regular Schedule
This PR is created in response to https://github.com/hatchet/hatchet/issues/443.

This PR adds a new GitHub Action that essentially performs automated regression testing on PyPI releases. It will install each considered version of Hatchet under each considered version of Python, checkout that Hatchet version's release branch, and perform the version's unit tests.

The following Hatchet versions are currently considered:

v1.2.0 (omitted, missing writers module)

v1.3.0 (omitted, missing writers module)

v1.3.1a0 (omitted, missing writers module)

2022.1.0 (omitted, missing writers module)

2022.1.1 (omitted, missing writers module)

2022.2.0 (omitted, missing writers module)

Similarly, the following versions of Python are currently considered:

2.7 (omitted, missing version in docker)

3.5 (omitted, missing version in docker)

3.6

3.7

3.8

3.9

Before merging, the following tasks must be done:

[X] ~Replace the workflow_dispatch (i.e., manual) trigger with the commented out schedule trigger in pip_unit_tester.yaml~ Superseded by a task in a later comment

[x] Change the "Install Hatchet" step to install llnl-hatchet instead of hatchet. This will be changed once the llnl-hatchet package goes live on PyPI

area-ci area-deployment priority-high status-ready-for-review type-feature
opened by ilumsden 4
Changes GitHub Action OS Image to Avoid Python Caching Issues

This PR allows us to avoid this issue with the setup-python Action used in Hatchet's CI.

To do so, it simply changes the OS image for the CI from ubuntu-latest to ubuntu-20.04. When the linked issue is resolved, we can switch back to ubuntu-latest.
area-ci priority-high status-ready-for-review type-bug

opened by ilumsden 2
Added tojson writer and from_dict and from_json readers.
Added to_dict and to_json readers to the graphframe. Added a from_json reader.

Added tests to verify that these readers and writers work in addition to a thicket generated json file to verify backwards compatibility.

Added json files for tests.

area-readers area-writers priority-high type-feature
opened by cscully-allison 1
BeautifulSoup not a dependency

In the 2022.1.0 release, running the install.sh script produces a ModuleNotFoundError for the bs4 package (BeautifulSoup). The import of this package is in hatchet/vis/static_fixer.py.

@slabasan do we want to include BeautifulSoup as a dependency of Hatchet?
area-deployment priority-normal type-question area-visualization

opened by ilumsden 1
Modifications to the Interactive CCT Visualization
Work in Progress

Added:

Object Oriented Refactor of tree code

Redesign of "collapsed" nodes

Additional legend

Menu Bar

Improved interface for mass pruning

Note: Merge after PR #26
priority-normal status-approved area-visualization
opened by cscully-allison 1
Calculates exclusive metrics from corresponding inclusive metrics
This PR adds the generate_exclusive_columns function to calculate exclusive metrics from inclusive metrics. It does this by calculating the sum of the inclusive metric for each node's children and then subtracting that from the node's inclusive metric. It will only attempt to calculate exclusive metrics in certain situations, namely:

The inclusive metric name ends in "(inc)", but there is not an exclusive metric with the same name, minus the "(inc)"

There is an inclusive metric without the "(inc)" suffix

This might not be ideal. However, Hatchet currently provides no mechanism internally for explicitly correlating exclusive and inclusive metrics. So, until such functionality is added, this PR must use some solution based on metric names to determine what to calculate. When the internal mechanism for recording inclusive and exclusive metrics is updated, this function will be updated to use that new feature.

This PR builds off of #18, so it will be marked as a Draft until that PR is merged.
area-graphframe priority-normal status-approved type-feature
opened by ilumsden 1
Preserve existing inc_metrics in update_inclusive_columns
This is a small PR to fix a bug in GraphFrame.update_inclusive_columns that causes existing values in GraphFrame.inc_columns to be dropped.

As an example, consider a GraphFrame with the following metrics:

exc_metrics: ["time"]

inc_metrics: ["foo"]

Currently, after calling update_inclusive_columns, inc_metrics will no longer contain "foo". Instead, inc_metrics will simply be ["time (inc)"].

This PR will extend inc_metrics instead of overriding. So, in the above example, inc_metrics will now be ["foo", "time (inc)"].
area-graphframe priority-normal status-ready-for-review type-bug
opened by ilumsden 1
Creates a new function that unifies a list of GraphFrames into a single GraphFrame
This PR implements a new function called unify_ensemble that takes a list of GraphFrame objects with equal graphs and returns a new GraphFrame containing the data of all the inputs. In the output data, a new DataFrame column, called dataset, is added that informs the user which GraphFrame that row came from. If the dataset attribute of the GraphFrame (explained below) is set, that value will be used for the corresponding rows in the output. Otherwise, the string "gframe_#" is used, with "#" being replaced by the index of the GraphFrame in the input list.

To help link output data to input data, this PR also adds a new dataset attribute to the GraphFrame class and a graphframe_reader decorator to help set this attribute. The dataset attribute is meant to be a string that labels the GraphFrame. For most readers, this attribute will be set automatically by the graphframe_reader decorator. This decorator is meant to be applied to from_X static methods in the GraphFrame class. This decorator does 3 things:

Runs the from_X function it decorates

If the from_X function did not set the dataset attribute and the first argument to from_X is a string, this first argument will be considered a path to the read data, and it will be used to set dataset

Returns the (potentially) modified GraphFrame produced by from_X

area-graphframe area-utils priority-normal status-ready-for-review type-feature
opened by ilumsden 1
add clean to install to remove prior build artifacts

Tiny update to the install script to remove build artifacts before rebuilding Cython modules. Especially useful when switching between major versions of Python.

opened by jonesholger 0
caliperreader: handle root nodes in _create_parent

When creating parent nodes, we need to handle the case that the parent might be a root node. Previously, the recursive _create_parent calls were being made on root nodes, and we incorrectly tried to index into the grandparent callpath tuple, even though it was empty. This ends the recursion if we encounter an empty callpath tuple.
area-readers priority-urgent status-ready-to-merge type-bug

opened by slabasan 0
Enables support for multi-indexed DataFrames in the Query Language
Summary

Currently, the Object-based dialect and String-based dialect of the Query Language cannot handle GraphFrames containing a DataFrame with a multi-index (e.g., when you have rank and thread info).

This PR adds support for that type of data to the Object-based Dialect and String-based Dialect. This support comes in the form of a new multi_index_mode argument to the ObjectQuery constructor, the StringQuery constructor, the parse_string_dialect function, and the GraphFrame.filter function. This argument can have one of three values:

"off" (default): query will be applied under the assumption that the DataFrame does not have a MultiIndex (i.e., the currently behavior of the QL)

"all": when applying a predicate to a particular node's data in the DataFrame, all rows associated with the node must satisfy the predicate

"any": when applying a predicate to a particular node's data in the DataFrame, at least one row associate with the node must satisfy the predicate

The implementation of these three modes is performed within the ObjectQuery and StringQuery classes. In these classes, the translation of predicates from dialects to the "base" syntax (represented by the Query class) will differ depending on the value of multi_index_mode. Since the implementation of this functionality is in ObjectQuery and StringQuery, the multi_index_mode arguments to parse_string_dialect and GraphFrame.filter are simply passed through to the correct class.

Finally, one important thing to note is that this functionality is ONLY implemented for new-style queries (as defined in PR #72). Old-style queries (e.g., using the QueryMatcher class) do not support this behavior.

What's Left to Do?

In short, all that's left in this PR is unit testing. I still need to implement tests in test/query.py and confirm that everything is working correctly.
area-query-lang priority-normal status-work-in-progress type-feature
opened by ilumsden 0
Refactors Query Language for Thicket
Summary

This PR refactors the Query Language (QL) to prepare it for use in Thicket, improve its overall extensibility, and make its terminology more in line with that of the QL paper.

First and foremost, the QL is no longer contained within a single file. Now, all code for the QL is contained in the new query directory. This directory contains the following files:

__init__.py: contains re-exports for everything in the QL so it can all be imported with from hatchet.query import ... (same as before)

engine.py: contains a class containing the algorithm for applying queries to GraphFrames

errors.py: contains any errors the QL may raise

query.py: contains the class representing the base QL syntax and compound queries (i.e., classes for operations like "and," "or," "xor," and "not"

object_dialect.py: contains the class representing the Object-based dialect

string_dialect.py: contains the class representing the String-based dialect

compat.py: contains various classes that ensure (deprecated) backwards compatibility with earlier versions of Hatchet

In this PR, queries are represented by one of 3 classes:

Query: represents the base syntax for the QL

StringQuery: represents the String-based dialect. This class extends Query and implements the conversion from String-based dialect to base syntax

ObjectQuery: represents the Object-based dialect. This class extends Query and implements the conversion from Object-based dialect to base syntax

Additionally, as before, there are classes to allow queries to combined via set operations. All of these classes extend the CompoundQuery class. These classes are:

ConjunctionQuery: combines the results of a set of queries through set conjunction (equivalent to logical AND)

DisjunctionQuery: combines the results of a set of queries through set disjunction (equivalent to logical OR)

ExclusiveDisjunctionQuery: combines the results of a set of queries through exclusive set disjunction (equivalent to logical XOR)

NegationQuery: inverts the results of a query (equivalent to logical NOT)

As before, these "compound queries" can easily be created from the 3 main query classes using the &, |, ^, and ~ operators.

New in this PR, the algorithm for applying queries to GraphFrames has been separated from query composition. The algorithm is now contained within the new QueryEngine class.

Finally, all the old QL classes and functions have been reimplemented to be thin wrappers around the classes mentioned above. As a result, this PR should ensure full backwards compatibility with old QL code. However, if this PR is merged, all "old-style" query code should be considered deprecated.

What's left to do

All the implementation has been completed for this PR. Additionally, all existing unit tests that do not involve query composition are passing, which validates my claims about backwards compatibility. All that's left to do before this PR can be merged is:

[x] Move the existing QL unit tests into a new file (e.g., query_compat.py)

[x] Create a new QL unit tests file for "new-style" queries

[x] Move query construction unit tests into the new file and refactor as needed

[x] Add tests (based on the old ones) to confirm that new-style queries are working as intended

area-query-lang priority-normal status-work-in-progress type-feature type-internal-cleanup
opened by ilumsden 1
Adding roundtrip auto update functionality to the CCT Visualization
The CCT visualization now supports auto-updating. If the user places a "?" in front of the input variable name passed as an argument to the visualization it will reload automatically when that variable updates anywhere in the notebook. A second argument is added for the automatic return of selection based and snapshot based queries.

Original functionality is maintained.

Example Syntax:

%cct ?gf ?queries

The data stored in queries is an dictionary comprised of two fields:

{ tree_state: <string> query describing the current state/shape of the tree, selection: <string> query describing the currently selected subtree }
opened by cscully-allison 0
Update basic tutorial on RTD

Update basic tutorial to walk through hatchet-tutorial github: https://llnl-hatchet.readthedocs.io/en/latest/basic_tutorial.html#installing-hatchet-and-tutorial-setup
area-docs priority-normal type-feature

opened by slabasan 0

Releases(v2022.2.2)

v2022.2.2(Oct 25, 2022)

This is a hotfix on the 2022.2 series. It addresses a bug fix in Hatchet's from_caliperreader().
Source code(tar.gz)
Source code(zip)
v2022.2.1(Oct 17, 2022)
This is a minor release on the 2022.2 series.

Notable Changes

updates caliper reader to convert caliper metadata values into correct Python objects

adds to_json writer and from_dict and from_json readers

adds render_header parameter to tree() to toggle the header on/off

adds the ability to match leaf nodes in the Query Language

Other Changes

exposes version module to query hatchet version from the command line

docs: update to using hatchet at llnl page

adds a GitHub Action to test PyPI releases on a regular schedule

Source code(tar.gz)
Source code(zip)
v2022.2.0(Aug 19, 2022)
Version 2022.2.0 is a major release, and resolves package install of hatchet.

Adds writers module to installed modules to resolve package install

CaliperReader bug fixes: filter records to parse, ignore function metadata field

Modify graphframe copy/deepcopy

Adds beautiful soup 4 to requirements.txt

Add new page on using hatchet on LLNL systems

Source code(tar.gz)
Source code(zip)
v2022.1.1(Jun 8, 2022)
This is a minor release on the 2022.1 series. It addresses a bug fix in Hatchet's query language and Hatchet's flamegraph output:

flamegraph: change count to be an int instead of a float

query language: fix edge cases with + wildcard/quantifier by replacing it with . followed by *

Source code(tar.gz)
Source code(zip)
v2022.1.0(Apr 28, 2022)
Version 2022.1.0 is a major release.

New features

3 new readers: TAU, SpotDB, and Caliper python reader

Query language extensions: compound queries, not query, and middle-level API

Adds GraphFrame checkpoints in HDF5 format

Interactive CCT visualization enhancements: pan and zoom, module encoding, multivariate encoding and adjustable mass pruning on large datasets

HPCToolkit: extend for GPU stream data

New color maps for terminal tree visualization

New function for calculating exclusive metrics from corresponding inclusive metrics

Changes to existing APIs

Precision parameter applied to second metric in terminal tree visualization (e.g., gf.tree(precision=3))

Deprecates from_caliper_json(), augments existing from_caliper() to accept optional cali-query parameter and cali file or just a json file

Metadata now stored on the GraphFrame

New interface for calling the Hatchet calling context tree from Roundtrip: %cct <graphframe or list>. Deprecated interface: %loadVisualization <roundtrip_path> <literal_tree>

Add recursion limit parameter to graphframe filter(rec_limit=1000), resolving recursion depth errors on large graphs

Tutorials and documentation

New tutorial material from the ECP Annual Meeting 2021

New developer and contributor guides

Added section on how to generate datasets for Hatchet and expanded

documentation on the query language

Internal updates

Extend update_inclusive_columns() for multi-indexed trees

Moves CI from Travis to GitHub Actions

Roundtrip refactor

New unit test for formatting license headers

Bugfixes

Return default_metric and metadata in filter(), squash(), copy(), and deepcopy()

flamegraph: extract name from dataframe column instead of frame

Preserve existing inc_metrics in update_inclusive_columns

Source code(tar.gz)
Source code(zip)
v1.3.1a0(Feb 7, 2022)
New features

Timemory reader

Query dataframe columns with GraphFrame.show_metric_columns()

Query nodes within a range using the call path query language

Extend readers to define their own default metric

Changes to existing APIs

Tree visualization displays 2 metrics

Literal output format: add hatchet node IDs

Parallel implementation of filter function

Caliper reader: support multiple hierarchies in JSON format

Adds multiprocessing dependency

Source code(tar.gz)
Source code(zip)
v1.3.0(Feb 7, 2022)
New features:

Interactive tree visualization in Jupyter

Add mult and division API

Update hatchet installation steps for cython integration

Readers: cprofiler, pyinstrument

Graph output formats: to_literal

Add profiling APIs to profile Hatchet APIs

Update basic tutorial for hatchet

Changes to existing APIs

Remove threshold=, color=, and unicode= from tree API

Highlighting name disabled by default in terminal tree output is kept in sync with the dataframe

Internal performance improvements to unify and HPCToolkit reader, enabling analysis of large datasets

For mathematical operations, insert nan values for missing nodes, show values as nan and inf as necessary in dataframe

Extend callpath query language to support non-dataframe metrics (e.g., depth, hatchet ID)

Literal reader: A node can be defined with a "duplicate": True field if it should be the same node (though in a different callpath). A node also needs "frame" field, which is a dict containing the node "name" and "type" (if necessary).

Source code(tar.gz)
Source code(zip)

Owner

Lawrence Livermore National Laboratory

For more than 65 years, the Lawrence Livermore National Laboratory has applied science and technology to make the world a safer place.

GitHub https://llnl-hatchet.readthedocs.io

An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

450 Dec 30, 2022

A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

9 Sep 15, 2022

Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022

Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

1 Feb 16, 2022

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

2 Dec 4, 2021

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

48 Dec 21, 2022

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

102 Nov 10, 2022

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 6, 2021

A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

0 Sep 7, 2021

Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

1.2k Dec 31, 2022

NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

3.1k Jan 5, 2023

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

2 Dec 1, 2021

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

98 Dec 31, 2022

A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

4 Sep 5, 2022

Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

1.1k Dec 28, 2022

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

5 Nov 13, 2022

Calculate multilateral price indices in Python (with Pandas and PySpark).

IndexNumCalc Calculate multilateral price indices using the GEKS-T (CCDI), Time Product Dummy (TPD), Time Dummy Hedonic (TDH), Geary-Khamis (GK) metho

3 Apr 27, 2022

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

3.3k Jan 4, 2023

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

1 Apr 12, 2022

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Related tags

Overview

Hatchet

Documentation

Contributing

Authors

Citing Hatchet

License

Comments

Note: Merge after PR #26

Summary

What's Left to Do?

Summary

What's left to do

Releases(v2022.2.2)

v2022.2.2(Oct 25, 2022)

v2022.2.1(Oct 17, 2022)

Notable Changes

Other Changes

v2022.2.0(Aug 19, 2022)

v2022.1.1(Jun 8, 2022)

v2022.1.0(Apr 28, 2022)

New features

Changes to existing APIs

Tutorials and documentation

Internal updates

Bugfixes

v1.3.1a0(Feb 7, 2022)

New features

v1.3.0(Feb 7, 2022)

New features:

Changes to existing APIs

Owner

Lawrence Livermore National Laboratory

An extension to pandas dataframes describe function.

A tool to compare differences between dataframes and create a differences report in Excel

Bearsql allows you to query pandas dataframe with sql syntax.

Useful tool for inserting DataFrames into the Excel sheet.

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

A data analysis using python and pandas to showcase trends in school performance.

Statistical package in Python based on Pandas

NumPy and Pandas interface to Big Data

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

A crude Hy handle on Pandas library

Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Calculate multilateral price indices in Python (with Pandas and PySpark).

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.