Always know what to expect from your data.

Overview

Build Status Coverage Documentation Status DOI

Great Expectations

Always know what to expect from your data.

Introduction

Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.

Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams.

See Down with Pipeline Debt! for an introduction to the philosophy of pipeline testing.

Key features

Expectations

Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues, including:

  • expect_column_values_to_not_be_null
  • expect_column_values_to_match_regex
  • expect_column_values_to_be_unique
  • expect_column_values_to_match_strftime_format
  • expect_table_row_count_to_be_between
  • expect_column_median_to_be_between
  • ...and many more

Expectations are declarative, flexible and extensible.

Batteries-included data validation

Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails?

Great Expectations supports all of these use cases out of the box. Instead of building these components for yourself over weeks or months, you will be able to add production-ready validation to your pipeline in a day. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing name spaces, and is designed for extensibility.

ooooo ahhhh

Tests are docs and docs are tests

! This feature is in beta

Many data teams struggle to maintain up-to-date data documentation. Great Expectations solves this problem by rendering Expectations directly into clean, human-readable documentation.

Since docs are rendered from tests, and tests are run against new data as it arrives, your documentation is guaranteed to never go stale. Additional renderers allow Great Expectations to generate other type of "documentation", including slack notifications, data dictionaries, customized notebooks, etc.

Your tests are your docs and your docs are your tests

Automated data profiling

- This feature is experimental

Wouldn't it be great if your tests could write themselves? Run your data through one of Great Expectations' data profilers and it will automatically generate Expectations and data documentation. Profiling provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.

ooooo ahhhh

Automated profiling doesn't replace domain expertise—you will almost certainly tune and augment your auto-generated Expectations over time—but it's a great way to jump start the process of capturing and sharing domain knowledge across your team.

Pluggable and extensible

Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc. This design choice gives a lot of creative freedom to developers working with Great Expectations.

Recent extensions include:

We're very excited to see what other plugins the data community comes up with!

Quick start

To see Great Expectations in action on your own data:

You can install it using pip

pip install great_expectations

or conda

conda install -c conda-forge great-expectations

and then run

great_expectations init

(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources, which will teach you how to get up and running in minutes.)

For full documentation, visit Great Expectations on readthedocs.io.

If you need help, hop into our Slack channel—there are always contributors and other users there.

Integrations

Great Expectations works with the tools and systems that you're already using with your data, including:

Integration Notes
Pandas Great for in-memory machine learning pipelines!
Spark Good for really big data.
Postgres Leading open source database
BigQuery Google serverless massive-scale SQL analytics platform
Databricks Managed Spark Analytics Platform
MySQL Leading open source database
AWS Redshift Cloud-based data warehouse
AWS S3 Cloud based blob storage
Snowflake Cloud-based data warehouse
Apache Airflow An open source orchestration engine
Other SQL Relational DBs Most RDBMS are supported via SQLalchemy
Jupyter Notebooks The best way to build Expectations
Slack Get automatic data quality notifications!

What does Great Expectations not do?

Great Expectations is not a pipeline execution framework.

We aim to integrate seamlessly with DAG execution tools like Spark, Airflow, dbt, prefect, dagster, Kedro, Flyte, etc. We DON'T execute your pipelines for you.

Great Expectations is not a data versioning tool.

Great Expectations does not store data itself. Instead, it deals in metadata about data: Expectations, validation results, etc. If you want to bring your data itself under version control, check out tools like: DVC and Quilt.

Great Expectations currently works best in a python/bash environment.

Following the philosophy of "take the compute to the data," Great Expectations currently supports native execution of Expectations in three environments: pandas, SQL (through the SQLAlchemy core), and Spark. That said, all orchestration in Great Expectations is python-based. You can invoke it from the command line without using a python programming environment, but if you're working in another ecosystem, other tools might be a better choice. If you're running in a pure R environment, you might consider assertR as an alternative. Within the Tensorflow ecosystem, TFDV fulfills a similar function as Great Expectations.

Who maintains Great Expectations?

Great Expectations is under active development by James Campbell, Abe Gong, Eugene Mandel, Rob Lim, Taylor Miller, with help from many others.

What's the best way to get in touch with the Great Expectations team?

If you have questions, comments, or just want to have a good old-fashioned chat about data pipelines, please hop on our public Slack channel

If you'd like hands-on assistance setting up Great Expectations, establishing a healthy practice of data testing, or adding functionality to Great Expectations, please see options for consulting help here.

Can I contribute to the library?

Absolutely. Yes, please. Start here and please don't be shy with questions.

Comments
  • Fix compatibility with MS SQL Server.

    Fix compatibility with MS SQL Server.

    Following fixes are implemented:

    • Temporary table names are now prefixed with '#' as required by MS SQL dialect
    • Temporary table names only persist within a connection so engine is overridden with a connection (the same happens for SQLite)
    • Create temporary table using "select .. into {table_name} from ..
    • head() now uses "top" instead of "limit" for MS SQL
    • get_column_quantiles(): add an empty over() clause as required for MS SQL
    • get_column_stdev(): use "stdevp" function for MS SQL instead of "stddev_samp"
    • Use column introspection fallback in initialization as introspection seems to return an empty list. The fallback for MS SQL is modified to also retrieve the column types.

    Open issues that I don't know how to fix:

    • ~get_table_columns(): returns an empty list because table introspection yields an empty column list~
    • ~get_column_count(): returns 0 (because get_table_columns() is empty)~
    • ~expect_column_values_to_be_of_type(): doesn't work because ~get_table_columns() returns an empty list~ introspection fallback does not provide column type information~
    • ~expect_column_values_to_be_in_type_list(): idem~
    • expect_column_values_to_be_unique(), expect_column_values_to_not_be_null(), expect_column_values_to_be_null() raise errors (such as reported in #932, #1098)

    This should close #1110.

    opened by kepiej 38
  • Docs/expanded contrib pages

    Docs/expanded contrib pages

    This PR replaces CONTRIBUTING.md with a full-fledged community contributions section.

    It's still a draft---some sections are still mostly empty.

    I'm creating a draft PR (1) so that the rest of the team can start to orient to a working model with more clarity and predictability for community members, and (2) to progressively get feedback on the sections that are ready for it.

    opened by abegong 36
  • Fix/docstrings

    Fix/docstrings

    I spent my flight to Dublin cranking away at documentation.

    It's coming together pretty nicely.

    Can you take a look at the first five expectations and see if you like this format? If so, we can roll it out to the rest of them.

    Also, please take a look at the diffs in the rest of the docs. I've moved some pieces around and fixed some lingering inaccuracies.

    Sample pics for expectation docs are below:

    screen shot 2017-12-11 at 6 15 26 am screen shot 2017-12-11 at 6 15 34 am screen shot 2017-12-11 at 6 15 44 am screen shot 2017-12-11 at 6 15 57 am

    opened by abegong 23
  • Add success field and stats to validation report

    Add success field and stats to validation report

    Hey,

    I quickly drafted a first version of the overall success field and summary statistics for the validation report. I'll write test and update the docs as soon as we settle on the exact scope of this PR.

    In summary: the validation report now contains some infos about the overall success as well as stats about the results:

    {
      "results": [...],  # unchanged
       "success": True,
       "stats": {
         "n_total": 10,
         "n_success": 10,
         "n_failed": 0,
       }
    

    What do you think? Do we need the stats? Should the names me different?

    TODO:

    • [X] intial implementation
    • [X] fix old failing tests
    • [X] write tests
    • [X] update docs
    • [x] rebase/clean up commits
    • [x] decide on name "details/statistics" field

    This PR closes #305.

    opened by sotte 22
  • Feature request: Improve Typing System (fka `expect_column_values_to_be_of_type` feels borked in pandas)

    Feature request: Improve Typing System (fka `expect_column_values_to_be_of_type` feels borked in pandas)

    In pandas, a column containing 1, 2, 3, 4, abcd will be parsed as a series of strings "1", "2", "3", "4", "abcd". That means that if I run expect_column_values_to_be_of_type on this column, I get:

    success=False : This is good partial_exception_list = ["1", "2", "3", "4", "abcd"] : This is bad.

    Proposal:

    1. Drop the column_map_expectation expect_column_values_to_be_of_type
    2. Replace it with a column_aggregate_expectation: expect_column_to_be_of_type
    3. Add two new column_map_expectations: expect_column_values_to_be_int_parseable, and expect_column_values_to_be_float_parseable

    Notes:

    • If expect_column_values_to_be_of_type gets a series with mixed types, it always returns false.
    • If expect_column_values_to_be_int_parseable gets a series of ints, it returns true. Ditto for floats in expect_column_values_to_be_float_parseable.
    opened by abegong 22
  • Feature/gcp

    Feature/gcp

    This draft PR adds limited support for Google Cloud Platform. It includes:

    • A Google Cloud Storage (GCS) store backend
    • SQLAlchemy backend support for BigQuery via pybigquery

    Note that a GCS generator was out of scope for this PR, but should be implemented in a future release to achieve full parity between AWS and GCP.

    opened by williamjr 20
  • "catch_exception" not functioning with v3 API

    Describe the bug I am using the v3 API, SparkDFExecutionEngine with the following classes ExpectationConfiguration RuntimeBatchRequest context.get_validator().validate().

    When I include "catch_exception" in the expectation kwargs or in context.get_validator().validate(catch_exceptions=True), an exception is still thrown.

    Calculating Metrics:  50%|█████     | 4/8 [00:00<00:00,  5.30it/s]Traceback (most recent call last):
      File "/opt/project/src/main/glue/reproduce.py", line 92, in <module>
        expectation_suite=suite,
      File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 1209, in validate
        "result_format": result_format,
      File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 474, in graph_validate
        metrics = self.resolve_validation_graph(graph, metrics, runtime_configuration)
      File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 519, in resolve_validation_graph
        runtime_configuration=runtime_configuration,
      File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 560, in _resolve_metrics
        metrics_to_resolve, metrics, runtime_configuration
      File "/usr/local/lib/python3.6/site-packages/great_expectations/execution_engine/execution_engine.py", line 282, in resolve_metrics
        **metric_provider_kwargs
      File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/metric_provider.py", line 58, in inner_func
        return metric_fn(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/map_metric_provider.py", line 465, in inner_func
        message=f'Error: The column "{column_name}" in BatchData does not exist.'
    great_expectations.exceptions.exceptions.ExecutionEngineError: Error: The column "unknown_column" in BatchData does not exist.
    Calculating Metrics:  50%|█████     | 4/8 [00:00<00:00,  4.45it/s]
    

    To Reproduce Code to reproduce the behavior:

    from great_expectations.data_context import BaseDataContext
    from great_expectations.data_context.types.base import InMemoryStoreBackendDefaults
    from great_expectations.data_context.types.base import DataContextConfig
    from great_expectations.core import ExpectationSuite, ExpectationConfiguration
    from great_expectations.core.batch import RuntimeBatchRequest
    from pyspark.sql import SparkSession
    import pandas as pd
    
    data_context_config = DataContextConfig(
        datasources={
            "spark_datasource": {
                "execution_engine": {
                    "class_name": "SparkDFExecutionEngine",
                    "module_name": "great_expectations.execution_engine",
                },
                "class_name": "Datasource",
                "module_name": "great_expectations.datasource",
                "data_connectors": {
                    "runtime_data_connector": {
                        "class_name": "RuntimeDataConnector",
                        "batch_identifiers": [
                            "domain_id",
                            "component_name"
                        ]
                    }
                }
            }
        },
        validation_operators={
            "action_list_operator": {
                "class_name": "ActionListValidationOperator",
                "action_list": [
                    {
                        "name": "store_validation_result",
                        "action": {"class_name": "StoreValidationResultAction"},
                    },
                    {
                        "name": "store_evaluation_params",
                        "action": {"class_name": "StoreEvaluationParametersAction"},
                    },
                    {
                        "name": "update_data_docs",
                        "action": {"class_name": "UpdateDataDocsAction"},
                    },
                ],
            }
        },
        expectations_store_name="expectations_store",
        validations_store_name="validations_store",
        evaluation_parameter_store_name="evaluation_parameter_store",
        checkpoint_store_name="checkpoint_store",
        store_backend_defaults=InMemoryStoreBackendDefaults(),
    )
    
    context = BaseDataContext(project_config=data_context_config)
    suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)
    
    expectation_configuration = ExpectationConfiguration(
        expectation_type='expect_column_values_to_not_be_null',
        kwargs={
            'catch_exceptions': True,  # expect exceptions to be caught
            'result_format': 'SUMMARY',
            'include_config': False,
            'column': 'unknown_column'  # intentionally incorrect column to force error
        },
        meta={
            "Notes": "Some notes"
        }
    )
    suite.add_expectation(expectation_configuration=expectation_configuration)
    
    pandasDF = pd.DataFrame(data=[['Scott'], ['Jeff'], ['Thomas'], ['Ann']], columns=['Name'])
    
    spark = SparkSession.builder.appName("local").getOrCreate()
    sparkDF = spark.createDataFrame(pandasDF)
    sparkDF.show()
    
    runtime_batch_request = RuntimeBatchRequest(
        datasource_name="spark_datasource",
        data_connector_name="runtime_data_connector",
        data_asset_name="insert_your_data_asset_name_here",
        runtime_parameters={
            "batch_data": sparkDF
        },
        batch_identifiers={
            "domain_id": "ininfsgi283",
            "component_name": "some_component",
        }
    )
    validator = context.get_validator(
        batch_request=runtime_batch_request,
        expectation_suite=suite,
    ).validate()
    results = validator.results
    print(results)
    
    

    Environment (please complete the following information):

    • Operating System: MacOS
    • Great Expectations Version: 0.13.26

    Thanks for your time.

    community core-engineering-queue 
    opened by KentonParton 19
  • Support to include ID/PK in validation result for each row that failed an expectations

    Support to include ID/PK in validation result for each row that failed an expectations

    Is your feature request related to a problem? Please describe. When an expectation is run the output includes a “partial_unexpected_list” property of values that were unexpected. While this is useful, in most cases, it doesn't allow teams to identify, resolve, or divert data with poor quality.

    It would be great if a sample list or a complete list (result_format SUMMARY, COMPLETE) of ID's for each row that failed an expectation was included in the validation result.

    Describe the solution you'd like One could include a column name as an argument in the expectation that they would like to be included in the "partial_unexpected_list" (or a new property).

    Describe alternatives you've considered One could try infer the PK for a table but this is not possible for all engine types E.g. Spark.

    Additional context Enabling teams to not only bring light to data quality issues but identify all rows allows them to address poor data quality in real-time instead of requiring manual intervention.

    community devrel feature 
    opened by KentonParton 19
  • [FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set'

    [FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set'

    [FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' sqlalchemy dataset.py file

    Changes proposed in this pull request:

    • Allows users to make use of the pairwise expectation 'expect_column_pair_values_to_be_in_set' for SQLAlchemy backends in v2 GE -Provides the 'column_pair_map_expectation 'decorator for other pairwise expectations to be developed for SQLAlchemy backends in v2 GE

    After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

    Definition of Done

    Please delete options that are not relevant.

    • [x] My code follows the Great Expectations style guide
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [x] I have made corresponding changes to the documentation
    • [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
    • [ ] I have run any local integration tests and made sure that nothing is broken.
    community 
    opened by Arnavkar 18
  • Data docs does not contain the results of my expectations

    Data docs does not contain the results of my expectations

    Describe the bug This is probably not a bug and is user error but I didn't see a suitable template.. I am trying to run expectations via code (not the CLI) as a part of my ETL pipeline in order to validate data before it goes to production. I want to save the expectation results json and upload it to S3 and setup a S3-hosted data docs to pull from those results & the expectation suite.

    To Reproduce Steps to reproduce the behavior:

    1. Run the below attached code
    2. Note that when data docs opens, it only contains the expectations, and not the results of those expectations.

    Expected behavior Data docs contains the results of my expectations.

    Environment (please complete the following information):

    • Operating System: Linux & MacOS
    • Great Expectations Version: [e.g. 0.14.1]

    Additional context If there is a better way to do this that say better leverages existing great_expectations features, do please point me in that direction. Notably, I couldn't make the CLI configuration of my great_expectations.yml work for me, as I need this to run dynamically in a pipeline, uploading to different locations depending on client.

    import great_expectations as ge
    from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, FilesystemStoreBackendDefaults
    from great_expectations.data_context import BaseDataContext
    import numpy as np
    import pandas as pd
    import json
    import os
    from datetime import datetime
    from great_expectations.data_context.types.resource_identifiers import (ExpectationSuiteIdentifier,
        ValidationResultIdentifier,
    )
    
    df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
    
    abs_path = os.getcwd() + '/great_expectations'
    
    data_context_config = DataContextConfig(
        datasources={
            "my_pandas_datasource": DatasourceConfig(
                class_name="PandasDatasource",
            )
        },
        store_backend_defaults=FilesystemStoreBackendDefaults(root_directory=abs_path),
    )
    context = BaseDataContext(project_config=data_context_config)
    
    domain_name = 'test'
    
    suite = context.create_expectation_suite(domain_name, overwrite_existing=True)
    
    batch_kwargs = {
        "datasource": 'my_pandas_datasource',
        "dataset": df,
        "data_asset_name": domain_name,
    }
    
    batch = context.get_batch(batch_kwargs, "test")
    
    print(batch.head())
    
    batch.expect_table_row_count_to_be_between(max_value=250, min_value=10)
    
    batch.expect_table_column_count_to_equal(value=4)
    
    batch.expect_table_columns_to_match_ordered_list(
        column_list=[
            "A",
            "B",
            "C",
            "D",
        ]
    )
    
    batch.expect_column_values_to_not_be_null(column="A",
        result_format='COMPLETE')
    
    batch.expect_column_values_to_be_null(column="A",
        result_format='COMPLETE')
    
    batch.expect_column_values_to_be_in_set(
        column="A",
        value_set=["A", "B", "C", "D", "E", "F"],
        result_format='COMPLETE'
    )
    
    results = batch.validate()
    
    
    # This step is optional, but useful - evaluate the Expectations against the current batch of data
    run_id = {
    "run_name": domain_name,
    "run_time": datetime.now()
    }
    results = batch.validate(expectation_suite=None,
                                    run_id=None,
                                    data_context=context,
                                    evaluation_parameters=None,
                                    catch_exceptions=True,
                                    only_return_failures=False,
                                    run_name=domain_name,
                                    run_time=datetime.now(),)
    
    # save the Expectation Suite (by default to a JSON file in great_expectations/expectations folder
    # batch.save_expectation_suite(suite, domain_name, discard_failed_expectations=False)
    batch.save_expectation_suite(discard_failed_expectations=False)
    
    # Neither details nor meta (I inferred this as expected) seem to contain an expectation_suite_identifier
    # expectation_suite_identifier = list(results["details"].keys())[0]
    # expectation_suite_identifier = list(results["meta"].keys())[0]
    # print('expectation_suite_identifier')
    # print(expectation_suite_identifier)
    
    validation_result_identifier = ValidationResultIdentifier(
        expectation_suite_identifier=domain_name,
        # expectation_suite_identifier=expectation_suite_identifier,
        batch_identifier=batch.batch_kwargs.to_id(),
        run_id=run_id
    )
    
    # This doesn't work
    # context.build_data_docs()
    
    # Neither does this
    # context.build_data_docs(domain_name, results)
    # context.open_data_docs(domain_name)
    
    # Neither does this
    # context.build_data_docs(domain_name, suite_identifier)
    # context.open_data_docs(suite_identifier)
    # context.open_data_docs(validation_result_identifier)
    
    # Neither does this
    suite_identifier = ExpectationSuiteIdentifier(expectation_suite_name=domain_name)
    context.build_data_docs(domain_name, suite_identifier)
    context.open_data_docs()
    
    with open('validation_results.json', 'w') as f:
        f.write(str(results))
    
    community devrel 
    opened by abekfenn 18
  • great_expectations : not able to connect to BigQuery instance

    great_expectations : not able to connect to BigQuery instance

    Describe the bug print(validator.head()) gives

    for (metric_name, metric_configuration) in metrics.items()
    KeyError: ('table.head', 'batch_id=7d60dd00fbc6ebeac60da5ea671dccb0', '04166707abe073177c1dd922d3584468')
    

    when connecting to BigQuery instance

    To Reproduce I am trying to configure the great_expectations tool

    https://greatexpectations.io/

    and here is the documentation i followed

    https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/database/bigquery

    and i was able to create

    import os
    
    from ruamel import yaml
    
    import great_expectations as ge
    from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
    gcp_project = 'my-gcp-project'
    bigquery_dataset = 'my-dataset'
    
    CONNECTION_STRING = f"bigquery://{gcp_project}/{bigquery_dataset}"
    
    context = ge.get_context()
    datasource_config = {
        "name": "my_bigquery_datasource",
        "class_name": "Datasource",
        "execution_engine": {
            "class_name": "SqlAlchemyExecutionEngine",
            "connection_string": "bigquery://my-gcp-project/my-dataset",
        },
        "data_connectors": {
            "default_runtime_data_connector_name": {
                "class_name": "RuntimeDataConnector",
                "batch_identifiers": ["default_identifier_name"],
            },
            "default_inferred_data_connector_name": {
                "class_name": "InferredAssetSqlDataConnector",
                "name": "whole_table",
            },
        },
    }
    
    datasource_config["execution_engine"]["connection_string"] = CONNECTION_STRING
    
    context.test_yaml_config(yaml.dump(datasource_config))
    
    context.add_datasource(**datasource_config)
    
    batch_request = RuntimeBatchRequest(
        datasource_name="my_bigquery_datasource",
        data_connector_name="default_runtime_data_connector_name",
        data_asset_name="json_data",  # this can be anything that identifies this data
        runtime_parameters={"query": "SELECT * from mytable where colA='someVal' LIMIT 10"},
        batch_identifiers={"default_identifier_name": "default_identifier"},
        batch_spec_passthrough={
            "bigquery_temp_table": "ge_temp"
        },  # this is the name of the table you would like to use a 'temp_table'
    )
    
    context.create_expectation_suite(
        expectation_suite_name="test_suite", overwrite_existing=True
    )
    validator = context.get_validator(
        batch_request=batch_request, expectation_suite_name="test_suite"
    )
    print(validator.head())
    
    assert isinstance(validator, ge.validator.validator.Validator)
    

    however, when i run this i get

      Traceback (most recent call last):
      File "geBq.py", line 55, in <module>
        print(validator.head())
      File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 1620, in head
        "fetch_all": fetch_all,
      File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 364, in get_metric
        return self.get_metrics({"_": metric})["_"]
      File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 359, in get_metrics
        for (metric_name, metric_configuration) in metrics.items()
      File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 359, in <dictcomp>
        for (metric_name, metric_configuration) in metrics.items()
      KeyError: ('table.head', 'batch_id=7d60dd00fbc6ebeac60da5ea671dccb0', '04166707abe073177c1dd922d3584468')
    

    what does this error mean? how can i resolve it? i just want to connect to my BigQuery table and then run some validations on it. Btw, i am able to query this table on the BigQuery console

    Expected behavior I should be able to successfully connect Great Expectations with my data on BigQuery instance

    Environment (please complete the following information):

    • Operating System: Linux
    • Great Expectations Version: 0.13.37

    Additional context Please let me know if I can add any other details

    community devrel 
    opened by abtpst 18
  • [MAINTENANCE] ID/PK squashed tests re-added

    [MAINTENANCE] ID/PK squashed tests re-added

    Changes proposed in this pull request:

    • Tests that were introduced by #6656 were squashed by #6676 because of a bad merge. This PR re-adds them to the tests/checkpoint/test_checkpoint_result_format.py file
    core-team dx 
    opened by Shinnnyshinshin 2
  • [FEATURE] ID/PK Pandas query returned as all unexpected indices

    [FEATURE] ID/PK Pandas query returned as all unexpected indices

    • adding query
    • first push with test
    • setting the terminology a little differently

    Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], or [CONTRIB]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

    Changes proposed in this pull request:

    After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

    For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

    In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in GitHub issues or Slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

    Previous Design Review notes:

    Definition of Done

    Please delete options that are not relevant.

    • [ ] My code follows the Great Expectations style guide
    • [ ] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
    • [ ] I have run any local integration tests and made sure that nothing is broken.

    Thank you for submitting!

    core-team dx 
    opened by Shinnnyshinshin 2
  • [MAINTENANCE] Refactor `BaseDataContext` to leverage `get_context`

    [MAINTENANCE] Refactor `BaseDataContext` to leverage `get_context`

    Changes proposed in this pull request:

    • Use get_context within BaseDataContext factory function

    Definition of Done

    • [x] My code follows the Great Expectations style guide
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [x] I have run any local integration tests and made sure that nothing is broken.
    core-team platform 
    opened by cdkini 2
  • CLI can't create a new suite from a csv file with uppercase file extension

    CLI can't create a new suite from a csv file with uppercase file extension

    Describe the bug When going through the steps to generate a new expectation suite from a sample batch of data, the process will fail with a Unable to determine reader method from path error if the batch is a csv file with an uppercase extension (i.e., myfile.CSV).

    To Reproduce Steps to reproduce the behavior:

    1. Follow the tutorial to create a new datasource (steps 1 and 2).
    2. Rename one of the files in the data folder to have an uppercase .CSV extension
    3. Go through step 3 of the tutorial, selecting the uppercase .CSV file as the source of the batch data
    4. It will fail with the above error
    5. Do step 3 but for the other, lowercase, .csv file instead, and note that it succeeds

    Expected behavior .csv and .CSV should be treated the same way

    Environment (please complete the following information):

    • Operating System: MacOS
    • Great Expectations Version: 0.15.41

    Additional context N/A

    community devrel 
    opened by olhmr 1
  • * Implement PySpark backend for column kurtosis expectation

    * Implement PySpark backend for column kurtosis expectation

    Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], or [CONTRIB]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

    Changes proposed in this pull request:

    After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

    For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

    In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in GitHub issues or Slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

    Previous Design Review notes:

    Definition of Done

    Please delete options that are not relevant.

    • [ ] My code follows the Great Expectations style guide
    • [ ] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
    • [ ] I have run any local integration tests and made sure that nothing is broken.

    Thank you for submitting!

    community 
    opened by mkopec87 1
  • [BUGFIX] - Implementing deep copy of runtime_configuration variable

    [BUGFIX] - Implementing deep copy of runtime_configuration variable

    Implementing deep copy of runtime_configuration variable in graph_validation function

    Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], or [CONTRIB]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

    Changes proposed in this pull request:

    • closes #6565

    After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

    For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

    In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in GitHub issues or Slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

    Previous Design Review notes:

    Definition of Done

    Please delete options that are not relevant.

    • [x] My code follows the Great Expectations style guide
    • [x] I have performed a self-review of my own code
    • [x] I have added unit tests where applicable and made sure that new and existing tests are passing.
    • [x] I have run any local integration tests and made sure that nothing is broken.

    Thank you for submitting!

    community 
    opened by tmilitino 1
Releases(0.15.41)
  • 0.15.41(Dec 15, 2022)

    • [FEATURE] enable mostly for expect_compound_columns_to_be_unique (#6533) (thanks @kimfrie)
    • [BUGFIX] Return unique list of batch_definitions (#6579) (thanks @tanelk)
    • [BUGFIX] convert_to_json_serializable does not accept numpy datetime (#6553) (thanks @tmilitino)
    • [DOCS] Clean up misc snippet violations (#6582)
    • [MAINTENANCE] Update json schema validation on usage stats to filter based on format. (#6502)
    • [MAINTENANCE] Renaming Metric Name Suffixes using Enum Values for Better Code Readability (#6575)
    • [MAINTENANCE] M/great 1433/cloud tests to async (#6543)
    • [MAINTENANCE] Add static type checking to rule.py and rule_based_profiler_result.py (#6573)
    • [MAINTENANCE] Update most contrib Expectation docstrings to be consistent and decently formatted for gallery (#6577)
    • [MAINTENANCE] Update changelogs to reflect PyPI yanks (0.15.37-0.15.39) (#6581)
    • [MAINTENANCE] Refactor ExecutionEngine.resolve_metrics() for better code readability (#6578)
    • [MAINTENANCE] adding googletag manager to docusaurus (#6584)
    • [MAINTENANCE] typo in method name (#6585)
    • [MAINTENANCE] mypy config update (#6589)
    • [MAINTENANCE] Small refactor of ExecutionEngine.resolve_metrics() for better code readability (and miscellaneous additional clean up) (#6587)
    • [MAINTENANCE] Remove ExplorerDataContext (#6592)
    • [MAINTENANCE] Leverage RendererConfiguration in existing prescriptive templates (2 of 3) (#6488)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.40(Dec 13, 2022)

    • [FEATURE] F/great 1397/zep checkpoints (#6525)
    • [FEATURE] Add integration test for zep sqlalchemy datasource with renderering. (#6564)
    • [BUGFIX] Patch additional deprecated call to GXCloudIdentifier.ge_cloud_id attr (#6555)
    • [BUGFIX] Patch packaging_and_installation Azure pipeline test failures (#6559)
    • [BUGFIX] Fix dependency issues to reenable RTD builds (#6560)
    • [BUGFIX] Add missing raise statement in RuntimeDataConnector logic (#6569)
    • [DOCS] doc 383: bring sql datasource configuration examples under test (#6466)
    • [MAINTENANCE] Add error handling to docs snippet checker (#6556)
    • [MAINTENANCE] ID/PK tests at Checkpoint-level (#6539)
    • [MAINTENANCE] Improve DataAssistant Parameter Builder Naming/Sanitization Mechanism and Enhance TableDomainBuilder (#6554)
    • [MAINTENANCE] Simplify computational graph assembly from metric configurations (#6563)
    • [MAINTENANCE] RTD Mobile header brand adjustment (#6557)
    • [MAINTENANCE] Use MetricsCalculator methods for ValidationGraph construction and resolution operations in Validator (#6566)
    • [MAINTENANCE] Cast type in execution_environment.py to bypass flaky mypy warnings (#6572)
    • [MAINTENANCE] Additional patch for mypy issue in execution_environment.py (#6574)
    • [MAINTENANCE] Clean up GX rename artifacts (#6561)
    • [CONTRIB] fix observed value in custom expectation (#6515) (thanks @itaise)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.39(Dec 10, 2022)

    • [BUGFIX] Patch faulty GX Cloud arg resolution logic (#6542)
    • [BUGFIX] Fix resolution of cloud variables. (#6546)
    • [DOCS] Fix line numbers in snippets part 2 (#6537)
    • [DOCS] Convert nested snippets to named snippets (#6541)
    • [DOCS] Simplify snippet checker logic to catch stray tags in CI (#6538)
    • [MAINTENANCE] v2 Docs link (#6534)
    • [MAINTENANCE] Fix logic around cloud_mode and ge_cloud_mode. (#6550)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.38(Dec 9, 2022)

    • [BUGFIX] Patch broken Cloud E2E test around Datasource CRUD (#6520)
    • [BUGFIX] Patch outdated ge_cloud_id attribute call in ValidationOperator (#6529)
    • [BUGFIX] Revert refactor to Datasource instantiation logic in DataContext (#6535)
    • [BUGFIX] Patch faulty GX Cloud arg resolution logic (#6542)
    • [DOCS] Fix line numbers in snippets (#6536)
    • [DOCS] Fix line numbers in snippets part 2 (#6537)
    • [DOCS] Convert nested snippets to named snippets (#6541)
    • [MAINTENANCE] Update Data Assistant plot images (#6521)
    • [MAINTENANCE] Clean up type hints and make test generation more elegant (#6523)
    • [MAINTENANCE] Clean up Datasource instantiation logic in DataContext (#6517)
    • [MAINTENANCE] Update Domain computation in MetricConfiguration (#6528)
    • [MAINTENANCE] v2 Docs link (#6534)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.37(Dec 8, 2022)

    • [FEATURE] Support to include ID/PK in validation result for each row - SQL (#6448)
    • [FEATURE] Build process and example API docs (part 1) (#6474)
    • [FEATURE] Add temp_table_schema_name support for BigQuery (#6303) (thanks @BobbyRyterski)
    • [FEATURE] Decorators for API docs (part 2) (#6497)
    • [FEATURE] Decorators for API docs (part 3) (#6504)
    • [BUGFIX] Support slack channel name with webhook also (#6481) (thanks @Kozehh)
    • [BUGFIX] Airflow operator package conflict for jsonschema (#6495)
    • [BUGFIX] Validator uses proper arguments to show progress bar at Metrics resolution-level (#6510) (thanks @tommy-watts-depop)
    • [DOCS] Schedule Algolia Crawler daily at midnight (#6323)
    • [DOCS] fix(gh-6512): fix rendering of Batch definition (#6513) (thanks @JoelGritter)
    • [MAINTENANCE] Add pretty representations for zep pydantic models (#6472)
    • [MAINTENANCE] Misc updates to PR template (#6479)
    • [MAINTENANCE] Minor cleanup for better code readability (#6478)
    • [MAINTENANCE] Move zep method from datasource to data asset. (#6477)
    • [MAINTENANCE] Staging for build gallery (#6480)
    • [MAINTENANCE] Reformat core expectation docstrings (#6423)
    • [MAINTENANCE] Move "Domain" to "great_expectations/core" to avoid circular imports; also add MetricConfiguration tests; and other clean up. (#6484)
    • [MAINTENANCE] Query the database for datetime column splitter defaults (#6482)
    • [MAINTENANCE] Placing metrics test db under try-except (#6489)
    • [MAINTENANCE] Clean up tests for more formal Batch and Validator instantiation (#6491)
    • [MAINTENANCE] Rename ge to gx across the codebase (#6487)
    • [MAINTENANCE] Upgrade CodeSee workflow to version 2 (#6498) (thanks @codesee-maps[bot])
    • [MAINTENANCE] Rename GE to GX across codebase (GREAT-1352) (#6494)
    • [MAINTENANCE] Resolve mypy issues in cli/docs.py (#6500)
    • [MAINTENANCE] Increase timeout to 15 minutes for the 2 jobs in manual-staging-json-to-prod pipeline (#6509)
    • [MAINTENANCE] Update Data Assistant plot color scheme and fonts (#6496)
    • [MAINTENANCE] Update RendererConfiguration to pydantic model (#6452)
    • [MAINTENANCE] Message for how to install Great Expectations in Cloud Composer by pinning packages (#6492)
    • [MAINTENANCE] Leverage RendererConfiguration in existing prescriptive templates (1 of 3) (#6460)
    • [MAINTENANCE] Clean up teams.yml (#6511)
    • [MAINTENANCE] Make Generated Integration Tests More Robust Using BatchDefinition and InMemoryDataContext In Validator and ExecutionEngine Instantiation (#6505)
    • [MAINTENANCE] DO NOT MERGE UNTIL DEC 8: [MAINTENANCE] Brand changes in docs (#6427)
    • [MAINTENANCE] fixed typo in nav (#6518)
    • [MAINTENANCE] Clean up GX Cloud environment variable usage (GREAT-1352) (#6501)
    • [MAINTENANCE] Update Data Assistant plot images (#6521)
    • [CONTRIB] Add uniqueness expectation (#6473) (thanks @itaise)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.36(Dec 2, 2022)

    • [BUGFIX] Contrib Expectation tracebacks (#6471)
    • [BUGFIX] Add additional error checking to ExpectationAnonymizer (#6467)
    • [MAINTENANCE] Add docstring for context.sources.add_postgres (#6459)
    • [MAINTENANCE] fixing type hints in metrics utils module (#6469)
    • [MAINTENANCE] Moving tutorials to great-expectations repo (#6464)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.35(Dec 2, 2022)

    • [FEATURE] add multiple input metric (#6373) (thanks @CarstenFrommhold)
    • [FEATURE] add multiple column metric (#6372) (thanks @CarstenFrommhold)
    • [FEATURE]: DataProfilerUnstructuredDataAssistant Integration (#6400) (thanks @micdavis)
    • [FEATURE] add new metric - query template values (#5994) (thanks @itaise)
    • [FEATURE] ZEP Config serialize as YAML (#6398)
    • [BUGFIX] Patch issue with call to ExpectationAnonymizer to ensure DataContext init events are captured (#6458)
    • [BUGFIX] Support Table and Column Names Case Non-Sensitivity Relationship Between Snowflake, Oracle, DB2, etc. DBMSs (Upper Case) and SQLAlchemy (Lower Case) Representations (#6450)
    • [BUGFIX] Metrics return value no longer returns None for unexpected_index_list - Sql and Spark (#6392)
    • [BUGFIX] Fix for mssql tests that depend on datetime to string conversion (#6449)
    • [BUGFIX] issue-4295-fix-issue (#6164) (thanks @YevgeniyaLee)
    • [BUGFIX] updated capitalone setup.py file (#6410) (thanks @micdavis)
    • [BUGFIX] Patch key-generation issue with DataContext.save_profiler() (#6405)
    • [DOCS] add configuration of anonymous_usage_statistics for documentation (#6293) (thanks @milithino)
    • [DOCS] add boto3 explanations on document (#6407) (thanks @tiruka)
    • [MAINTENANCE] [CONTRIB] Multicolumns sum equal to single column (#6446) (thanks @asafla)
    • [MAINTENANCE] [CONTRIB] add expectation - check gaps in SCD tables (#6433) (thanks @itaise)
    • [MAINTENANCE] [CONTRIB] Add no days missing expectation (#6432) (thanks @itaise)
    • [MAINTENANCE] [CONTRIB] Feature/add two tables expectation (#6429) (thanks @itaise)
    • [MAINTENANCE] [CONTRIB] Add number of unique values expectation (#6425) (thanks @itaise)
    • [MAINTENANCE] Add sorters to zep postgres datasource. (#6456)
    • [MAINTENANCE] Bump ubuntu version in CI (#6457)
    • [MAINTENANCE] Remove anticipatory multi-language support from renderers (#6426)
    • [MAINTENANCE] Remove yaml user_flow_scripts (#6454)
    • [MAINTENANCE] Additional sqlite database fixture for taxi_data - All 2020 data in single table (#6455)
    • [MAINTENANCE] Clean Up Variable Names In Test Modules, Type Hints, and Minor Refactoring For Better Code Elegance/Readability (#6444)
    • [MAINTENANCE] Update and Simplify Pandas tests for MapMetrics (#6443)
    • [MAINTENANCE] Add metadata to experimental datasource Batch class (#6442)
    • [MAINTENANCE] Small refactor (#6422)
    • [MAINTENANCE] Sorting batch IDs and typehints clean up (#6421)
    • [MAINTENANCE] Clean Up Type Hints and Minor Refactoring For Better Code Elegance/Readability (#6418)
    • [MAINTENANCE] Implement RendererConfiguration (#6412)
    • [MAINTENANCE] Cleanup For Better Code Elegance/Readability (#6406)
    • [MAINTENANCE] ZEP - GxConfig cleanup (#6404)
    • [MAINTENANCE] Migrate remaining methods from BaseDataContext (#6403)
    • [MAINTENANCE] Migrate additional CRUD methods from BaseDataContext to AbstractDataContext (#6395)
    • [MAINTENANCE] ZEP add yaml methods to all experimental models (#6401)
    • [MAINTENANCE] Remove call to verify_library_dependent_modules for pybigquery (#6394)
    • [MAINTENANCE] Make "IDDict.to_id()" serialization more efficient. (#6389)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.34(Nov 18, 2022)

    • [BUGFIX] Ensure packaging_and_installation CI tests against latest tag (#6386)
    • [BUGFIX] Fixed missing comma in pydantic constraints (#6391) (thanks @awburgess)
    • [BUGFIX] fix pydantic dev req file entries (#6396)
    • [DOCS] DOC-379 bring spark datasource configuration example scripts under test (#6362)
    • [MAINTENANCE] Handle both ExpectationConfiguration and ExpectationValidationResult in default Atomic renderers and cleanup include_column_name (#6380)
    • [MAINTENANCE] Add type annotations to all existing atomic renderer signatures (#6385)
    • [MAINTENANCE] move zep -> experimental package (#6378)
    • [MAINTENANCE] Migrate additional methods from BaseDataContext to other parts of context hierarchy (#6388)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.33(Nov 17, 2022)

    • [FEATURE] POC ZEP Config Loading (#6320)
    • [BUGFIX] Fix issue with misaligned indentation in docs snippets (#6339)
    • [BUGFIX] Use requirements.txt file when installing linting/static check dependencies in CI (#6368)
    • [BUGFIX] Patch nested snippet indentation issues within remark-named-snippets plugin (#6376)
    • [BUGFIX] Ensure packaging_and_installation CI tests against latest tag (#6386)
    • [DOCS] DOC-308 update CLI command in docs when working with RBPs instead of Data Assistants (#6222)
    • [DOCS] DOC-366 updates to docs in support of branding updates (#5766)
    • [DOCS] Add yarn snippet-check command (#6351)
    • [MAINTENANCE] Add missing one-line docstrings and try to make the others consistent (#6340)
    • [MAINTENANCE] Refactor variable aggregation/substitution logic into ConfigurationProvider hierarchy (#6321)
    • [MAINTENANCE] In ExecutionEngine: Make variable names and usage more descriptive of their purpose. (#6342)
    • [MAINTENANCE] Move Cloud-specific enums to cloud_constants.py (#6349)
    • [MAINTENANCE] Refactor out termcolor dependency (#6348)
    • [MAINTENANCE] Zep PostgresDatasource returns a list of batches. (#6341)
    • [MAINTENANCE] Refactor usage_stats_opt_out method in DataContext (#5339)
    • [MAINTENANCE] Fix computed metrics type hint in ExecutionEngine.resolve_metrics() method (#6347)
    • [MAINTENANCE] Subject: Support to include ID/PK in validation result for each row t… (#5876) (thanks @abekfenn)
    • [MAINTENANCE] Pin mypy to 0.990 (#6361)
    • [MAINTENANCE] Misc cleanup of GX Cloud helpers (#6352)
    • [MAINTENANCE] Update column_reflection_fallback to also use schema name for Trino (#6350)
    • [MAINTENANCE] Bump version of mypy in contrib CLI (#6370)
    • [MAINTENANCE] Move config variable substitution logic into ConfigurationProvider (#6345)
    • [MAINTENANCE] Removes comment in code that was causing confusion to some users. (#6366)
    • [MAINTENANCE] minor metrics typing (#6374)
    • [MAINTENANCE] Make ConfigurationProvider and ConfigurationSubstitutor private (#6375)
    • [MAINTENANCE] Rename GeCloudStoreBackend to GXCloudStoreBackend (#6377)
    • [MAINTENANCE] Cleanup Metrics and ExecutionEngine methods (#6371)
    • [MAINTENANCE] F/great 1314/integrate zep in core (#6358)
    • [MAINTENANCE] Loosen pydantic version requirement (#6384)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.32(Nov 10, 2022)

    • [BUGFIX] Patch broken CloudNotificationAction tests (#6327)
    • [BUGFIX] add create_temp_table flag to ExecutionEngineConfigSchema (#6331) (thanks @tommy-watts-depop)
    • [BUGFIX] MapMetrics now return partial_unexpected values for SUMMARY format (#6334)
    • [DOCS] Re-writes "how to implement custom notifications" as "How to get Data Docs URLs for use in custom Validation Actions" (#6281)
    • [DOCS] Removes deprecated expectation notebook exploration doc (#6298)
    • [DOCS] Removes a number of unused & deprecated docs (#6300)
    • [DOCS] Prioritizes Onboarding Data Assistant in ToC (#6302)
    • [DOCS] Add ZenML into integration table in Readme (#6144) (thanks @dnth)
    • [DOCS] add pypi release badge (#6324)
    • [MAINTENANCE] Remove unneeded BaseDataContext.get_batch_list (#6291)
    • [MAINTENANCE] Clean up implicit Optional errors flagged by mypy (#6319)
    • [MAINTENANCE] Add manual prod flags to core Expectations (#6278)
    • [MAINTENANCE] Fallback to isnot method if is_not is not available (old sqlalchemy) (#6318)
    • [MAINTENANCE] Add ZEP postgres datasource. (#6274)
    • [MAINTENANCE] Delete "metric_dependencies" from MetricConfiguration constructor arguments (#6305)
    • [MAINTENANCE] Clean up DataContext (#6304)
    • [MAINTENANCE] Deprecate save_changes flag on Datasource CRUD (#6258)
    • [MAINTENANCE] Deprecate great_expectations.render.types package (#6315)
    • [MAINTENANCE] Update range of allowable sqlalchemy versions (#6328)
    • [MAINTENANCE] Fixing checkpoint types (#6325)
    • [MAINTENANCE] Fix column_reflection_fallback for Trino and minor logging/testing improvements (#6218)
    • [MAINTENANCE] Change the number of expected Expectations in the 'quick check' stage of build_gallery pipeline (#6333)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.31(Nov 4, 2022)

    • [BUGFIX] Include all requirement files in the sdist (#6292) (thanks @xhochy)
    • [DOCS] Updates outdated batch_request snippet in Terms (#6283)
    • [DOCS] Update Conditional Expectations doc w/ current availability (#6279)
    • [DOCS] Remove outdated Data Discovery page and all references (#6288)
    • [DOCS] Remove reference/evaluation_parameters page and all references (#6294)
    • [DOCS] Removing deprecated Custom Metrics doc (#6282)
    • [DOCS] Re-writes "how to implement custom notifications" as "How to get Data Docs URLs for use in custom Validation Actions" (#6281)
    • [DOCS] Removes deprecated expectation notebook exploration doc (#6298)
    • [MAINTENANCE] Move RuleState into rule directory. (#6284)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.30(Nov 3, 2022)

    • [FEATURE] Add zep datasources to data context. (#6255)
    • [BUGFIX] Iterate through GeCloudIdentifiers to find the suite ID from the name (#6243)
    • [BUGFIX] Update default base url for cloud API (#6176)
    • [BUGFIX] Pin termcolor to below 2.1.0 due to breaking changes in lib's TTY parsing logic (#6257)
    • [BUGFIX] InferredAssetSqlDataConnector include_schema_name introspection of identical table names in different schemas (#6166)
    • [BUGFIX] Fixdocs-integration tests, and temporarily pin sqlalchemy (#6268)
    • [BUGFIX] Fix serialization for contrib packages (#6266)
    • [BUGFIX] Ensure that Datasource credentials are not persisted to Cloud/disk (#6254)
    • [DOCS] Updates package contribution references (#5885)
    • [MAINTENANCE] Maintenance/great 1103/great 1318/alexsherstinsky/validation graph/refactor validation graph usage 2022 10 20 248 (#6228)
    • [MAINTENANCE] Refactor instances of noqa: F821 Flake8 directive (#6220)
    • [MAINTENANCE] Logo URI ref in data_docs (#6246)
    • [MAINTENANCE] fix typos in docstrings (#6247)
    • [MAINTENANCE] Isolate Trino/MSSQL/MySQL tests in dev CI (#6231)
    • [MAINTENANCE] Split up compatability and comprehensive stages in dev CI to improve performance (#6245)
    • [MAINTENANCE] ZEP POC - Asset Type Registration (#6194)
    • [MAINTENANCE] Add Trino CLI support and bump Trino version (#6215) (thanks @hovaesco)
    • [MAINTENANCE] Delete unneeded Rule attribute property (#6264)
    • [MAINTENANCE] Small clean-up of Marshmallow warnings (missing parameter changed to load_default as of 3.13) (#6213)
    • [MAINTENANCE] Move .png files out of project root (#6249)
    • [MAINTENANCE] Cleanup expectation.py attributes (#6265)
    • [MAINTENANCE] Further parallelize test runs in dev CI (#6267)
    • [MAINTENANCE] GCP Integration Pipeline fix (#6259)
    • [MAINTENANCE] mypy warn_unused_ignores (#6270)
    • [MAINTENANCE] ZEP - Datasource base class (#6263)
    • [MAINTENANCE] Reverting marshmallow version bump (#6271)
    • [MAINTENANCE] type hints cleanup in Rule-Based Profiler (#6272)
    • [MAINTENANCE] Remove unused f-strings (#6248)
    • [MAINTENANCE] Make ParameterBuilder.resolve_evaluation_dependencies() into instance (rather than utility) method (#6273)
    • [MAINTENANCE] Test definition for ExpectColumnValueZScoresToBeLessThan (#6229)
    • [MAINTENANCE] Make RuleState constructor argument ordering consistent with standard pattern. (#6275)
    • [MAINTENANCE] [REQUEST] Please allow Rachel to unblock blockers (#6253)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.29(Oct 28, 2022)

    • [FEATURE] Add support to AWS Glue Data Catalog (#5123) (thanks @lccasagrande)
    • [FEATURE] / Added pairwise expectation 'expect_column_pair_values_to_be_in_set' (#6097) (thanks @Arnavkar)
    • [BUGFIX] Adjust condition in RenderedAtomicValueSchema.clean_null_attrs (#6168)
    • [BUGFIX] Add py to dev dependencies to circumvent compatability issues with pytest==7.2.0 (#6202)
    • [BUGFIX] Fix test_package_dependencies.py to include py lib (#6204)
    • [BUGFIX] Fix logic in ExpectationDiagnostics._check_renderer_methods method (#6208)
    • [BUGFIX] Patch issue with empty config variables file raising TypeError (#6216)
    • [BUGFIX] Release patch for Azure env vars (#6233)
    • [BUGFIX] Cloud Data Context should overwrite existing suites based on ge_cloud_id instead of name (#6234)
    • [BUGFIX] Add env vars to Pytest min versions Azure stage (#6239)
    • [DOCS] doc-297: update the create Expectations overview page for Data Assistants (#6212)
    • [DOCS] DOC-378: bring example scripts for pandas configuration guide under test (#6141)
    • [MAINTENANCE] Add unit test for MetricsCalculator.get_metric() Method -- as an example template (#6179)
    • [MAINTENANCE] ZEP MetaDatasource POC (#6178)
    • [MAINTENANCE] Update scope_check in Azure CI to trigger on changed .py source code files (#6185)
    • [MAINTENANCE] Move test_yaml_config to a separate class (#5487)
    • [MAINTENANCE] Changed profiler to Data Assistant in CLI, docs, and tests (#6189)
    • [MAINTENANCE] Update default GE_USAGE_STATISTICS_URL in test docker image. (#6192)
    • [MAINTENANCE] Re-add a renamed test definition file (#6182)
    • [MAINTENANCE] Refactor method parse_evaluation_parameter (#6191)
    • [MAINTENANCE] Migrate methods from BaseDataContext to AbstractDataContext (#6188)
    • [MAINTENANCE] Rename cfe to v3_api (#6190)
    • [MAINTENANCE] Test Trino doc examples with test_script_runner.py (#6198)
    • [MAINTENANCE] Cleanup of Regex ParameterBuilder (#6196)
    • [MAINTENANCE] Apply static type checking to expectation.py (#6173)
    • [MAINTENANCE] Remove version matrix from dev CI pipeline to improve performance (#6203)
    • [MAINTENANCE] Rename CloudMigrator.retry_unsuccessful_validations (#6206)
    • [MAINTENANCE] Add validate_configuration method to expect_table_row_count_to_equal_other_table (#6209)
    • [MAINTENANCE] Replace deprecated iteritems with items (#6205)
    • [MAINTENANCE] Add instructions for setting up the test_ci database (#6211)
    • [MAINTENANCE] Add E2E tests for Cloud-backed Datasource CRUD (#6186)
    • [MAINTENANCE] Execution Engine linting & partial typing (#6210)
    • [MAINTENANCE] Test definition for ExpectColumnValuesToBeJsonParsable, including a fix for Spark (#6207)
    • [MAINTENANCE] Port over usage statistics enabled methods from BaseDataContext to AbstractDataContext (#6201)
    • [MAINTENANCE] Remove temporary dependency on py (#6217)
    • [MAINTENANCE] Adding type hints to DataAssistant implementations (#6224)
    • [MAINTENANCE] Remove AWS config file dependencies and use existing env vars in CI/CD (#6227)
    • [MAINTENANCE] Make UsageStatsEvents a StrEnum (#6225)
    • [MAINTENANCE] Move all requirements-dev*.txt files to separate dir (#6223)
    • [MAINTENANCE] Maintenance/great 1103/great 1318/alexsherstinsky/validation graph/refactor validation graph usage 2022 10 20 248 (#6228)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.28(Oct 20, 2022)

    • [FEATURE] Initial zep datasource protocol. (#6153)
    • [FEATURE] Introduce BatchManager to manage Batch objects used by Validator and BatchData used by ExecutionEngine (#6156)
    • [FEATURE] Add support for Vertica dialect (#6145) (thanks @viplazylmht)
    • [FEATURE] Introduce MetricsCalculator and Refactor Redundant Code out of Validator (#6165)
    • [BUGFIX] SQLAlchemy selectable Bug fix (#6159) (thanks @tommy-watts-depop)
    • [BUGFIX] Parameterize usage stats endpoint in test dockerfile. (#6169)
    • [BUGFIX] B/great 1305/usage stats endpoint (#6170)
    • [BUGFIX] Ensure that spaces are recognized in named snippets (#6172)
    • [DOCS] Clarify wording for interactive mode in databricks (#6154)
    • [DOCS] fix source activate command (#6161) (thanks @JGrzywacz)
    • [DOCS] Update version in runtime.txt to fix breaking Netlify builds (#6181)
    • [DOCS] Clean up snippets and line number validation in docs (#6142)
    • [MAINTENANCE] Add Enums for renderer types (#6112)
    • [MAINTENANCE] Minor cleanup in preparation for Validator refactoring into separate concerns (#6155)
    • [MAINTENANCE] add the internal GE_DATA_CONTEXT_ID env var to the docker file (#6122)
    • [MAINTENANCE] Rollback setting GE_DATA_CONTEXT_ID in docker image. (#6163)
    • [MAINTENANCE] disable ge_cloud_mode when specified, detect misconfiguration (#6162)
    • [MAINTENANCE] Re-add missing Expectations to gallery and include package names (#6171)
    • [MAINTENANCE] Use from __future__ import annotations to clean up type hints (#6127)
    • [MAINTENANCE] Make sure that quick stage check returns 0 if there are no problems (#6177)
    • [MAINTENANCE] Remove SQL for expect_column_discrete_entropy_to_be_between (#6180)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.27(Oct 13, 2022)

    • [FEATURE] Add logging/warnings to GX Cloud migration process (#6106)
    • [FEATURE] Introduction of updated gx.get_context() method that returns correct DataContext-type (#6104)
    • [FEATURE] Contribute StatisticsDataAssistant and GrowthNumericDataAssistant (both experimental) (#6115)
    • [BUGFIX] add OBJECT_TYPE_NAMES to the JsonSchemaProfiler - issue #6109 (#6110) (thanks @OphelieC)
    • [BUGFIX] Fix example Set-Based Column Map Expectation template import (#6134)
    • [BUGFIX] Regression due to GESqlDialect Enum for Hive (#6149)
    • [DOCS] Support for named snippets in documentation (#6087)
    • [MAINTENANCE] Clean up test_migrate=True Cloud migrator output (#6119)
    • [MAINTENANCE] Creation of Hackathon Packages (#4587)
    • [MAINTENANCE] Rename GCP Integration Pipeline (#6121)
    • [MAINTENANCE] Change log levels used in CloudMigrator (#6125)
    • [MAINTENANCE] Bump version of sqlalchemy-redshift from 0.7.7 to 0.8.8 (#6082)
    • [MAINTENANCE] self_check linting & initial type-checking (#6126)
    • [MAINTENANCE] Update per Clickhouse multiple same aliases Bug (#6128) (thanks @adammrozik)
    • [MAINTENANCE] Only update existing rendered_content if rendering does not fail with new InlineRenderer failure message (#6091)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.26(Sep 29, 2022)

    • [FEATURE] Enable sending of ConfigurationBundle payload in HTTP request to Cloud backend (#6083)
    • [FEATURE] Send user validation results to Cloud backend during migration (#6102)
    • [BUGFIX] Fix bigquery crash when using "in" with a boolean column (#6071)
    • [BUGFIX] Fix serialization error when rendering kl_divergence (#6084) (thanks @roblim)
    • [BUGFIX] Enable top-level parameters in Data Assistants accessed via dispatcher (#6077)
    • [BUGFIX] Patch issue around DataContext.save_datasource not sending class_name in result payload (#6108)
    • [DOCS] DOC-377 add missing dictionary in configured asset datasource portion of Pandas and Spark configuration guides (#6081)
    • [DOCS] DOC-376 finalize definition for Data Assistants in technical terms (#6080)
    • [DOCS] Update docs-integration test due to new whole_table splitter behavior (#6103)
    • [DOCS] How to create a Custom Multicolumn Map Expectation (#6101)
    • [MAINTENANCE] Patch broken Cloud E2E test (#6079)
    • [MAINTENANCE] Bundle data context config and other artifacts for migration (#6068)
    • [MAINTENANCE] Add datasources to ConfigurationBundle (#6092)
    • [MAINTENANCE] Remove unused config files from root of GX repo (#6090)
    • [MAINTENANCE] Add data_context_id property to ConfigurationBundle (#6094)
    • [MAINTENANCE] Move all Cloud migrator logic to separate directory (#6100)
    • [MAINTENANCE] Update aloglia scripts for new fields and replica indices (#6049) (thanks @winrp17)
    • [MAINTENANCE] initial Datasource typings (#6099)
    • [MAINTENANCE] Data context migrate to cloud event (#6095)
    • [MAINTENANCE] Bundling tests with empty context configs (#6107)
    • [MAINTENANCE] Fixing a typo (#6113)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.25(Sep 23, 2022)

    • [FEATURE] Since value set in expectation kwargs is list of strings, do not emit expect_column_values_to_be_in_set for datetime valued columns (#6046)
    • [FEATURE] add failed expectations list to slack message (#5812) (thanks @itaise)
    • [FEATURE] Enable only ExactNumericRangeEstimator and QuantilesNumericRangeEstimator in "datetime_columns_rule" of OnboardingDataAssistant (#6063)
    • [BUGFIX] numpy typing behind if TYPE_CHECKING (#6076)
    • [DOCS] Update "How to create an Expectation Suite with the Onboarding Data Assistant" (#6050)
    • [DOCS] How to get one or more Batches of data from a configured Datasource (#6043)
    • [DOCS] DOC-298 Data Assistant technical term page (#6057)
    • [DOCS] Update OnboardingDataAssistant documentation (#6059)
    • [MAINTENANCE] Clean up of DataAssistant tests that depend on Jupyter notebooks (#6039)
    • [MAINTENANCE] AbstractDataContext.datasource_save() test simplifications (#6052)
    • [MAINTENANCE] Rough architecture for cloud migration tool (#6054)
    • [MAINTENANCE] Include git commit info when building docker image. (#6060)
    • [MAINTENANCE] Allow CloudDataContext to retrieve and initialize its own project config (#6006)
    • [MAINTENANCE] Removing Jupyter notebook-based tests for DataAssistants (#6062)
    • [MAINTENANCE] pinned dremio, fixed linting (#6067)
    • [MAINTENANCE] usage-stats, & utils.py typing (#5925)
    • [MAINTENANCE] Refactor external HTTP request logic into a Session factory function (#6007)
    • [MAINTENANCE] Remove tag validity stage from release pipeline (#6069)
    • [MAINTENANCE] Remove unused test fixtures from test suite (#6058)
    • [MAINTENANCE] Remove outdated release files (#6074)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.24(Sep 19, 2022)

    • [FEATURE] context.save_datasource (#6009)
    • [BUGFIX] Standardize ConfiguredAssetSqlDataConnector config in datasource new CLI workflow (#6044)
    • [DOCS] DOC-371 update the getting started tutorial for data assistants (#6024)
    • [DOCS] DOCS-369 sql data connector configuration guide (#6002)
    • [MAINTENANCE] Remove outdated entry from release schedule JSON (#6032)
    • [MAINTENANCE] Clean up Spark schema tests to have proper names (#6033)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.23(Sep 16, 2022)

    • [FEATURE] do not require expectation_suite_name in DataAssistantResult.show_expectations_by...() methods (#5976)
    • [FEATURE] Refactor PartitionParameterBuilder into dedicated ValueCountsParameterBuilder and HistogramParameterBuilder (#5975)
    • [FEATURE] Implement default sorting for batches based on selected splitter method (#5924)
    • [FEATURE] Make OnboardingDataAssistant default profiler in CLI SUITE NEW (#6012)
    • [FEATURE] Enable omission of rounding of decimals in NumericMetricRangeMultiBatchParameterBuilder (#6017)
    • [FEATURE] Enable non-default sorters for ConfiguredAssetSqlDataConnector (#5993)
    • [FEATURE] Data Assistant plot method indication of total metrics and expectations count (#6016)
    • [BUGFIX] Addresses issue with ExpectCompoundColumnsToBeUnique renderer (#5970)
    • [BUGFIX] Fix failing run_profiler_notebook test (#5983)
    • [BUGFIX] Handle case when only one unique "column.histogram" bin value is found (#5987)
    • [BUGFIX] Update get_validator test assertions due to change in fixture batches (#5989)
    • [BUGFIX] Fix use of column.partition metric in HistogramSingleBatchParameterBuilder to more accurately handle errors (#5990)
    • [BUGFIX] Make Spark implementation of "column.value_counts" metric more robust to None/NaN column values (#5996)
    • [BUGFIX] Filter out np.nan values (just like None values) as part of ColumnValueCounts._spark() implementation (#5998)
    • [BUGFIX] Handle case when only one unique "column.histogram" bin value is found with proper type casting (#6001)
    • [BUGFIX] ColumnMedian._sqlalchemy() needs to handle case of single-value column (#6011)
    • [BUGFIX] Patch broken save_expectation_suite behavior with Cloud-backed DataContext (#6004)
    • [BUGFIX] Clean quantitative metrics DataFrames in Data Assistant plotting (#6023)
    • [BUGFIX] Defer pprint in ExpectationSuite.show_expectations_by_expectation_type() due to Jupyter rate limit (#6026)
    • [BUGFIX] Use UTC TimeZone (rather than Local Time Zone) for Rule-Based Profiler DateTime Conversions (#6028)
    • [DOCS] Update snippet refs in "How to create an Expectation Suite with the Onboarding Data Assistant" (#6014)
    • [MAINTENANCE] Randomize the non-comprehensive tests (#5968)
    • [MAINTENANCE] DatasourceStore refactoring (#5941)
    • [MAINTENANCE] Expectation suite init unit tests + types (#5957)
    • [MAINTENANCE] Expectation suite new unit tests for add_citation (#5966)
    • [MAINTENANCE] Updated release schedule (#5977)
    • [MAINTENANCE] Unit tests for CheckpointStore (#5967)
    • [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979)
    • [MAINTENANCE] Remove unused fixtures from test suite (#5965)
    • [MAINTENANCE] Update to MultiBatch Notebook to include Configured - Sql (#5945)
    • [MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql (#5958)
    • [MAINTENANCE] Add reverse assertion for isEquivalentTo tests (#5982)
    • [MAINTENANCE] Unit test enhancements ExpectationSuite.eq() (#5984)
    • [MAINTENANCE] Refactor DataContext.__init__ to move Cloud-specific logic to CloudDataContext (#5981)
    • [MAINTENANCE] Set up cloud integration tests with Azure Pipelines (#5995)
    • [MAINTENANCE] Example of splitter_method at Asset and DataConnector level (#6000)
    • [MAINTENANCE] Replace splitter_method strings with SplitterMethod Enum and leverage GESqlDialect Enum where applicable (#5980)
    • [MAINTENANCE] Ensure that DataContext.add_datasource works with nested DataConnector ids (#5992)
    • [MAINTENANCE] Remove cloud integration tests from azure-pipelines.yml (#5997)
    • [MAINTENANCE] Unit tests for GeCloudStoreBackend (#5999)
    • [MAINTENANCE] Parameterize pg hostname in jupyter notebooks (#6005)
    • [MAINTENANCE] Unit tests for Validator (#5988)
    • [MAINTENANCE] Add unit tests for SimpleSqlalchemyDatasource (#6008)
    • [MAINTENANCE] Remove dgtest from dev pipeline (#6003)
    • [MAINTENANCE] Remove deprecated account_id from GX Cloud integrations (#6010)
    • [MAINTENANCE] Added perf considerations to onboarding assistant notebook (#6022)
    • [MAINTENANCE] Redshift specific temp table code path (#6021)
    • [MAINTENANCE] Update datasource new workflow to enable ConfiguredAssetDataConnector usage with SQL-backed Datasources (#6019)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.22(Sep 8, 2022)

    • [BUGFIX] Data Assistant plotting with zero expectations produced (#5934)
    • [BUGFIX] Don't include abstract Expectation classes in _retrieve_expectations_from_module (#5947)
    • [BUGFIX] Ensure that ParameterBuilder implementations in Rule Based Profiler properly handle SQL DECIMAL type (#5896)
    • [BUGFIX] Making an all-NULL column handling in RuleBasedProfiler more robust (#5937)
    • [BUGFIX] Prevent "division by zero" errors in Rule-Based Profiler calculations when Batch has zero rows (#5960)
    • [BUGFIX] Spark column.distinct_values no longer returns entire table distinct values (#5969)
    • [BUGFIX] prefix and suffix asset names are only relevant for InferredSqlAlchemyDataConnector (#5950)
    • [DOCS] DOC-368 spelling correction (#5912)
    • [FEATURE] Allowing schema to be passed in as batch_spec_passthrough in Spark (#5900)
    • [FEATURE] DataAssistants Example Notebook - Spark (#5919)
    • [FEATURE] Improve slack error condition (#5818 thank you, @Itai Sevitt)
    • [FEATURE] SparkDFExecutionEngine able to use schema #5917
    • [MAINTENANCE] Certify InferredAssetSqlDataConnector and ConfiguredAssetSqlDataConnector (#5847)
    • [MAINTENANCE] Add missing import for ConfigurationIdentifier (#5943)
    • [MAINTENANCE] Add x-fails to flaky Cloud tests for purposes of 0.15.22 (#5964)
    • [MAINTENANCE] Bump Marshmallow upper bound to work with Airflow operator (#5952)
    • [MAINTENANCE] Cleanup up some new datasource sql data connector tests. (#5918)
    • [MAINTENANCE] Mark DBFS tests with @pytest.mark.integration (#5931)
    • [MAINTENANCE] Mark all tests within tests/data_context/stores dir (#5913)
    • [MAINTENANCE] Mark all tests within tests/validator (#5926)
    • [MAINTENANCE] Mark tests within tests/rule_based_profiler (#5930)
    • [MAINTENANCE] More unit tests for Stores (#5953)
    • [MAINTENANCE] Move Store test utils from source code to tests (#5932)
    • [MAINTENANCE] Remove xfails from passing tests in preparation for 0.15.21 release (#5908)
    • [MAINTENANCE] Reset globals modified in tests (#5936)
    • [MAINTENANCE] Run comprehensive tests in a random order (#5942)
    • [MAINTENANCE] Run spark and onboarding data assistant test in their own jobs. (#5951)
    • [MAINTENANCE] Unit tests for ConfigurationStore (#5948)
    • [MAINTENANCE] Unit tests for ValidationGraph and related classes (#5954)
    • [MAINTENANCE] Unit tests for data_context/store (#5923)
    • [MAINTENANCE] Update to OnboardingDataAssistant Notebook - Sql (#5939)
    • [MAINTENANCE] Use DataContext to ignore progress bars (#5959)
    • [MAINTENANCE] Use datasource config in add_datasource support methods (#5901)
    • [MAINTENANCE] Cleanup to allow docker test target to run tests in random order (#5915)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.21(Sep 1, 2022)

    • [FEATURE] Add include_rendered_content to get_expectation_suite and get_validation_result (#5853)
    • [FEATURE] Add tags as an optional setting for the OpsGenieAlertAction (#5855) (thanks @stevewb1993)
    • [BUGFIX] Ensure that delete_expectation_suite returns proper boolean result (#5878)
    • [BUGFIX] many small bugfixes (#5881)
    • [BUGFIX] Fix typo in default value of "ignore_row_if" kwarg for MulticolumnMapExpectation (#5860) (thanks @mkopec87)
    • [BUGFIX] Patch issue with checkpoint_identifier within Checkpoint.run workflow (#5894)
    • [BUGFIX] Ensure that DataContext.add_checkpoint() updates existing objects in GX Cloud (#5895)
    • [DOCS] DOC-364 how to configure a spark datasource (#5840)
    • [MAINTENANCE] Unit Tests Pipeline step (#5838)
    • [MAINTENANCE] Unit tests to ensure coverage over Datasource caching in DataContext (#5839)
    • [MAINTENANCE] Add entries to release schedule (#5833)
    • [MAINTENANCE] Properly label DataAssistant tests with @pytest.mark.integration (#5845)
    • [MAINTENANCE] Add additional unit tests around Datasource caching (#5844)
    • [MAINTENANCE] Mark miscellaneous tests with @pytest.mark.unit (#5846)
    • [MAINTENANCE] datasource, data_context, core typing, lint fixes (#5824)
    • [MAINTENANCE] add --ignore-suppress and --ignore-only-for to build_gallery.py with bugfixes (#5802)
    • [MAINTENANCE] Remove pyparsing pin for <3.0 (#5849)
    • [MAINTENANCE] Finer type exclude (#5848)
    • [MAINTENANCE] use id instead id_ (#5775)
    • [MAINTENANCE] Add data connector names in datasource config (#5778)
    • [MAINTENANCE] init tests for dict and json serializers (#5854)
    • [MAINTENANCE] Remove Partitioning and Quantiles metrics computations from DateTime Rule of OnboardingDataAssistant (#5862)
    • [MAINTENANCE] Update ExpectationSuite CRUD on DataContext to recognize Cloud ids (#5836)
    • [MAINTENANCE] Handle Pandas warnings in Data Assistant plots (#5863)
    • [MAINTENANCE] Misc cleanup of test_expectation_suite_crud.py (#5868)
    • [MAINTENANCE] Remove vendored marshmallow__shade (#5866)
    • [MAINTENANCE] don't force using the stand alone mock (#5871)
    • [MAINTENANCE] Update expectation_gallery pipeline (#5874)
    • [MAINTENANCE] run unit-tests on a target package (#5869)
    • [MAINTENANCE] add pytest-timeout (#5857)
    • [MAINTENANCE] Label tests in tests/core with @pytest.mark.unit and @pytest.mark.integration (#5879)
    • [MAINTENANCE] new invoke test flags (#5880)
    • [MAINTENANCE] JSON Serialize RowCondition and MetricBundle computation result to enable IDDict.to_id() for SparkDFExecutionEngine (#5883)
    • [MAINTENANCE] increase the pytest-timeout timeout value during unit-testing step (#5884)
    • [MAINTENANCE] Add @pytest.mark.slow throughout test suite (#5882)
    • [MAINTENANCE] Add test_expectation_suite_send_usage_message (#5886)
    • [MAINTENANCE] Mark existing tests as unit or integration (#5890)
    • [MAINTENANCE] Convert integration tests to unit (#5891)
    • [MAINTENANCE] Update distinct metric dependencies and implementations (#5811)
    • [MAINTENANCE] Add slow pytest marker to config and sort them alphabetically. (#5892)
    • [MAINTENANCE] Adding serialization tests for Spark (#5897)
    • [MAINTENANCE] Improve existing expectation suite unit tests (phase 1) (#5898)
    • [MAINTENANCE] SqlAlchemyExecutionEngine case for SQL Alchemy Select and TextualSelect due to SADeprecationWarning (#5902)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.20(Aug 25, 2022)

    • [FEATURE] query.pair_column Metric (#5743)
    • [FEATURE] Enhance execution time measurement utility, and save DomainBuilder execution time per Rule of Rule-Based Profiler (#5796)
    • [FEATURE] Support single-batch mode in MetricMultiBatchParameterBuilder (#5808)
    • [FEATURE] Inline ExpectationSuite Rendering (#5726)
    • [FEATURE] Better error for missing expectation (#5750) (thanks @tylertrussell)
    • [FEATURE] DataAssistants Example Notebook - Pandas (#5820)
    • [BUGFIX] Ensure name not persisted (#5813)
    • [DOCS] Change the selectable to a list (#5780) (thanks @itaise)
    • [DOCS] Fix how to create custom table expectation (#5807) (thanks @itaise)
    • [DOCS] DOC-363 how to configure a pandas datasource (#5779)
    • [MAINTENANCE] Remove xfail markers on cloud tests (#5793)
    • [MAINTENANCE] build-gallery enhancements (#5616)
    • [MAINTENANCE] Refactor save_profiler to remove explicit name and ge_cloud_id args (#5792)
    • [MAINTENANCE] Add v2_api flag for v2_api specific tests (#5803)
    • [MAINTENANCE] Clean up ge_cloud_id reference from DataContext ExpectationSuite CRUD (#5791)
    • [MAINTENANCE] Refactor convert_dictionary_to_parameter_node (#5805)
    • [MAINTENANCE] Remove ge_cloud_id from DataContext.add_profiler() signature (#5804)
    • [MAINTENANCE] Remove "copy.deepcopy()" calls from ValidationGraph (#5809)
    • [MAINTENANCE] Add vectorized is_between for common numpy dtypes (#5711)
    • [MAINTENANCE] Make partitioning directives of PartitionParameterBuilder configurable (#5810)
    • [MAINTENANCE] Write E2E Cloud test for RuleBasedProfiler creation and retrieval (#5815)
    • [MAINTENANCE] Change recursion to iteration for function in parameter_container.py (#5817)
    • [MAINTENANCE] add pytest-mock & pytest-icdiff plugins (#5819)
    • [MAINTENANCE] Surface cloud errors (#5797)
    • [MAINTENANCE] Clean up build_parameter_container_for_variables (#5823)
    • [MAINTENANCE] Bugfix/snowflake temp table schema name (#5814)
    • [MAINTENANCE] Update list_ methods on DataContext to emit names along with object ids (#5826)
    • [MAINTENANCE] xfail Cloud E2E tests due to schema issue with DataContextVariables (#5828)
    • [MAINTENANCE] Clean up xfails in preparation for 0.15.20 release (#5835)
    • [MAINTENANCE] Add back xfails for E2E Cloud tests that fail on env var retrieval in Docker (#5837)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.19(Aug 18, 2022)

    • [FEATURE] DataAssistantResult plot multiple metrics per expectation (#5556)
    • [FEATURE] Enable passing "exact_estimation" boolean at DataAssistant.run() level (default value is True) (#5744)
    • [FEATURE] Example notebook for Onboarding DataAssistant - postgres (#5776)
    • [BUGFIX] dir update for data_assistant_result (#5751)
    • [BUGFIX] Fix docs_integration pipeline (#5734)
    • [BUGFIX] Patch flaky E2E Cloud test with randomized suite names (#5752)
    • [BUGFIX] Fix RegexPatternStringParameterBuilder to use legal character repetition. Remove median, mean, and standard deviation features from OnboardingDataAssistant "datetime_columns_rule" definition. (#5757)
    • [BUGFIX] Move SuiteValidationResult.meta validation id propogation before ValidationOperator._run_action (#5760)
    • [BUGFIX] Update "column.partition" Metric to handle DateTime Arithmetic Properly (#5764)
    • [BUGFIX] JSON-serialize RowCondition and enable IDDict to support comparison operations (#5765)
    • [BUGFIX] Insure all estimators properly handle datetime-float conversion (#5774)
    • [BUGFIX] Return appropriate subquery type to Query Metrics for SA version (#5783)
    • [DOCS] added guide how to use gx with emr serverless (#5623) (thanks @bvolodarskiy)
    • [DOCS] DOC-362: how to choose between working with a single or multiple batches of data (#5745)
    • [MAINTENANCE] Temporarily xfail E2E Cloud tests due to Azure env var issues (#5787)
    • [MAINTENANCE] Add ids to DataConnectorConfig (#5740)
    • [MAINTENANCE] Rename GX Cloud "contract" resource to "checkpoint" (#5748)
    • [MAINTENANCE] Rename GX Cloud "suite_validation_result" resource to "validation_result" (#5749)
    • [MAINTENANCE] Store Refactor - cloud store return types & http-errors (#5730)
    • [MAINTENANCE] profile_numeric_columns_diff_expectation (#5741) (thanks @stevensecreti)
    • [MAINTENANCE] Clean up type hints around class constructors (#5738)
    • [MAINTENANCE] invoke docker (#5703)
    • [MAINTENANCE] Add plist to build docker test image daily. (#5754)
    • [MAINTENANCE] opt-out type-checking (#5713)
    • [MAINTENANCE] Enable Algolia UI (#5753)
    • [MAINTENANCE] Linting & initial typing for data context (#5756)
    • [MAINTENANCE] Update oneshot estimator to quantiles estimator (#5737)
    • [MAINTENANCE] Update Auto-Initializing Expectations to use exact estimator by default (#5759)
    • [MAINTENANCE] Send a Gx-Version header set to version in requests to cloud (#5758)
    • [MAINTENANCE] invoke docker --detach and more typing (#5770)
    • [MAINTENANCE] In ParameterBuilder implementations, enhance handling of numpy.ndarray metric values, whose elements are or can be converted into datetime.datetime type. (#5771)
    • [MAINTENANCE] Config/Schema round_tripping (#5697)
    • [MAINTENANCE] Add experimental label to MetricStore Doc (#5782)
    • [MAINTENANCE] Remove GeCloudIdentifier creation in Checkpoint.run() (#5784)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.18(Aug 11, 2022)

    • [FEATURE] Example notebooks for multi-batch Spark (#5683)
    • [FEATURE] Introduce top-level default_validation_id in CheckpointConfig (#5693)
    • [FEATURE] Pass down validation ids to ExpectationSuiteValidationResult.meta within Checkpoint.run() (#5725)
    • [FEATURE] Refactor data assistant runner to compute formal parameters for data assistant run method signatures (#5727)
    • [BUGFIX] Restored sqlite database for tests (#5742)
    • [BUGFIX] Fixing a typo in variable name for default profiler for auto-initializing expectation "expect_column_mean_to_be_between" (#5687)
    • [BUGFIX] Remove resource_type from call to StoreBackend.build_key (#5690)
    • [BUGFIX] Update how_to_use_great_expectations_in_aws_glue.md (#5685) (thanks @bvolodarskiy)
    • [BUGFIX] Updated how_to_use_great_expectations_in_aws_glue.md again (#5696) (thanks @bvolodarskiy)
    • [BUGFIX] Update how_to_use_great_expectations_in_aws_glue.md (#5722) (thanks @bvolodarskiy)
    • [BUGFIX] Update aws_glue_deployment_patterns.py (#5721) (thanks @bvolodarskiy)
    • [DOCS] added guide how to use great expectations with aws glue (#5536) (thanks @bvolodarskiy)
    • [DOCS] Document the ZenML integration for Great Expectations (#5672) (thanks @stefannica)
    • [DOCS] Converts broken ZenML md refs to Technical Tags (#5714)
    • [DOCS] How to create a Custom Query Expectation (#5460)
    • [MAINTENANCE] Pin makefun package to version range for support assurance (#5746)
    • [MAINTENANCE] s3 link for logo (#5731)
    • [MAINTENANCE] Assign resource_type in InlineStoreBackend constructor (#5671)
    • [MAINTENANCE] Add mysql client to Dockerfile.tests (#5681)
    • [MAINTENANCE] RuleBasedProfiler corner case configuration changes (#5631)
    • [MAINTENANCE] Update teams.yml (#5684)
    • [MAINTENANCE] Utilize e2e mark on E2E Cloud tests (#5691)
    • [MAINTENANCE] pyproject.tooml build-system typo (#5692)
    • [MAINTENANCE] expand flake8 coverage (#5676)
    • [MAINTENANCE] Ensure Cloud E2E tests are isolated to gx-cloud-e2e stage of CI (#5695)
    • [MAINTENANCE] Add usage stats and initial database docker tests to CI (#5682)
    • [MAINTENANCE] Add e2e mark to pyproject.toml (#5699)
    • [MAINTENANCE] Update docker readme to mount your repo over the builtin one. (#5701)
    • [MAINTENANCE] Combine packages rule_based_profiler and rule_based_profiler.types (#5680)
    • [MAINTENANCE] ExpectColumnValuesToBeInSetSparkOptimized (#5702)
    • [MAINTENANCE] expect_column_pair_values_to_have_difference_of_custom_perc… (#5661) (thanks @exteli)
    • [MAINTENANCE] Remove non-docker version of CI tests that are now running in docker. (#5700)
    • [MAINTENANCE] Add back integration mark to tests in test_datasource_crud.py (#5708)
    • [MAINTENANCE] DEVREL-2289/Stale/Triage (#5694)
    • [MAINTENANCE] revert expansive flake8 pre-commit checking - flake8 5.0.4 (#5706)
    • [MAINTENANCE] Bugfix for cloud-db-integration-pipeline (#5704)
    • [MAINTENANCE] Remove pytest-azurepipelines (#5716)
    • [MAINTENANCE] Remove deprecation warning from DataConnector-level batch_identifiers for RuntimeDataConnector (#5717)
    • [MAINTENANCE] Refactor AbstractConfig to make name and id_ consistent attrs (#5698)
    • [MAINTENANCE] Move CLI tests to docker (#5719)
    • [MAINTENANCE] Leverage DataContextVariables in DataContext hierarchy to automatically determine how to persist changes (#5715)
    • [MAINTENANCE] Refactor InMemoryStoreBackend out of store_backend.py (#5679)
    • [MAINTENANCE] Move compatibility matrix tests to docker (#5728)
    • [MAINTENANCE] Adds additional file extensions for Parquet assets (#5729)
    • [MAINTENANCE] MultiBatch SqlExample notebook Update. (#5718)
    • [MAINTENANCE] Introduce NumericRangeEstimator class hierarchy and encapsulate existing estimator implementations (#5735)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.17(Aug 4, 2022)

    • [FEATURE] Improve estimation histogram computation in NumericMetricRangeMultiBatchParameterBuilder to include both counts and bin edges (#5628)
    • [FEATURE] Enable retrieve by name for datasource with cloud store backend (#5640)
    • [FEATURE] Update DataContext.add_checkpoint() to ensure validations within CheckpointConfig contain ids (#5638)
    • [FEATURE] Add expect_column_values_to_be_valid_crc32 (#5580) (thanks @sp1thas)
    • [FEATURE] Enable showing expectation suite by domain and by expectation_type -- from DataAssistantResult (#5673)
    • [BUGFIX] Patch flaky E2E GX Cloud tests (#5629)
    • [BUGFIX] Pass --cloud flag to dgtest-cloud-overrides section of Azure YAML (#5632)
    • [BUGFIX] Remove datasource from config on delete (#5636)
    • [BUGFIX] Patch issue with usage stats sync not respecting usage stats opt-out (#5644)
    • [BUGFIX] SlackRenderer / EmailRenderer links to deprecated doc (#5648)
    • [BUGFIX] Fix table.head metric issue when using BQ without temp tables (#5630)
    • [BUGFIX] Quick bugfix on all profile numeric column diff bounds expectations (#5651) (thanks @stevensecreti)
    • [BUGFIX] Patch bug with id vs id_ in Cloud integration tests (#5677)
    • [DOCS] Fix a typo in batch_request_parameters variable (#5612) (thanks @StasDeep)
    • [MAINTENANCE] CloudDataContext add_datasource test (#5626)
    • [MAINTENANCE] Update stale.yml (#5602)
    • [MAINTENANCE] Add id to CheckpointValidationConfig (#5603)
    • [MAINTENANCE] Better error message for RuntimeDataConnector for BatchIdentifiers (#5635)
    • [MAINTENANCE] type-checking round 2 (#5576)
    • [MAINTENANCE] minor cleanup of old comments (#5641)
    • [MAINTENANCE] add --clear-cache flag for invoke type-check (#5639)
    • [MAINTENANCE] Install dgtest test runner utilizing Git URL in CI (#5645)
    • [MAINTENANCE] Make comparisons of aggregate values date aware (#5642)
    • [MAINTENANCE] Add E2E Cloud test for DataContext.add_checkpoint() (#5653)
    • [MAINTENANCE] Use docker to run tests in the Azure CI pipeline. (#5646)
    • [MAINTENANCE] add new invoke tasks to tasks.py and create new file usage_stats_utils.py (#5593)
    • [MAINTENANCE] Don't include 'test-pipeline' in extras_require dict (#5659)
    • [MAINTENANCE] move tool config to pyproject.toml (#5649)
    • [MAINTENANCE] Refactor docker test CI steps into jobs. (#5665)
    • [MAINTENANCE] Only run Cloud E2E tests in primary pipeline (#5670)
    • [MAINTENANCE] Improve DateTime Conversion Candling in Comparison Metrics & Expectations and Provide a Clean Object Model for Metrics Computation Bundling (#5656)
    • [MAINTENANCE] Ensure that id_ fields in Marshmallow schema serialize as id (#5660)
    • [MAINTENANCE] data_context initial type checking (#5662)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.16(Jul 29, 2022)

    • [FEATURE] Multi-Batch Example Notebook - SqlDataConnector examples (#5575)
    • [FEATURE] Implement "is_close()" for making equality comparisons "reasonably close" for each ExecutionEngine subclass (#5597)
    • [FEATURE] expect_profile_numeric_columns_percent_diff_(inclusive bounds) (#5586) (thanks @stevensecreti)
    • [FEATURE] DataConnector Query enabled for SimpleSqlDatasource (#5610)
    • [FEATURE] Implement the exact metric range estimate for NumericMetricRangeMultiBatchParameterBuilder (#5620)
    • [FEATURE] Ensure that id propogates from RuleBasedProfilerConfig to RuleBasedProfiler (#5617)
    • [BUGFIX] Pass cloud base url to datasource store (#5595)
    • [BUGFIX] Temporarily disable Trino 0.315.0 from requirements (#5606)
    • [BUGFIX] Update _create_trino_engine to check for schema before creating it (#5607)
    • [BUGFIX] Support ExpectationSuite CRUD at BaseDataContext level (#5604)
    • [BUGFIX] Update test due to change in postgres stdev calculation method (#5624)
    • [BUGFIX] Patch issue with get_validator on Cloud-backed DataContext (#5619)
    • [MAINTENANCE] Add name and id to DatasourceConfig (#5560)
    • [MAINTENANCE] Clear datasources in test_data_context_datasources to improve test performance and narrow test scope (#5588)
    • [MAINTENANCE] Fix tests that rely on guessing pytest generated random file paths. (#5589)
    • [MAINTENANCE] Do not set google cloud credentials for lifetime of pytest process. (#5592)
    • [MAINTENANCE] Misc updates to Datasource CRUD on DataContext to ensure consistent behavior (#5584)
    • [MAINTENANCE] Add id to RuleBasedProfiler config (#5590)
    • [MAINTENANCE] refactor to enable customization of quantile bias correction threshold for bootstrap estimation method (#5587)
    • [MAINTENANCE] Ensure that resource_type used in GeCloudStoreBackend is converted to GeCloudRESTResource enum as needed (#5601)
    • [MAINTENANCE] Create datasource with id (#5591)
    • [MAINTENANCE] Enable Azure blob storage integration tests (#5594)
    • [MAINTENANCE] Increase expectation kwarg line stroke width (#5608)
    • [MAINTENANCE] Added Algolia Scripts (#5544) (thanks @devanshdixit)
    • [MAINTENANCE] Handle numpy deprecation warnings (#5615)
    • [MAINTENANCE] remove approximate comparisons -- they will be replaced by estimator alternatives (#5618)
    • [MAINTENANCE] Making the dependency on dev-lite clearer (#5514)
    • [MAINTENANCE] Fix tests in tests/integration/profiling/rule_based_profiler/ and tests/render/renderer/ (#5611)
    • [MAINTENANCE] DataContext in cloud mode test add_datasource (#5625)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.15(Jul 21, 2022)

    • [FEATURE] Integrate DataContextVariables with DataContext (#5466)
    • [FEATURE] Add mostly to MulticolumnMapExpectation (#5481)
    • [FEATURE] [MAINTENANCE] Revamped expect_profile_numeric_columns_diff_between_exclusive_threshold_range (#5493) (thanks @stevensecreti)
    • [FEATURE] [CONTRIB] expect_profile_numeric_columns_diff_(less/greater)_than_or_equal_to_threshold (#5522) (thanks @stevensecreti)
    • [FEATURE] Provide methods for returning ExpectationConfiguration list grouped by expectation_type and by domain_type (#5532)
    • [FEATURE] add support for Azure authentication methods (#5229) (thanks @sdebruyn)
    • [FEATURE] Show grouped sorted expectations by Domain and by expectation_type (#5539)
    • [FEATURE] Categorical Rule in VolumeDataAssistant Should Use Same Cardinality As Categorical Rule in OnboardingDataAssistant (#5551)
    • [BUGFIX] Handle "division by zero" in "ColumnPartition" metric when all column values are NULL (#5507)
    • [BUGFIX] Use string dialect name if not found in enum (#5546)
    • [BUGFIX] Add try/except around DataContext._save_project_config to mitigate issues with permissions (#5550)
    • [BUGFIX] Explicitly pass in mostly as 1 if not set in configuration. (#5548)
    • [BUGFIX] Increase precision for categorical rule for fractional comparisons (#5552)
    • [DOCS] DOC-340 partition local installation guide (#5425)
    • [DOCS] Add DataHub Ingestion docs (#5330) (thanks @maggiehays)
    • [DOCS] toc update for DataHub integration doc (#5518)
    • [DOCS] Updating discourse to GitHub Discussions in Docs (#4953)
    • [MAINTENANCE] Clean up payload for /data-context-variables endpoint to adhere to desired chema (#5509)
    • [MAINTENANCE] DataContext Refactor: DataAssistants (#5472)
    • [MAINTENANCE] Ensure that validation operators are omitted from Cloud variables payload (#5510)
    • [MAINTENANCE] Add end-to-end tests for multicolumn map expectations (#5517)
    • [MAINTENANCE] Ensure that *_store_name attrs are omitted from Cloud variables payload (#5519)
    • [MAINTENANCE] Refactor key arg out of Store.serialize/deserialize (#5511)
    • [MAINTENANCE] Fix links to documentation (#5177) (thanks @andyjessen)
    • [MAINTENANCE] Readme Update (#4952)
    • [MAINTENANCE] E2E test for FileDataContextVariables (#5516)
    • [MAINTENANCE] Cleanup/refactor prerequisite for group/filter/sort Expectations by domain (#5523)
    • [MAINTENANCE] Refactor GeCloudStoreBackend to use PUT and DELETE HTTP verbs instead of PATCH (#5527)
    • [MAINTENANCE] /profiler Cloud endpoint support (#5499)
    • [MAINTENANCE] Add type hints to Store (#5529)
    • [MAINTENANCE] Move MetricDomainTypes to core (it is used more widely now than previously). (#5530)
    • [MAINTENANCE] Remove dependency pins on pyarrow and snowflake-connector-python (#5533)
    • [MAINTENANCE] use invoke for common contrib/dev tasks (#5506)
    • [MAINTENANCE] Add snowflake-connector-python dependency lower bound. (#5538)
    • [MAINTENANCE] enforce pre-commit in ci (#5526)
    • [MAINTENANCE] Providing more robust error handling for determining domain_type of an ExpectationConfiguration object (#5542)
    • [MAINTENANCE] Remove extra indentation from store backend test (#5545)
    • [MAINTENANCE] Plot-level dropdown for DataAssistantResult display charts (#5528)
    • [MAINTENANCE] Make DataAssistantResult.batch_id_to_batch_identifier_display_name_map private (in order to optimize auto-complete for ease of use) (#5549)
    • [MAINTENANCE] Initial Dockerfile for running tests and associated README. (#5541)
    • [MAINTENANCE] Other dialect test (#5547)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.13(Jul 7, 2022)

    • [FEATURE] Add atomic rendered_content to ExpectationValidationResult and ExpectationConfiguration (#5369)
    • [FEATURE] Add DataContext.update_datasource CRUD method (#5417)
    • [FEATURE] Refactor Splitter Testing Modules so as to Make them More General and Add Unit and Integration Tests for "split_on_whole_table" and "split_on_column_value" on SQLite and All Supported Major SQL Backends (#5430)
    • [FEATURE] Support underscore in the condition_value of a row_condition (#5393) (thanks @sp1thas)
    • [DOCS] DOC-322 update terminology to v3 (#5326)
    • [MAINTENANCE] Change property name of TaxiSplittingTestCase to make it more general (#5419)
    • [MAINTENANCE] Ensure that BaseDataContext does not persist Datasource changes by default (#5423)
    • [MAINTENANCE] Migration of project_config_with_variables_substituted to AbstractDataContext (#5385)
    • [MAINTENANCE] Improve type hinting in GeCloudStoreBackend (#5427)
    • [MAINTENANCE] Test serialization of text, table, and bulleted list rendered_content in ExpectationValidationResult (#5438)
    • [MAINTENANCE] Refactor datasource_name out of DataContext.update_datasource (#5440)
    • [MAINTENANCE] Add checkpoint name to validation results (#5442)
    • [MAINTENANCE] Remove checkpoint from top level of schema since it is captured in meta (#5445)
    • [MAINTENANCE] Add unit and integration tests for Splitting on Divided Integer (#5449)
    • [MAINTENANCE] Update cli with new default simple checkpoint name (#5450)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.12(Jun 30, 2022)

    • [FEATURE] Add Rule Statistics to DataAssistantResult for display in Jupyter notebook (#5368)
    • [FEATURE] Include detailed Rule Execution statistics in jupyter notebook "repr" style output (#5375)
    • [FEATURE] Support datetime/date-part splitters on Amazon Redshift (#5408)
    • [DOCS] Capital One DataProfiler Expectations README Update (#5365) (thanks @stevensecreti)
    • [DOCS] Add Trino guide (#5287)
    • [DOCS] DOC-339 remove redundant how-to guide (#5396)
    • [DOCS] Capital One Data Profiler README update (#5387) (thanks @taylorfturner)
    • [DOCS] Add sqlalchemy-redshfit to dependencies in redshift doc (#5386)
    • [MAINTENANCE] Reduce output amount in Jupyter notebooks when displaying DataAssistantResult (#5362)
    • [MAINTENANCE] Update linter thresholds (#5367)
    • [MAINTENANCE] Move _apply_global_config_overrides() to AbstractDataContext (#5285)
    • [MAINTENANCE] WIP: [MAINTENANCE] stalebot configuration (#5301)
    • [MAINTENANCE] expect_column_values_to_be_equal_to_or_greater_than_profile_min (#5372) (thanks @stevensecreti)
    • [MAINTENANCE] expect_column_values_to_be_equal_to_or_less_than_profile_max (#5380) (thanks @stevensecreti)
    • [MAINTENANCE] Replace string formatting with f-string (#5225) (thanks @andyjessen)
    • [MAINTENANCE] Fix links in docs (#5340) (thanks @andyjessen)
    • [MAINTENANCE] Caching of config_variables in DataContext (#5376)
    • [MAINTENANCE] StaleBot Half DryRun (#5390)
    • [MAINTENANCE] StaleBot DryRun 2 (#5391)
    • [MAINTENANCE] file extentions applied to rel links (#5399)
    • [MAINTENANCE] Allow installing jinja2 version 3.1.0 and higher (#5382)
    • [MAINTENANCE] expect_column_values_confidence_for_data_label_to_be_less_than_or_equal_to_threshold (#5392) (thanks @stevensecreti)
    • [MAINTENANCE] Add warnings to internal linters if actual error count does not match threshold (#5401)
    • [MAINTENANCE] Ensure that changes made to env vars / config vars are recognized within subsequent calls of the same process (#5410)
    • [MAINTENANCE] Stack RuleBasedProfiler progress bars for better user experience (#5400)
    • [MAINTENANCE] Keep all Pandas Splitter Tests in a Dedicated Module (#5411)
    • [MAINTENANCE] Refactor DataContextVariables to only persist state to Store using explicit save command (#5366)
    • [MAINTENANCE] Refactor to put tests for splitting and sampling into modules for respective ExecutionEngine implementation (#5412)
    Source code(tar.gz)
    Source code(zip)
  • 0.15.11(Jun 22, 2022)

    • [FEATURE] Enable NumericMetricRangeMultiBatchParameterBuilder to use evaluation dependencies (#5323)
    • [FEATURE] Improve Trino Support (#5261) (thanks @aezomz)
    • [FEATURE] added support to Aws Athena quantiles (#5114) (thanks @kuhnen)
    • [FEATURE] Implement the "column.standard_deviation" metric for sqlite database (#5338)
    • [FEATURE] Update add_datasource to leverage the DatasourceStore (#5334)
    • [FEATURE] Provide ability for DataAssistant to return its effective underlying BaseRuleBasedProfiler configuration (#5359)
    • [BUGFIX] Fix Netlify build issue that was being caused by entry in changelog (#5322)
    • [BUGFIX] Numpy dtype.float64 formatted floating point numbers must be converted to Python float for use in SQLAlchemy Boolean clauses (#5336)
    • [BUGFIX] Fix for failing Expectation test in cloud_db_integration pipeline (#5321)
    • [DOCS] revert getting started tutorial to RBP process (#5307)
    • [DOCS] mark onboarding assistant guide as experimental and update cli command (#5308)
    • [DOCS] Fix line numbers in getting started guide (#5324)
    • [DOCS] DOC-337 automate updates to the version information displayed in the getting started tutorial. (#5348)
    • [MAINTENANCE] Fix link in suite profile renderer (#5242) (thanks @andyjessen)
    • [MAINTENANCE] Refactor of _apply_global_config_overrides() method to return config (#5286)
    • [MAINTENANCE] Remove "json_serialize" directive from ParameterBuilder computations (#5320)
    • [MAINTENANCE] Misc cleanup post 0.15.10 release (#5325)
    • [MAINTENANCE] Standardize instantiation of NumericMetricRangeMultibatchParameterBuilder throughout the codebase. (#5327)
    • [MAINTENANCE] Reuse MetricMultiBatchParameterBuilder computation results as evaluation dependencies for performance enhancement (#5329)
    • [MAINTENANCE] clean up type declarations (#5331)
    • [MAINTENANCE] Maintenance/great 761/great 1010/great 1011/alexsherstinsky/rule based profiler/data assistant/include only essential public methods in data assistant dispatcher class 2022 06 21 177 (#5351)
    • [MAINTENANCE] Update release schedule JSON (#5349)
    • [MAINTENANCE] Include only essential public methods in DataAssistantResult class (and its descendants) (#5360)
    Source code(tar.gz)
    Source code(zip)
💻An open-source eBook with 101 Linux commands that everyone should know

This is an open-source eBook with 101 Linux commands that everyone should know. No matter if you are a DevOps/SysOps engineer, developer, or just a Linux enthusiast, you will most likely have to use the terminal at some point in your career.

Ashfaque Ahmed 0 Oct 29, 2022
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Ram Seshadri 6 Oct 2, 2022
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

Maria Patterson 26 Nov 15, 2022
Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized

Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized; as this information is gathered, the marketing team can target the top keywords to get your company’s website higher on a results page.

Vibhav Kumar Dixit 2 Jul 18, 2022
Your Project with Great Documentation.

Read Latest Documentation - Browse GitHub Code Repository The only thing worse than documentation never written, is documentation written but never di

Timothy Edmund Crosley 809 Dec 28, 2022
MkDocs Plugin allowing your visitors to *File > Print > Save as PDF* the entire site.

mkdocs-print-site-plugin MkDocs plugin that adds a page to your site combining all pages, allowing your site visitors to File > Print > Save as PDF th

Tim Vink 67 Jan 4, 2023
Zero configuration Airflow plugin that let you manage your DAG files.

simple-dag-editor SimpleDagEditor is a zero configuration plugin for Apache Airflow. It provides a file managing interface that points to your dag_fol

null 30 Jul 20, 2022
Seamlessly integrate pydantic models in your Sphinx documentation.

Seamlessly integrate pydantic models in your Sphinx documentation.

Franz Wöllert 71 Dec 26, 2022
Automated Integration Testing and Live Documentation for your API

Automated Integration Testing and Live Documentation for your API

ScanAPI 1.3k Dec 30, 2022
Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang, 4 September 2021 (Online via Zoom).

Workshop Python UNSIKA 2021 Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperban

Eka Putra 20 Mar 24, 2022
A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

PYTHON WORLD 12.1k Jan 5, 2023
An interview engine for businesses, interview those who are actually qualified and are worth your time!

easyInterview V0.8B An interview engine for businesses, interview those who are actually qualified and are worth your time! Quick Overview You/the com

Vatsal Shukla 1 Nov 19, 2021
Manage your WordPress installation directly from SublimeText SideBar and Command Palette.

WordpressPluginManager Manage your WordPress installation directly from SublimeText SideBar and Command Palette. Installation Dependencies You will ne

Art-i desenvolvimento 1 Dec 14, 2021
This programm checks your knowlege about the capital of Japan

Introduction This programm checks your knowlege about the capital of Japan. Now, what does it actually do? After you run the programm you get asked wh

null 1 Dec 16, 2021
Show Rubygems description and annotate your code right from Sublime Text.

Gem Description for Sublime Text Show Rubygems description and annotate your code. Just mouse over your Gemfile's gem definitions to show the popup. s

Nando Vieira 2 Dec 19, 2022
Build AGNOS, the operating system for your comma three

agnos-builder This is the tool to build AGNOS, our Ubuntu based OS. AGNOS runs on the comma three devkit. NOTE: the edk2_tici and agnos-firmare submod

comma.ai 21 Dec 24, 2022
Mkdocs obsidian publish - Publish your obsidian vault through a python script

Mkdocs Obsidian Mkdocs Obsidian is an association between a python script and a

Mara 49 Jan 9, 2023
Valentine-with-Python - A Python program generates an animation of a heart with cool texts of your loved one

Valentine with Python Valentines with Python is a mini fun project I have coded.

Niraj Tiwari 4 Dec 31, 2022
A Python Package To Generate Strong Passwords For You in Your Projects.

shPassGenerator Version 1.0.6 Ready To Use Developed by Shervin Badanara (shervinbdndev) on Github Language and technologies used in This Project Work

Shervin 11 Dec 19, 2022