The micro-framework to create dataframes from functions.

Overview

Hamilton

The micro-framework to create dataframes from functions.

Specifically, Hamilton is a framework that allows for delayed executions of functions in a Directed Acyclic Graph (DAG). This is meant to solve the problem of creating complex data pipelines. Core to the design of Hamilton is a clear mapping of function name to implementation. That is, Hamilton forces a certain paradigm with writing functions, and aims for DAG clarity, easy modifications, unit testing, and documentation.

For the backstory on how Hamilton came about, see our blog post!.

Getting Started

Here's a quick getting started guide to get you up and running in less than 15 minutes.

Installation

Requirements:

  • Python 3.6 or 3.7

To get started, first you need to install hamilton. It is published to pypi under sf-hamilton:

pip install sf-hamilton

While it is installing we encourage you to start on the next section.

Note: the content (i.e. names, function bodies) of our example code snippets are for illustrative purposes only, and don't reflect what we actually do internally.

Hamilton in 15 minutes

Hamilton is a new paradigm when it comes to creating dataframes. Rather than thinking about manipulating a central dataframe, you instead think about the column(s) you want to create, and what inputs are required. There is no need for you to think about maintaining this dataframe, meaning you do not need to think about any "glue" code; this is all taken care of by the Hamilton framework.

For example rather than writing the following to manipulate a central dataframe object df:

df['col_c'] = df['col_a'] + df['col_b']

you write

def col_c(col_a: pd.Series, col_b: pd.Series) -> pd.Series:
    """Creating column c from summing column a and column b."""
    return col_a + col_b

In diagram form: example The Hamilton framework will then be able to build a DAG from this function definition.

So let's create a "Hello World" and start using Hamilton!

Your first hello world.

By now, you should have installed Hamilton, so let's write some code.

  1. Create a file my_functions.py and add the following functions:
pd.Series: """The cost per signup in relation to spend.""" return spend / signups ">
import pandas as pd

def avg_3wk_spend(spend: pd.Series) -> pd.Series:
    """Rolling 3 week average spend."""
    return spend.rolling(3).mean()

def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
    """The cost per signup in relation to spend."""
    return spend / signups

The astute observer will notice we have not defined spend or signups as functions. That is okay, this just means these need to be provided as input when we come to actually wanting to create a dataframe.

  1. Create a my_script.py which is where code will live to tell Hamilton what to do:
import importlib
import logging
import sys

import pandas as pd
from hamilton import driver

logger = logging.getLogger(__name__)
logging.basicConfig(stream=sys.stdout)
initial_columns = {  # load from actuals or wherever -- this is our initial data we use as input.
    'signups': pd.Series([1, 10, 50, 100, 200, 400]),
    'spend': pd.Series([10, 10, 20, 40, 40, 50]),
}
# we need to tell hamilton where to load function definitions from
module_name = 'my_functions'
module = importlib.import_module(module_name)
dr = driver.Driver(initial_columns, module)  # can pass in multiple modules
# we need to specify what we want in the final dataframe.
output_columns = [
    'spend',
    'signups',
    'avg_3wk_spend',
    'spend_per_signup',
]
# let's create the dataframe!
df = dr.execute(output_columns, display_graph=True)
print(df)
  1. Run my_script.py

python my_script.py

You should see the following output: hello_world_image

   spend  signups  avg_3wk_spend  spend_per_signup
0     10        1            NaN            10.000
1     10       10            NaN             1.000
2     20       50      13.333333             0.400
3     40      100      23.333333             0.400
4     40      200      33.333333             0.200
5     50      400      43.333333             0.125

Congratulations - you just created your first dataframe with Hamilton!

License

Hamilton is released under the BSD 3-Clause Clear License. If you need to get in touch about something, contact us at algorithms-opensource (at) stitchfix.com.

Contributing

We take contributions, large and small. We operate via a Code of Conduct and expect anyone contributing to do the same.

To see how you can contribute, please read our contributing guidelines and then our developer setup guide.

Prescribed Development Workflow

In general we prescribe the following:

  1. Ensure you understand Hamilton Basics.
  2. Familiarize yourself with some of the Hamilton decorators. They will help keep your code DRY.
  3. Start creating Hamilton Functions that represent your work. We suggest grouping them in modules where it makes sense.
  4. Write a simple script so that you can easily run things end to end.
  5. Join our discord community to chat/ask Qs/etc.

For the backstory on Hamilton we invite you to watch ~9 minute lightning talk on it that we gave at the apply conference: video, slides.

PyCharm Tips

If you're using Hamilton, it's likely that you'll need to migrate some code. Here are some useful tricks we found to speed up that process.

Live templates

Live templates are a cool feature and allow you to type in a name which expands into some code.

E.g. For example, we wrote one to make it quick to stub out Hamilton functions: typing graphfunc would turn into ->

def _(_: pd.Series) -> pd.Series:
   """""""
   return _

Where the blanks are where you can tab with the cursor and fill things in. See your pycharm preferences for setting this up.

Multiple Cursors

If you are doing a lot of repetitive work, one might consider multiple cursors. Multiple cursors allow you to do things on multiple lines at once.

To use it hit option + mouse click to create multiple cursors. Esc to revert back to a normal mode.

Contributors

  • Stefan Krawczyk (@skrawcz)
  • Elijah ben Izzy (@elijahbenizzy)
  • Danielle Quinn (@danfisher-sf)
  • Rachel Insoft (@rinsoft-sf)
  • Shelly Jang (@shellyjang)
  • Vincent Chu (@vslchusf)
Comments
  • Decorator to specify value and dag node inputs

    Decorator to specify value and dag node inputs

    Is your feature request related to a problem? Please describe. I am looking for a decorator to specify both value and dag node inputs. parameterized allows for value inputs, parameterized_inputs allows for dag node inputs, but there are no decorators that allow you to do both.

    Describe the solution you'd like A decorator that allows for both value and dag node inputs.

    Describe alternatives you've considered I tried just decorating a function with both decorators, but that is not supported.

    Additional context

    enhancement 
    opened by wangkev 17
  • [idea] can we make hamilton run in a distributed manner?

    [idea] can we make hamilton run in a distributed manner?

    What's the idea?

    Hamilton runs locally on data that fits in memory on a single core.

    Can we improve speed and data scale by making hamilton run in a parallel and distributed manner?

    Why we think it should be possible?

    Hamilton ultimately creates a DAG before executing. The idea would be to distribute various parts of this DAG.

    We'd have to figure out initial data loading, but other than that, it seems like we're solving something that other systems have already solved for us. Can we harness them? Perhaps by going from a Hamilton DAG and "compiling" it to the target system of choice?

    Ideas to explore

    enhancement product idea 
    opened by skrawcz 14
  • Add docker container(s) to help run examples

    Add docker container(s) to help run examples

    Is your feature request related to a problem? Please describe. The friction to getting the examples up and running is installing the dependencies. A docker container with them already provided would reduce friction for people to get started with Hamilton.

    Describe the solution you'd like

    1. A docker container, that has different python virtual environments, that has the dependencies to run the examples.
    2. The container has the hamilton repository checked out -- so it has the examples folder.
    3. Then using it would be:
    • docker pull image
    • docker start image
    • activate python virtual environment
    • run example

    Describe alternatives you've considered Not doing this.

    Additional context This was a request from a Hamilton talk.

    documentation good first issue help wanted 
    opened by skrawcz 13
  • Better error handling for errors in the execute() stage

    Better error handling for errors in the execute() stage

    Is your feature request related to a problem? Please describe. When my code runs into errors in the execute() stage, the traceback just goes to the execute() function, and doesn't provide any useful feedback about the provenance of the error. For example, if my DAG feeds a series into another series and they have incompatible indexes, I get an index error but I can't tell which series caused.

    Describe the solution you'd like Just passing back the names of the relevant series with the error would helpful.

    Describe alternatives you've considered More involved type checking when constructing the series functions and then when building the DAG could potentially eliminate some of the errors (probably not all though).

    enhancement 
    opened by ropeladder 11
  • ValueError with typing.Union type signature

    ValueError with typing.Union type signature

    hamilton raises a ValueError when using typing.Union, even if the type signature has the input type.

    Current behavior

    Error when using typing.Union types.

    Stack Traces

    ValueError: 1 errors encountered:
      Error: Type requirement mismatch. Expected x:typing.Union[int, pandas.core.series.Series] got 1 instead.
    

    Steps to replicate behavior

    run.py

    from hamilton.driver import Driver
    from hamilton.base import SimplePythonGraphAdapter, DictResult
    import lib
    
    initial_columns = {'x': 1}
    adapter = SimplePythonGraphAdapter(DictResult)
    dr = Driver(initial_columns, lib, adapter=adapter)
    print(dr.execute(['foo_union']))
    

    lib.py

    import typing as t
    import pandas as pd
    
    
    def foo_union(x: t.Union[int, pd.Series]) -> t.Union[int, pd.Series]:
        '''foo union'''
        return x + 1
    

    Library & System Information

    python 3.6, hamilton 1.9.0, tested on mac and linux.

    Expected behavior

    To be able to run dag.

    Additional context

    Everything works if there are functions that have typing.Any as type signature. For example, the above will run if we add foo_any to lib.py:

    import typing as t
    import pandas as pd
    
    
    def foo_union(x: t.Union[int, pd.Series]) -> t.Union[int, pd.Series]:
        '''foo union'''
        return x + 1
    
    
    def foo_any(x: t.Any) -> t.Any:
        '''foo any'''
        return x + 1
    
    enhancement 
    opened by wangkev 11
  • Data quality

    Data quality

    [Short description explaining the high-level reason for the pull request]

    Changes

    Testing

    Notes

    Checklist

    • [x] PR has an informative and human-readable title (this will be pulled into the release notes)
    • [x] Changes are limited to a single goal (no scope creep)
    • [x] Code can be automatically merged (no conflicts)
    • [x] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
    • [x] Passes all existing automated tests
    • [x] Any change in functionality is tested
    • [x] New functions are documented (with a description, list of inputs, and expected output)
    • [x] Placeholder code is flagged / future TODOs are captured in comments
    • [x] Project documentation has been updated if adding/changing functionality.
    • [x] Reviewers requested with the Reviewers tool :arrow_right:

    Testing checklist

    Python - local testing

    • [ ] python 3.6
    • [ ] python 3.7
    opened by elijahbenizzy 11
  • [RFC] consolidate CI code into shell scripts

    [RFC] consolidate CI code into shell scripts

    Is your feature request related to a problem? Please describe.

    All of this project's CI configuration currently lives in a YAML file, https://github.com/stitchfix/hamilton/blob/main/.circleci/config.yml.

    This introduces some friction to development in the following ways:

    • adding a new CI job involves adding a new block of YAML that is mostly the same as the others (see, for example, https://github.com/stitchfix/hamilton/commit/e2ad1367488f73d9d74fb9fd4f0c337f6713063b)
    • running tests locally involves copying and pasting commands from that YAML file
    • duplication of code across jobs makes it a bit more difficult to understand what is different between them

    Describe the solution you'd like

    I'd like to propose the following refactoring of this project's CI jobs:

    • put CI code into one or more shell scripts in a directory like .ci/
      • using environment variables to handle the fact that some jobs are slightly different from others (e.g. the dask jobs don't require installing ray)
    • change .circlci/config.yaml so that it runs those shell scripts
    • document in CONTRIBUTING.md how to run the tests in Docker locally, with commands that can just directly be copied and run by contributors, like this:
      •  docker run \
             --rm \
             -v $(pwd):/app \
             --workdir /app \
             --env BUILD_TYPE="dask" \
             -it circleci/python:3.7 \
              bash .ci/test.sh
        

    Describe alternatives you've considered

    Using a Makefile instead of shell scripts could also work for this purpose, but in my experience shell scripts are understood by a broader range of people and have fewer surprises around quoting, interpolation, and exit codes.

    Additional context

    A similar pattern to the one I'm proposing has been very very useful for us in LightGBM.

    Consider, for example, how many different job configurations the CI for that project's R package uses (https://github.com/microsoft/LightGBM/blob/3ad26a499614cf0af075ce4ea93b880bcc69b6bb/.github/workflows/r_package.yml) and how little repetition there is across jobs.

    If you all are interested in trying this out, I'd be happy to propose some PRs.

    Thanks for considering it!

    opened by jameslamb 9
  • Make graphviz optional

    Make graphviz optional

    Makes graphviz an optional dependency.

    Fixes #26.

    Additions

    • Helpful error message if graphviz isn't installed and the user tries to plot

    Removals

    • Dependency on graphviz

    Testing

    Not really sure how to test this.

    Surely there is a way to mock existence or non-existence of a package? Cursory googling did not reveal one.

    Todos

    • Test maybe? If we can figure out how

    Checklist

    I'm not sure I can do all of these things:

    • [x] PR has an informative and human-readable title
    • [x] Changes are limited to a single goal (no scope creep)
    • [x] Code can be automatically merged (no conflicts)
    • [ ] Code follows the standards laid out in the TODO link to standards
    • [x] Passes all existing automated tests
    • [ ] Any change in functionality is tested
    • [x] New functions are documented (with a description, list of inputs, and expected output)
    • [x] Placeholder code is flagged / future todos are captured in comments
    • [ ] Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
    • [ ] Reviewers requested with the Reviewers tool :arrow_right:

    Testing checklist

    Python

    • [ ] python 3.6
    • [ ] python 3.7
    opened by ivirshup 9
  • PandasDataFrameResult: Convert non-list values to single row frame

    PandasDataFrameResult: Convert non-list values to single row frame

    When trying to run an intermediate node which produces a scalar in the hello_world example, Hamilton throws an error:

    WARNING:hamilton.base:It appears no Pandas index type was detected. This will likely break when trying to create a DataFrame. E.g. are you requesting all scalar values? Use a different result builder or return at least one Pandas object with an index. Ignore this warning if you're using DASK for now.
    ERROR:hamilton.driver:-------------------------------------------------------------------
    Oh no an error! Need help with Hamilton?
    Join our slack and ask for help! https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg
    -------------------------------------------------------------------
    
    Traceback (most recent call last):
      File "my_script.py", line 29, in <module>
        df = dr.execute(output_columns)
      File "/Users/ian.hoffman/src/hamilton/examples/hello_world/.venv/lib/python3.8/site-packages/hamilton/driver.py", line 142, in execute
        raise e
      File "/Users/ian.hoffman/src/hamilton/examples/hello_world/.venv/lib/python3.8/site-packages/hamilton/driver.py", line 139, in execute
        return self.adapter.build_result(**outputs)
      File "/Users/ian.hoffman/src/hamilton/examples/hello_world/.venv/lib/python3.8/site-packages/hamilton/base.py", line 171, in build_result
        raise ValueError(f"Cannot build result. Cannot handle type {value}.")
    ValueError: Cannot build result. Cannot handle type 28.333333333333332.
    

    If we can run an entire DAG, it seems like we should be able to run any sub-DAG of the DAG.

    Changes

    Updates PandasDataFrameResult.build_result() to convert scalar values into dataframes.

    How I tested this

    Updated unit tests.

    Checklist

    • [X] PR has an informative and human-readable title (this will be pulled into the release notes)
    • [X] Changes are limited to a single goal (no scope creep)
    • [X] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
    • [X] Any change in functionality is tested
    • [X] New functions are documented (with a description, list of inputs, and expected output)
    • [X] Placeholder code is flagged / future TODOs are captured in comments
    • [X] Project documentation has been updated if adding/changing functionality.
    opened by ianhoffman 8
  • Prototype Data Quality Feature

    Prototype Data Quality Feature

    Is your feature request related to a problem? Please describe. When creating pipelines, data issues can silently wreak havoc; your code didn't change but the data did and now things are wonky...

    To combat that, there are projects like pandera that allow you to annotate functions with expectations, and at runtime, have those expectations checked and appropriately exposed.

    Hamilton, should have some support for runtime data quality checks, so that we can not only support clean code bases, but also clean data as well.

    Describe the solution you'd like We should prototype an approach where there is:

    1. A way to set expectations on the output of a function, what the data should like.
    2. Use those expectations either right after function execution, or on conclusion of a Hamilton DAG, or some other way.
    3. A way to specify what should happen when an expectation is not met -- e.g. log warnings, surface warnings, or stop execution.
    4. Thinking of a way to bootstrap these expectations from a dataset -- so that users can update/change expectations easily as time goes on.

    Directionally https://pandera.readthedocs.io/en/stable/ seems like a good first approach to try, i.e. via decorators.

    Describe alternatives you've considered This is something that as the prototype is being built out, we're thinking about alternatives considered too.

    Additional context Some ideas on approaches:

    • https://pandera.readthedocs.io/en/stable/
    • https://github.com/awslabs/deequ
    • https://greatexpectations.io/
    • https://github.com/whylabs/whylogs
    enhancement product idea 
    opened by skrawcz 8
  • [ci] add flake8

    [ci] add flake8

    Is your feature request related to a problem? Please describe.

    The project does not currently have any automatic protections against some classes of issue that can be detected by flake8, including:

    • unused imports
    • unused variables
    • duplicated test names (which leads to pytest skipping tests)
    • f-strings used on strings with no templating

    Describe the solution you'd like

    I believe this project should use flake8 for linting, and should store the configuration for that tool in a setup.cfg file (https://flake8.pycqa.org/en/latest/user/configuration.html).

    Adding this tool to the project's testing setup would reduce the effort required for pull request authors and reviewers to detect such issues, and would reduce the risk of code changes with such issues making it to main.

    Describe alternatives you've considered

    N/A

    Additional context

    Here is the result of running flake8 (ignoring style-only issues) on the current latest commit on main.

    flake8 \
        --ignore=E126,E128,E203,E241,E261,E302,E303,E402,E501,W503,W504,W605 \
        .
    
    ./hamilton/function_modifiers.py:50:13: F402 import 'node' from line 11 shadowed by loop variable
    ./hamilton/function_modifiers.py:169:30: F541 f-string is missing placeholders
    ./hamilton/function_modifiers.py:874:21: F541 f-string is missing placeholders
    ./hamilton/function_modifiers.py:878:34: F541 f-string is missing placeholders
    ./hamilton/graph.py:172:53: F821 undefined name 'graphviz'
    ./hamilton/graph.py:195:92: F821 undefined name 'networkx'
    ./hamilton/graph.py:316:13: F401 'graphviz' imported but unused
    ./hamilton/graph.py:451:17: F841 local variable 'e' is assigned to but never used
    ./hamilton/node.py:3:1: F401 'typing.Collection' imported but unused
    ./hamilton/driver.py:24:5: F811 redefinition of unused 'node' from line 13
    ./hamilton/data_quality/default_validators.py:227:30: F541 f-string is missing placeholders
    ./hamilton/data_quality/default_validators.py:394:9: F401 'pandera' imported but unused
    ./hamilton/data_quality/default_validators.py:396:21: F541 f-string is missing placeholders
    ./hamilton/data_quality/pandera_validators.py:1:1: F401 'typing.Any' imported but unused
    ./hamilton/data_quality/pandera_validators.py:21:16: F541 f-string is missing placeholders
    ./tests/test_function_modifiers.py:620:1: F811 redefinition of unused 'test_tags_invalid_value' from line 607
    ./tests/resources/bad_functions.py:5:1: F401 'tests.resources.only_import_me' imported but unused
    ./tests/resources/layered_decorators.py:1:1: F401 'hamilton.function_modifiers.tag' imported but unused
    ./tests/resources/cyclic_functions.py:5:1: F401 'tests.resources.only_import_me' imported but unused
    ./tests/resources/data_quality.py:1:1: F401 'numpy as np' imported but unused
    ./examples/ray/hello_world/run_rayworkflow.py:1:1: F401 'importlib' imported but unused
    ./graph_adapter_tests/h_ray/test_h_ray_workflow.py:1:1: F401 'tempfile' imported but unused
    

    If maintainers are interested, I'd be happy to put up a pull request proposing this.

    Thanks for your time and consideration.

    opened by jameslamb 7
  • Remove 3.6 Support

    Remove 3.6 Support

    Python 3.6 has been deemed End of Life (EOL) since December 2021.

    We should make moves to stop supporting it.

    Tasks

    • [ ] Remove 3.6 test support.
    • [ ] Remove 3.6 specific code.
    • [ ] Update setup.py to not list 3.6 and set minimum python version to 3.7.
    • [ ] Update any documentation & markdown files to remove reference to 3.6.
    • [ ] Create pull request with all the changes.
    • [ ] Determine timeline for a merge. With telemetry -- we should be able to confirm whether anyone is using Hamilton with python 3.6. If so then it should be easy to remove, if not, we'll then need to understand what's stopping people from moving to 3.7+. (See @skrawcz to know whether this is the case or not).
    good first issue repo hygiene 
    opened by skrawcz 0
  • Polars example

    Polars example

    Polars is gaining traction, we should have an example. This PR helps show an example that matches our hello world. It requires one adjust to the extract_columns decorator to function. Otherwise the user right now has to create their own build result function. Which seems fine for now, but it's something we could build a sf-hamilton-polars package for to house.

    Changes

    1. prototypes change to extract_columns to handle providing the dataframe types.
    2. adds example polars code.

    How I tested this

    1. Tested this locally.

    Notes

    Checklist

    • [ ] PR has an informative and human-readable title (this will be pulled into the release notes)
    • [ ] Changes are limited to a single goal (no scope creep)
    • [ ] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
    • [ ] Any change in functionality is tested
    • [ ] New functions are documented (with a description, list of inputs, and expected output)
    • [ ] Placeholder code is flagged / future TODOs are captured in comments
    • [ ] Project documentation has been updated if adding/changing functionality.
    opened by skrawcz 1
  • Parameterized extract - WIP, needs more testing

    Parameterized extract - WIP, needs more testing

    [Short description explaining the high-level reason for the pull request]

    Changes

    How I tested this

    Notes

    Checklist

    • [ ] PR has an informative and human-readable title (this will be pulled into the release notes)
    • [ ] Changes are limited to a single goal (no scope creep)
    • [ ] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
    • [ ] Any change in functionality is tested
    • [ ] New functions are documented (with a description, list of inputs, and expected output)
    • [ ] Placeholder code is flagged / future TODOs are captured in comments
    • [ ] Project documentation has been updated if adding/changing functionality.
    opened by elijahbenizzy 0
  • Clarify behavior of  decorator ordering

    Clarify behavior of decorator ordering

    We need to make clear our philosophy and resolution method for functions such as:

    @extract_fields({'out_value1': int, 'out_value2': str})
    @tag(test_key="test-value")
    @check_output(data_type=dict, importance="fail")
    @does(_dummy)
    def uber_decorated_function(in_value1: int, in_value2: str) -> dict:
        pass
    

    Right now it is not clear, nor obvious.

    Current behavior

    This is what the graph looks like:

    Screen Shot 2022-12-17 at 5 24 42 PM

    So it would be unexpected to see check_output over the output of extract_fields.

    Steps to replicate behavior

    Function code:

    def _dummy(**values) -> dict:
        return {f"out_{k.split('_')[1]}": v for k, v in values.items()}
    
    
    @extract_fields({'out_value1': int, 'out_value2': str})
    @tag(test_key="test-value")
    @check_output(data_type=dict, importance="fail")
    @does(_dummy)
    def uber_decorated_function(in_value1: int, in_value2: str) -> dict:
        pass
    

    Expected behavior

    check_output should probably operate over what's directly underneath that. tag similarly should apply to all? or just what's underneath? does should apply to uber_decorated_function extract_fields is the last thing that's applied?

    Additional context

    Thoughts: can we create a linter that reorders decorators?

    triage 
    opened by skrawcz 0
  • hamilton --init to get started

    hamilton --init to get started

    Is your feature request related to a problem? Please describe. New folks might want to get started in an existing repo. New DS/college students could use hamilton to get started on a simple modeling project...

    Describe the solution you'd like

    hamilton init
    # Creates a basic project structure with some functions + hamilton files
    
    hamilton init --project=hello_world 
    # Creates the hello_world example
    
    hamilton init --project=recomendations_stack
    # Creates the scaffolding for a rec-stack example
    
    hamilton init --project=web-service
    # Create[s the scaffolding for a flask app
    
    hamilton init kaggle --kaggle-competition=...
    # Maybe we could create a template from a kaggle competition?
    

    Additional context Messing around with dbt and they have this

    enhancement onboarding 
    opened by elijahbenizzy 0
Releases(sf-hamilton-1.13.0)
  • sf-hamilton-1.13.0(Jan 2, 2023)

    What's Changed

    • Updates bug hunters by @skrawcz in https://github.com/stitchfix/hamilton/pull/261
    • Fixes reusing_functions example by @skrawcz in https://github.com/stitchfix/hamilton/pull/262
    • Adds telemetry by @skrawcz in https://github.com/stitchfix/hamilton/pull/255
    • Bumps version to 1.13.0 by @skrawcz in https://github.com/stitchfix/hamilton/pull/264

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.12.0...sf-hamilton-1.13.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.12.0(Dec 27, 2022)

    What's Changed

    • [ci] remove 'test_suite' in setup.py by @jameslamb in https://github.com/stitchfix/hamilton/pull/257
    • subdag modifier + tag_outputs + refactor by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/237
    • Updates hamilton version to 1.12.0 by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/258

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.11.1...sf-hamilton-1.12.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.11.1(Dec 20, 2022)

    What's Changed

    • Adds Alaa Abedrabbo to bug hunters list by @skrawcz in https://github.com/stitchfix/hamilton/pull/215
    • Makes the fastapi async example more robust by @skrawcz in https://github.com/stitchfix/hamilton/pull/211
    • remove unnecessary code in tests by @jameslamb in https://github.com/stitchfix/hamilton/pull/221
    • fix a few flake8 warnings by @jameslamb in https://github.com/stitchfix/hamilton/pull/220
    • remove unused imports by @jameslamb in https://github.com/stitchfix/hamilton/pull/218
    • [ci] enforce flake8 (fixes #161) by @jameslamb in https://github.com/stitchfix/hamilton/pull/222
    • simplify pull request template by @jameslamb in https://github.com/stitchfix/hamilton/pull/219
    • [docs] restructure 'how to contribute' in developer guide by @jameslamb in https://github.com/stitchfix/hamilton/pull/223
    • [ci] move CI logic into shell scripts (fixes #114) by @jameslamb in https://github.com/stitchfix/hamilton/pull/225
    • Fix whitespace in readme for CI by @skrawcz in https://github.com/stitchfix/hamilton/pull/232
    • DBT + Hamilton Example by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/236
    • Fixes dbt integration to be much cleaner using FAL integration by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/239
    • dbt Hamilton example update requirements and readme by @datarshreya in https://github.com/stitchfix/hamilton/pull/241
    • PandasDataFrameResult: Convert non-list values to single row frame by @ianhoffman in https://github.com/stitchfix/hamilton/pull/243
    • Alternate fix for does by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/247
    • Adjusts index type check warnings by @skrawcz in https://github.com/stitchfix/hamilton/pull/246
    • Bumps version to 1.11.1 by @skrawcz in https://github.com/stitchfix/hamilton/pull/250

    New Contributors

    • @datarshreya made their first contribution in https://github.com/stitchfix/hamilton/pull/241
    • @ianhoffman made their first contribution in https://github.com/stitchfix/hamilton/pull/243

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.11.0...sf-hamilton-1.11.1

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.11.0(Oct 21, 2022)

    What's Changed

    • Black formatting by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/184
    • [ci] remove unnecessary f-strings by @jameslamb in https://github.com/stitchfix/hamilton/pull/190
    • Fixes ray workflow adapter to work with Ray 2.0 by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/189
    • Fixes for @does by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/187
    • fixes inconsistent parameterize documentation to use correct helper f… by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/192
    • Very minor spelling fix in README by @AAbedrabbo in https://github.com/stitchfix/hamilton/pull/202
    • Uses fixed instead of relative import for async example by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/204
    • Adding container image for running examples by @bovem in https://github.com/stitchfix/hamilton/pull/209
    • Adds more instructions for using and running docker by @skrawcz in https://github.com/stitchfix/hamilton/pull/210
    • Add pandas index checks by @skrawcz in https://github.com/stitchfix/hamilton/pull/200
    • Tweaks warning message from pandas index check by @skrawcz in https://github.com/stitchfix/hamilton/pull/213
    • Bump version to 1.11.0 by @skrawcz in https://github.com/stitchfix/hamilton/pull/212

    New Contributors

    • @AAbedrabbo made their first contribution in https://github.com/stitchfix/hamilton/pull/202
    • @bovem made their first contribution in https://github.com/stitchfix/hamilton/pull/209

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.10.0...sf-hamilton-1.11.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.10.0(Aug 20, 2022)

    What's Changed

    • Fixes DAG construction slowness by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/169
    • Async implementation of driver/adapter by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/171
    • Fixes mistaken superclass call by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/177
    • Full parametrized decorator by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/163
    • Adds Nullable validator by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/176
    • Adds instructions on how to push to Anaconda by @skrawcz in https://github.com/stitchfix/hamilton/pull/157
    • Adds union support check when passing in inputs by @skrawcz in https://github.com/stitchfix/hamilton/pull/173
    • Release for 1.10.0 by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/182

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.9.0...sf-hamilton-1.10.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.9.0(Jul 14, 2022)

    What's Changed

    • Data quality by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/115
    • Fixes whitespace that was failing circleci by @skrawcz in https://github.com/stitchfix/hamilton/pull/152
    • Bumps version to 1.9.0 by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/154

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.8.0...sf-hamilton-1.9.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.8.0(Jul 3, 2022)

    What's Changed

    • Adds the ability to pass functions not defined in modules - fixes #134 by @skrawcz in https://github.com/stitchfix/hamilton/pull/145

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.7.1...sf-hamilton-1.8.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.7.1(Jun 27, 2022)

    What's Changed

    • Fixes SimplePythonDataFrameGraphAdapter.check_input_type by @skrawcz in https://github.com/stitchfix/hamilton/pull/136
    • Fixes case where optional user inputs broke computation by @skrawcz in https://github.com/stitchfix/hamilton/pull/133
    • Switches documentation to point to Slack instead of Discord by @skrawcz in https://github.com/stitchfix/hamilton/pull/141

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.7.0...sf-hamilton-1.7.1

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.7.0(May 2, 2022)

    What's Changed

    • Adds Ray Workflow Graph Adapter - implements #67 by @skrawcz in https://github.com/stitchfix/hamilton/pull/108
    • Implements tags by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/83
    • Exposes passing kwargs to graphviz object by @skrawcz in https://github.com/stitchfix/hamilton/pull/125

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.6.0...sf-hamilton-1.7.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.6.0(Apr 12, 2022)

    What's Changed

    • remove unused loggers by @jameslamb in https://github.com/stitchfix/hamilton/pull/103
    • fix minor formatting and grammar issues in docs by @jameslamb in https://github.com/stitchfix/hamilton/pull/102
    • Adds parameterized_inputs by @skrawcz in https://github.com/stitchfix/hamilton/pull/104
    • Fixes parameterized_inputs typo by @skrawcz in https://github.com/stitchfix/hamilton/pull/109
    • Bumps version to 1.6.0 by @skrawcz in https://github.com/stitchfix/hamilton/pull/110

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.5.1...sf-hamilton-1.6.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.5.1(Apr 11, 2022)

    What's Changed

    • Adds our awesome bug-finders to the contributors list by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/99
    • Adds message about discord channel for help by @skrawcz in https://github.com/stitchfix/hamilton/pull/100
    • remove duplicate functools import by @jameslamb in https://github.com/stitchfix/hamilton/pull/101
    • Fixes whitespace in contributing.md by @skrawcz in https://github.com/stitchfix/hamilton/pull/105
    • Bumps version from 1.5.0 to 1.5.1 by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/106

    This also includes support for optionals, as the release was done non-standardly (off a local build).

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.5.0...sf-hamilton-1.5.1

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.5.0(Mar 25, 2022)

    What's Changed

    • Adds READMEs to examples by @skrawcz in https://github.com/stitchfix/hamilton/pull/73
    • Adds more links to the Discord channel by @skrawcz in https://github.com/stitchfix/hamilton/pull/74
    • Adds Christopher Prohm as a Contributor by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/71
    • remove unused imports by @jameslamb in https://github.com/stitchfix/hamilton/pull/76
    • Adds James Lamb as a contributor by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/78
    • fix typos in docs and comments by @jameslamb in https://github.com/stitchfix/hamilton/pull/80
    • remove dependency on pytest-runner by @jameslamb in https://github.com/stitchfix/hamilton/pull/81
    • remove dependency on pytest-assume by @jameslamb in https://github.com/stitchfix/hamilton/pull/82
    • Here's a numpy example from a numpy tutorial on doing AQI by @skrawcz in https://github.com/stitchfix/hamilton/pull/79
    • simplify pull request template by @jameslamb in https://github.com/stitchfix/hamilton/pull/90
    • remove support for 'python setup.py test' by @jameslamb in https://github.com/stitchfix/hamilton/pull/89
    • Extract-columns double-execution bug by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/93
    • Handle TypeVar types and enables most common generics support by @skrawcz in https://github.com/stitchfix/hamilton/pull/94
    • Fixes has_cycles behavior by @skrawcz in https://github.com/stitchfix/hamilton/pull/95
    • Bumps version to 1.5.0 by @skrawcz in https://github.com/stitchfix/hamilton/pull/97

    New Contributors

    • @jameslamb made their first contribution in https://github.com/stitchfix/hamilton/pull/76

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.4.0...sf-hamilton-1.5.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.4.0(Feb 10, 2022)

    What's Changed -- including 1.3.0 for posterity

    • Adds release methodology instructions by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/30
    • Adds documentation showing scalar creation & input by @skrawcz in https://github.com/stitchfix/hamilton/pull/31
    • Brings distributed execution and optional freedom from pandas by @skrawcz in https://github.com/stitchfix/hamilton/pull/47
    • Implements opinionated decorator lifecycle by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/28
    • Changes the way drivers handle parameters by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/56
    • Stefan/refactor visualization by @skrawcz in https://github.com/stitchfix/hamilton/pull/58
    • Renames columns to outputs in driver & ResultMixin build_result function by @skrawcz in https://github.com/stitchfix/hamilton/pull/61
    • 1.3.0 release by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/62
    • Replaces executor with adapter in places we missed by @skrawcz in https://github.com/stitchfix/hamilton/pull/63
    • Adds simple case to help motivate @extract_fields by @skrawcz in https://github.com/stitchfix/hamilton/pull/66
    • Bumps version 1.4.0 by @skrawcz in https://github.com/stitchfix/hamilton/pull/70

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.2.0...sf-hamilton-1.4.0

    Source code(tar.gz)
    Source code(zip)
  • sf-hamilton-1.2.0(Dec 14, 2021)

    What's Changed

    • Fixes setup.py to enable pushing to pypi by @skrawcz in https://github.com/stitchfix/hamilton/pull/14
    • Add release docs by @skrawcz in https://github.com/stitchfix/hamilton/pull/15
    • Adds link to blog post for hamilton by @elijahbenizzy in https://github.com/stitchfix/hamilton/pull/16
    • Adds issue templates by @skrawcz in https://github.com/stitchfix/hamilton/pull/20
    • Make graphviz optional by @ivirshup in https://github.com/stitchfix/hamilton/pull/27

    New Contributors

    • @ivirshup made their first contribution in https://github.com/stitchfix/hamilton/pull/27

    Full Changelog: https://github.com/stitchfix/hamilton/compare/sf-hamilton-1.1.1...sf-hamilton-1.2.0

    Source code(tar.gz)
    Source code(zip)
    sf-hamilton-1.2.0.tar.gz(32.43 KB)
    sf_hamilton-1.2.0-py3-none-any.whl(20.41 KB)
  • sf-hamilton-1.1.1(Oct 13, 2021)

    Patches

    • Touches up the package settings: stitchfix/hamilton#13
    • Open Source: stitchfix/hamilton#8
    • Parametrized inputs: stitchfix/hamilton#5
    Source code(tar.gz)
    Source code(zip)
Owner
Stitch Fix Technology
Engineering, Analytics, and Data Science at Stitch Fix
Stitch Fix Technology
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 7, 2023
Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

Luka Sosiashvili 1 Feb 16, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

WhiteBox 3 Oct 3, 2022
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

null 48 Dec 21, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
Very useful and necessary functions that simplify working with data

Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp

Alexander Goldian 2 Dec 2, 2021
Shot notebooks resuming the main functions of GeoPandas

Shot notebooks resuming the main functions of GeoPandas, 2 notebooks written as Exercises to apply these functions.

null 1 Jan 12, 2022
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023
A library to create multi-page Streamlit applications with ease.

A library to create multi-page Streamlit applications with ease.

Jackson Storm 107 Jan 4, 2023
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 9, 2023
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

Павел Максимов 18 Jul 6, 2022
PyChemia, Python Framework for Materials Discovery and Design

PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o

Materials Discovery Group 61 Oct 2, 2022