Visions provides an extensible suite of tools to support common data analysis operations

Overview

Visions

JossPaper PyPiDownloadsBadge PyPiDownloadsMonthlyBadge PyPiVersionBadge PythonBadge BinderBadge

And these visions of data types, they kept us up past the dawn.

Visions provides an extensible suite of tools to support common data analysis operations including

  • type inference on unknown data
  • casting data types
  • automated data summarization

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/side-by-side.png

Documentation

Full documentation can be found here.

Installation

You can install visions via pip:

pip install visions

Alternatives and more details can be found in the documentation.

Supported frameworks

These frameworks are supported out-of-the-box in addition to native Python types:

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/frameworks.png

  • Numpy
  • Pandas
  • Spark

Contributing and support

Contributions to visions are welcome. For more information, please visit the Community contributions page. The the Github issues tracker is used for reporting bugs, feature requests and support questions.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

https://github.com/dylan-profiler/visions/raw/master/SIDNfonds.png

Comments
  • Numpy backend

    Numpy backend

    Summary

    This PR adds complete numpy backend support for the StandardSet of types. The type implementation is fully compatible with the pandas equivalent implementations with the exception of the object type.

    Caveats

    Whereas pandas provides support for Optional[int] and Optional[bool] numpy doesn't - in order to support those types by default I was forced to make object completely disjoint from any other concrete type. A similar story plays out for datetime objects with timezones which also default to object in numpy.

    opened by ieaves 8
  • API and Usage

    API and Usage

    (This is part of the review in openjournals/joss-reviews#2145)

    Hi @sbrugman,

    I am currently going through the package and I found it a very interesting project. The type inference that already exists in built-in Python is frequently not enough, and I often find myself writing my own functions for it on a case-by-case basis. So, it is nice to see that this is being done.

    However, I do find myself having quite a bit of trouble using the package effectively. Funnily enough, this is mostly caused by the (in my opinion) confusing naming schemes and the structure of the namespace. It may require a bit of effort to solve (not to mention that it might create incompatibilities with previous versions), but I believe that it will greatly improve the user experience when fixed.

    So, the main problem in my opinion is that almost all definitions are stored in their own subpackages in the visions.core subpackage, often separated as well over an implementations and a model subpackage. This means that in order to access a definition, let's say VisionsTypeset, I have to import this from visions.core.model.typeset, while pre-defined typesets I have to import from visions.core.implementations.typesets. In my opinion, it is incredibly confusing that these are stored in different subpackages/submodules, as they are related to the same thing, namely typesets, and I would expect to find all of these definitions in a visions.typesets subpackage. Preferably, all definitions the average user would use, should be available either at root (visions) or a single level deep (visions.xxx).

    I also noticed that almost all definitions have the word visions in their name. I get the feeling that the reason for this is to avoid namespace clashes when someone uses a wildcard import (from visions import *). However, as wildcard imports are heavily discouraged in the community, this leads to the user writing the word visions at least twice for every definition used (for example, using the visions_integer type requires me to write visions.visions_integer, which could be simplified to visions.integer or even visions.int).

    Finally, I am not entirely sure if this has to do with the online documentation being outdated as mentioned in #21, but according to the example here, a visions_standard_set object has a method called detect_type_series. In v0.2.3, this object neither has a method called detect_type_series nor type_detect_series (the name that the stand-alone function has in visions.core.functional), but instead it is called detect_series_type. If possible, could you check and make sure that the methods and stand-alone functions use consistent naming schemes?

    Please let me know if you have any questions.

    enhancement 
    opened by 1313e 7
  • Recommended stack overflow tag for questions

    Recommended stack overflow tag for questions

    I have a question about how to use the library. I considered opening an issue, but I see in the documentation that you recommend asking questions about how to use the package on Stack Overflow. Is there a tag that you'd suggest people use when asking questions there? I don't see anything with visions as a tag, but maybe I'm just the first person to ask a question over there.

    If you think visions would be a good tag choice it would make sense to update the stack overflow ask a question link to pre-populate the question with the tag (https://stackoverflow.com/questions/ask?tags=visions).

    Thanks!

    enhancement 
    opened by sterlinm 6
  • Automate building of documentation

    Automate building of documentation

    The documentation should rebuilt at every merge. The differences caused by the documentation convolute code reviews. The steps to build the documentation are simple and can be automated.

    Suggested solution via Github Actions (e.g.): https://github.com/marketplace/actions/sphinx-build https://github.com/ammaraskar/sphinx-action-test/blob/master/.github/workflows/default.yml

    enhancement 
    opened by sbrugman 6
  • Please push an updated version to pypi to correct dependency on attrs not attr

    Please push an updated version to pypi to correct dependency on attrs not attr

    Describe the bug visions uses the @attr.s decorator which comes from the attrs module, not the attr module. The master version of visions has the correct dependency, but the pypi versions do not.

    Additional context When using pandas_profiling which depends on visions, I got the following error:

    AttributeError: module 'attr' has no attribute 's'
    

    which led me to post this issue.

    bug 
    opened by proinsias 5
  • Version 0.7.5

    Version 0.7.5

    0.7.5 Includes:

    • Fixes to numpy backend for complex, object, email_address, URL, boolean
    • Support for new versions of pandas ABCIndex class (previously called ABCIndexClass)
    • Updated tests for numpy backend
    • Automated Github Actions unit tests on PR
    • Additional documentation
    opened by ieaves 4
  • fail to pass the test with 0.6.1 release

    fail to pass the test with 0.6.1 release

    Describe the bug The tests failed with 0.6.1 release. To Reproduce Steps to reproduce the behavior:

    python setup.py build
    PYTHONPATH=build/lib pytest -v
    

    Expected behavior pass all tests

    Additional context error log:

    =================================== FAILURES ===================================
    _____________________ test_contains[file_mixed_ext x File] _____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: file_mixed_ext in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x File] ________________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x Image] _______________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x File] ____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x Image] ___________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _____________ test_inference[file_mixed_ext x File expected True] ______________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected File to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _____________ test_inference[file_mixed_ext x Path expected False] _____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Image expected True] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Path expected False] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Image expected True] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Path expected False] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    =============================== warnings summary ===============================
    tests/test_root.py::test_multiple_roots
      /build/python-visions/src/visions-0.6.1/build/lib/visions/typesets/typeset.py:88: UserWarning: {Generic} were isolates in the type relation map and consequently orphaned. Please add some mapping to the orphaned nodes.
        warnings.warn(message)
    
    tests/test_summarization.py::test_complex_missing_summary
      /usr/lib/python3.8/site-packages/numpy/core/_methods.py:47: ComplexWarning: Casting complex values to real discards the imaginary part
        return umr_sum(a, axis, dtype, out, keepdims, initial, where)
    
    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    =========================== short test summary info ============================
    FAILED tests/typesets/test_complete_set.py::test_contains[file_mixed_ext x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x Image]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x Image]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x File expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Path expected False]
    ================= 11 failed, 8954 passed, 2 warnings in 15.94s =================
    

    see also the complete build log here

    bug 
    opened by hubutui 4
  • No module named 'shapely'

    No module named 'shapely'

    (This is part of the review in openjournals/joss-reviews#2145)

    When I try to execute the example given here, I get an error stating No module named 'shapely'. I see that this is a dependency of visions, but is only listed in the requirements_test.txt. You probably have to add this requirement to the requirements.txt as well.

    PS: Currently, the requirements of the package are both listed in their own separate file and in the setup.py file. To avoid confusion for yourself, it is probably better to only use either. You can read in a requirements file and use it in the setup.py file by using:

    # Get the requirements list
    with open('requirements.txt', 'r') as f:
        requirements = f.read().splitlines()
    

    Keep in mind that it is possible to link different requirements files together. For example, you can link requirements.txt and requirements_dev.txt together by adding the line -r requirements.txt to the top of requirements_dev.txt. This means that installing the requirements of requirements_dev.txt will use both files. This however won't work if you parse the file in a setup.py file. In that case, you can simply read both files and append them together if necessary.

    opened by 1313e 4
  • Bump pyarrow from 1.0.1 to 5.0.0

    Bump pyarrow from 1.0.1 to 5.0.0

    Bumps pyarrow from 1.0.1 to 5.0.0.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 3
  • all nulls should be inferred as generic

    all nulls should be inferred as generic

    i don't think this should be expected behavior.

    In [41]: infer_type(pd.DataFrame({'x':['', '']}), StandardSet())
    Out[41]: {'x': DateTime}
    
    In [39]: infer_type(pd.DataFrame({'x':[None, None]}), StandardSet())
    Out[39]: {'x': Boolean}
    
    bug 
    opened by majidaldo 3
  • comma separator

    comma separator

    New functionality

    • comma separator handling for string digits
    • new utility functionality for working with missing values

    Major Proposed Changes

    • Integer should be a strict subset of Float
    opened by ieaves 3
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi dylan-profiler/visions!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • Sktime semantic data types for time series & vision

    Sktime semantic data types for time series & vision

    I've recently been made aware of this excellent and imo much needed library by @lmmentel.

    The reason is its similarity to the datatypes module of sktime, which introduces semantic typing for time series related data types - we distinguish "mtypes" (machine representations) and "scitypes" (scientific types, what visions calls semantic type). More details here as reference.

    Few questions for visions devs:

    • time series are known to be a notoriously splintered field in terms of data representation, and even more when it comes to learning tasks (as in your ML example). Do you see visions moving in the direction of typing for ML?
    • would you have time to look into the sktime datatypes module and assess how similar this is to visions? If similar, we might be tempted to take a dependency on visions and contribute. Key features are mtype conversions, scitype inference, checks that also return metadata (e.g., number of time stamps in a series, which can be represented 4 different ways)
    enhancement 
    opened by fkiraly 7
  • src/visions/types/url.py passes non URLs

    src/visions/types/url.py passes non URLs

    src/visions/types/url.py does not correctly validate URLs.

    First, the example code (lines 14--19) from the docs do not return True:

    Python 3.9.4 (default, Apr  9 2021, 09:32:38)
    [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import visions
    >>> from urllib.parse import urlparse
    >>> urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> x = [urlparse(url) for url in urls]
    >>> x in visions.URL
    False
    >>> x
    [ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment=''), ParseResult(scheme='https', netloc='github.com', path='/pandas-profiling/pandas-profiling', params='', query='', fragment='')]
    >>> import pkg_resources
    >>> pkg_resources.get_distribution("visions").version
    '0.7.4'
    

    Second, non URLs are passing:

    >>> urlparse('junk') in visions.URL
    True
    >>>
    

    The code should probably check something like the following for each element of x:

        try:
            result = urlparse(x)
            return all([result.scheme, result.netloc])
        except:
            return False
    

    Finally, and this is a suggested enhancement, I think the behavior would be more useful if it handled raw strings and did the parsing internally without the caller having to supply a parser:

    urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> urls in visions.URL
    True
    
    bug 
    opened by leapingllamas 3
  • How to check if a type is/is_not parent of another type ?

    How to check if a type is/is_not parent of another type ?

    Follow the example of "Problem type inference".

    graph

    From one dataframe, I already make a list of type for each column. Here is the type_list:

    [Discrete,
     Nominal,
     Discrete,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Binary,
     Discrete,
     Discrete,
     Discrete,
     Nominal,
     Binary]
    

    type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

    Now, I want to check if each type either have parent type of Categorical or Numeric.

    for column, t in zip(column, type_list):
         if is_type_parent_of_categorical(t): 
                category_job(dataframe[column]) 
    
    # binary is child if Categorical
    is_type_parent_of_categorical(type_list[14]) -> True 
    
    # Discrete is child of Numeric 
    is_type_parent_of_categorical(type_list[0]) -> False 
    

    How should I implement is_type_parent_of_categorical ?

    My workaround seem to work because string comparision:

    def is_type_parent_of_categorical(visions_type):
            type_str = str(visions_type)
                if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                    return True
                return False
    
    enhancement 
    opened by ttpro1995 2
  • function: 'lowest' common type

    function: 'lowest' common type

    Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

    A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

    enhancement 
    opened by majidaldo 2
Releases(v0.7.5)
  • v0.7.5(Dec 5, 2021)

  • v0.7.4(Sep 27, 2021)

  • v0.7.2(Sep 27, 2021)

  • v0.7.1(Feb 4, 2021)

  • v0.7.0(Jan 5, 2021)

  • v0.6.4(Oct 17, 2020)

    ENH: swifter apply for pandas backend FIX: fix for issue #147 ENH: __version__ attribute made available ENH: improved typing and CI ENH: contrib types/typesets for a low-threshold contribution of types

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Oct 11, 2020)

    ENH: Expose state using typeset.detect and typeset.infer ENH: plotting of typesets improved FIX: fix and test cases for #136 CLN: pre-commit with black, isort, pyupgrade, flake8 ENH: type relations are now accessible by type (e.g. Float.relations[Integer])

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 22, 2020)

  • v0.5.1(Sep 22, 2020)

    • Introduce stateful type inference and casting
    • Expose test utils to users and fix diagnostic information
    • Integer consistency for the standard set
    • Use pd.BooleanDtype for newer versions of pandas
    • Latest black formatting
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 16, 2020)

    API breaking changes:

    • migration to single dispatch on typeset methods
    • updated API to unify detect / infer / cast against Series and DataFrames
    • improvements to boolean type
    Source code(tar.gz)
    Source code(zip)
  • v0.4.6(Jul 28, 2020)

  • v0.4.5(Jul 28, 2020)

  • v0.4.4(May 11, 2020)

  • v0.4.3(May 10, 2020)

  • v0.4.2(May 10, 2020)

    Support for Files and Images, rewritten summarization functions

    • Renamed ExistingPath to File
    • Renamed ImagePath to Image
    • Version bump to 0.4.2
    • Summaries: return series instead of dict
    • Categorical: unicode counts now based on original character distribution instead of unique characters which are used as intermediate step for increased performance.
    • Categorical: aggregate functions are included for string length (min, max, mean, median).
    • Path: number of unique values for the path parts are returned
    • Image: make Exif and Hash calculations optional. Also return width, height and area.
    • File: in addition to the file_size, return creation, modification and access time (which were already returned).
    Source code(tar.gz)
    Source code(zip)
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 2, 2023
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Toolchest 11 Jun 30, 2022
Universal data analysis tools for atmospheric sciences

U_analysis Universal data analysis tools for atmospheric sciences Script written in python 3. This file defines multiple functions that can be used fo

Luis Ackermann 1 Oct 10, 2021
Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

PyLHC 3 Apr 13, 2022
Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而

Lin Jian 3 Nov 16, 2021
Tools for the analysis, simulation, and presentation of Lorentz TEM data.

ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI

McMorran Lab 1 Dec 26, 2022
Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c

Meltano 625 Jan 2, 2023
Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

Andy Pham 1 Sep 3, 2022
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. ??️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
Common bioinformatics database construction

biodb Common bioinformatics database construction 1.taxonomy (Substance classification database) Download the database wget -c https://ftp.ncbi.nlm.ni

sy520 2 Jan 4, 2022
Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa

sammuhrai 4 Jul 29, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Coiled 102 Nov 10, 2022
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

Ryan McGeehan 3 Nov 4, 2022
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

GWpy 342 Jan 7, 2023
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

null 1 Nov 11, 2021
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022