Type-safe YAML parser and validator.

Overview

StrictYAML

StrictYAML is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification.

Priorities:

  • Beautiful API
  • Refusing to parse the ugly, hard to read and insecure features of YAML like the Norway problem.
  • Strict validation of markup and straightforward type casting.
  • Clear, readable exceptions with code snippets and line numbers.
  • Acting as a near-drop in replacement for pyyaml, ruamel.yaml or poyo.
  • Ability to read in YAML, make changes and write it out again with comments preserved.
  • Not speed, currently.

Simple example:

# All about the character
name: Ford Prefect
age: 42
possessions:
- Towel
from strictyaml import load, Map, Str, Int, Seq, YAMLError

Default parse result:

>>> load(yaml_snippet)
YAML({'name': 'Ford Prefect', 'age': '42', 'possessions': ['Towel']})

All data is string, list or OrderedDict:

>>> load(yaml_snippet).data
{'name': 'Ford Prefect', 'age': '42', 'possessions': ['Towel']}

Quickstart with schema:

from strictyaml import load, Map, Str, Int, Seq, YAMLError

schema = Map({"name": Str(), "age": Int(), "possessions": Seq(Str())})

42 is now parsed as an integer:

>>> person = load(yaml_snippet, schema)
>>> person.data
{'name': 'Ford Prefect', 'age': 42, 'possessions': ['Towel']}

A YAMLError will be raised if there are syntactic problems, violations of your schema or use of disallowed YAML features:

# All about the character
name: Ford Prefect
age: 42

For example, a schema violation:

try:
    person = load(yaml_snippet, schema)
except YAMLError as error:
    print(error)
while parsing a mapping
  in "<unicode string>", line 1, column 1:
    # All about the character
     ^ (line: 1)
required key(s) 'possessions' not found
  in "<unicode string>", line 3, column 1:
    age: '42'
    ^ (line: 3)

If parsed correctly:

from strictyaml import load, Map, Str, Int, Seq, YAMLError, as_document

schema = Map({"name": Str(), "age": Int(), "possessions": Seq(Str())})

You can modify values and write out the YAML with comments preserved:

person = load(yaml_snippet, schema)
person['age'] = 43
print(person.as_yaml())
# All about the character
name: Ford Prefect
age: 43
possessions:
- Towel

As well as look up line numbers:

>>> person = load(yaml_snippet, schema)
>>> person['possessions'][0].start_line
5

And construct YAML documents from dicts or lists:

print(as_document({"x": 1}).as_yaml())
x: 1

Install

$ pip install strictyaml

Why StrictYAML?

There are a number of formats and approaches that can achieve more or less the same purpose as StrictYAML. I've tried to make it the best one. Below is a series of documented justifications:

Using StrictYAML

How to:

Compound validators:

Scalar validators:

Restrictions:

Design justifications

There are some design decisions in StrictYAML which are controversial and/or not obvious. Those are documented here:

Star Contributors

  • @wwoods
  • @chrisburr

Contributors

  • @eulores
  • @WaltWoods
  • @ChristopherGS
  • @gvx
  • @AlexandreDecan
  • @lots0logs
  • @tobbez
  • @jaredsampson
  • @BoboTIG

Contributing

  • Before writing any code, please read the tutorial on contributing to hitchdev libraries.
  • Before writing any code, if you're proposing a new feature, please raise it on github. If it's an existing feature / bug, please comment and briefly describe how you're going to implement it.
  • All code needs to come accompanied with a story that exercises it or a modification to an existing story. This is used both to test the code and build the documentation.
Comments
  • Implicit default value for optional key in mappings?

    Implicit default value for optional key in mappings?

    According to the optional keys docs and the sources there is no support for an implicit default value for an optional key. My use case: I potentially have to configure a very long list of key value mappings for my command line application. The majority of the optional-keys usually have a default value which is implicit but from application/user perspective reasonable, not surprising. I know implicit data may be bad. Anyway... what do you think?

    - mandatory-key: ...
      optional-key:
    - mandatory-key: ...
      optional-key:
    (... potentially very long list of other mappings ...)
    
    opened by fkromer 16
  • Bug: Or on deeper structures breaks

    Bug: Or on deeper structures breaks

    Hi, i found a bug when trying to "or" deeper nested structures:

    from strictyaml import load, Map, MapPattern, Str, Int
    
    yaml="""
    configuration: 
      test1:
        helptext: "some text"
      test2:
        helptext: "some other text."
        min: 0
    """
    
    schema = Map(
        {
            "configuration": MapPattern(
                Str(), 
                Map({
                    "helptext": Str(), 
                }) | Map({
                    "helptext": Str(), 
                    "min": Int(), 
                }),
                minimum_keys=1
            ),
        }
    )
    
    data = load(yaml, schema=schema)
    print(data.data)
    

    which results in

    strictyaml.exceptions.YAMLValidationError: when expecting an integer
    found arbitrary text
      in "<unicode string>", line 5, column 1:
            helptext: some other text.
        ^ (line: 5)
    

    the strange thing is, it "works" when themin validator is a Str() so i guess the validators are applied to the wrong content?

    opened by kinkerl 14
  • Implicit Typing in StrictYAML

    Implicit Typing in StrictYAML

    Consider this YAML:

    array:
      - string 1!
      - string: 2?
    

    Which parses into this:

    {'array': ['string 1!', {'string': '2?'}]}
    

    The value determines its type. You've eliminated this behavior for primitive types:

    python: 3.5.3
    postgres: 9.3
    

    Both are strings. But there is still implicit typing between string and map.

    An array element syntax like:

    array:
      -
      key1: value1
      key2: value2
      -
      key1: value1
      key2: value2
      - some: text
    

    would fix this behavior but then it wouldn't be a subset of YAML anymore 😄

    opened by vnc5 12
  • Revert

    Revert "Revert "BUG: issue #72. Now __setitem__ uses schema.""

    This is a replacement for https://github.com/crdoconnor/strictyaml/pull/75

    I merged and then reverted the original PR. This unmerged branch contains a revert of that revert.

    This code unfortunately causes story "Boolean (Bool)/Update boolean values with string and bool type" to fail (and possibly some others). @wwoods can you run the test and see what happened?

    Once all the tests pass - i.e. "hk regression" has no failures, I'll be happy to merge a new pull request.

    opened by crdoconnor 11
  • Support question: OpenAPI 3.0.x flow mapping workarounds

    Support question: OpenAPI 3.0.x flow mapping workarounds

    Thanks for the great library! I'm hoping to use this instead of pyyaml when parsing OpenAPI 3.0.x documents, but I'm running into an issue with the Security Requirements Object of a path item.

    From that link, emphasis mine:

    Each name MUST correspond to a security scheme which is declared in the Security Schemes under the Components Object. If the security scheme is of type "oauth2" or "openIdConnect", then the value is a list of scope names required for the execution. For other security scheme types, the array MUST be empty.

    This means I've got paths like this:

    paths:
      /users/{userId}/widgets/{widgetId}/revisions:
        post:
          operationId: createWidgetRevision
          security:
            - apiToken: []
    

    Which fails as expected with FLowMappingDisallowed:

    $ ipython
    Python 3.6.5 (default, Apr  1 2018, 15:30:28) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: from strictyaml import load
    
    In [2]: doc="""paths:
       ...:   /users/{userId}/widgets/{widgetId}/revisions:
       ...:     post:
       ...:       operationId: createWidgetRevision
       ...:       security:
       ...:         - apiToken: []"""
    
    In [3]: load(doc)
    FlowMappingDisallowed [traceback clipped]
    

    Any recommendations? There's also a commonly-used form for an unauthenticated method with a similar issue:

    paths:
      /items/{itemId}:
        put:
          operationId: updateItem
          security:
            - {}
    
    opened by numberoverzero 11
  • BUGFIX: Support None as Optional default (#62) , add default to Optional.__repr__

    BUGFIX: Support None as Optional default (#62) , add default to Optional.__repr__

    #62 mentions that None is not supported as default value in Optional. Fix this by adding a unique object and checking for identity when determining if the default value exists or not.

    Also add the default value to __repr__

    Btw: is there a timeline for when default values in Optionals will become stable? They haven't changed in the last two years.

    opened by ntova 10
  • Validation where document schema is superset of expected schema

    Validation where document schema is superset of expected schema

    I was trying to find out on the project site whether specifying a schema prevents the document from having additional fields.

    In other words, if a document's schema is a superset of the expected schema, is it considered valid?

    opened by nchammas 9
  • BUG: issue #72.  Now __setitem__ uses schema.

    BUG: issue #72. Now __setitem__ uses schema.

    Before this commit, schemas could be violated when assigning to Map/Sequence members. Now, modifications to the data must fit the data's schema.

    Furthermore, if the node on which setitem is called has a compound schema, the selected validator within the compound schema may change correctly.


    This fixes #72, #76, and supersedes #75. @crdoconnor I'm pretty happy with this in that it works on all tests without any changes, and supports the new test suite. There are two strange aspects which you may want to weigh in on:

    1. The _strictparsed option, which to me had no clear semantics, is overwritten with self. This works to preserve the hierarchy and required methods, but I'm not sure if this would break anything else. At any rate, again, all the tests pass, and it seems to behave sanely.

    2. I make no claims about optimality in terms of speed of this approach. Frankly, you'd probably need to rewrite a few of the internals if that were the chief concern of this library. That said, I did try to prevent multiple validations a few places, but there are likely more ways to invoke validations of already-validated content, which is harmless but could eventually lead to a performance problem.

    Please let me know if you do/do not merge this; I'm going ahead with my professional project assuming this (or a slight variant) will be accepted and rolled into a version that I'll be able to have clients pull down in the near future.

    opened by wwoods 9
  • Indent should be consistent

    Indent should be consistent

    • Feature request

    How many indent spaces should I use?

    I encounter many YAML files, some indent 4 spaces, some indent 2 spaces, and some indent 2 spaces before - list, and others not. Sometimes I got confused when they mixed:

    example:
      indent: 2 spaces
      items:
      - one
      - two
      others:
        - three
        - four
      inner:
          indent: 4 spaces
          items:
          - five
          - six
          others:
              - seven
              - eight
    
    opened by guyskk 9
  • FEATURE: augment constructed (loaded) document with location marks

    FEATURE: augment constructed (loaded) document with location marks

    strictyaml does schema based validation and reports exact text positions of erroneous input. However in addition to schema validation, the application may perform other checks which are impossible to express in terms of schema. It would be nice to be able to report to user exact text position of offending data. I was able to come up with a quick-and-dirty proof of concept (tested only on python 3.5) that uses type(name, bases, dict) to construct subclasses of built-in types (dict, list, int, str) that contain start_mark and end_mark of nodes producing the corresponding data. I create those sub-classed instances in construct_object. I can send you the above-mentioned prototype should you find it useful.

    opened by kshpytsya 9
  • StrictYAMLError should not inherit from YAMLError

    StrictYAMLError should not inherit from YAMLError

    Consider the following: https://github.com/crdoconnor/strictyaml/blob/master/strictyaml/exceptions.py#L4

    StrictYAMLError should not inherit from YAMLError, allowing people to make a clear distinction between errors that are raised by your library and errors that are raised by the underlying one.

    I know one could simply do:

    try:
      # ...
    except YAMLError:
      pass
    except StrictYAMLError:
      pass
    

    But (1) this implies that YAMLError is in the current scope (it is not even exposed by your library, but that's not the point), and (2) YAMLError are caught before StrictYAMLError.

    opened by AlexandreDecan 9
  • The type order of optional arguments affects the results

    The type order of optional arguments affects the results

    I used version: 1.6.1. The following program is correct:

    s = "mode: default"
    
    schema_map = {
        "mode": sy.Str(),
        sy.Optional("duration", default=None, drop_if_none=False): sy.EmptyNone()
        | sy.Float(),
    }
    
    config_yaml = sy.load(
        yaml_string=s,
        schema=sy.Map(schema_map),
    )
    

    But when I change the type order of optional duration like following:

    s = "mode: default"
    
    schema_map = {
        "mode": sy.Str(),
        sy.Optional("duration", default=None, drop_if_none=False): sy.Float()
        | sy.EmptyNone(),
    }
    
    config_yaml = sy.load(
        yaml_string=s,
        schema=sy.Map(schema_map),
    )
    

    then there would be an error, here is the error log:

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    ../vendor/lib/python3.10/site-packages/strictyaml/parser.py:323: in load
        return generic_load(yaml_string, schema=schema, label=label)
    ../vendor/lib/python3.10/site-packages/strictyaml/parser.py:301: in generic_load
        return schema(YAMLChunk(document, label=label))
    ../vendor/lib/python3.10/site-packages/strictyaml/validators.py:17: in __call__
        self.validate(chunk)
    ../vendor/lib/python3.10/site-packages/strictyaml/compound.py:180: in validate
        new_value = value_validator(
    ../vendor/lib/python3.10/site-packages/strictyaml/validators.py:107: in __call__
        result = self._validator_a(chunk)
    ../vendor/lib/python3.10/site-packages/strictyaml/scalar.py:27: in __call__
        return YAML(chunk, validator=self)
    ../vendor/lib/python3.10/site-packages/strictyaml/representation.py:63: in __init__
        self._value = validator.validate(value)
    ../vendor/lib/python3.10/site-packages/strictyaml/scalar.py:30: in validate
        return self.validate_scalar(chunk)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = Float()
    chunk = <strictyaml.yamllocation.YAMLChunk object at 0x7f64f2212aa0>
    
        def validate_scalar(self, chunk):
            val = chunk.contents
            if utils.is_infinity(val) or utils.is_not_a_number(val):
                val = val.replace(".", "")
            elif not utils.is_decimal(val):
                chunk.expecting_but_found("when expecting a float")
            # Only Python 3.6+ supports underscores in numeric literals
    >       return float(val.replace("_", ""))
    E       ValueError: could not convert string to float: ''
    
    ../vendor/lib/python3.10/site-packages/strictyaml/scalar.py:230: ValueError
    

    But if changing sy.Float() to sy.Int(), this error would not arise.

    opened by fouzhe 0
  • strictyaml does not act as

    strictyaml does not act as "a near-drop in replacement for pyyaml"

    I got started on moving to this, but now I discover your claim is not at all true - strictyaml does not offer even one function that is the same as pyyaml.


    The one facility that the two appear to share is load, except good pyyaml code is using safe_load - but in fact load is a "false friend", as I discovered when I put it in my code:

    >>> strictyaml.load('1') + 1
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'YAML' and 'int'
    

    Yes, I understand why you are doing this, so you can preserve comments, but why not use another name considering this is an entirely different software service that returns a radically different type?

    And there isn't a dump or a dumps like JSON and pretty well all other such systems have. Are we instead supposed to do as_document({"x": 1}).as_yaml()?


    How much work would it really have taken to create a load, dump/dumps so you'd be compatible with pyyaml, json, toml, and pretty well all other serialization tools in Python?

    Time is all humans have in their lives, and I just wasted 40 minutes of it because your description is the reverse of the truth.

    opened by rec 4
  • Website fails to acknowledge that the Norway problem was fixed in YAML 1.2

    Website fails to acknowledge that the Norway problem was fixed in YAML 1.2

    Your website includes a lot of criticism of YAML’s Norway problem (where NO would parse as the boolean false), but it neglects to mention that this behavior was already removed thirteen years ago in YAML 1.2, and even includes this incorrect statement:

    The most tragic aspect of this bug, however, is that it is intended behavior according to the YAML 1.2 specification.

    It was intended according to the YAML 1.1 specification, but in YAML 1.2, the only recognized booleans are true and false.

    Now, there are of course still arguments to be made—for example, some popular libraries still only support YAML 1.1. But these arguments should be made clearly and fairly.

    opened by andersk 0
  • Failed revalidation leads to inconsistant state

    Failed revalidation leads to inconsistant state

    Incomplete revalidation also causes a mismatch between the ruamel and strictyaml internal states in the YAMLChunk object. This is the same outcome as in #183, but the initial cause of the inconsistency is different, so I've listed this as a separate issue.

    An example of the error:

    from strictyaml import Int, Map, Str, as_document
    
    dat = {"a": 1, "b": "x"}
    schema1 = Map({"a": Int(), "b": Int()})
    schema2 = Map({"a": Int(), "b": Str()})
    
    # The state is consistent here
    yml._chunk.contents
    # ordereddict([('a', '1'), ('b', 'x')])
    yml._chunk.strictparsed()
    # ordereddict([(YAML(a), YAML(1)), (YAML(b), YAML(x))])
    
    yml.revalidate(schema1)
    # Fails (as it should)
    
    # But the state is now inconsistent
    yml._chunk.contents
    # ordereddict([('a', '1'), ('b', 'x')])
    yml._chunk.strictparsed()
    # ordereddict([(YAML(b), YAML(x)), (YAML(a), YAML(1))])
    

    The mismatch in the internal state means that revalidation with a correct schema will fail:

    yml.revalidate(schema2)
    
    ...
    strictyaml.exceptions.YAMLValidationError: when expecting an integer
    found arbitrary text
      in "<unicode string>", line 2, column 1:
        b: x
        ^ (line: 2)
    
    opened by aschankler 0
  • Repeated revalidate() raises Invalid state

    Repeated revalidate() raises Invalid state

    I've found some weird bug: calling revalidate() for the second time raises an Invalid state exception when optional fields with default values are present in the schema, but not in the YAML document. Here's a snippet to reproduce the problem.

    from strictyaml import load, Map, Int, Optional
    
    doc = "x: 1"
    
    # This works fine
    schema = Map({ "x": Int(), Optional("y"): Int() })
    yaml = load(doc, schema)
    try:
        for i in range(1,6):
            yaml.revalidate(schema)
            print(i, end=" ")
    except Exception as e:
        print(e)
    print("")
    
    # This raises an exception on the second iteration
    schema = Map({ "x": Int(), Optional("y", default=42): Int() })
    yaml = load(doc, schema)
    try:
        for i in range(1,6):
            yaml.revalidate(schema)
            print(i, end=" ")
    except Exception as e:
        print(e)
    

    The generated output:

    1 2 3 4 5 
    1 Invalid state
    

    It seems that the type of the field does not matter: I've got the same behavior for Str and Any.

    On the other hand, the exception is not raised when the YAML object is modified, either by removing the optional value or modifying another one i.e. no exception is raised by the following loops

    schema = Map({ "x": Int(), Optional("y", default=42): Int(), Optional("z", default=42): Int() })
    yaml = load(doc, schema)
    try:
        for i in range(1,6):
            yaml['y'] = 18
            del yaml['y']
            yaml.revalidate(schema)
    
        for i in range(1,6):
            yaml['x'] = i
            yaml.revalidate(schema)
    
    except Exception as e:
        print(e)
    

    It looks to me that revalidate() uses some internal values that are reset whenever the YAML object is modified, but not otherwise.

    opened by kputyra 1
Param: Make your Python code clearer and more reliable by declaring Parameters

Param Param is a library providing Parameters: Python attributes extended to have features such as type and range checking, dynamically generated valu

HoloViz 304 Jan 7, 2023
Lightweight data validation and adaptation Python library.

Valideer Lightweight data validation and adaptation library for Python. At a Glance: Supports both validation (check if a value is valid) and adaptati

Podio 258 Nov 22, 2022
Yamale (ya·ma·lē) - A schema and validator for YAML.

Yamale (ya·ma·lē) ⚠️ Ensure that your schema definitions come from internal or trusted sources. Yamale does not protect against intentionally maliciou

23andMe 534 Dec 21, 2022
A YAML validator for Programming Historian lessons.

phyaml A simple YAML validator for Programming Historian lessons. USAGE: python3 ph-lesson-yaml-validator.py lesson.md The script automatically detect

Riva Quiroga 1 Nov 7, 2021
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
Py-Parser est un parser de code python en python encore en plien dévlopement.

PY - PARSER Py-Parser est un parser de code python en python encore en plien dévlopement. Une fois achevé, il servira a de nombreux projets comme glad

pf4 3 Feb 21, 2022
Lua-parser-lark - An out-of-box Lua parser written in Lark

An out-of-box Lua parser written in Lark Such parser handles a relaxed version o

Taine Zhao 2 Jul 19, 2022
Lol qq parser - A League of Legends parser for QQ data

lol_qq_parser A League of Legends parser for QQ data Sources This package relies

Tolki 3 Jul 13, 2022
Discord bot-CTFD-Thread-Parser - Discord bot CTFD-Thread-Parser

Discord bot CTFD-Thread-Parser Description: This tools is used to create automat

null 15 Mar 22, 2022
Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

(intron I nterrogator and C lassifier) intronIC is a program that can be used to classify intron sequences as minor (U12-type) or major (U2-type), usi

Graham Larue 4 Jul 26, 2022
Tool for translation type comments to type annotations in Python

com2ann Tool for translation of type comments to type annotations in Python. The tool requires Python 3.8 to run. But the supported target code versio

Ivan Levkivskyi 123 Nov 12, 2022
API spec validator and OpenAPI document generator for Python web frameworks.

API spec validator and OpenAPI document generator for Python web frameworks.

1001001 249 Dec 22, 2022
Advanced Number Validator Using telnyx api

Number Validator Python v1.0.0 Number Validator Using telnyx api DISCLAIMER This Tool is only for educational purposes You'll be responsible yourself

xBlackxCoder 3 Sep 24, 2022
An application to see if your Ethereum staking validator(s) are members of the current or next post-Altair sync committees.

eth_sync_committee.py Since the Altair upgrade, 512 validators are randomly chosen every 256 epochs (~27 hours) to form a sync committee. Validators i

null 4 Oct 27, 2022
Yahoo Mail Validator For Python

Validator Validator helps to know if the mail is valid or not Installation Install The libraries pip install requests bs4 colorama Usage Create a new

Mr Python 3 Mar 12, 2022
Get informed when your DeFI Earn CRO Validator is jailed or changes the commission rate.

CRO-DeFi-Warner Intro CRO-DeFi-Warner can be used to notify you when a validator changes the commission rate or gets jailed. It can also notify you wh

null 5 May 16, 2022
Address Validator (Bitcoin & Monero)

The Bitcoin address is an identifier of 26-35 alphanumeric characters, beginning with the number 1, 3 or bc1. 0, O, I, l are removed to avoid visual a

null 0 Mar 29, 2022
A modular dynamical-systems model of Ethereum's validator economics.

CADLabs Ethereum Economic Model A modular dynamical-systems model of Ethereum's validator economics, based on the open-source Python library radCAD, a

CADLabs 104 Jan 3, 2023
Disposable email validator for python

disposable-email-validator installation pip install disposable-email-validator

null 1 Jan 5, 2022
A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax

1.0 Release See here for details about breaking changes with the upcoming 1.0 release: https://github.com/taverntesting/tavern/issues/495 Easier API t

null 909 Dec 15, 2022