Welcome to Horkos

Horkos is a library for validating data at the edges of data systems.

Installation

Horkos is on PyPI so simply run:

$ pip install horkos

Quick Start

>>> import horkos
>>> from horkos import types
>>> from horkos import checks
>>> schema = horkos.Schema(fields=[
...     horkos.Field(
...         'method',
...         types.String,
...         checks=[checks.Choice(['GET', 'POST', 'PUT', 'DELETE'])],
...     ),
...     horkos.Field('path', types.String),
...     horkos.Field('response_code', types.Integer),
... ])
>>> schema.process({
...     'method': 'GET',
...     'path': '/my-settings',
...     'response_code': '200',
... })
{'method': 'GET', 'path': '/my-settings', 'response_code': 200}
>>> schema.process({
...     'method': 'NOT-AN-OPTION',
...     'path': '/my-settings',
...     'response_code': 200,
... })
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kjschiroo/gitlab/kjschiroo/horkos/horkos/_schemaomatic.py", line 142, in process
    self._apply_checks(cast)
  File "/home/kjschiroo/gitlab/kjschiroo/horkos/horkos/_schemaomatic.py", line 107, in _apply_checks
    raise errors.RecordValidationError(msg)
horkos.errors.RecordValidationError: Checks failed: value of "NOT-AN-OPTION" for method did not pass choice check

Yaml Schemas

Horkos provides a yaml data documentation format that can be used to create a schema. This has the advantage of pairing documentation with functionality, making both more effective.

First declare your data schema in a yaml file. This documents the data as well as makes claims about its field types and properties.

# http_requests.yaml
name: http_requests
description: |
  This is the http request event data set, it is all about
  http requests we receive. Every time that the backend
  server gets an http request it generates one of these
  events.
labels:
  retention: 4yr
fields:
  path:
    type: string
    description: |
      The path of the url that was hit. This will be
      everything after the host portion of the url.
  params:
    type: string
    nullable: true
    checks:
    - json
    description: |
      The parameters of the http request encoded as JSON.
      If the method is a `GET` these come from the url
      otherwise they are the JSON from the request body.
  method:
    type: string
    checks:
    - name: choice
      args:
        options:
        - DELETE
        - GET
        - HEAD
        - OPTIONS
        - PATCH
        - POST
        - PUT
    description: |
      The http method of the request. This is expected
      to be one of the standard http method strings.
  response_code:
    type: integer
    description: The http response code of the request.
  timestamp:
    type: string
    checks:
    - iso_timestamp
    description: |
      The time at which the http request was received.

This file is loaded into a schema that can be used to process and validate records.

>>> import horkos
>>> schema = horkos.load_schema('http_requests.yaml')
>>> record = {
...     'path': '/my-settings',
...     'params': None,
...     'method': 'GET',
...     'response_code': 200,
...     'timestamp': '2020-10-25T02:22:16'
... }
>>> schema.process(record)
{'path': '/my-settings', 'params': None, 'method': 'GET', 'response_code': 200, 'timestamp': '2020-07-27T15:23:45'}
>>> record['timestamp'] = 'Sat Oct 24 2020 21:22:16 GMT-0500'
>>> schema.process(record)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/horkos/_schemaomatic.py", line 47, in process
    f'Check errors - {", ".join(error_set)}'
horkos.errors.RecordValidationError: Check errors - value of "Sat Oct 24 2020 21:22:16 GMT-0500" in timestamp did not pass iso_timestamp check

Indices and tables