Welcome to Horkos¶
Horkos is a library for validating data at the edges of data systems.
Quick Start¶
>>> import horkos
>>> from horkos import types
>>> from horkos import checks
>>> schema = horkos.Schema(fields=[
... horkos.Field(
... 'method',
... types.String,
... checks=[checks.Choice(['GET', 'POST', 'PUT', 'DELETE'])],
... ),
... horkos.Field('path', types.String),
... horkos.Field('response_code', types.Integer),
... ])
>>> schema.process({
... 'method': 'GET',
... 'path': '/my-settings',
... 'response_code': '200',
... })
{'method': 'GET', 'path': '/my-settings', 'response_code': 200}
>>> schema.process({
... 'method': 'NOT-AN-OPTION',
... 'path': '/my-settings',
... 'response_code': 200,
... })
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kjschiroo/gitlab/kjschiroo/horkos/horkos/_schemaomatic.py", line 142, in process
self._apply_checks(cast)
File "/home/kjschiroo/gitlab/kjschiroo/horkos/horkos/_schemaomatic.py", line 107, in _apply_checks
raise errors.RecordValidationError(msg)
horkos.errors.RecordValidationError: Checks failed: value of "NOT-AN-OPTION" for method did not pass choice check
Yaml Schemas¶
Horkos provides a yaml data documentation format that can be used to create a schema. This has the advantage of pairing documentation with functionality, making both more effective.
First declare your data schema in a yaml file. This documents the data as well as makes claims about its field types and properties.
# http_requests.yaml
name: http_requests
description: |
This is the http request event data set, it is all about
http requests we receive. Every time that the backend
server gets an http request it generates one of these
events.
labels:
retention: 4yr
fields:
path:
type: string
description: |
The path of the url that was hit. This will be
everything after the host portion of the url.
params:
type: string
nullable: true
checks:
- json
description: |
The parameters of the http request encoded as JSON.
If the method is a `GET` these come from the url
otherwise they are the JSON from the request body.
method:
type: string
checks:
- name: choice
args:
options:
- DELETE
- GET
- HEAD
- OPTIONS
- PATCH
- POST
- PUT
description: |
The http method of the request. This is expected
to be one of the standard http method strings.
response_code:
type: integer
description: The http response code of the request.
timestamp:
type: string
checks:
- iso_timestamp
description: |
The time at which the http request was received.
This file is loaded into a schema that can be used to process and validate records.
>>> import horkos
>>> schema = horkos.load_schema('http_requests.yaml')
>>> record = {
... 'path': '/my-settings',
... 'params': None,
... 'method': 'GET',
... 'response_code': 200,
... 'timestamp': '2020-10-25T02:22:16'
... }
>>> schema.process(record)
{'path': '/my-settings', 'params': None, 'method': 'GET', 'response_code': 200, 'timestamp': '2020-07-27T15:23:45'}
>>> record['timestamp'] = 'Sat Oct 24 2020 21:22:16 GMT-0500'
>>> schema.process(record)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/horkos/_schemaomatic.py", line 47, in process
f'Check errors - {", ".join(error_set)}'
horkos.errors.RecordValidationError: Check errors - value of "Sat Oct 24 2020 21:22:16 GMT-0500" in timestamp did not pass iso_timestamp check