API Documentation

horkos

class horkos.Catalog(schemas: List[Union[str, dict, horkos._schemaomatic.Schema, TextIO]] = None)

A collection of schemas.

Parameters

schemas – The schemas to include in the catalog.

process(name: str, record: dict) → dict

Process a record against a named schema from the catalog.

Each field within the record will be cast to the type specified in the schema and the resulting value validated against the checks defined within the schema. If any of the fields cannot be successfully cast or any of the checks fail a RecordValidationError exception will be raised.

Parameters
  • name – The name of the schema to process the record against.

  • record – The record to process against the schema. This should be a dictionary mapping field names to field values.

Returns

The processed record with values cast and validated.

property schemas

All of the schemas contained within the catalog.

update(schema: Union[str, dict, horkos._schemaomatic.Schema, TextIO])

Update the catalog with the given schema.

Parameters

schema – The schema to add/update to the catalog.

class horkos.Field(name: str, type_: horkos.types.FieldType, description: str = None, required: bool = True, nullable: bool = False, checks: List[Callable] = None, labels: dict = None, derived: bool = False)

The definition of a field within a schema.

Variables
  • name (str) – The name of the field.

  • type (types.FieldType) – The type of the field.

  • description (str) – A detailed description of the field.

  • required (bool) – Whether the field is required to be present.

  • nullable (bool) – Whether the field should accept null values.

  • checks (typing.List[typing.Callable]) – A list of callables accepting a single value to validate.

  • labels (dict) – A space for unstructured information regarding the field.

  • derived (bool) – Whether the field is derived rather than directly declared. This is intended to enable a schema to both validate received data and fully document stored data. An example of a derived field would be a processed_at field declaring the time at which the record was validated. Clients need to implement their own logic to generate derived values.

class horkos.Schema(name: str = None, description: str = None, labels: dict = None, fields: List[horkos._fields.Field] = None)

A formal specification of a dataset. It defines all fields (or columns) a dataset is expected to have as well as the properties of each field.

Variables
  • name (str) – The name of the dataset defined by the schema.

  • description – A detailed description of the dataset defined by the schema.

  • labels (dict) – A dictionary of key value pairs containing organization specific structured information about the dataset. These values have no function within horkos.

  • fields (typing.List[horkos.Field]) – A list of fields expected within the dataset.

Parameters
  • name – The name of the dataset described by the schema. This is used to provide more descriptive error messages and to provide an identity while being used as part of a Catalog.

  • description – A detailed description of the dataset defined by the schema.

  • labels – A dictionary of minimally structured key value pairs intended for storing organization specific information relating to the dataset.

  • fields – A list of fields (or columns) that are expected within the dataset. Each field must have its own unique name.

process(record: dict, include_derived: bool = False) → dict

Process a record against the schema. This process includes:

  1. Confirming all required fields are present.

  2. Casting all field values to their expected type.

  3. Confirming there are no null values in non-nullable fields.

  4. Confirm all non-null values pass their field’s checks.

The cast and validated record is returned.

Parameters
  • record – A dictionary mapping field names to field values.

  • include_derived – Whether to include derived fields in the validation process.

Returns

The processed record with values cast and validated.

horkos.load_schema(schema: Union[str, dict, horkos._schemaomatic.Schema, TextIO], custom_checkers: Dict[str, Callable] = None) → horkos._schemaomatic.Schema

Load a schema from a file or existing schema object.

Parameters
  • schema – The schema to load. If a string is given, it is assumed to be a path to a .json or .yaml file defining the schema, alternatively a file handle can be passed directly. If a dictionary is given it will be assumed to be equivalent to the contents of a schema file.

  • custom_checkers – A dictionary of custom checkers. This dictionary should map the name of the check to a function that generates the check function. This can be used to extend checks beyond those defined within horkos.

Returns

A Schema object.

horkos.checks

class horkos.checks.BaseCheck

Base class for a check.

property args

The arguments that were used to construct the check.

class horkos.checks.Between(lower: Union[str, int, float], upper: Union[str, int, float])

Check for confirming that an incoming value is between two values

class horkos.checks.Choice(options: list = None)

Check for confirming that an incoming value is within a set of options.

class horkos.checks.Email

Check for confirming that an incoming value is a valid email.

class horkos.checks.IsoTimestamp

Check for confirming an incoming string is an iso timestamp.

class horkos.checks.JsonArrayString

Check for validating that a string is json deserializable and deserializes into an array.

class horkos.checks.JsonObjectString

Check for validating that an incoming string is json deserializable and deserializes into an object.

class horkos.checks.JsonString

Check for validating that an incoming string is json deserializable.

class horkos.checks.MaxLength(limit: int)

Check for the maximum length of an incoming string.

class horkos.checks.Maximum(limit: Union[str, int, float], inclusive: bool = True)

Check for confirming that an incoming value is less than the given limit.

class horkos.checks.Minimum(limit: Union[str, int, float], inclusive: bool = True)

Check for confirming that an incoming value is greater than the given limit.

class horkos.checks.Regex(regex: str = None, ignore_case=False)

Check for confirming that an incoming string matches a regex.

class horkos.checks.Uuid

Check for confirming that an incoming string is a valid uuid.

horkos.errors

exception horkos.errors.CastingError

Could not cast value.

exception horkos.errors.Error

Base error.

exception horkos.errors.RecordValidationError

A record failed to validate against the schema.

exception horkos.errors.SchemaValidationError

The given data schema failed to validate.

horkos.types

class horkos.types.Boolean

A value that is either true or false. boolean fields will cast strings that case insensitively match "true" or "false" exactly to True and False respectively. It will cast integers 0 and 1 to False and True respectively.

class horkos.types.FieldType

The base type for all field types.

classmethod cast(value: Any)

Cast the incoming value to the desired type.

class horkos.types.Float

A numerical value with decimal components. float fields will cast integers, as well as float strings and integer strings to float values.

class horkos.types.Integer

A whole number value. integer fields will cast integer strings to integers, they will also cast float values to integers if there is no decimal component. Float values with decimal components will fail to cast to integers.

class horkos.types.String

Any string value. horkos will only cast values that can be cast unambiguously to a string such as integers and floats. Other values that have a string encoding in python will not be cast to a string if there isn’t exactly one reasonable string encoding. Examples of this are booleans (since True, TRUE, and true are all equally reasonable) or dictionaries (since {'foo': 'bar'} and {"foo": "bar"} are equally reasonable).

horkos.inspector

class horkos.inspector.CheckMetadata(name: str, args: dict)

The name and args associated with a check.

property args

Alias for field number 1

property name

Alias for field number 0

horkos.inspector.get_check_metadata(check: Callable)horkos.inspector.CheckMetadata

Get the metadata associated with a given check.