API Documentation¶
horkos¶
-
class
horkos.Catalog(schemas: List[Union[str, dict, horkos._schemaomatic.Schema, TextIO]] = None)¶ A collection of schemas.
- Parameters
schemas – The schemas to include in the catalog.
-
process(name: str, record: dict) → dict¶ Process a record against a named schema from the catalog.
Each field within the record will be cast to the type specified in the schema and the resulting value validated against the checks defined within the schema. If any of the fields cannot be successfully cast or any of the checks fail a RecordValidationError exception will be raised.
- Parameters
name – The name of the schema to process the record against.
record – The record to process against the schema. This should be a dictionary mapping field names to field values.
- Returns
The processed record with values cast and validated.
-
property
schemas¶ All of the schemas contained within the catalog.
-
update(schema: Union[str, dict, horkos._schemaomatic.Schema, TextIO])¶ Update the catalog with the given schema.
- Parameters
schema – The schema to add/update to the catalog.
-
class
horkos.Field(name: str, type_: horkos.types.FieldType, description: str = None, required: bool = True, nullable: bool = False, checks: List[Callable] = None, labels: dict = None, derived: bool = False)¶ The definition of a field within a schema.
- Variables
name (str) – The name of the field.
type (types.FieldType) – The type of the field.
description (str) – A detailed description of the field.
required (bool) – Whether the field is required to be present.
nullable (bool) – Whether the field should accept null values.
checks (typing.List[typing.Callable]) – A list of callables accepting a single value to validate.
labels (dict) – A space for unstructured information regarding the field.
derived (bool) – Whether the field is derived rather than directly declared. This is intended to enable a schema to both validate received data and fully document stored data. An example of a derived field would be a processed_at field declaring the time at which the record was validated. Clients need to implement their own logic to generate derived values.
-
class
horkos.Schema(name: str = None, description: str = None, labels: dict = None, fields: List[horkos._fields.Field] = None)¶ A formal specification of a dataset. It defines all fields (or columns) a dataset is expected to have as well as the properties of each field.
- Variables
name (str) – The name of the dataset defined by the schema.
description – A detailed description of the dataset defined by the schema.
labels (dict) – A dictionary of key value pairs containing organization specific structured information about the dataset. These values have no function within horkos.
fields (typing.List[horkos.Field]) – A list of fields expected within the dataset.
- Parameters
name – The name of the dataset described by the schema. This is used to provide more descriptive error messages and to provide an identity while being used as part of a Catalog.
description – A detailed description of the dataset defined by the schema.
labels – A dictionary of minimally structured key value pairs intended for storing organization specific information relating to the dataset.
fields – A list of fields (or columns) that are expected within the dataset. Each field must have its own unique name.
-
process(record: dict, include_derived: bool = False) → dict¶ Process a record against the schema. This process includes:
Confirming all required fields are present.
Casting all field values to their expected type.
Confirming there are no null values in non-nullable fields.
Confirm all non-null values pass their field’s checks.
The cast and validated record is returned.
- Parameters
record – A dictionary mapping field names to field values.
include_derived – Whether to include derived fields in the validation process.
- Returns
The processed record with values cast and validated.
-
horkos.load_schema(schema: Union[str, dict, horkos._schemaomatic.Schema, TextIO], custom_checkers: Dict[str, Callable] = None) → horkos._schemaomatic.Schema¶ Load a schema from a file or existing schema object.
- Parameters
schema – The schema to load. If a string is given, it is assumed to be a path to a .json or .yaml file defining the schema, alternatively a file handle can be passed directly. If a dictionary is given it will be assumed to be equivalent to the contents of a schema file.
custom_checkers – A dictionary of custom checkers. This dictionary should map the name of the check to a function that generates the check function. This can be used to extend checks beyond those defined within horkos.
- Returns
A Schema object.
horkos.checks¶
-
class
horkos.checks.BaseCheck¶ Base class for a check.
-
property
args¶ The arguments that were used to construct the check.
-
property
-
class
horkos.checks.Between(lower: Union[str, int, float], upper: Union[str, int, float])¶ Check for confirming that an incoming value is between two values
-
class
horkos.checks.Choice(options: list = None)¶ Check for confirming that an incoming value is within a set of options.
-
class
horkos.checks.Email¶ Check for confirming that an incoming value is a valid email.
-
class
horkos.checks.IsoTimestamp¶ Check for confirming an incoming string is an iso timestamp.
-
class
horkos.checks.JsonArrayString¶ Check for validating that a string is json deserializable and deserializes into an array.
-
class
horkos.checks.JsonObjectString¶ Check for validating that an incoming string is json deserializable and deserializes into an object.
-
class
horkos.checks.JsonString¶ Check for validating that an incoming string is json deserializable.
-
class
horkos.checks.MaxLength(limit: int)¶ Check for the maximum length of an incoming string.
-
class
horkos.checks.Maximum(limit: Union[str, int, float], inclusive: bool = True)¶ Check for confirming that an incoming value is less than the given limit.
-
class
horkos.checks.Minimum(limit: Union[str, int, float], inclusive: bool = True)¶ Check for confirming that an incoming value is greater than the given limit.
-
class
horkos.checks.Regex(regex: str = None, ignore_case=False)¶ Check for confirming that an incoming string matches a regex.
-
class
horkos.checks.Uuid¶ Check for confirming that an incoming string is a valid uuid.
horkos.errors¶
-
exception
horkos.errors.CastingError¶ Could not cast value.
-
exception
horkos.errors.Error¶ Base error.
-
exception
horkos.errors.RecordValidationError¶ A record failed to validate against the schema.
-
exception
horkos.errors.SchemaValidationError¶ The given data schema failed to validate.
horkos.types¶
-
class
horkos.types.Boolean¶ A value that is either true or false.
booleanfields will cast strings that case insensitively match"true"or"false"exactly toTrueandFalserespectively. It will cast integers0and1toFalseandTruerespectively.
-
class
horkos.types.FieldType¶ The base type for all field types.
-
classmethod
cast(value: Any)¶ Cast the incoming value to the desired type.
-
classmethod
-
class
horkos.types.Float¶ A numerical value with decimal components.
floatfields will cast integers, as well as float strings and integer strings to float values.
-
class
horkos.types.Integer¶ A whole number value.
integerfields will cast integer strings to integers, they will also cast float values to integers if there is no decimal component. Float values with decimal components will fail to cast to integers.
-
class
horkos.types.String¶ Any string value.
horkoswill only cast values that can be cast unambiguously to a string such as integers and floats. Other values that have a string encoding in python will not be cast to a string if there isn’t exactly one reasonable string encoding. Examples of this are booleans (sinceTrue,TRUE, andtrueare all equally reasonable) or dictionaries (since{'foo': 'bar'}and{"foo": "bar"}are equally reasonable).
horkos.inspector¶
-
class
horkos.inspector.CheckMetadata(name: str, args: dict)¶ The name and args associated with a check.
-
property
args¶ Alias for field number 1
-
property
name¶ Alias for field number 0
-
property
-
horkos.inspector.get_check_metadata(check: Callable) → horkos.inspector.CheckMetadata¶ Get the metadata associated with a given check.