py_avro_schema¶
Generate Avro schemas for any Python (nested) dataclass or Pydantic model.
Usage¶
First, define a Python data structure like this:
>>> from typing import List
>>> import dataclasses
>>> @dataclasses.dataclass
... class Person:
... name: str = ""
...
>>> @dataclasses.dataclass
... class Ship:
... name: str = ""
... crew: List[Person] = dataclasses.field(default_factory=list)
...
Pydantic models can also be used instead of dataclasses. “Normal” Python classes cannot be used to construct Avro record schemas because there is no way to determine the attributes and their types from the class definition.
Then, generate the corresponding Avro schema like this:
>>> import py_avro_schema as pas
>>> pas.generate(Ship, namespace="my_package")
b'{"type":"record","name":"Ship","fields":[{"name":"name","type":"string","default":""},...'
The options
argument supports the following Option
enum-values:
Option |
Description |
---|---|
INT_32 |
Use “int” schemas instead of “long” schemas |
FLOAT_32 |
Use “float” schemas instead of “double” schemas |
MILLISECONDS |
Use milliseconds instead of microseconds precision for (date)time schemas |
DEFAULTS_MANDATORY |
Mandate default values to be specified for all dataclass fields. This option may be used to enforce default values on Avro record fields to support schema evoluation/resolution. |
LOGICAL_JSON_STRING |
Model Dict[str, Any] fields as string schemas instead of byte schemas (with logical type “json”, to support JSON serialization inside Avro). |
NO_AUTO_NAMESPACE |
Do not populate namespaces automatically based on the package a Python class is defined in. |
AUTO_NAMESPACE_MODULE |
Automatically populate namespaces using full (dotted) module names instead of top-level package names. |
JSON_INDENT_2 |
Format JSON data using 2 spaces indentation |
JSON_SORT_KEYS |
Sort keys in JSON data |
JSON_APPEND_NEWLINE |
Append a newline character at the end of the JSON data |
- class py_avro_schema.DecimalType¶
Bases:
object
A decimal type for type annotations including hints for precision and scale
Example
>>> import decimal >>> my_decimal: DecimalType[4, 2] = decimal.Decimal("12.34")
Here, the subscript
(4, 2)
refers to the precision and scale of decimal numbers.
- class py_avro_schema.Option(value)¶
Bases:
Flag
Schema generation options
- AUTO_NAMESPACE_MODULE = 131072¶
- DEFAULTS_MANDATORY = 16384¶
- FLOAT_32 = 4096¶
- INT_32 = 2048¶
- JSON_APPEND_NEWLINE = 1024¶
- JSON_INDENT_2 = 1¶
- JSON_SORT_KEYS = 32¶
- LOGICAL_JSON_STRING = 32768¶
- MILLISECONDS = 8192¶
- NO_AUTO_NAMESPACE = 65536¶
- NO_DOC = 262144¶
- py_avro_schema.generate(py_type: ~typing.Type, *, namespace: ~typing.Optional[str] = None, options: ~py_avro_schema._schemas.Option = Option.None) bytes ¶
Return an Avro schema as a JSON-formatted bytestring for a given Python class or instance
This function is cached and can be called repeatedly with the same arguments without any performance penalty.
- Parameters:
py_type – The Python class to generate a schema for.
namespace – The Avro namespace to add to schemas.
options – Schema generation options, specify multiple values like this:
Option.INT_32 | Option.FLOAT_32
.