py_avro_schema

Generate Avro schemas for any Python (nested) dataclass or Pydantic model.

Usage

First, define a Python data structure like this:

>>> from typing import List
>>> import dataclasses

>>> @dataclasses.dataclass
... class Person:
...     name: str = ""
...
>>> @dataclasses.dataclass
... class Ship:
...     name: str = ""
...     crew: List[Person] = dataclasses.field(default_factory=list)
...

Pydantic models can also be used instead of dataclasses. “Normal” Python classes cannot be used to construct Avro record schemas because there is no way to determine the attributes and their types from the class definition.

Then, generate the corresponding Avro schema like this:

>>> import py_avro_schema as pas
>>> pas.generate(Ship, namespace="my_package")
b'{"type":"record","name":"Ship","fields":[{"name":"name","type":"string","default":""},...'

The options argument supports the following Option enum-values:

Option

Description

INT_32

Use “int” schemas instead of “long” schemas

FLOAT_32

Use “float” schemas instead of “double” schemas

MILLISECONDS

Use milliseconds instead of microseconds precision for (date)time schemas

DEFAULTS_MANDATORY

Mandate default values to be specified for all dataclass fields. This option may be used to enforce default values on Avro record fields to support schema evoluation/resolution.

LOGICAL_JSON_STRING

Model Dict[str, Any] fields as string schemas instead of byte schemas (with logical type “json”, to support JSON serialization inside Avro).

NO_AUTO_NAMESPACE

Do not populate namespaces automatically based on the package a Python class is defined in.

AUTO_NAMESPACE_MODULE

Automatically populate namespaces using full (dotted) module names instead of top-level package names.

JSON_INDENT_2

Format JSON data using 2 spaces indentation

JSON_SORT_KEYS

Sort keys in JSON data

JSON_APPEND_NEWLINE

Append a newline character at the end of the JSON data

class py_avro_schema.DecimalType

Bases: object

A decimal type for type annotations including hints for precision and scale

Example

>>> import decimal
>>> my_decimal: DecimalType[4, 2] = decimal.Decimal("12.34")

Here, the subscript (4, 2) refers to the precision and scale of decimal numbers.

class py_avro_schema.Option(value)

Bases: Flag

Schema generation options

AUTO_NAMESPACE_MODULE = 131072
DEFAULTS_MANDATORY = 16384
FLOAT_32 = 4096
INT_32 = 2048
JSON_APPEND_NEWLINE = 1024
JSON_INDENT_2 = 1
JSON_SORT_KEYS = 32
LOGICAL_JSON_STRING = 32768
MILLISECONDS = 8192
NO_AUTO_NAMESPACE = 65536
NO_DOC = 262144
py_avro_schema.generate(py_type: ~typing.Type, *, namespace: ~typing.Optional[str] = None, options: ~py_avro_schema._schemas.Option = Option.None) bytes

Return an Avro schema as a JSON-formatted bytestring for a given Python class or instance

This function is cached and can be called repeatedly with the same arguments without any performance penalty.

Parameters:
  • py_type – The Python class to generate a schema for.

  • namespace – The Avro namespace to add to schemas.

  • options – Schema generation options, specify multiple values like this: Option.INT_32 | Option.FLOAT_32.