Tutorial
This tutorial shows you how to use py-avro-schema, step by step.
Define data types and structures
An example data structure could be defined like this in Python:
# File shipping/models.py
import dataclasses
@dataclasses.dataclass
class Ship:
"""A beautiful ship"""
name: str
year_launched: int
This defines a single type Ship
with 2 fields: name
(some text) and year_launched
(a number).
The type hints are essential and used by py-avro-schema to generate the Avro schema!
Generating the Avro schema
To represent this as a data type, we run the following commands (here we use an interactive Python shell):
>>> import py_avro_schema as pas
>>> import shipping.models
>>> pas.generate(shipping.models.Ship)
b'{"type":"record","name":"Ship","fields":[{"name":"name","type":"string"},{"name":"year_launched","type":"long"}],"namespace":"shipping","doc":"A beautiful ship"}'
The output is the Avro schema as a (binary) JSON string.
If we wanted to, we could format the JSON string a bit nicer:
>>> raw_json = pas.generate(Ship, options=pas.Option.JSON_INDENT_2)
>>> print(raw_json.decode())
{
"type": "record",
"name": "Ship",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "year_launched",
"type": "long"
}
],
"namespace": "shipping",
"doc": "A beautiful ship"
}
This human-friendly representation is useful for debugging for example.
Controlling the schema namespace
Avro named types such as a Record
optionally define a “namespace” to qualify their name.
Package name
By default, py-avro-schema populates the namespace with the Python package name within which the Python type is defined.
For example, if the type Ship
is defined in module shipping.models
, the namespace will be shipping
.
A good pattern is to define (or import-as) the types into a package’s __init__.py
module such that the types are importable using the Avro schema namespace exactly.
For example:
# File shipping/__init__.py
from shipping.models import Ship
__all__ = ["Ship"]
This can be really useful for deserializing Avro data into Python objects.
Module name
Alternatively, to use the full dotted module name (shipping.models
in the above example) instead of the top-level package name use the option py_avro_schema.Option.AUTO_NAMESPACE_MODULE
.
Manual
A custom namespace can be specified like this:
>>> pas.generate(shipping.models.Ship, namespace="com.shipping.schemas")
b'{"type":"record","name":"Ship","fields":[...],"namespace":"com.shipping.schemas", ...}'
No namespace
To disable automatic namespace population altogether, use this:
>>> pas.generate(Ship, options=pas.Option.NO_AUTO_NAMESPACE)
b'{"type":"record","name":"Ship","fields":[...],"doc":"A beautiful ship"}'