Tutorial¶
This tutorial shows you how to use py-avro-schema, step by step.
Define data types and structures¶
An example data structure could be defined like this in Python:
# File shipping/models.py
import dataclasses
@dataclasses.dataclass
class Ship:
"""A beautiful ship"""
name: str
year_launched: int
This defines a single type Ship
with 2 fields: name
(some text) and year_launched
(a number).
The type hints are essential and used by py-avro-schema to generate the Avro schema!
Generating the Avro schema¶
To represent this as a data type, we run the following commands (here we use an interactive Python shell):
>>> import py_avro_schema as pas
>>> import shipping.models
>>> pas.generate(shipping.models.Ship)
b'{"type":"record","name":"Ship","fields":[{"name":"name","type":"string"},{"name":"year_launched","type":"long"}],"namespace":"shipping","doc":"A beautiful ship"}'
The output is the Avro schema as a (binary) JSON string.
If we wanted to, we could format the JSON string a bit nicer:
>>> raw_json = pas.generate(Ship, options=pas.Option.JSON_INDENT_2)
>>> print(raw_json.decode())
{
"type": "record",
"name": "Ship",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "year_launched",
"type": "long"
}
],
"namespace": "shipping",
"doc": "A beautiful ship"
}
This human-friendly representation is useful for debugging for example.
Controlling the schema namespace¶
Avro named types such as a Record
optionally define a “namespace” to qualify their name.
py-avro-schema populates the namespace with the Python package name within which the Python type is defined.
The recommended pattern is to define (or import-as) the types into a package’s __init__.py
module such that the types are importable from a package as populated in the Avro schema namespace.
This can be really useful for deserializing Avro data into Python objects.
Disable automatic namespace population like this:
>>> pas.generate(Ship, options=pas.Option.NO_AUTO_NAMESPACE)
b'{"type":"record","name":"Ship","fields":[{"name":"name","type":"string"},{"name":"year_launched","type":"long"}],"doc":"A beautiful ship"}'
Alternatively, to use the full dotted module name (for example shipping.models
) instead of the top-level package name (shipping
) use the option pas.Option.AUTO_NAMESPACE_MODULE
.