Supported data types
See also
Official Avro schema specification: https://avro.apache.org/docs/current/spec.html#schemas
py-avro-schema supports the following Python types:
Compound types/structures
dataclasses.dataclass()
Supports Python classes decorated with dataclasses.dataclass()
.
Avro schema: record
The Avro record
type is a named schema.
py-avro-schema uses the Python class name as the schema name.
Dataclass fields with types supported by py-avro-schema are output as expected, including population of default values.
Example:
# File shipping/models.py
import dataclasses
from typing import Optional
@dataclasses.dataclass
class Ship:
"""A beautiful ship"""
name: str
year_launched: Optional[int] = None
Is output as:
{
"type": "record",
"name": "Ship",
"namespace": "shipping",
"doc": "A beautiful ship",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "year_launched",
"type": ["null", "long"],
"default": null
}
],
}
Field default values may improve Avro schema evolution and resolution.
To validate that all dataclass fields are specified with a default value, use option py_avro_schema.Option.DEFAULTS_MANDATORY
.
The Avro record schema’s doc
field is populated from the Python class’s docstring.
To disable this, pass the option py_avro_schema.Option.NO_DOC
.
Recursive or repeated reference to the same Python dataclass is supported. After the first time the schema is output, any subsequent references are by name only.
pydantic.BaseModel
Supports Python classes inheriting from pydantic.BaseModel. Requires Pydantic version 2 or greater. For Pydantic 1 support, use py-avro-schema version 2.
Avro schema: record
The Avro record
type is a named schema.
py-avro-schema uses the Python class name as the schema name.
Pydantic model fields with types supported by py-avro-schema are output as expected, including population of default values and descriptions.
Example:
# File shipping/models.py
import pydantic
from typing import Optional
class Ship(pydantic.BaseModel):
"""A beautiful ship"""
name: str
year_launched: Optional[int] = pydantic.Field(None, description="When we hit the water")
Is output as:
{
"type": "record",
"name": "Ship",
"namespace": "shipping",
"doc": "A beautiful ship",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "year_launched",
"type": ["null", "long"],
"default": null,
"doc": "When we hit the water"
}
],
}
Field default values may improve Avro schema evolution and resolution.
To validate that all model fields are specified with a default value, use option py_avro_schema.Option.DEFAULTS_MANDATORY
.
The Avro record schema’s doc
attribute is populated from the Python class’s docstring.
For individual model fields, the doc
attribute is taken from the Pydantic field’s description
attribute.
To disable this, pass the option py_avro_schema.Option.NO_DOC
.
Recursive or repeated reference to the same Pydantic class is supported. After the first time the schema is output, any subsequent references are by name only.
Warning
When using a hierarchy of Pydantic model classes, recursive type references are supported in the final class only and not in any inherited/base class.
Plain Python classes
Supports Python classes with a __init__()
where all arguments have type hints and fully define all schema fields.
Avro schema: record
The Avro record
type is a named schema.
py-avro-schema uses the Python class name as the schema name.
Example:
class Port:
"""A port you can sail to"""
def __init__(self, name: str, country: str = "NLD"):
self.name = name
self.country = country.upper()
Is output as:
{
"type": "record",
"name": "Port",
"namespace": "shipping",
"doc": "A port you can sail to",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "country",
"type": "string",
"default": "NLD"
}
]
}
typing.Union
and types.UnionType
(X | Y
)
Avro schema: JSON array of multiple Avro schemas
Union members can be any other type supported by py-avro-schema.
When defined as a class field with a default value, the union members may be re-ordered to ensure that the first member matches the type of the default value.
Forward references
Avro schema: any named schema
py-avro-schema generally supports “forward” or recursive references, for example when a class attribute has the same type as a the class itself.
Example:
@dataclasses.dataclass
class PyType:
field_a: "PyType"
Is output as:
{
"type": "record",
"name": "PyType",
"fields": [
{
"name": "field_a",
"type": "PyType",
},
],
}
Warning
When using a hierarchy of Pydantic model classes, recursive type references are supported in the final class only and not in any inherited/base class.
Collections
typing.Dict[str, typing.Any]
See also
For a “normal” Avro map
schema using fully typed Python dictionaries, see typing.Mapping.
bytes
json
Arbitrary Python dictionaries could be serialized as a bytes
Avro schema by first serializing the data as JSON.
py-avro-schema supports this “JSON-in-Avro” approach by adding the custom logical type json
to a bytes
schema.
To support JSON serialization as strings instead of bytes, use py_avro_schema.Option.LOGICAL_JSON_STRING
.
typing.List[typing.Dict[str, typing.Any]]
See also
For a “normal” Avro array
schema using fully typed Python lists of dictionaries, see typing.Sequence.
bytes
json
Arbitrary lists of Python dictionaries could be serialized as a bytes
Avro schema by first serializing the data as JSON.
py-avro-schema supports this “JSON-in-Avro” approach by adding the custom logical type json
to a bytes
schema.
To support JSON serialization as strings instead of bytes, use py_avro_schema.Option.LOGICAL_JSON_STRING
.
typing.Mapping
Avro schema: map
This supports other “generic type” versions of collections.abc.Mapping
, including typing.Dict
.
Avro map
schemas support string keys only. Map values can be any other Python type supported by py-avro-schema.
For example, Dict[str, int]
is output as:
{
"type": "map",
"values": "long"
}
typing.Sequence
Avro schema: array
This supports other “generic type” versions of collections.abc.Sequence
, including typing.List
.
Sequence values can be any Python type supported by py-avro-schema. For example, List[int]
is output as:
{
"type": "array",
"values": "long"
}
Simple types
bool
(and subclasses)
Avro schema: boolean
bytes
(and subclasses)
Avro schema: bytes
datetime.date
int
date
datetime.datetime
long
timestamp-micros
To output with millisecond precision instead (logical type timestamp-millis
), use py_avro_schema.Option.MILLISECONDS
.
datetime.time
long
time-micros
To output with millisecond precision instead (logical type time-millis
), use py_avro_schema.Option.MILLISECONDS
.
In that case, the Avro schema is int
.
datetime.timedelta
fixed
duration
The Avro fixed
type is a named schema.
Here, py-avro-schema uses the name datetime.timedelta
.
The full generated schema looks like this:
{
"type": "fixed",
"name": "datetime.timedelta",
"size": 12,
"logicalType": "duration"
}
enum.Enum
Avro schema: enum
The Avro enum
type is a named schema.
py-avro-schema uses the Python class name as the schema name.
Avro enum symbols must be strings.
Example:
# File shipping/models.py
import enum
class ShipType(enum.Enum):
SAILING_VESSEL = "SAILING_VESSEL"
MOTOR_VESSEL = "MOTOR_VESSEL"
Outputs as:
{
"type": "enum",
"name": "ShipType",
"namespace": "shipping",
"symbols": ["SAILING_VESSEL", "MOTOR_VESSEL"],
"default": "SAILING_VESSEL"
}
The default value is taken from the first defined enum symbol and is used to support writer/reader schema resolution.
float
(and subclasses)
Avro schema: double
To output as the 32-bit Avro schema float
instead, use py_avro_schema.Option.FLOAT_32
.
int
(and subclasses)
Avro schema: long
To output as the 32-bit Avro schema int
instead, use py_avro_schema.Option.INT_32
.
NoneType
Avro schema: null
This schema is typically used as a “unioned” type where the default value is None
.
decimal.Decimal
bytes
decimal
The standard library’s decimal.Decimal
should be annotated with additional metadata to define the decimal
precision and scale.
For example, a decimal field with precision 4 and scale 2 is defined like this:
import decimal
from typing import Annotated
import py_avro_schema as pas
construction_costs: Annotated[decimal.Decimal, pas.DecimalMeta(precision=4, scale=2)]
Values can be assigned like normal, e.g. construction_costs = decimal.Decimal("12.34")
. The scale attribute can be
omitted as per Avro specification in which case the scale equals to zero.
The Avro schema for the above type is:
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
str
Avro schema: string
str
subclasses (“named strings”)
Avro schema: string
Python classes inheriting from str
are converted to Avro string
schemas to support serialization of any arbitrary Python types “as a string value”.
Primarily to support deserialization of Avro data, a custom property namedString
is added and populated as the schema’s namespace followed by the class name.
The custom property is used here since the Avro string
schema is not a “named” schema.
py-avro-schema schema uses the same namespace logic as with real named Avro schemas.
Example:
# file shipping/models.py
class PortName(str):
...
Outputs as:
{
"type": "string",
"namedString": "shipping.PortName"
}
typing.Literal
Avro schema: schema corresponding to the type of the literal value, e.g. string
, long
etc.
Mixed types, e.g. Literal["", 42]
are not supported.
uuid.UUID
string
uuid