Apache AVRO - Introduction

We will start with Introduction to Apache Avro and dive into Avro schema and end with serialization process in Apache Avro.

1. Introduction to Apache Avro

"Apache Avro is data serialization library" That's it, huh!?!. This is what you will see when you open their official page. Apache Avro is:

2. Schema definition

Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing.

Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex.

2.1. Primitive types

These are the basic type supported by Avro. It includes null, int, long, bytes, string, float and double. One quick example:

{"type": "string"}

2.2. Complex types

Apache Avro support six complex types i.e. record, enum, array, map, fixed and union.


Record uses the name type record and has following attributes.

"type": "record",
"name": "Node",
"aliases": ["SinglyLinkedNodes"],
"fields" : [
{"name": "value", "type": "string"},
{"name": "next", "type": ["null", "Node"]}

Enum uses the type enum and support attributes i.e. name, namespace, aliases, doc and symbols (A JSON array).

"type": "enum",
"name": "Move",
"symbols" : ["LEFT", "RIGHT", "UP", "DOWN"]

Array uses the type array and support single attribute item.

{"type": "array", "items": "string"}

Map uses the type map and support one attribute values. It's key by default are of type string.

{"type": "map", "values": "long"}

Unions are represented by JSON array as ["null", "string"] which means the value type could be null or string.


Fixed uses type fixed and support two attributes i.e. name and size.

{"type": "fixed", "size": 16, "name": "md5"}

3. Serialization in Apache Avro

Apache Avro data is always serialized with its schema. It supports two types of encoding i.e. Binary and JSON . You can read more on serialization on their official specification and/ or can see our detailed post on serialization.

Tags: Apache AVRO, Binary and JSON format serialization, AVRO schema

← Back home