We will start with Apache Avro formats and dive into Avro schema example with explanation.
Previous posts
"Apache Avro™ is a data serialization system." It supports two formats, JSON and Binary. We use DatumReader<T>
and DatumWriter<T>
for de-serialization and serialization of data respectively.
In the previous post, we generated Avro classes. We are extending the same example code to show how it support serialization and de-serialization of the data.
// Point 1
Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();
// Point 2
DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
// Point 3
Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(Employee.getClassSchema(), baos);
// Point 4
employeeWriter.write(employee, jsonEncoder);
// Point 5
jsonEncoder.flush();
data = baos.toByteArray();
}
// serialized data
System.out.println(new String(data));
// Point 6
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
// Point 7
Decoder decoder = DecoderFactory.get().jsonDecoder(Employee.getClassSchema(), new String(data));
// Point 8
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);
Explanation on the way :)
SpecificDatumWriter<T>
which implements DatumWriter<T>
Also, there exists other implementation of DatumWriter
viz. GenericDatumWriter
and ReflectDatumWriter
.JsonEncoder
by passing Schema and OutputStream
where we want the serialized data and In our case, it is in-memory ByteArrayOutputStream
.write(T, Encoder)
method on DatumWriter
with Object
and Encoder
.JsonEncoder
. Internally, it flushes the OutputStream
passed to it.SpecificDatumReader<T>
which implements DatumReader<T>
. Also, there exists other implementation of DatumReader
viz. GenericDatumReader
and ReflectDatumReader
.JsonDecoder
passing Schema and input String which will be deserialized.read
method on DatumReader
.// Point 1
Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();
// Point 2
DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
// Point 3
Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(baos, null);
// Point 4
employeeWriter.write(employee, binaryEncoder);
// Point 5
binaryEncoder.flush();
data = baos.toByteArray();
}
// serialized data
System.out.println(data);
// Point 6
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
// Point 7
Decoder binaryDecoder = DecoderFactory.get().binaryDecoder(data, null);
// Point 8
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);
All the example is same as the previous one except Point 3 and Point 7 where we are creating an object of BinaryEncoder
and BinaryDecoder
.
This is how to we can serialize and deserialize data with Apache Avro. I hope you found this article informative and useful. You can find the full example on github.