We will start with huge performance difference between Apache Avro vs Java Serializastion and compare both serialization processes.
Previous posts
Apache Avro consumed 15-20 times less memory to store the serialized data. I created a class with three fields (two String and one enum) and serialized them with Avro and Java.
The memory used by Apache Avro is 14 bytes and Java used 231 bytes (length of byte[]).
Let's understand both Java and Avro serilization process to understand the reasoning behind Avro using less bytes.
The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.
writes only the schema as String and data of class being serialized. There is no per field overhead of writing the class of the object, the class signature as in Java. Also, the fields are serialized in pre-determined order.
You can find the full Java example code used to compare serialization process on github.
Some observations
Apache Avro can't handle circular references and throw
java.lang.StackOverflowError
whereas Java's default serialization can handle it. (example code for Avro and example code for Java serialization) Another observation is that Avro have no direct way of defining inheritance in the Schema (Classes) but Java's default serialization support inheritance with its own constraints like super class either need to implements Serializable interface or have default no-args constructor accessible till top hierarchy, otherwise will throwjava.io.NotSerializableException
.