When using code generation, you use the Avro tool binary, external to your application (or programatically internal to your application) to generate classes that represent the data in your schema. You populate the fields in the classes and use a specific datum writer parameterized with the class to produce the serialized data.
When you do not use code generation, you use the GenericData classes along with the schema to do a series of put() operations. You then use a generic datum writer constructed with the schema to produce the serialized data.
Example Avro Schema
This is a simple schema to use in the example. I made it have an array as a record field because I had a bit of trouble with that when I was working with such a schema. Let's say that this schema is in a file called Schema.avsc.
{
"namespace": "my.pkg.path.avro",
"type" : "record",
"name" : "PeopleList",
"fields" : [
{"name" : "Version", "type" : "int"},
{"name" : "People", "type" : {
"namespace" : "my.pkg.path.avro",
"type" : "array",
"items" : {
"type" : "record",
"namespace": "my.pkg.path.avro",
"name" : "Person",
"fields" : [
{"name" : "FirstName", "type" : "string"},
{"name" : "LastName", "type" : "string"}
]
}}
}
]
}
This schema represents data consisting of a version and a list of records describing people. Each record has the person's first and last name.
Serialize Using Code Generation
To generate classes from outside of your program using the tool, you can invoke it like this:
java -cp <needed jars> org.apache.avro.tool.Main compile schema
<path to avro schema> <path to put generated classes>
For the <needed jars> it will probably be a handful of dependencies. If a needed dependency can't be found when you run the tool, you'll see an error that the specific jar cannot be found. When this happened, I added it to the classpath (the -cp option) and once I had all of them, it worked.
The generator will start at the destination path specified as the second argument and create directories as specified by the namespaces in your schema. Note that you must have namespaces otherwise an exception will be thrown (NullException) and the code generation will fail. In this example, the tool will generate classes and put them in:
<path to put generated classes>/my/pkg/path/avro/.
In this example, two classes will be generated: PeopleList and Person. All of the classes contain i.e. store the schema used to generate them.
Here is a link to the Avro 1.6.1 API to refer to when looking at the code: http://avro.apache.org/docs/1.6.1/api/java/index.html.
This code snippet shows how to use the generated classes to create the serialized data and place it in a java.nio.ByteBuffer. I create a stream, a binary encoder initialized with the stream, and a specific datum writer. Then, I create a PeopleList, add three people to it, serialize it, and store it in the ByteBuffer.
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = new BinaryEncoder(out);
SpecificDatumWriter<PeopleList> w =
new SpecificDatumWriter<PeopleList>(PeopleList.class);
PeopleList all = new PeopleList();
all.People = new GenericData.Array<Person>(
3, all.getSchema().getField("People").schema());
Person person1 = new Person();
person1.FirstName = new Utf8("Cairne");
person1.LastName = new Utf8("Bloodhoof");
all.People.add(person1);
Person person2 = new Person();
person2.FirstName = new Utf8("Sylvanas");
person2.LastName = new Utf8("Windrunner");
all.People.add(person2);
Person person3 = new Person();
person3.FirstName = new Utf8("Grom");
person3.LastName = new Utf8("Hellscream");
all.People.add(person3);
all.Version = 1;
w.write(all, e);
e.flush();
ByteBuffer serialized = ByteBuffer.allocate(out.toByteArray().length);
serialized.put(out.toByteArray());
Serialize Without Using Code Generation
This code snippet shows how to populate GenericData objects, serialize them, and place the result in a java.nio.ByteBuffer. This is similar to the previous code. I create a stream, a binary encoder initialized with the stream, and a generic datum writer. Then, I create a GenericData.Record that will have the version and the list. Next, I create a GenericData.Array for the list, add three people (each being a GenericData.Record) to it, serialize it, and store it in the ByteBuffer. Each GenericData has to be initialized with a schema that describes its format. You can grab different sections of the overall schema using Schema methods. Check them out here: http://avro.apache.org/docs/current/api/java/org/apache/avro/Schema.html. Keep in mind as you're doing this, each put() gets validated against the schema, so if you mess up, an exception will be thrown. If you leave out any field i.e. you don't do a put() for it, an exception will be thrown when you try to write it.
Schema schema = Schema.parse(new File("Schema.avsc"));
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = new BinaryEncoder(out);
GenericDatumWriter<GenericRecord> w = new GenericDatumWriter<GenericRecord>(schema);
GenericRecord all = new GenericData.Record(schema);
Schema peopleSchema = schema.getField("People").schema();
GenericArray<GenericRecord> people = new GenericData.Array<GenericRecord>(3, peopleSchema);
Schema personSchema = peopleSchema.getElementType();
GenericRecord person1 = new GenericData.Record(personSchema);
person1.put("FirstName", new Utf8("Cairne"));
person1.put("LastName", new Utf8("Bloodhoof"));
people.add(person1);
GenericRecord person2 = new GenericData.Record(personSchema);
person2.put("FirstName", new Utf8("Sylvanas"));
person2.put("LastName", new Utf8("Windrunner"));
people.add(person2);
GenericRecord person3 = new GenericData.Record(personSchema);
person3.put("FirstName", new Utf8("Grom"));
person3.put("LastName", new Utf8("Hellscream"));
people.add(person3);
all.put("People", people);
all.put("Version", 1);
w.write(all, e);
e.flush();
ByteBuffer serialized = ByteBuffer.allocate(out.toByteArray().length);
serialized.put(out.toByteArray());
Thoughts
I tried both methods of serializing my data and found that using the code generation seemed less error prone and also handy for when I made changes to the schema. I use Eclipse with the Maven Integration Plugin (http://maven.apache.org/eclipse-plugin.html) to build. I wrote a bash script to invoke the code generation tool and used the Maven AntRun Plugin (http://maven.apache.org/plugins/maven-antrun-plugin/) to run it as part of my build. That way I could make changes to my schema, do a project clean, and have updated classes generated easily. I think updating the serialization methods is easier using the generated classes instead of changing/adding/removing the GenericData objects and/or their put() operations.
I hope maybe some of this can help someone with their Avro serialization endeavors in Java.
This comment has been removed by the author.
ReplyDeleteTwas helpful. Thanks.
ReplyDeleteHello, from Tom White's Hadoop book I cannot get the following to work using Avro 1.6.1. When the JVM attempts creation of a new SpecificDatumWriter(Pair.class), I get a NullPointerException when Avro's SpecificData class tries to create a schema. In the creation of the schema, there is a replace method being called which causes this NullPointerException. Avro 1.6.1 deprecates DecoderFactory.defaultFactory() and changes BinaryEncoder to an Abstract class. Your help would be greatly appreciated. Thank you!
ReplyDeleteSee:
// why doesn't this work in Avro 1.6.1.
@Test
public void testGeneratedPairClass() {
Pair datum = new Pair();
datum.left = new Utf8("L");
datum.right = new Utf8("R");
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder encoder = new EncoderFactory().binaryEncoder(out, null);
DatumWriter writer = new SpecificDatumWriter(Pair.class);
try {
writer.write(datum, encoder);
encoder.flush();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
Decoder decoder = new DecoderFactory().binaryDecoder(out.toByteArray(), null);
DatumReader reader = new SpecificDatumReader(Pair.class);
Pair result;
try {
result = reader.read(null, decoder);
assertThat(result.left.toString(), is("L"));
assertThat(result.right.toString(), is("R"));
} catch (IOException e) {
e.printStackTrace();
}
}
The problem you're seeing is happening because the full name of your generated class doesn't match the name you gave in the schema AND you didn't specify a namespace in your schema. In my example schema, the name is "PeopleList". You can check your Pair class full name like this:
DeleteClass c = (Class)Pair.class;
System.out.println(c.getName());
If you're curious, here's the code (http://svn.apache.org/viewvc/avro/tags/release-1.6.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java?revision=1201974&view=markup) where the exception gets thrown:
179 } else if (type instanceof Class) { // class
180 Class c = (Class)type;
181 String fullName = c.getName();
182 Schema schema = names.get(fullName);
183 if (schema == null)
184 try {
185 schema = (Schema)(c.getDeclaredField("SCHEMA$").get(null));
186
187 if (!fullName.equals(getClassName(schema)))
188 // HACK: schema mismatches class. maven shade plugin? try replacing.
189 schema = Schema.parse
190 (schema.toString().replace(schema.getNamespace(),
191 c.getPackage().getName()));
192 } catch (NoSuchFieldException e) {
193 throw new AvroRuntimeException(e);
194 } catch (IllegalAccessException e) {
195 throw new AvroRuntimeException(e);
196 }
It first checks if the full class name matches the name in the schema. If it doesn't, it tries to do a string replace to replace the namespace with the package name. Your namespace is probably null.
To fix the problem, either make the name in the schema match the class full name OR add a namespace to your schema. I recommend adding a namespace to the schema in any case because I've encountered problems in the past (http://stylishtoupee.blogspot.com/2011/02/null-exception-when-generating-classes.html) when I didn't specify a namespace in my schema. My example schema shows how to specify a namespace.
Not only for java, this clarify some basic concepts of avro. Great!
ReplyDeleteThank you for sharing and illustrating use of the built-in Avro toolset.
ReplyDeleteHi,
ReplyDeleteI'm trying to convert a large XML file into avro. I have the xsd's and I'm able to convert it to avro schema and corresponding jave classes using code generation. Is there a way to convert the XML to java?
Very useful
ReplyDeleteizmir
ReplyDeleteErzurum
Diyarbakır
Tekirdağ
Ankara
6GQERT
yalova
ReplyDeleteyozgat
elazığ
van
sakarya
UNEMİN
https://titandijital.com.tr/
ReplyDeletekars parça eşya taşıma
konya parça eşya taşıma
çankırı parça eşya taşıma
yalova parça eşya taşıma
İFUW
ankara parça eşya taşıma
ReplyDeletetakipçi satın al
antalya rent a car
antalya rent a car
ankara parça eşya taşıma
K8LZİT
E0ABB
ReplyDeleteDiyarbakır Lojistik
Çanakkale Parça Eşya Taşıma
Sivas Lojistik
Burdur Parça Eşya Taşıma
Samsun Lojistik
9FC19
ReplyDeleteVan Şehir İçi Nakliyat
Burdur Evden Eve Nakliyat
Tokat Şehir İçi Nakliyat
Çerkezköy Evden Eve Nakliyat
Kırşehir Şehirler Arası Nakliyat
Kastamonu Evden Eve Nakliyat
Etlik Boya Ustası
Muş Evden Eve Nakliyat
Silivri Duşa Kabin Tamiri
E82C7
ReplyDeleteSinop Şehirler Arası Nakliyat
Mersin Lojistik
Amasya Evden Eve Nakliyat
Aksaray Şehir İçi Nakliyat
Yalova Parça Eşya Taşıma
Urfa Parça Eşya Taşıma
Burdur Parça Eşya Taşıma
Eskişehir Evden Eve Nakliyat
Karapürçek Parke Ustası
151D9
ReplyDeleteBinance Güvenilir mi
Tekirdağ Şehirler Arası Nakliyat
Balıkesir Evden Eve Nakliyat
Tekirdağ Boya Ustası
Ankara Evden Eve Nakliyat
Mersin Şehir İçi Nakliyat
Adıyaman Şehir İçi Nakliyat
Kocaeli Şehir İçi Nakliyat
Çankaya Fayans Ustası
CBE94
ReplyDeleteSiirt Kadınlarla Görüntülü Sohbet
sesli sohbet uygulamaları
Antep Kadınlarla Ücretsiz Sohbet
zonguldak sohbet odaları
canlı görüntülü sohbet
Kastamonu Telefonda Rastgele Sohbet
yozgat rastgele görüntülü sohbet uygulaması
hakkari sesli sohbet mobil
gümüşhane mobil sohbet sitesi
9980F
ReplyDeleteAksaray Canlı Görüntülü Sohbet Odaları
samsun mobil sohbet bedava
Niğde Telefonda Canlı Sohbet
Antalya Rastgele Görüntülü Sohbet Ücretsiz
Rize Canlı Sohbet Uygulamaları
afyon canlı sohbet et
ığdır ücretsiz görüntülü sohbet uygulamaları
balıkesir canlı görüntülü sohbet
mobil sohbet odaları
59866
ReplyDeletecanli goruntulu sohbet siteleri
kastamonu bedava sohbet
çankırı canlı görüntülü sohbet uygulamaları
bedava sohbet siteleri
Elazığ Ücretsiz Sohbet Odaları
ankara canlı sohbet et
muş canlı sohbet bedava
görüntülü sohbet siteleri ücretsiz
bayburt canlı görüntülü sohbet siteleri
EEC54
ReplyDeleteTwitter Trend Topic Satın Al
Shibanomi Coin Hangi Borsada
Dlive Takipçi Satın Al
Soundcloud Dinlenme Satın Al
Caw Coin Hangi Borsada
Jns Coin Hangi Borsada
Periscope Takipçi Hilesi
Kripto Para Nasıl Çıkarılır
Bitcoin Oynama
89FD1
ReplyDeleteBinance Ne Kadar Komisyon Alıyor
Bitcoin Madenciliği Siteleri
Azero Coin Hangi Borsada
Kripto Para Üretme Siteleri
Görüntülü Sohbet
Dlive Takipçi Hilesi
Binance Referans Kodu
Soundcloud Takipçi Satın Al
Facebook Beğeni Satın Al
FD332
ReplyDeleteLoop Network Coin Hangi Borsada
Coin Nasıl Oynanır
Snapchat Takipçi Hilesi
Linkedin Beğeni Hilesi
Görüntülü Sohbet
Coin Para Kazanma
Parasız Görüntülü Sohbet
Youtube Abone Hilesi
Coin Oynama