Read and Write to Avro Format File With Schema in Python

Mahaboob Basha
3 min readJul 15, 2022

In this article we will know how to read and write Avro format file with schema.

To know about Avro and Avro schema please click on below link

Avro file format is popular is big data and used in many uses cases. Avro stores data in JSON format. Since it stores data in JSON format it is easy to read and interpret data by any program.

Now we will see how to read and write Avro file in Python.

Please use below command to install Avro library in python

pip install avro or pip install avro-python3

avro-python3 is for python 3 version

We will use below sample Avro schema called sample.avsc

{ “type” : “record”, “name” : “Sample”, “namespace” : “avropoc.example”, “fields” : [{“name” : “customer_id”, “type” : “int”}, {“name” : “customer_name”, “type” : “string”}, {“name” : “joining_date”, “type” : “int”, “logicalType”: “date”}, {“name” : “salary”, “type” : “float”, “logicalType”: “decimal”, “precision”: 10, “scale”: 2}, {“name” : “address”, “type” : “string”}, {“name” : “updated_date”, “type” : “long”, “logicalType”: “timestamp-millis”} ]}

Above schema file has below fields and data type in it

  1. customer_id and data type is “int”.
  2. customer_name and data type is “string”.
  3. joining_date and data type is “int” , logical data type is “date”.
  4. salary and data type is “float” , logical data type is “decimal”.
  5. address and data type is “string” .
  6. updated_data and data type is “long” , logical data type is “timestamp-millis”.

Please use below code for importing libraries

import avro.schemafrom avro.datafile import DataFileReader, DataFileWriterfrom avro.io import DatumReader, DatumWriter

“avro.schema” is used for reading Avro schema.

“avro.datafile” is for reading and writing file.

“avro.io” is for input and output operations.

Below code snippet is used for reading and printing Avro Schema

#Reading and parsing Avro Schema Fileschema = avro.schema.Parse(open(‘sample.avsc’, “r”).read()) #Printing Avro Schemaprint(schema)

Below code snippet is used for creating empty Avro file using schema

#Creating a empty Avro file using Avro schemawriter = DataFileWriter(open("sample.avro", "wb"), DatumWriter(), schema)

Below code snippet is used for writing data to the Avro empty file

#Writning data to Avro file using Avro schema writer.append({"customer_id": 123, "customer_name":'abc', "joining_date":20120131,"salary":4000.50          ,"address":"D.No 1234 , ABC , abc , ABC"              ,"updated_date":1538265652000})
writer.close()

Below code snippet is used for reading Avro file

#Reading Avro file 
reader = DataFileReader(open("sample.avro", "rb"), DatumReader())

Below code snippet is used for printing data

#Printing Avro file data
for user in reader:
print (user)reader.close()

Please find below complete code

#Loading required libaries 
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
#Reading and parsing Avro Schema
Fileschema = avro.schema.Parse(open('sample.avsc', "r").read()) #Printing Avro Schema
print(schema)
#Creating a empty Avro file using Avro
schemawriter = DataFileWriter(open("sample.avro", "wb"), DatumWriter(), schema)
#Writning data to Avro file using Avro schema writer.append({"customer_id": 123, "customer_name":'abc', "joining_date":20120131,"salary":4000.50 ,"address":"D.No 1234 , ABC , abc , ABC" ,"updated_date":1538265652000})
writer.close()
#Reading Avro file
reader = DataFileReader(open("sample.avro", "rb"), DatumReader()) #Printing Avro file data
for user in reader:
print (user)reader.close()

Please find below github link

https://github.com/bashamsc/Avro_Read_Write_Python

--

--