Read Avro Files
Overview
This skill helps extract JSON data from Apache Avro files and displays them in a readable format. It handles deserialization of nested byte strings and saves the output to a JSON file.
When to Use This Skill
- User mentions they have an Avro file (.avro extension)
- User wants to convert Avro to JSON
- User wants to see the contents of an Avro file
- User provides a path to an .avro file
Workflow
Copy and track progress:
Avro Conversion Progress:
- [ ] Step 1: Verify file path or ask user
- [ ] Step 2: Check dependencies (avro-python3)
- [ ] Step 3: Run conversion script
- [ ] Step 4: Verify JSON output created
- [ ] Step 5: Present results to user
Step 1: Verify File Path
If the file path is not provided by the user:
- Use the AskUserQuestion tool to get the Avro file path
- Ask: "Please provide the full path to the Avro file you want to convert"
Step 2: Check Dependencies
Install dependencies if needed:
pip install avro-python3
Only install if the avro module is not already available.
Step 3: Run Conversion Script
Run the bundled script (do not read its contents):
python scripts/read_avro.py "<avro_file_path>"
The script will:
- Display each record as formatted JSON
- Deserialize the Body field if it contains nested JSON
- Save the output to a .json file in the same directory
For script implementation details, see scripts/read_avro.py.
Step 4: Verify JSON Output Created
Check that the output file was created:
- Output file name: same as input but with .json extension
- Location: same directory as the input file
Step 5: Present Results
Show the user:
- Where the JSON output was saved
- Number of records found
- Key information from the records if relevant
Expected Output Format
The script produces:
- Console output showing each record with formatted JSON
- A .json file saved in the same directory as the input file
- Record count summary
Common Use Cases
- Event Hub captured data: Avro files from Azure Event Hub captures containing event metadata and body
- Kafka messages: Avro-serialized Kafka messages
- Data pipeline debugging: Inspecting intermediate Avro files in data processing pipelines
- Schema validation: Viewing actual data structure for schema comparison
Common Issues
FileNotFoundError:
- Verify the path exists
- Use absolute paths instead of relative paths
- Check for typos in the path
Module not found (avro):
- Run:
pip install avro-python3
- Verify installation:
python -c "import avro"
JSON decode error in Body field:
- The Body field may have unexpected encoding
- Check Event Hub configuration if applicable
- The script will show a warning but continue processing
Notes
- The script handles nested JSON in byte string format (common in Event Hub captures)
- Dates and complex types are converted to strings in the output
- Large files will show all records in console but save efficiently to JSON
- The bundled script is optimized for reliability and handles edge cases