How to Parse data from XML, JSON and YAML into Python?
Introduction
Data formats are a syntax that applications agree to exchange data with each other. The most commonly used with network APIs and automation tools are XML, JSON and YAML. A standard format can allow a diverse set of software written in the same or different programming languages to communicate with each other.
In this article, we will explain how to parse data from XML, JSON and YAML formats to Python, so that they can be structured and manipulated. We will focus on structuring this data in Python Dictionaries.
Requerimientos
- A computer with Linux and Python installed (See this article if you need to install Python on Linux).
How to convert from JSON to a Dictionary in Python?
Remembering some JSON features
- The ways of formatting and transmitting data are human and machine readable.
- Sends data objects using Name-Value pairs, Key-Value pairs.
- JSON most closely resembles the dictionary data type in Python.
- APIs often return strings in JSON.
Example of a text in JSON format:
{ “Routers”: [ { “Hostname”: “R1”, “Vendor”: “Cisco”, “Id”: “1” }, { “Hostname”: “R2”, “Vendor”: “Huawei”, “Id”: “2” } ] }
Read JSON data from a file with Python
Let’s take the text from the previous example and save it to a file called “json_data.txt”. Below I show you a simple code using the python module called “json” to read the data in json and print it on screen.
import json with open('json_data.txt') as json_data: json_parsed = json.loads(json_data.read()) print(json.dumps(json_parsed, indent=4, sort_keys=True))
The output we will get will be exactly what we have in the file.
{ “Routers”: [ { “Hostname”: “R1”, “Vendor”: “Cisco”, “Id”: “1” }, { “Hostname”: “R2”, “Vendor”: “Huawei”, “Id”: “2” } ] }
If we take the above code, we remove the line from “print” and add the following line:
print(json_parsed) type(json_parsed)
We will see that the data is stored in a Python dictionary.
How to convert from XML to a Dictionary in Python?
Remembering some XML features
- Independent of a programming language
- Ways of formatting and transmitting data are readable by both humans and machines.
- Not as readable as JSON
- There is no native Python object to structure the data.
Example of a text in XML format:
<?xml version="1.0" encoding="UTF-8" ?> <root> <Routers> <Hostname>R1</Hostname> <Vendor>Cisco</Vendor> <Id>1</Id> </Routers> <Routers> <Hostname>R2</Hostname> <Vendor>Huawei</Vendor> <Id>2</Id> </Routers> </root>
Read XML data from a file with Python
One of the easiest ways to parse XML data into Python is through the “xmltodict” module. To install this module we execute the following command:
$ pip install xmltodict
Let’s take the text from the previous example and save it to a file called “xml_data.txt”. Below I show you a simple code using the python module “xmltodict” to read the data in xml and print it on screen in the Python structure.
import xmltodict with open('xml_data.txt') as xml_data: xml_parsed = xmltodict.parse(xml_data.read()) print(xml_parsed) print() print(xml_parsed['root']) print() print(xml_parsed['root']['Routers']) print() print(xml_parsed['root']['Routers'][0])
As we can see the outputs are ordered Python dictionaries:
OrderedDict([('root', OrderedDict([('Routers', [OrderedDict([('Hostname', 'R1'), ('Vendor', 'Cisco'), ('Id', '1')]), OrderedDict([('Hostname', 'R2'), ('Vendor', 'Huawei'), ('Id', '2')])])]))]) OrderedDict([('Routers', [OrderedDict([('Hostname', 'R1'), ('Vendor', 'Cisco'), ('Id', '1')]), OrderedDict([('Hostname', 'R2'), ('Vendor', 'Huawei'), ('Id', '2')])])]) [OrderedDict([('Hostname', 'R1'), ('Vendor', 'Cisco'), ('Id', '1')]), OrderedDict([('Hostname', 'R2'), ('Vendor', 'Huawei'), ('Id', '2')])] OrderedDict([('Hostname', 'R1'), ('Vendor', 'Cisco'), ('Id', '1')])
How to convert YAML to Dictionary in Python?
YAML is the most popular data format for network automation due to its uniqueness of being a human friendly data format. This was the main objective of the creators. The official website define it as: “YAML is a friendly data serialization standard for all programming languages”.
Remembering some YAML features
- YAML is visually easier to see than JSON and XML.
- Document structure is denoted by indenting with blank spaces
- List members are denoted headed by a hyphen (-) with one member for each line.
- Comments are padded (#) and continue to the end of the line.
- Multiple documents can be included within a single stream, separated by three hyphens (- – -).
- Associative vectors are represented using the two points followed by a space. in the form “key: value”
Example of a text in YAML format:
--- Routers: - Hostname: R1 Vendor: Cisco Id: '1' - Hostname: R2 Vendor: Huawei Id: '2'
Read YAML data from a file with Python
One of the easiest ways to parse data in YAML to Python is through the “yaml” module.
Let’s take the text from the previous example and save it to a file called “yaml_data.txt”. Below I show you a simple code using the python module “yaml” to read the data in yaml and print it on screen as a Python dictionary structure.
import yaml with open('yaml_data.txt') as yaml_data: yaml_parsed = yaml.load(yaml_data, Loader=yaml.FullLoader) print(yaml_parsed) print(type(yaml_parsed))
We will see in the first line the data in a Python dictionary. And in the second line we confirm that the data type is a dictionary.
{'Routers': [{'Hostname': 'R1', 'Vendor': 'Cisco', 'Id': '1'}, {'Hostname': 'R2', 'Vendor': 'Huawei', 'Id': '2'}]} <class 'dict'>
Conclusion
It’s really not difficult to parse data in JSON, XML and YAML formats in Python. By using the modules and methods that we show, you can structure the data in a dictionary.