Python XML Processing
$count++; if($count == 1) { include "../mobilemenu.php"; } ?> if ($count == 2) { include "../sharemediasubfolder.php"; } ?>
Python's xml
modules enable you to parse, create, and modify XML data. The main modules include xml.etree.ElementTree
(for parsing and creating XML) and xml.dom.minidom
(for DOM manipulation).
1. Parsing XML with xml.etree.ElementTree
Thexml.etree.ElementTree
module is commonly used to parse XML from strings or files. The ElementTree.parse()
method loads XML data and returns a tree structure.
import xml.etree.ElementTree as ET
# Sample XML data
xml_data = """<library>
<book id="1">
<title>Python Programming</title>
<author>John Smith</author>
</book>
<book id="2">
<title>Advanced Python</title>
<author>Jane Doe</author>
</book>
</library>"""
# Parse XML string
root = ET.fromstring(xml_data)
print("Root tag:", root.tag)
Output:
Root tag: library
Explanation: ET.fromstring()
parses XML from a string and provides access to the root element.2. Accessing Elements and Attributes
Once parsed, you can access child elements, attributes, and text content usingElement
objects.
# Access each book and print details
for book in root.findall("book"):
title = book.find("title").text
author = book.find("author").text
book_id = book.get("id")
print(f"Book ID: {book_id}, Title: {title}, Author: {author}")
Output:
Book ID: 1, Title: Python Programming, Author: John Smith
Book ID: 2, Title: Advanced Python, Author: Jane Doe
Explanation: findall()
retrieves elements by tag name, get()
retrieves attribute values, and text
accesses element content.3. Creating XML Documents
UseElementTree
to build XML documents from scratch by creating elements and setting their attributes and text content.
# Create root element
library = ET.Element("library")
# Add book elements
book1 = ET.SubElement(library, "book", id="3")
ET.SubElement(book1, "title").text = "Learning XML"
ET.SubElement(book1, "author").text = "Tom Brown"
book2 = ET.SubElement(library, "book", id="4")
ET.SubElement(book2, "title").text = "XML and Python"
ET.SubElement(book2, "author").text = "Sue Johnson"
# Generate string from XML structure
tree = ET.ElementTree(library)
ET.dump(library) # Displays XML structure
Output:
<library>
<book id="3">
<title>Learning XML</title>
<author>Tom Brown</author>
</book>
<book id="4">
<title>XML and Python</title>
<author>Sue Johnson</author>
</book>
</library>
Explanation: ET.Element()
creates a root element, ET.SubElement()
adds child elements, and ET.dump()
outputs the entire XML structure.4. Writing XML to a File
To save XML data to a file, useElementTree.write()
.
# Write XML to a file
tree.write("library.xml", encoding="utf-8", xml_declaration=True)
Explanation: The write()
method saves XML data to a file, with optional encoding
and xml_declaration
arguments.5. Pretty-Printing XML with xml.dom.minidom
Thexml.dom.minidom
module provides pretty-printing capabilities for XML output.
from xml.dom import minidom
# Convert ElementTree to string and pretty-print
xml_str = ET.tostring(library, encoding="unicode")
pretty_xml = minidom.parseString(xml_str).toprettyxml(indent=" ")
print("Pretty-printed XML:\n", pretty_xml)
Output:
Pretty-printed XML:
<?xml version="1.0"?>
<library>
<book id="3">
<title>Learning XML</title>
<author>Tom Brown</author>
</book>
<book id="4">
<title>XML and Python</title>
<author>Sue Johnson</author>
</book>
</library>
Explanation: minidom.parseString().toprettyxml()
adds indentation to make XML human-readable.6. Modifying XML Elements
Modify XML elements by changing their attributes or text content.# Modify the title of the first book
book_to_modify = library.find("book")
book_to_modify.find("title").text = "Mastering XML"
print("Modified XML structure:")
ET.dump(library)
Output:
Modified XML structure:
<library>
<book id="3">
<title>Mastering XML</title>
<author>Tom Brown</author>
</book>
<book id="4">
<title>XML and Python</title>
<author>Sue Johnson</author>
</book>
</library>
Explanation: Updating an element’s text
property modifies the XML data.7. Parsing XML with xml.sax (Event-Driven)
Thexml.sax
module uses event-driven parsing, suitable for large XML files, as it processes elements incrementally.
import xml.sax
# Define a SAX handler to process each element
class BookHandler(xml.sax.ContentHandler):
def startElement(self, name, attrs):
if name == "book":
print(f"Book ID: {attrs['id']}")
def characters(self, content):
if content.strip():
print("Content:", content)
def endElement(self, name):
if name == "book":
print("--- End of book ---")
# Parse XML with SAX
xml.sax.parseString(xml_data, BookHandler())
Output:
Book ID: 1
Content: Python Programming
Content: John Smith
--- End of book ---
Book ID: 2
Content: Advanced Python
Content: Jane Doe
--- End of book ---
Explanation: The BookHandler
class processes XML elements as they’re encountered, making it efficient for large files.Summary
Python’sxml.etree.ElementTree
, xml.dom.minidom
, and xml.sax
modules offer versatile options for parsing, creating, modifying, and saving XML data, catering to various processing needs.