[python] How to update/modify an XML file in python?

I have an XML document that I would like to update after it already contains data.

I thought about opening the XML file in "a" (append) mode. The problem is that the new data will be written after the root closing tag.

How can I delete the last line of a file, then start writing data from that point, and then close the root tag?

Of course I could read the whole file and do some string manipulations, but I don't think that's the best idea..

Thanks for your time.

This question is related to python xml io

The answer is


The quick and easy way, which you definitely should not do (see below), is to read the whole file into a list of strings using readlines(). I write this in case the quick and easy solution is what you're looking for.

Just open the file using open(), then call the readlines() method. What you'll get is a list of all the strings in the file. Now, you can easily add strings before the last element (just add to the list one element before the last). Finally, you can write these back to the file using writelines().

An example might help:

my_file = open(filename, "r")
lines_of_file = my_file.readlines()
lines_of_file.insert(-1, "This line is added one before the last line")
my_file.writelines(lines_of_file)

The reason you shouldn't be doing this is because, unless you are doing something very quick n' dirty, you should be using an XML parser. This is a library that allows you to work with XML intelligently, using concepts like DOM, trees, and nodes. This is not only the proper way to work with XML, it is also the standard way, making your code both more portable, and easier for other programmers to understand.

Tim's answer mentioned checking out xml.dom.minidom for this purpose, which I think would be a great idea.


Useful Python XML parsers:

  1. Minidom - functional but limited
  2. ElementTree - decent performance, more functionality
  3. lxml - high-performance in most cases, high functionality including real xpath support

Any of those is better than trying to update the XML file as strings of text.

What that means to you:

Open your file with an XML parser of your choice, find the node you're interested in, replace the value, serialize the file back out.


While I agree with Tim and Oben Sonne that you should use an XML library, there are ways to still manipulate it as a simple string object.

I likely would not try to use a single file pointer for what you are describing, and instead read the file into memory, edit it, then write it out.:

inFile = open('file.xml', 'r')
data = inFile.readlines()
inFile.close()
# some manipulation on `data`
outFile = open('file.xml', 'w')
outFile.writelines(data)
outFile.close()

For the modification, you could use tag.text from xml. Here is snippet:

import xml.etree.ElementTree as ET

tree = ET.parse('country_data.xml')
root = tree.getroot()

for rank in root.iter('rank'):
    new_rank = int(rank.text) + 1
    rank.text = str(new_rank)
    
tree.write('output.xml')

The rank in the code is example of tag, which depending on your XML file contents.


To make this process more robust, you could consider using the SAX parser (that way you don't have to hold the whole file in memory), read & write till the end of tree and then start appending.


Using ElementTree:

import xml.etree.ElementTree

# Open original file
et = xml.etree.ElementTree.parse('file.xml')

# Append new tag: <a x='1' y='abc'>body text</a>
new_tag = xml.etree.ElementTree.SubElement(et.getroot(), 'a')
new_tag.text = 'body text'
new_tag.attrib['x'] = '1' # must be str; cannot be an int
new_tag.attrib['y'] = 'abc'

# Write back to file
#et.write('file.xml')
et.write('file_new.xml')

note: output written to file_new.xml for you to experiment, writing back to file.xml will replace the old content.

IMPORTANT: the ElementTree library stores attributes in a dict, as such, the order in which these attributes are listed in the xml text will NOT be preserved. Instead, they will be output in alphabetical order. (also, comments are removed. I'm finding this rather annoying)

ie: the xml input text <b y='xxx' x='2'>some body</b> will be output as <b x='2' y='xxx'>some body</b>(after alphabetising the order parameters are defined)

This means when committing the original, and changed files to a revision control system (such as SVN, CSV, ClearCase, etc), a diff between the 2 files may not look pretty.


You should read the XML file using specific XML modules. That way you can edit the XML document in memory and rewrite your changed XML document into the file.

Here is a quick start: http://docs.python.org/library/xml.dom.minidom.html

There are a lot of other XML utilities, which one is best depends on the nature of your XML file and in which way you want to edit it.


As Edan Maor explained, the quick and dirty way to do it (for [utc-16] encoded .xml files), which you should not do for the resons Edam Maor explained, can done with the following python 2.7 code in case time constraints do not allow you to learn (propper) XML parses.

Assuming you want to:

  1. Delete the last line in the original xml file.
  2. Add a line
  3. substitute a line
  4. Close the root tag.

It worked in python 2.7 modifying an .xml file named "b.xml" located in folder "a", where "a" was located in the "working folder" of python. It outputs the new modified file as "c.xml" in folder "a", without yielding encoding errors (for me) in further use outside of python 2.7.

pattern = '<Author>'
subst = '    <Author>' + domain + '\\' + user_name + '</Author>'
line_index =0 #set line count to 0 before starting
file = io.open('a/b.xml', 'r', encoding='utf-16')
lines = file.readlines()
outFile = open('a/c.xml', 'w')
for line in lines[0:len(lines)]:
    line_index =line_index +1
    if line_index == len(lines):
        #1. & 2. delete last line and adding another line in its place not writing it
        outFile.writelines("Write extra line here" + '\n')
        # 4. Close root tag:
        outFile.writelines("</phonebook>") # as in:
        #http://tizag.com/xmlTutorial/xmldocument.php
    else:
        #3. Substitue a line if it finds the following substring in a line:
        pattern = '<Author>'
        subst = '    <Author>' + domain + '\\' + user_name + '</Author>'
        if pattern in line:
            line = subst
            print line
        outFile.writelines(line)#just writing/copying all the lines from the original xml except for the last.

What you really want to do is use an XML parser and append the new elements with the API provided.

Then simply overwrite the file.

The easiest to use would probably be a DOM parser like the one below:

http://docs.python.org/library/xml.dom.minidom.html


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to io

Reading file using relative path in python project How to write to a CSV line by line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder Exception: Unexpected end of ZLIB input stream How to get File Created Date and Modified Date Printing Mongo query output to a file while in the mongo shell Load data from txt with pandas Writing File to Temp Folder How to get resources directory path programmatically ValueError : I/O operation on closed file