[python] Validating with an XML schema in Python

I have an XML file and an XML schema in another file and I'd like to validate that my XML file adheres to the schema. How do I do this in Python?

I'd prefer something using the standard library, but I can install a third-party package if necessary.

This question is related to python xml validation xsd

The answer is


As for "pure python" solutions: the package index lists:

  • pyxsd, the description says it uses xml.etree.cElementTree, which is not "pure python" (but included in stdlib), but source code indicates that it falls back to xml.etree.ElementTree, so this would count as pure python. Haven't used it, but according to the docs, it does do schema validation.
  • minixsv: 'a lightweight XML schema validator written in "pure" Python'. However, the description says "currently a subset of the XML schema standard is supported", so this may not be enough.
  • XSV, which I think is used for the W3C's online xsd validator (it still seems to use the old pyxml package, which I think is no longer maintained)

lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

The PyXB package at http://pyxb.sourceforge.net/ generates validating bindings for Python from XML schema documents. It handles almost every schema construct and supports multiple namespaces.


You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It's pure Python, available on PyPi and doesn't have many dependencies.

Example - validate a file:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

The method raises an exception if the file doesn't validate against the XSD. That exception then contains some violation details.

If you want to validate many files you only have to load the XSD once:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

If you don't need the exception you can validate like this:

if xsd.is_valid('doc.xml'):
    print('do something useful')

Alternatively, xmlschema directly works on file objects and in memory XML trees (either created with xml.etree.ElementTree or lxml). Example:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))

lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

The PyXB package at http://pyxb.sourceforge.net/ generates validating bindings for Python from XML schema documents. It handles almost every schema construct and supports multiple namespaces.


An example of a simple validator in Python3 using the popular library lxml

Installation lxml

pip install lxml

If you get an error like "Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?", try to do this first:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

The simplest validator

Let's create simplest validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

then write and run main.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

A little bit of OOP

In order to validate more than one file, there is no need to create an XMLSchema object every time, therefore:

validator.py

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

Now we can validate all files in the directory as follows:

main.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

For more options read here: Validation with lxml


lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It's pure Python, available on PyPi and doesn't have many dependencies.

Example - validate a file:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

The method raises an exception if the file doesn't validate against the XSD. That exception then contains some violation details.

If you want to validate many files you only have to load the XSD once:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

If you don't need the exception you can validate like this:

if xsd.is_valid('doc.xml'):
    print('do something useful')

Alternatively, xmlschema directly works on file objects and in memory XML trees (either created with xml.etree.ElementTree or lxml). Example:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))

There are two ways(actually there are more) that you could do this.
1. using lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. Use xmllint from the commandline. xmllint comes installed in many linux distributions.

>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml


lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

An example of a simple validator in Python3 using the popular library lxml

Installation lxml

pip install lxml

If you get an error like "Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?", try to do this first:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

The simplest validator

Let's create simplest validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

then write and run main.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

A little bit of OOP

In order to validate more than one file, there is no need to create an XMLSchema object every time, therefore:

validator.py

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

Now we can validate all files in the directory as follows:

main.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

For more options read here: Validation with lxml


As for "pure python" solutions: the package index lists:

  • pyxsd, the description says it uses xml.etree.cElementTree, which is not "pure python" (but included in stdlib), but source code indicates that it falls back to xml.etree.ElementTree, so this would count as pure python. Haven't used it, but according to the docs, it does do schema validation.
  • minixsv: 'a lightweight XML schema validator written in "pure" Python'. However, the description says "currently a subset of the XML schema standard is supported", so this may not be enough.
  • XSV, which I think is used for the W3C's online xsd validator (it still seems to use the old pyxml package, which I think is no longer maintained)

There are two ways(actually there are more) that you could do this.
1. using lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. Use xmllint from the commandline. xmllint comes installed in many linux distributions.

>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to validation

Rails 2.3.4 Persisting Model on Validation Failure Input type number "only numeric value" validation How can I manually set an Angular form field as invalid? Laravel Password & Password_Confirmation Validation Reactjs - Form input validation Get all validation errors from Angular 2 FormGroup Min / Max Validator in Angular 2 Final How to validate white spaces/empty spaces? [Angular 2] How to Validate on Max File Size in Laravel? WebForms UnobtrusiveValidationMode requires a ScriptResourceMapping for jquery

Examples related to xsd

How to generate xsd from wsdl How to reference a local XML Schema file correctly? XML Schema Validation : Cannot find the declaration of element Using Notepad++ to validate XML against an XSD Error in spring application context schema cvc-elt.1: Cannot find the declaration of element 'MyElement' Importing xsd into wsdl SoapUI "failed to load url" error when loading WSDL How to make an element in XML schema optional? XML Schema How to Restrict Attribute by Enumeration