[c] XML Parser for C

Can you suggest some of the best XML Parser for C ?

This question is related to c xml parsing

The answer is


On Windows, it's native with Win32 api...


Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).

With expat, reading a XML file and displaying the elements indented:

/* 
   A simple test program to parse XML documents with expat
   <http://expat.sourceforge.net/>. It just displays the element
   names.

   On Debian, compile with:

   gcc -Wall -o expat-test -lexpat expat-test.c  

   Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html> 
*/

#include <expat.h>
#include <stdio.h>
#include <string.h>

/* Keep track of the current level in the XML tree */
int             Depth;

#define MAXCHARS 1000000

void
start(void *data, const char *el, const char **attr)
{
    int             i;

    for (i = 0; i < Depth; i++)
        printf("  ");

    printf("%s", el);

    for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
    }

    printf("\n");
    Depth++;
}               /* End of start handler */

void
end(void *data, const char *el)
{
    Depth--;
}               /* End of end handler */

int
main(int argc, char **argv)
{

    char           *filename;
    FILE           *f;
    size_t          size;
    char           *xmltext;
    XML_Parser      parser;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return (1);
    }
    filename = argv[1];
    parser = XML_ParserCreate(NULL);
    if (parser == NULL) {
        fprintf(stderr, "Parser not created\n");
        return (1);
    }
    /* Tell expat to use functions start() and end() each times it encounters
     * the start or end of an element. */
    XML_SetElementHandler(parser, start, end);
    f = fopen(filename, "r");
    xmltext = malloc(MAXCHARS);
    /* Slurp the XML file in the buffer xmltext */
    size = fread(xmltext, sizeof(char), MAXCHARS, f);
    if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
        XML_STATUS_ERROR) {
        fprintf(stderr,
            "Cannot parse %s, file may be too large or not well-formed XML\n",
            filename);
        return (1);
    }
    fclose(f);
    XML_ParserFree(parser);
    fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
        filename);
    return (0);
}

With libxml2, a program which displays the name of the root element and the names of its children:

/*
   Simple test with libxml2 <http://xmlsoft.org>. It displays the name
   of the root element and the names of all its children (not
   descendents, just children).

   On Debian, compiles with:
   gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
                    read-xml2.c    

*/

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    xmlDoc         *document;
    xmlNode        *root, *first_child, *node;
    char           *filename;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
        return 1;
    }
    filename = argv[1];

    document = xmlReadFile(filename, NULL, 0);
    root = xmlDocGetRootElement(document);
    fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
    first_child = root->children;
    for (node = first_child; node; node = node->next) {
        fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
    }
    fprintf(stdout, "...\n");
    return 0;
}

How about one written in pure assembler :-) Don't forget to check out the benchmarks.


Expat is pretty decent. It's hard to give good recommendations without more information though.


For C++ I suggest using CMarkup.


On Windows, it's native with Win32 api...


Expat is pretty decent. It's hard to give good recommendations without more information though.


Could you give some indication of what platforms you are writing for? This should weigh heavily on what is 'best'. You might find a super 'xml-foo' library that does not ship commonly on most systems by default .. while its great, the lack of the library might prevent (or at least) annoy users.

Mostly, I use libxml2 .. because its standard or easy to install on the platforms that I target.

As you see, 'best' is also determined by the library being available on your target platforms.


You can try ezxml -- it's a lightweight parser written entirely in C.

For C++ you can check out TinyXML++


Expat is pretty decent. It's hard to give good recommendations without more information though.


http://www.minixml.org is also pretty good. Small and just ANSI C.


You can try ezxml -- it's a lightweight parser written entirely in C.

For C++ you can check out TinyXML++


Could you give some indication of what platforms you are writing for? This should weigh heavily on what is 'best'. You might find a super 'xml-foo' library that does not ship commonly on most systems by default .. while its great, the lack of the library might prevent (or at least) annoy users.

Mostly, I use libxml2 .. because its standard or easy to install on the platforms that I target.

As you see, 'best' is also determined by the library being available on your target platforms.


How about one written in pure assembler :-) Don't forget to check out the benchmarks.


You can try ezxml -- it's a lightweight parser written entirely in C.

For C++ you can check out TinyXML++


Could you give some indication of what platforms you are writing for? This should weigh heavily on what is 'best'. You might find a super 'xml-foo' library that does not ship commonly on most systems by default .. while its great, the lack of the library might prevent (or at least) annoy users.

Mostly, I use libxml2 .. because its standard or easy to install on the platforms that I target.

As you see, 'best' is also determined by the library being available on your target platforms.


Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).

With expat, reading a XML file and displaying the elements indented:

/* 
   A simple test program to parse XML documents with expat
   <http://expat.sourceforge.net/>. It just displays the element
   names.

   On Debian, compile with:

   gcc -Wall -o expat-test -lexpat expat-test.c  

   Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html> 
*/

#include <expat.h>
#include <stdio.h>
#include <string.h>

/* Keep track of the current level in the XML tree */
int             Depth;

#define MAXCHARS 1000000

void
start(void *data, const char *el, const char **attr)
{
    int             i;

    for (i = 0; i < Depth; i++)
        printf("  ");

    printf("%s", el);

    for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
    }

    printf("\n");
    Depth++;
}               /* End of start handler */

void
end(void *data, const char *el)
{
    Depth--;
}               /* End of end handler */

int
main(int argc, char **argv)
{

    char           *filename;
    FILE           *f;
    size_t          size;
    char           *xmltext;
    XML_Parser      parser;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return (1);
    }
    filename = argv[1];
    parser = XML_ParserCreate(NULL);
    if (parser == NULL) {
        fprintf(stderr, "Parser not created\n");
        return (1);
    }
    /* Tell expat to use functions start() and end() each times it encounters
     * the start or end of an element. */
    XML_SetElementHandler(parser, start, end);
    f = fopen(filename, "r");
    xmltext = malloc(MAXCHARS);
    /* Slurp the XML file in the buffer xmltext */
    size = fread(xmltext, sizeof(char), MAXCHARS, f);
    if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
        XML_STATUS_ERROR) {
        fprintf(stderr,
            "Cannot parse %s, file may be too large or not well-formed XML\n",
            filename);
        return (1);
    }
    fclose(f);
    XML_ParserFree(parser);
    fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
        filename);
    return (0);
}

With libxml2, a program which displays the name of the root element and the names of its children:

/*
   Simple test with libxml2 <http://xmlsoft.org>. It displays the name
   of the root element and the names of all its children (not
   descendents, just children).

   On Debian, compiles with:
   gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
                    read-xml2.c    

*/

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    xmlDoc         *document;
    xmlNode        *root, *first_child, *node;
    char           *filename;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
        return 1;
    }
    filename = argv[1];

    document = xmlReadFile(filename, NULL, 0);
    root = xmlDocGetRootElement(document);
    fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
    first_child = root->children;
    for (node = first_child; node; node = node->next) {
        fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
    }
    fprintf(stdout, "...\n");
    return 0;
}

My personal preference is libxml2. It's very easy to use but I never bothered to benchmark it, as I've only used it for configuration file parsing.


On Windows, it's native with Win32 api...


Could you give some indication of what platforms you are writing for? This should weigh heavily on what is 'best'. You might find a super 'xml-foo' library that does not ship commonly on most systems by default .. while its great, the lack of the library might prevent (or at least) annoy users.

Mostly, I use libxml2 .. because its standard or easy to install on the platforms that I target.

As you see, 'best' is also determined by the library being available on your target platforms.


http://www.minixml.org is also pretty good. Small and just ANSI C.


Two examples with expat and libxml2. The second one is, IMHO, much easier to use since it creates a tree in memory, a data structure which is easy to work with. expat, on the other hand, does not build anything (you have to do it yourself), it just allows you to call handlers at specific events during the parsing. But expat may be faster (I didn't measure).

With expat, reading a XML file and displaying the elements indented:

/* 
   A simple test program to parse XML documents with expat
   <http://expat.sourceforge.net/>. It just displays the element
   names.

   On Debian, compile with:

   gcc -Wall -o expat-test -lexpat expat-test.c  

   Inspired from <http://www.xml.com/pub/a/1999/09/expat/index.html> 
*/

#include <expat.h>
#include <stdio.h>
#include <string.h>

/* Keep track of the current level in the XML tree */
int             Depth;

#define MAXCHARS 1000000

void
start(void *data, const char *el, const char **attr)
{
    int             i;

    for (i = 0; i < Depth; i++)
        printf("  ");

    printf("%s", el);

    for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
    }

    printf("\n");
    Depth++;
}               /* End of start handler */

void
end(void *data, const char *el)
{
    Depth--;
}               /* End of end handler */

int
main(int argc, char **argv)
{

    char           *filename;
    FILE           *f;
    size_t          size;
    char           *xmltext;
    XML_Parser      parser;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return (1);
    }
    filename = argv[1];
    parser = XML_ParserCreate(NULL);
    if (parser == NULL) {
        fprintf(stderr, "Parser not created\n");
        return (1);
    }
    /* Tell expat to use functions start() and end() each times it encounters
     * the start or end of an element. */
    XML_SetElementHandler(parser, start, end);
    f = fopen(filename, "r");
    xmltext = malloc(MAXCHARS);
    /* Slurp the XML file in the buffer xmltext */
    size = fread(xmltext, sizeof(char), MAXCHARS, f);
    if (XML_Parse(parser, xmltext, strlen(xmltext), XML_TRUE) ==
        XML_STATUS_ERROR) {
        fprintf(stderr,
            "Cannot parse %s, file may be too large or not well-formed XML\n",
            filename);
        return (1);
    }
    fclose(f);
    XML_ParserFree(parser);
    fprintf(stdout, "Successfully parsed %i characters in file %s\n", size,
        filename);
    return (0);
}

With libxml2, a program which displays the name of the root element and the names of its children:

/*
   Simple test with libxml2 <http://xmlsoft.org>. It displays the name
   of the root element and the names of all its children (not
   descendents, just children).

   On Debian, compiles with:
   gcc -Wall -o read-xml2 $(xml2-config --cflags) $(xml2-config --libs) \
                    read-xml2.c    

*/

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    xmlDoc         *document;
    xmlNode        *root, *first_child, *node;
    char           *filename;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s filename.xml\n", argv[0]);
        return 1;
    }
    filename = argv[1];

    document = xmlReadFile(filename, NULL, 0);
    root = xmlDocGetRootElement(document);
    fprintf(stdout, "Root is <%s> (%i)\n", root->name, root->type);
    first_child = root->children;
    for (node = first_child; node; node = node->next) {
        fprintf(stdout, "\t Child is <%s> (%i)\n", node->name, node->type);
    }
    fprintf(stdout, "...\n");
    return 0;
}

Expat is pretty decent. It's hard to give good recommendations without more information though.


My personal preference is libxml2. It's very easy to use but I never bothered to benchmark it, as I've only used it for configuration file parsing.


How about one written in pure assembler :-) Don't forget to check out the benchmarks.


You can try ezxml -- it's a lightweight parser written entirely in C.

For C++ you can check out TinyXML++


How about one written in pure assembler :-) Don't forget to check out the benchmarks.


Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?