[java] Java : Convert formatted xml file to one line string

I have a formatted XML file, and I want to convert it to one line string, how can I do that.

Sample xml:

<?xml version="1.0" encoding="UTF-8"?>
<books>
   <book>
       <title>Basic XML</title>
       <price>100</price>
       <qty>5</qty>
   </book>
   <book>
     <title>Basic Java</title>
     <price>200</price>
     <qty>15</qty>
   </book>
</books>

Expected output

<?xml version="1.0" encoding="UTF-8"?><books><book> <title>Basic XML</title><price>100</price><qty>5</qty></book><book><title>Basic Java</title><price>200</price><qty>15</qty></book></books>

Thanks in advance.

This question is related to java xml string

The answer is


Using this answer which provides the code to use Dom4j to do pretty-printing, change the line that sets the output format from: createPrettyPrint() to: createCompactFormat()

public String unPrettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in unPrettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createCompactFormat();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error un-pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}

Run it through an XSLT identity transform with <xsl:output indent="no"> and <xsl:strip-space elements="*"/>

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="no" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

It will remove any of the non-significant whitespace and produce the expected output that you posted.


// 1. Read xml from file to StringBuilder (StringBuffer)
// 2. call s = stringBuffer.toString()
// 3. remove all "\n" and "\t": 
s.replaceAll("\n",""); 
s.replaceAll("\t","");

edited:

I made a small mistake, it is better to use StringBuilder in your case (I suppose you don't need thread-safe StringBuffer)


FileUtils.readFileToString(fileName);

link


Underscore-java library has static method U.formatXml(xmlstring). I am the maintainer of the project. Live example

import com.github.underscore.lodash.U;
import com.github.underscore.lodash.Xml;

public class MyClass {
    public static void main(String[] args) {
        System.out.println(U.formatXml("<a>\n  <b></b>\n  <b></b>\n</a>",
        Xml.XmlStringBuilder.Step.COMPACT));
    }
}

// output: <a><b></b><b></b></a>

Open and read the file.

Reader r = new BufferedReader(filename);
String ret = "";
while((String s = r.nextLine()!=null)) 
{
  ret+=s;
}
return ret;

In java 1.8 and above

BufferedReader br = new BufferedReader(new FileReader(filePath));
String content = br.lines().collect(Collectors.joining("\n"));

The above solutions work if you are compressing all white space in the XML document. Other quick options are JDOM (using Format.getCompactFormat()) and dom4j (using OutputFormat.createCompactFormat()) when outputting the XML document.

However, I had a unique requirement to preserve the white space contained within the element's text value and these solutions did not work as I needed. All I needed was to remove the 'pretty-print' formatting added to the XML document.

The solution that I came up with can be explained in the following 3-step/regex process ... for the sake of understanding the algorithm for the solution.

String regex, updatedXml;

// 1. remove all white space preceding a begin element tag:
regex = "[\\n\\s]+(\\<[^/])";
updatedXml = originalXmlStr.replaceAll( regex, "$1" );

// 2. remove all white space following an end element tag:
regex = "(\\</[a-zA-Z0-9-_\\.:]+\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

// 3. remove all white space following an empty element tag
// (<some-element xmlns:attr1="some-value".... />):
regex = "(/\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

NOTE: The pseudo-code is in Java ... the '$1' is the replacement string which is the 1st capture group.

This will simply remove the white space used when adding the 'pretty-print' format to an XML document, yet preserve all other white space when it is part of the element text value.


I guess you want to read in, ignore the white space, and write it out again. Most XML packages have an option to ignore white space. For example, the DocumentBuilderFactory has setIgnoringElementContentWhitespace for this purpose.

Similarly if you are generating the XML by marshaling an object then JAXB has JAXB_FORMATTED_OUTPUT


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript