Content is not allowed in prolog when parsing perfectly valid XML on GAE

Question

I ve been beating my head against this absolutely infuriating bug for the last 48 hours  so I thought I d finally throw in the towel and try asking here before I throw my laptop out the window   I m trying to parse the response XML from a call I made to AWS SimpleDB  The response is coming back on the wire just fine  for example  it may look like    lt  xml version  1 0  encoding  utf-8   gt    lt ListDomainsResponse xmlns  http   sdb amazonaws com doc 2009-04-15   gt       lt ListDomainsResult gt           lt DomainName gt Audio lt  DomainName gt           lt DomainName gt Course lt  DomainName gt           lt DomainName gt DocumentContents lt  DomainName gt           lt DomainName gt LectureSet lt  DomainName gt           lt DomainName gt MetaData lt  DomainName gt           lt DomainName gt Professors lt  DomainName gt           lt DomainName gt Tag lt  DomainName gt       lt  ListDomainsResult gt       lt ResponseMetadata gt           lt RequestId gt 42330b4a-e134-6aec-e62a-5869ac2b4575 lt  RequestId gt           lt BoxUsage gt 0 0000071759 lt  BoxUsage gt       lt  ResponseMetadata gt   lt  ListDomainsResponse gt    I pass in this XML to a parser with   XMLEventReader eventReader   xmlInputFactory createXMLEventReader response getContent       and call eventReader nextEvent    a bunch of times to get the data I want   Here s the bizarre part -- it works great inside the local server  The response comes in  I parse it  everyone s happy  The problem is that when I deploy the code to Google App Engine  the outgoing request still works  and the response XML seems 100  identical and correct to me  but the response fails to parse with the following exception   com amazonaws http HttpClient handleResponse  Unable to unmarshall response  ParseError at  row col   1 1  Message  Content is not allowed in prolog     lt  xml version  1 0  encoding  utf-8   gt    lt ListDomainsResponse xmlns  http   sdb amazonaws com doc 2009-04-15   gt  lt ListDomainsResult gt  lt DomainName gt Audio lt  DomainName gt  lt DomainName gt Course lt  DomainName gt  lt DomainName gt DocumentContents lt  DomainName gt  lt DomainName gt LectureSet lt  DomainName gt  lt DomainName gt MetaData lt  DomainName gt  lt DomainName gt Professors lt  DomainName gt  lt DomainName gt Tag lt  DomainName gt  lt  ListDomainsResult gt  lt ResponseMetadata gt  lt RequestId gt 42330b4a-e134-6aec-e62a-5869ac2b4575 lt  RequestId gt  lt BoxUsage gt 0 0000071759 lt  BoxUsage gt  lt  ResponseMetadata gt  lt  ListDomainsResponse gt  javax xml stream XMLStreamException  ParseError at  row col   1 1  Message  Content is not allowed in prolog      at com sun org apache xerces internal impl XMLStreamReaderImpl next Unknown Source      at com sun xml internal stream XMLEventReaderImpl nextEvent Unknown Source      at com amazonaws transform StaxUnmarshallerContext nextEvent StaxUnmarshallerContext java 153           rest of lines omitted    I have double  triple  quadruple checked this XML for  invisible characters  or non-UTF8 encoded characters  etc  I looked at it byte-by-byte in an array for byte-order-marks or something of that nature  Nothing  it passes every validation test I could throw at it  Even stranger  it happens if I use a Saxon-based parser as well -- but ONLY on GAE  it always works fine in my local environment   It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly  I haven t found any good way to remotely debug on GAE   Nevertheless  using the primitive means I have  I ve tried a million approaches including    XML with and without the prolog With and without newlines With and without the  encoding   attribute in the prolog Both newline styles With and without the chunking information present in the HTTP stream   And I ve tried most of these in multiple combinations where it made sense they would interact -- nothing  I m at my wit s end  Has anyone seen an issue like this before that can hopefully shed some light on it   Thanks

User · Answer

In my xml file  the header looked like this     lt  xml version  1 0  encoding  utf-16     gt    In a test file  I was reading the file bytes and decoding the data as UTF-8  not realizing the header in this file was utf-16  to create a string   byte   data   Files readAllBytes Paths get path    String dataString   new String data   UTF-8      When I tried to deserialize this string into an object  I was seeing the same error   javax xml stream XMLStreamException  ParseError at  row col   1 1  Message  Content is not allowed in prolog    When I updated the second line to  String dataString   new String data   UTF-16      I was able to deserialize the object just fine   So as Romain had noted above  the encodings need to match

User · Answer

This error message is always caused by the invalid XML content in the beginning element  For example  extra small dot         in the beginning of XML element   Any characters before the     lt  xml        will cause above    org xml sax SAXParseException  Content is not allowed in prolog    error message   A small dot         before the     lt  xml      To fix it  just delete all those weird characters before the     lt  xml      Ref  http   www mkyong com java sax-error-content-is-not-allowed-in-prolog

User · Answer

I catched the same error message today  The solution was to change the document from UTF-8 with BOM to UTF-8 without BOM

User · Answer

I was facing the same issue  In my case XML files were generated from c  program and feeded into AS400 for further processing  After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac in AS400  uses  UTF8 without BOM    So  had to write extra code similar to mentioned below     create encoding with no BOM Encoding outputEnc   new UTF8Encoding false      open file with encoding TextWriter file   new StreamWriter filePath  false  outputEnc               file Write doc InnerXml   file Flush    file Close       save and close it

User · Answer

I had a tab character instead of spaces  Replacing the tab   t  fixed the problem   Cut and paste the whole doc into an editor like Notepad   and display all characters

User · Answer

The encoding in your XML and XSD  or DTD  are different  XML file header    lt  xml version  1 0  encoding  utf-8   gt  XSD file header   lt  xml version  1 0  encoding  utf-16   gt   Another possible scenario that causes this is when anything comes before the XML document type declaration  i e you might have something like this in the buffer     helloworld lt  xml version  1 0  encoding  utf-8   gt      or even a space or special character   There are some special characters called byte order markers that could be in the buffer  Before passing the buffer to the Parser do this       String xml     lt  xml       xml   xml trim   replaceFirst       W    lt     lt

User · Answer

Removing the xml declaration solved it   lt  xml version  1 0  encoding  utf-8   gt

User · Answer

I had issue while inspecting the xml file in notepad   and saving the file  though I had the top utf-8 xml tag as  lt  xml version  1 0  encoding  utf-8   gt   Got fixed by saving the file in notpad   with Encoding Tab    Encode in UTF-8 selected  was Encode in UTF-8-BOM

User · Answer

In the spirit of  just delete all those weird characters before the  lt  xml   here s my Java code  which works well with input via a BufferedReader       BufferedReader test   new BufferedReader new InputStreamReader fisTest        test mark 4       while  true            int earlyChar   test read            System out println earlyChar           if  earlyChar    60                test reset                break            else               test mark 4                     FWIW  the bytes I was seeing are  in decimal   239  187  191

User · Answer

In my instance of the problem  the solution was to replace german umlauts          with their HTML-equivalents

User · Answer

Unexpected reason    character in file path Due to some internal bug  the error Content is not allowed in prolog also appears if the file content itself is 100  correct but you are supplying the file name like C  Data  22 file xml  This may possibly apply to other special characters  too  How to check  If you move your file into a path without special characters and the error disappears  then it was this issue

User · Answer

bellow are cause above    org xml sax SAXParseException  Content is not allowed in prolog    exception    First check the file path of schema xsd and file xml  The encoding in your XML and XSD  or DTD  should be same  XML file header    lt  xml version  1 0  encoding  utf-8   gt  XSD file header   lt  xml version  1 0  encoding  utf-8   gt  if anything comes before the XML document type declaration i e  hello lt  xml version  1 0  encoding  utf-16   gt

User · Answer

I zipped the xml in a Mac OS and sent it to a Windows machine  the default compression changes these files so the encoding sent this message

User · Answer

I was facing the same problem called  Content is not allowed in prolog  in my xml file   Solution  Initially my root folder was   Filename    When i removed the first character      the error got resolved   No need of removing the  filename     Try in this way     Instead of passing a File or URL object to the unmarshaller method  use a FileInputStream   File myFile   new File              Object obj   unmarshaller unmarshal new FileInputStream myFile

[java] "Content is not allowed in prolog" when parsing perfectly valid XML on GAE

Examples related to java

Examples related to xml

Examples related to google-app-engine

Examples related to parsing

Examples related to stax