[java] UTF-8 text is garbled when form is posted as multipart/form-data

I'm uploading a file to the server. The file upload HTML form has 2 fields:

  1. File name - A HTML text box where the user can give a name in any language.
  2. File upload - A HTMl 'file' where user can specify a file from disk to upload.

When the form is submitted, the file contents are received properly. However, when the file name (point 1 above) is read, it is garbled. ASCII characters are displayed properly. When the name is given in some other language (German, French etc.), there are problems.

In the servlet method, the request's character encoding is set to UTF-8. I even tried doing a filter as mentioned - How can I make this code to submit a UTF-8 form textarea with jQuery/Ajax work? - but it doesn't seem to work. Only the filename seems to be garbled.

The MySQL table where the file name goes supports UTF-8. I gave random non-English characters & they are stored/displayed properly.

Using Fiddler, I monitored the request & all the POST data is passed correctly. I'm trying to identify how/where the data could get garbled. Any help will be greatly appreciated.

This question is related to java jakarta-ee

The answer is


Just use Apache commons upload library. Add URIEncoding="UTF-8" to Tomcat's connector, and use FileItem.getString("UTF-8") instead of FileItem.getString() without charset specified.

Hope this help.


I think i'am late for the party but when you use a wildfly, you can add an default-encoding to the standalone.xml. Just search in the standalone.xml for

<servlet-container name="default"> 

and add encoding like this:

<servlet-container name="default" default-encoding="UTF-8">

I had the same problem. The only solution that worked for me was adding <property = "defaultEncoding" value = "UTF-8"> to multipartResoler in spring configurations file.


In case someone stumbled upon this problem when working on Grails (or pure Spring) web application, here is the post that helped me:

http://forum.spring.io/forum/spring-projects/web/2491-solved-character-encoding-and-multipart-forms

To set default encoding to UTF-8 (instead of the ISO-8859-1) for multipart requests, I added the following code in resources.groovy (Spring DSL):

multipartResolver(ContentLengthAwareCommonsMultipartResolver) {
    defaultEncoding = 'UTF-8'
}

I had the same problem and it turned out that in addition to specifying the encoding in the Filter

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

it is necessary to add "acceptcharset" to the form

<form method="post" enctype="multipart/form-data" acceptcharset="UTF-8" > 

and run the JVM with

-Dfile.encoding=UTF-8

The HTML meta tag is not necessary if you send it in the HTTP header using response.setCharacterEncoding().


The filter is key for IE. A few other things to check;

What is the page encoding and character set? Both should be UTF-8

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

What is the character set in the meta tag?

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Does your MySQL connection string specify UTF-8? e.g.

jdbc:mysql://127.0.0.1/dbname?requireSSL=false&useUnicode=true&characterEncoding=UTF-8

You also have to make sure that your encoding filter (org.springframework.web.filter.CharacterEncodingFilter) in your web.xml is mapped before the multipart filter (org.springframework.web.multipart.support.MultipartFilter).


I am using Primefaces with glassfish and SQL Server.

in my case i created the Webfilter, in back-end, to get every request and convert to UTF-8, like this:

package br.com.teste.filter;

import java.io.IOException;

import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebFilter;

@WebFilter(servletNames={"Faces Servlet"})
public class Filter implements javax.servlet.Filter {

    @Override
    public void destroy() {
        // TODO Auto-generated method stub

    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response,
            FilterChain chain) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);      
    }

    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // TODO Auto-generated method stub      
    }

}

In the View (.xhtml) i need to set the enctype paremeter's form to UTF-8 like @Kevin Rahe:

    <h:form id="frmt" enctype="multipart/form-data;charset=UTF-8" >
         <!-- your code here -->
    </h:form>  

The filter thing and setting up Tomcat to support UTF-8 URIs is only important if you're passing the via the URL's query string, as you would with a HTTP GET. If you're using a POST, with a query string in the HTTP message's body, what's important is going to be the content-type of the request and this will be up to the browser to set the content-type to UTF-8 and send the content with that encoding.

The only way to really do this is by telling the browser that you can only accept UTF-8 by setting the Accept-Charset header on every response to "UTF-8;q=1,ISO-8859-1;q=0.6". This will put UTF-8 as the best quality and the default charset, ISO-8859-1, as acceptable, but a lower quality.

When you say the file name is garbled, is it garbled in the HttpServletRequest.getParameter's return value?


I'm using org.apache.commons.fileupload.servlet.ServletFileUpload.ServletFileUpload(FileItemFactory) and defining the encoding when reading out parameter value:

List<FileItem> items = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);

for (FileItem item : items) {
    String fieldName = item.getFieldName();

    if (item.isFormField()) {
        String fieldValue = item.getString("UTF-8"); // <-- HERE

To avoid converting all request parameters manually to UTF-8, you can define a method annotated with @InitBinder in your controller:

@InitBinder
protected void initBinder(WebDataBinder binder) {
    binder.registerCustomEditor(String.class, new CharacterEditor(true) {
        @Override
        public void setAsText(String text) throws IllegalArgumentException {
            String properText = new String(text.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
            setValue(properText);
        }
    });
}

The above will automatically convert all request parameters to UTF-8 in the controller where it is defined.


I got stuck with this problem and found that it was the order of the call to

request.setCharacterEncoding("UTF-8");

that was causing the problem. It has to be called before any all call to request.getParameter(), so I made a special filter to use at the top of my filter chain.

https://rogerkeays.com/servletrequest-setcharactercoding-ignored


You do not use UTF-8 to encode text data for HTML forms. The html standard defines two encodings, and the relevant part of that standard is here. The "old" encoding, than handles ascii, is application/x-www-form-urlencoded. The new one, that works properly, is multipart/form-data.

Specifically, the form declaration looks like this:

 <FORM action="http://server.com/cgi/handle"
       enctype="multipart/form-data"
       method="post">
   <P>
   What is your name? <INPUT type="text" name="submit-name"><BR>
   What files are you sending? <INPUT type="file" name="files"><BR>
   <INPUT type="submit" value="Send"> <INPUT type="reset">
 </FORM>

And I think that's all you have to worry about - the webserver should handle it. If you are writing something that directly reads the InputStream from the web client, then you will need to read RFC 2045 and RFC 2046.