[java] Getting A File's Mime Type In Java

I was just wondering how most people fetch a mime type from a file in Java? So far I've tried two utils: JMimeMagic & Mime-Util.

The first gave me memory exceptions, the second doesn't close its streams off properly. I was just wondering if anyone else had a method/library that they used and worked correctly?

This question is related to java mime

The answer is


Because there's so many answers linking to libraries, or non-portable code; I thought I'd share an alternative way by simply checking the magic bytes of the stream or file that you want to know the type of, as I've shown here : https://stackoverflow.com/a/65667558/3225638

It uses native java, but requires you to define in the enum the types you would want to handle/detect beforehand, but you'd only have to do it once.


Apache Tika offers in tika-core a mime type detection based based on magic markers in the stream prefix. tika-core does not fetch other dependencies, which makes it as lightweight as the currently unmaintained Mime Type Detection Utility.

Simple code example (Java 7), using the variables theInputStream and theFileName

try (InputStream is = theInputStream;
        BufferedInputStream bis = new BufferedInputStream(is);) {
    AutoDetectParser parser = new AutoDetectParser();
    Detector detector = parser.getDetector();
    Metadata md = new Metadata();
    md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
    MediaType mediaType = detector.detect(bis, md);
    return mediaType.toString();
}

Please note that MediaType.detect(...) cannot be used directly (TIKA-1120). More hints are provided at https://tika.apache.org/1.24/detection.html.


From roseindia:

FileNameMap fileNameMap = URLConnection.getFileNameMap();
String mimeType = fileNameMap.getContentTypeFor("alert.gif");

If you are stuck with java 5-6 then this utility class from servoy open source product.

You only need this function

public static String getContentType(byte[] data, String name)

It probes the first bytes of the content and returns the content types based on that content and not by file extension.


in spring MultipartFile file;

org.springframework.web.multipart.MultipartFile

file.getContentType();


I did it with following code.

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class MimeFileType {

    public static void main(String args[]){

        try{
            URL url = new URL ("https://www.url.com.pdf");

            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            InputStream content = (InputStream)connection.getInputStream();
            connection.getHeaderField("Content-Type");

            System.out.println("Content-Type "+ connection.getHeaderField("Content-Type"));

            BufferedReader in = new BufferedReader (new InputStreamReader(content));

        }catch (Exception e){

        }
    }
}

I tried several ways to do it, including the first ones said by @Joshua Fox. But some don't recognize frequent mimetypes like for PDF files, and other could not be trustable with fake files (I tried with a RAR file with extension changed to TIF). The solution I found, as also is said by @Joshua Fox in a superficial way, is to use MimeUtil2, like this:

MimeUtil2 mimeUtil = new MimeUtil2();
mimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
String mimeType = MimeUtil2.getMostSpecificMimeType(mimeUtil.getMimeTypes(file)).toString();

After trying various other libraries I settled with mime-util.

<groupId>eu.medsea.mimeutil</groupId>
      <artifactId>mime-util</artifactId>
      <version>2.1.3</version>
</dependency>

File file = new File("D:/test.tif");
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
Collection<?> mimeTypes = MimeUtil.getMimeTypes(file);
System.out.println(mimeTypes);

With Apache Tika you need only three lines of code:

File file = new File("/path/to/file");
Tika tika = new Tika();
System.out.println(tika.detect(file));

If you have a groovy console, just paste and run this code to play with it:

@Grab('org.apache.tika:tika-core:1.14')
import org.apache.tika.Tika;

def tika = new Tika()
def file = new File("/path/to/file")
println tika.detect(file)

Keep in mind that its APIs are rich, it can parse "anything". As of tika-core 1.14, you have:

String  detect(byte[] prefix)
String  detect(byte[] prefix, String name)
String  detect(File file)
String  detect(InputStream stream)
String  detect(InputStream stream, Metadata metadata)
String  detect(InputStream stream, String name)
String  detect(Path path)
String  detect(String name)
String  detect(URL url)

See the apidocs for more information.


If you are working with a Servlet and if the servlet context is available to you, you can use :

getServletContext().getMimeType( fileName );

I couldn't find anything to check for video/mp4 MIME type so I made my own solution. I happened to observe that Wikipedia was wrong and that the 00 00 00 18 66 74 79 70 69 73 6F 6D file signature is not correct. the fourth byte (18) and all 70 (excluded) after changes quite a lot amongst otherwise valid mp4 files.

This code is essentially a copy/paste of URLConnection.guessContentTypeFromStream code but tailored to video/mp4.

BufferedInputStream bis = new BufferedInputStream(new ByteArrayInputStream(content));
String mimeType = URLConnection.guessContentTypeFromStream(bis);

// Goes full barbaric and processes the bytes manually
if (mimeType == null){
    // These ints converted in hex ar:
    // 00 00 00 18 66 74 79 70 69 73 6F 6D
    // which are the file signature (magic bytes) for .mp4 files
    // from https://www.wikiwand.com/en/List_of_file_signatures
    // just ctrl+f "mp4"
    int[] mp4_sig = {0, 0, 0, 24, 102, 116, 121, 112};

    bis.reset();
    bis.mark(16);
    int[] firstBytes = new int[8];
    for (int i = 0; i < 8; i++) {
        firstBytes[i] = bis.read();
    }
    // This byte doesn't matter for the file signature and changes
    mp4_sig[3] = content[3];

    bis.reset();
    if (Arrays.equals(firstBytes, mp4_sig)){
        mimeType = "video/mp4";
    }
}

Tested successfully against 10 different .mp4 files.

EDIT: Here is a useful link (if it is still online) where you can find samples of many types. I don't own those videos, don't know who does either, but they're useful for testing the above code.


public String getFileContentType(String fileName) {
    String fileType = "Undetermined";
    final File file = new File(fileName);
    try
    {
        fileType = Files.probeContentType(file.toPath());
    }
    catch (IOException ioException)
    {
        System.out.println(
                "ERROR: Unable to determine file type for " + fileName
                        + " due to exception " + ioException);
    }
    return fileType;
}

If you're an Android developer, you can use a utility class android.webkit.MimeTypeMap which maps MIME-types to file extensions and vice versa.

Following code snippet may help you.

private static String getMimeType(String fileUrl) {
    String extension = MimeTypeMap.getFileExtensionFromUrl(fileUrl);
    return MimeTypeMap.getSingleton().getMimeTypeFromExtension(extension);
}

Simple and Best option retrieve the content mime type from the file location.

Use this imports

import java.nio.file.Files;
import java.nio.file.Path;

Code

String type = Files.probeContentType(Path.of(imagePath));

if you work on linux OS ,there is a command line file --mimetype:

String mimetype(file){

   //1. run cmd
   Object cmd=Runtime.getRuntime().exec("file --mime-type "+file);

   //2 get output of cmd , then 
    //3. parse mimetype
    if(output){return output.split(":")[1].trim(); }
    return "";
}

Then

mimetype("/home/nyapp.war") //  'application/zip'

mimetype("/var/www/ggg/au.mp3") //  'audio/mp3'

Unfortunately,

mimeType = file.toURL().openConnection().getContentType();

does not work, since this use of URL leaves a file locked, so that, for example, it is undeletable.

However, you have this:

mimeType= URLConnection.guessContentTypeFromName(file.getName());

and also the following, which has the advantage of going beyond mere use of file extension, and takes a peek at content

InputStream is = new BufferedInputStream(new FileInputStream(file));
mimeType = URLConnection.guessContentTypeFromStream(is);
 //...close stream

However, as suggested by the comment above, the built-in table of mime-types is quite limited, not including, for example, MSWord and PDF. So, if you want to generalize, you'll need to go beyond the built-in libraries, using, e.g., Mime-Util (which is a great library, using both file extension and content).


This is the simplest way I found for doing this:

byte[] byteArray = ...
InputStream is = new BufferedInputStream(new ByteArrayInputStream(byteArray));
String mimeType = URLConnection.guessContentTypeFromStream(is);

I was just wondering how most people fetch a mime type from a file in Java?

I've published my SimpleMagic Java package which allows content-type (mime-type) determination from files and byte arrays. It is designed to read and run the Unix file(1) command magic files that are a part of most ~Unix OS configurations.

I tried Apache Tika but it is huge with tons of dependencies, URLConnection doesn't use the bytes of the files, and MimetypesFileTypeMap also just looks at files names.

With SimpleMagic you can do something like:

// create a magic utility using the internal magic file
ContentInfoUtil util = new ContentInfoUtil();
// if you want to use a different config file(s), you can load them by hand:
// ContentInfoUtil util = new ContentInfoUtil("/etc/magic");
...
ContentInfo info = util.findMatch("/tmp/upload.tmp");
// or
ContentInfo info = util.findMatch(inputStream);
// or
ContentInfo info = util.findMatch(contentByteArray);

// null if no match
if (info != null) {
   String mimeType = info.getMimeType();
}

You can do it with just one line: MimetypesFileTypeMap().getContentType(new File("filename.ext")). Look the complete test code (Java 7):

import java.io.File;
import javax.activation.MimetypesFileTypeMap;
public class MimeTest {
    public static void main(String a[]){
         System.out.println(new MimetypesFileTypeMap().getContentType(
           new File("/path/filename.txt")));
    }
}

This code produces the follow output: text/plain


Apache Tika.

<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.24</version>
</dependency>

and Two line of code.

Tika tika=new Tika();
tika.detect(inputStream);

Screenshot below

enter image description here


To chip in with my 5 cents:

TL,DR

I use MimetypesFileTypeMap and add any mime that is not there and I specifically need it, into mime.types file.

And now, the long read:

First of all, MIME types list is huge, see here: https://www.iana.org/assignments/media-types/media-types.xhtml

I like to use standard facilities provided by JDK first, and if that doesn't work, I'll go and look for something else.

Determine file type from file extension

Since 1.6, Java has MimetypesFileTypeMap, as pointed in one of the answers above, and it is the simplest way to determine mime type:

new MimetypesFileTypeMap().getContentType( fileName );

In its vanilla implementation this does not do much (i.e. it works for .html but it doesn't for .png). It is, however, super simple to add any content type you may need:

  1. Create file named 'mime.types' in META-INF folder in your project
  2. Add a line for every mime type you need and default implementation doesn't provide (there are hundreds of mime types and list grows as time goes by).

Example entries for png and js files would be:

image/png png PNG
application/javascript js

For mime.types file format, see more details here: https://docs.oracle.com/javase/7/docs/api/javax/activation/MimetypesFileTypeMap.html

Determine file type from file content

Since 1.7, Java has java.nio.file.spi.FileTypeDetector, which defines a standard API for determining a file type in implementation specific way.

To fetch mime type for a file, you would simply use Files and do this in your code:

Files.probeContentType(Paths.get("either file name or full path goes here"));

The API definition provides for facilities that support either for determining file mime type from file name or from file content (magic bytes). That is why probeContentType() method throws IOException, in case an implementation of this API uses Path provided to it to actually try to open the file associated with it.

Again, vanilla implementation of this (the one that comes with JDK) leaves a lot to be desired.

In some ideal world in a galaxy far, far away, all these libraries which try to solve this file-to-mime-type problem would simply implement java.nio.file.spi.FileTypeDetector, you would drop in the preferred implementing library's jar file into your classpath and that would be it.

In the real world, the one where you need TL,DR section, you should find the library with most stars next to it's name and use it. For this particular case, I don't need one (yet ;) ).


The JAF API is part of JDK 6. Look at javax.activation package.

Most interesting classes are javax.activation.MimeType - an actual MIME type holder - and javax.activation.MimetypesFileTypeMap - class whose instance can resolve MIME type as String for a file:

String fileName = "/path/to/file";
MimetypesFileTypeMap mimeTypesMap = new MimetypesFileTypeMap();

// only by file name
String mimeType = mimeTypesMap.getContentType(fileName);

// or by actual File instance
File file = new File(fileName);
mimeType = mimeTypesMap.getContentType(file);

File file = new File(PropertiesReader.FILE_PATH);
MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
String mimeType = fileTypeMap.getContentType(file);
URLConnection uconnection = file.toURL().openConnection();
mimeType = uconnection.getContentType();

It is better to use two layer validation for files upload.

First you can check for the mimeType and validate it.

Second you should look to convert the first 4 bytes of your file to hexadecimal and then compare it with the magic numbers. Then it will be a really secure way to check for file validations.