[java] Java: splitting the filename into a base and extension

Is there a better way to get file basename and extension than something like

File f = ...
String name = f.getName();
int dot = name.lastIndexOf('.');
String base = (dot == -1) ? name : name.substring(0, dot);
String extension = (dot == -1) ? "" : name.substring(dot+1);

This question is related to java file

The answer is


I know others have mentioned String.split, but here is a variant that only yields two tokens (the base and the extension):

String[] tokens = fileName.split("\\.(?=[^\\.]+$)");

For example:

"test.cool.awesome.txt".split("\\.(?=[^\\.]+$)");

Yields:

["test.cool.awesome", "txt"]

The regular expression tells Java to split on any period that is followed by any number of non-periods, followed by the end of input. There is only one period that matches this definition (namely, the last period).

Technically Regexically speaking, this technique is called zero-width positive lookahead.


BTW, if you want to split a path and get the full filename including but not limited to the dot extension, using a path with forward slashes,

    String[] tokens = dir.split(".+?/(?=[^/]+$)");

For example:

    String dir = "/foo/bar/bam/boozled"; 
    String[] tokens = dir.split(".+?/(?=[^/]+$)");
    // [ "/foo/bar/bam/" "boozled" ] 

What's wrong with your code? Wrapped in a neat utility method it's fine.

What's more important is what to use as separator — the first or last dot. The first is bad for file names like "setup-2.5.1.exe", the last is bad for file names with multiple extensions like "mybundle.tar.gz".


You can also user java Regular Expression. String.split() also uses the expression internally. Refer http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html


http://docs.oracle.com/javase/6/docs/api/java/io/File.html#getName()

From http://www.xinotes.org/notes/note/774/ :

Java has built-in functions to get the basename and dirname for a given file path, but the function names are not so self-apparent.

import java.io.File;

public class JavaFileDirNameBaseName {
    public static void main(String[] args) {
    File theFile = new File("../foo/bar/baz.txt");
    System.out.println("Dirname: " + theFile.getParent());
    System.out.println("Basename: " + theFile.getName());
    }
}

File extensions are a broken concept

And there exists no reliable function for it. Consider for example this filename:

archive.tar.gz

What is the extension? DOS users would have preferred the name archive.tgz. Sometimes you see stupid Windows applications that first decompress the file (yielding a .tar file), then you have to open it again to see the archive contents.

In this case, a more reasonable notion of file extension would have been .tar.gz. There are also .tar.bz2, .tar.xz, .tar.lz and .tar.lzma file "extensions" in use. But how would you decide, whether to split at the last dot, or the second-to-last dot?

Use mime-types instead.

The Java 7 function Files.probeContentType will likely be much more reliable to detect file types than trusting the file extension. Pretty much all the Unix/Linux world as well as your Webbrowser and Smartphone already does it this way.


Old question but I usually use this solution:

import org.apache.commons.io.FilenameUtils;

String fileName = "/abc/defg/file.txt";

String basename = FilenameUtils.getBaseName(fileName);
String extension = FilenameUtils.getExtension(fileName);
System.out.println(basename); // file
System.out.println(extension); // txt (NOT ".txt" !)

Source: http://www.java2s.com/Code/Java/File-Input-Output/Getextensionpathandfilename.htm

such an utility class :

class Filename {
  private String fullPath;
  private char pathSeparator, extensionSeparator;

  public Filename(String str, char sep, char ext) {
    fullPath = str;
    pathSeparator = sep;
    extensionSeparator = ext;
  }

  public String extension() {
    int dot = fullPath.lastIndexOf(extensionSeparator);
    return fullPath.substring(dot + 1);
  }

  public String filename() { // gets filename without extension
    int dot = fullPath.lastIndexOf(extensionSeparator);
    int sep = fullPath.lastIndexOf(pathSeparator);
    return fullPath.substring(sep + 1, dot);
  }

  public String path() {
    int sep = fullPath.lastIndexOf(pathSeparator);
    return fullPath.substring(0, sep);
  }
}

usage:

public class FilenameDemo {
  public static void main(String[] args) {
    final String FPATH = "/home/mem/index.html";
    Filename myHomePage = new Filename(FPATH, '/', '.');
    System.out.println("Extension = " + myHomePage.extension());
    System.out.println("Filename = " + myHomePage.filename());
    System.out.println("Path = " + myHomePage.path());
  }
}

Maybe you could use String#split

To answer your comment:

I'm not sure if there can be more than one . in a filename, but whatever, even if there are more dots you can use the split. Consider e.g. that:

String input = "boo.and.foo";

String[] result = input.split(".");

This will return an array containing:

{ "boo", "and", "foo" }

So you will know that the last index in the array is the extension and all others are the base.