[c++] Fastest way to check if a file exist using standard C++/C++11/C?

I would like to find the fastest way to check if a file exist in standard C++11, C++, or C. I have thousands of files and before doing something on them I need to check if all of them exist. What can I write instead of /* SOMETHING */ in the following function?

inline bool exist(const std::string& name)
{
    /* SOMETHING */
}

This question is related to c++ c file stream

The answer is


You may also do bool b = std::ifstream('filename').good();. Without the branch instructions(like if) it must perform faster as it needs to be called thousands of times.


all_of (begin(R), end(R), [](auto&p){ exists(p); })

where R is your sequence of path-like things, and exists() is from the future std or current boost. If you roll your own, keep it simple,

bool exists (string const& p) { return ifstream{p}; }

The branched solution isn't absolutely terrible and it won't gobble file descriptors,

bool exists (const char* p) {
    #if defined(_WIN32) || defined(_WIN64)
    return p && 0 != PathFileExists (p);
    #else
    struct stat sb;
    return p && 0 == stat (p, &sb);
    #endif
}

there is only one faster way to check if the file exists and if you have permission to read it the way is using C language wish is faster and can be used also in any version in C++

solution: in C there is a library errno.h which has an external (global) integer variable called errno which contains a number that can be used to recognize the type of error

    #include <stdio.h>
    #include <stdbool.h>
    #include <errno.h>

    bool isFileExist(char fileName[]) {
        FILE *fp = fopen(fileName, "r");
        if (fp) {
            fclose(fp);
            return true;
        }
        return errno != ENOENT;
    }

    bool isFileCanBeRead(char fileName[]) {
        FILE *fp = fopen(fileName, "r");
        if (fp) {
            fclose(fp);
            return true;
        }
        return errno != ENOENT && errno != EPERM;
    }

inline bool exist(const std::string& name)
{
    ifstream file(name);
    if(!file)            // If the file was not found, then file is 0, i.e. !file=1 or true.
        return false;    // The file was not found.
    else                 // If the file was found, then file is non-0.
        return true;     // The file was found.
}

If you need to distinguish between a file and a directory, consider the following which both use stat which the fastest standard tool as demonstrated by PherricOxide:

#include <sys/stat.h>
int FileExists(char *path)
{
    struct stat fileStat; 
    if ( stat(path, &fileStat) )
    {
        return 0;
    }
    if ( !S_ISREG(fileStat.st_mode) )
    {
        return 0;
    }
    return 1;
}

int DirExists(char *path)
{
    struct stat fileStat;
    if ( stat(path, &fileStat) )
    {
        return 0;
    }
    if ( !S_ISDIR(fileStat.st_mode) )
    {
        return 0;
    }
    return 1;
}

Although there are several ways to do this the most efficient solution to your problem would probably be to use one of the fstream's predefined method such as good(). With this method you can check whether the file you've specified exist or not.

fstream file("file_name.txt");

if (file.good()) 
{
    std::cout << "file is good." << endl;
}
else 
{
    std::cout << "file isnt good" << endl;
}

I hope you find this useful.


Remark : in C++14 and as soon as the filesystem TS will be finished and adopted, the solution will be to use:

std::experimental::filesystem::exists("helloworld.txt");

and since C++17, only:

std::filesystem::exists("helloworld.txt");

All of the other answers focus on individually checking every file, but if the files are all in one directory (folder), it might be much more efficient to just read the directory and check for the existence of every file name you want.

This might even be more efficient even if the files are spread across several directories, depends on the exact ratio of directories to files. Once you start getting closer to each target file being in its own directory, or there being lots of other files in the same directories which you don't want to check for, then I'd expect it to finally tip over into being less efficient than checking each file individually.

A good heuristic: working on a bunch of data you already have is much faster than asking the operating system for any amount of data. System call overhead is huge relative to individual machine instructions. So it is almost always going to be faster to ask the OS "give me the entire list of files in this directory" and then to dig through that list, and slower to ask the OS "give me information on this file", "okay now give me information on this other file", "now give me information on ...", and so on.

Every good C library implements its "iterate over all files in a directory" APIs in an efficient way, just like buffered I/O - internally it reads up a big list of directory entries from the OS at once, even though the APIs look like asking the OS for each entry individually.


So if I had this requirement, I would

  1. do everything possible to encourage the design and usage so that all the files were in one folder, and no other files were in that folder,
  2. put the list of file names that I need to be present into a data structure in memory that has O(1) or at least O(log(n)) lookup and delete times (like a hash map or a binary tree),
  3. list the files in that directory, and "check off" (delete) each one as I went from the "list" (hash map or binary tree) in memory.

Except depending on the exact use case, maybe instead of deleting entries from a hash map or tree, I would keep track of a "do I have this file?" boolean for each entry, and figure out a data structure that would make it O(1) to ask "do I have every file?". Maybe a binary tree but the struct for each non-leaf node also has a boolean that is a logical-and of the booleans of its leaf nodes. That scales well - after setting a boolean in a leaf node, you just walk up the tree and set each node's "have this?" boolean with the && of its child node's boolean (and you don't need to recurse down those other child nodes, since if you're doing this process consistently every time you go to set one of the leaves to true, they will be set to true if and only if all of their children are.)


Sadly, there is no standard way to do it until C++17.

C++17 got std::filesystem::directory_iterator.

Of course there is a corresponding boost::filesystem::directory_iterator which I presume will work in older versions of C++.

The closest thing to a standard C way is opendir and readdir from dirent.h. That is a standard C interface, it's just standardized in POSIX and not in the C standard itself. It comes is available out-of-the-box on Mac OS, Linux, all the BSDs, other UNIX/UNIX-like systems, and any other POSIX/SUS system. For Windows, there is a dirent.h implementation that you just have to download and drop into your include path.

However, since you're looking for the fastest way, you might want to look beyond the portable/standard stuff.

On Linux, you might be able to optimize your performance by manually specifying the buffer size with the raw system call getdents64.

On Windows, after a bit of digging, it looks like for maximum performance you want to use FindFirstFileEx with FindExInfoBasic and FIND_FIRST_EX_LARGE_FETCH when you can, which a lot of the open source libraries like the above dirent.h for Windows don't seem to do. But for code that needs to work with stuff older than the last couple Windows versions, you might as well just use the straightforward FindFirstFile without the extra flags.

Plan 9 won't be covered by any of the above, and there you'll want dirread or dirreadall (the latter if you can safely assume you have enough memory for the entire directory contents). If you want more control over the buffer size for performance use plain read or read and decode the directory entry data - they're in a documented machine-independent format and I think there's helper functions provided.

I don't know about any other operating systems.


I might edit this answer with some tests later. Others are welcome to edit in test results as well.


Another 3 options under windows:

1

inline bool exist(const std::string& name)
{
    OFSTRUCT of_struct;
    return OpenFile(name.c_str(), &of_struct, OF_EXIST) != INVALID_HANDLE_VALUE && of_struct.nErrCode == 0;
}

2

inline bool exist(const std::string& name)
{
    HANDLE hFile = CreateFile(name.c_str(), GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile != NULL && hFile != INVALID_HANDLE)
    {
         CloseFile(hFile);
         return true;
    }
    return false;
}

3

inline bool exist(const std::string& name)
{
    return GetFileAttributes(name.c_str()) != INVALID_FILE_ATTRIBUTES;
}

Same as suggested by PherricOxide but in C

#include <sys/stat.h>
int exist(const char *name)
{
  struct stat   buffer;
  return (stat (name, &buffer) == 0);
}

Without using other libraries, I like to use the following code snippet:

#ifdef _WIN32
   #include <io.h> 
   #define access    _access_s
#else
   #include <unistd.h>
#endif

bool FileExists( const std::string &Filename )
{
    return access( Filename.c_str(), 0 ) == 0;
}

This works cross-platform for Windows and POSIX-compliant systems.


For those who like boost:

 boost::filesystem::exists(fileName)

You can use std::ifstream, funcion like is_open, fail, for example as below code (the cout "open" means file exist or not):

enter image description here

enter image description here

cited from this answer


I use this piece of code, it works OK with me so far. This does not use many fancy features of C++:

bool is_file_exist(const char *fileName)
{
    std::ifstream infile(fileName);
    return infile.good();
}

It depends on where the files reside. For instance, if they are all supposed to be in the same directory, you can read all the directory entries into a hash table and then check all the names against the hash table. This might be faster on some systems than checking each file individually. The fastest way to check each file individually depends on your system ... if you're writing ANSI C, the fastest way is fopen because it's the only way (a file might exist but not be openable, but you probably really want openable if you need to "do something on it"). C++, POSIX, Windows all offer additional options.

While I'm at it, let me point out some problems with your question. You say that you want the fastest way, and that you have thousands of files, but then you ask for the code for a function to test a single file (and that function is only valid in C++, not C). This contradicts your requirements by making an assumption about the solution ... a case of the XY problem. You also say "in standard c++11(or)c++(or)c" ... which are all different, and this also is inconsistent with your requirement for speed ... the fastest solution would involve tailoring the code to the target system. The inconsistency in the question is highlighted by the fact that you accepted an answer that gives solutions that are system-dependent and are not standard C or C++.


In C++17 :

#include <experimental/filesystem>

bool is_file_exist(std::string& str) {   
    namespace fs = std::experimental::filesystem;
    fs::path p(str);
    return fs::exists(p);
}

Here is a simple example!

#include <iostream>
#include <fstream>
using namespace std;
    
void main(){
   SearchFile("test.txt");
}

bool SearchFile(const char *file)
{
   ifstream infile(file);
   if (!infile.good())
   {
    // If file is not there
    exit(1);
   }
}

Using MFC it is possible with the following

CFileStatus FileStatus;
BOOL bFileExists = CFile::GetStatus(FileName,FileStatus);

Where FileName is a string representing the file you are checking for existance


I need a fast function that can check if a file is exist or not and PherricOxide's answer is almost what I need except it does not compare the performance of boost::filesystem::exists and open functions. From the benchmark results we can easily see that :

  • Using stat function is the fastest way to check if a file is exist. Note that my results are consistent with that of PherricOxide's answer.

  • The performance of boost::filesystem::exists function is very close to that of stat function and it is also portable. I would recommend this solution if boost libraries is accessible from your code.

Benchmark results obtained with Linux kernel 4.17.0 and gcc-7.3:

2018-05-05 00:35:35
Running ./filesystem
Run on (8 X 2661 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
use_stat          815 ns        813 ns     861291
use_open         2007 ns       1919 ns     346273
use_access       1186 ns       1006 ns     683024
use_boost         831 ns        830 ns     831233

Below is my benchmark code:

#include <string.h>                                                                                                                                                                                                                                           
#include <stdlib.h>                                                                                                                                                                                                                                           
#include <sys/types.h>                                                                                                                                                                                                                                        
#include <sys/stat.h>                                                                                                                                                                                                                                         
#include <unistd.h>                                                                                                                                                                                                                                           
#include <dirent.h>                                                                                                                                                                                                                                           
#include <fcntl.h>                                                                                                                                                                                                                                            
#include <unistd.h>                                                                                                                                                                                                                                           

#include "boost/filesystem.hpp"                                                                                                                                                                                                                               

#include <benchmark/benchmark.h>                                                                                                                                                                                                                              

const std::string fname("filesystem.cpp");                                                                                                                                                                                                                    
struct stat buf;                                                                                                                                                                                                                                              

// Use stat function                                                                                                                                                                                                                                          
void use_stat(benchmark::State &state) {                                                                                                                                                                                                                      
    for (auto _ : state) {                                                                                                                                                                                                                                    
        benchmark::DoNotOptimize(stat(fname.data(), &buf));                                                                                                                                                                                                   
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_stat);                                                                                                                                                                                                                                          

// Use open function                                                                                                                                                                                                                                          
void use_open(benchmark::State &state) {                                                                                                                                                                                                                      
    for (auto _ : state) {                                                                                                                                                                                                                                    
        int fd = open(fname.data(), O_RDONLY);                                                                                                                                                                                                                
        if (fd > -1) close(fd);                                                                                                                                                                                                                               
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_open);                                  
// Use access function                                                                                                                                                                                                                                        
void use_access(benchmark::State &state) {                                                                                                                                                                                                                    
    for (auto _ : state) {                                                                                                                                                                                                                                    
        benchmark::DoNotOptimize(access(fname.data(), R_OK));                                                                                                                                                                                                 
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_access);                                                                                                                                                                                                                                        

// Use boost                                                                                                                                                                                                                                                  
void use_boost(benchmark::State &state) {                                                                                                                                                                                                                     
    for (auto _ : state) {                                                                                                                                                                                                                                    
        boost::filesystem::path p(fname);                                                                                                                                                                                                                     
        benchmark::DoNotOptimize(boost::filesystem::exists(p));                                                                                                                                                                                               
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_boost);                                                                                                                                                                                                                                         

BENCHMARK_MAIN();   

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to stream

Why does calling sumr on a stream with 50 tuples not complete How to read/write files in .Net Core? How to get the stream key for twitch.tv Download TS files from video stream How to get streaming url from online streaming radio station Save byte array to file How to get error message when ifstream open fails Download large file in python with requests ASP.Net MVC - Read File from HttpPostedFileBase without save Fastest way to check if a file exist using standard C++/C++11/C?