[linux] Quickly create a large file on a Linux system

How can I quickly create a large file on a Linux (Red Hat Linux) system?

dd will do the job, but reading from /dev/zero and writing to the drive can take a long time when you need a file several hundreds of GBs in size for testing... If you need to do that repeatedly, the time really adds up.

I don't care about the contents of the file, I just want it to be created quickly. How can this be done?

Using a sparse file won't work for this. I need the file to be allocated disk space.

This question is related to linux file filesystems

The answer is


truncate -s 10M output.file

will create a 10 M file instantaneously (M stands for 10241024 bytes, MB stands for 10001000 - same with K, KB, G, GB...)

EDIT: as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.

For e.g. notice no HDD space is consumed with this command:

### BEFORE
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

$ truncate -s 500M 500MB.file

### AFTER
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.

But this is still a useful command to know. For e.g. when benchmarking transfers using files, the specified size of the file will still get moved.

$ rsync -aHAxvP --numeric-ids --delete --info=progress2 \
       [email protected]:/export/lvm-raid0/500MB.file \
       /export/raid1/
receiving incremental file list
500MB.file
    524,288,000 100%   41.40MB/s    0:00:12 (xfr#1, to-chk=0/1)

sent 30 bytes  received 524,352,082 bytes  38,840,897.19 bytes/sec
total size is 524,288,000  speedup is 1.00

One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.


This is the fastest I could do (which is not fast) with the following constraints:

  • The goal of the large file is to fill a disk, so can't be compressible.
  • Using ext3 filesystem. (fallocate not available)

This is the gist of it...

// include stdlib.h, stdio.h, and stdint.h
int32_t buf[256]; // Block size.
for (int i = 0; i < 256; ++i)
{
    buf[i] = rand(); // random to be non-compressible.
}
FILE* file = fopen("/file/on/your/system", "wb");
int blocksToWrite = 1024 * 1024; // 1 GB
for (int i = 0; i < blocksToWrite; ++i)
{
   fwrite(buf, sizeof(int32_t), 256, file);
}

In our case this is for an embedded linux system and this works well enough, but would prefer something faster.

FYI the command dd if=/dev/urandom of=outputfile bs=1024 count = XX was so slow as to be unusable.


You can use "yes" command also. The syntax is fairly simple:

#yes >> myfile

Press "Ctrl + C" to stop this, else it will eat up all your space available.

To clean this file run:

#>myfile

will clean this file.


Shameless plug: OTFFS provides a file system providing arbitrarily large (well, almost. Exabytes is the current limit) files of generated content. It is Linux-only, plain C, and in early alpha.

See https://github.com/s5k6/otffs.


Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

You could use https://github.com/flew-software/trash-dump you can create file that is any size and with random data

heres a command you can run after installing trash-dump (creates a 1GB file)

$ trash-dump --filename="huge" --seed=1232 --noBytes=1000000000

BTW I created it


One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.


I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.


dd from the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually writing to it, works with most modern disk based file systems, very fast:

For example:

fallocate -l 10G gentoo_root.img

I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.

#include < stdio.h >
#include < stdlib.h >

int main() {
    int i;
    FILE *fp;

    fp=fopen("bigfakefile.txt","w");

    for(i=0;i<(1024*1024);i++) {
        fseek(fp,(1024*1024),SEEK_CUR);
        fprintf(fp,"C");
    }
}

Examples where seek is the size of the file you want in bytes

#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K

#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M

#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G

#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T


From the dd manpage:

BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.


To make a 1 GB file:

dd if=/dev/zero of=filename bs=1G count=1

Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

Examples where seek is the size of the file you want in bytes

#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K

#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M

#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G

#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T


From the dd manpage:

BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.


This is the fastest I could do (which is not fast) with the following constraints:

  • The goal of the large file is to fill a disk, so can't be compressible.
  • Using ext3 filesystem. (fallocate not available)

This is the gist of it...

// include stdlib.h, stdio.h, and stdint.h
int32_t buf[256]; // Block size.
for (int i = 0; i < 256; ++i)
{
    buf[i] = rand(); // random to be non-compressible.
}
FILE* file = fopen("/file/on/your/system", "wb");
int blocksToWrite = 1024 * 1024; // 1 GB
for (int i = 0; i < blocksToWrite; ++i)
{
   fwrite(buf, sizeof(int32_t), 256, file);
}

In our case this is for an embedded linux system and this works well enough, but would prefer something faster.

FYI the command dd if=/dev/urandom of=outputfile bs=1024 count = XX was so slow as to be unusable.


Linux & all filesystems

xfs_mkfile 10240m 10Gigfile

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

fallocate -l 10G 10Gigfile

OS X, Solaris, SunOS and probably other UNIXes

mkfile 10240m 10Gigfile

HP-UX

prealloc 10Gigfile 10737418240

Explanation

Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.

mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.

Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.


You could use https://github.com/flew-software/trash-dump you can create file that is any size and with random data

heres a command you can run after installing trash-dump (creates a 1GB file)

$ trash-dump --filename="huge" --seed=1232 --noBytes=1000000000

BTW I created it


dd from the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually writing to it, works with most modern disk based file systems, very fast:

For example:

fallocate -l 10G gentoo_root.img

I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.

#include < stdio.h >
#include < stdlib.h >

int main() {
    int i;
    FILE *fp;

    fp=fopen("bigfakefile.txt","w");

    for(i=0;i<(1024*1024);i++) {
        fseek(fp,(1024*1024),SEEK_CUR);
        fprintf(fp,"C");
    }
}

Shameless plug: OTFFS provides a file system providing arbitrarily large (well, almost. Exabytes is the current limit) files of generated content. It is Linux-only, plain C, and in early alpha.

See https://github.com/s5k6/otffs.


The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.

Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).


I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.


This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.

dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/random instead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:

dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G

truncate is another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:

 truncate -s 10G gentoo_root.img

fallocate is the final -- and best -- choice for use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:

fallocate -l 10G gentoo_root.img

Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

You can use "yes" command also. The syntax is fairly simple:

#yes >> myfile

Press "Ctrl + C" to stop this, else it will eat up all your space available.

To clean this file run:

#>myfile

will clean this file.


One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.


This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.

dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/random instead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:

dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G

truncate is another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:

 truncate -s 10G gentoo_root.img

fallocate is the final -- and best -- choice for use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:

fallocate -l 10G gentoo_root.img

Linux & all filesystems

xfs_mkfile 10240m 10Gigfile

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

fallocate -l 10G 10Gigfile

OS X, Solaris, SunOS and probably other UNIXes

mkfile 10240m 10Gigfile

HP-UX

prealloc 10Gigfile 10737418240

Explanation

Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.

mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.

Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.


One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.


I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.


The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.

Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).


Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.


truncate -s 10M output.file

will create a 10 M file instantaneously (M stands for 10241024 bytes, MB stands for 10001000 - same with K, KB, G, GB...)

EDIT: as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.

For e.g. notice no HDD space is consumed with this command:

### BEFORE
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

$ truncate -s 500M 500MB.file

### AFTER
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.

But this is still a useful command to know. For e.g. when benchmarking transfers using files, the specified size of the file will still get moved.

$ rsync -aHAxvP --numeric-ids --delete --info=progress2 \
       [email protected]:/export/lvm-raid0/500MB.file \
       /export/raid1/
receiving incremental file list
500MB.file
    524,288,000 100%   41.40MB/s    0:00:12 (xfr#1, to-chk=0/1)

sent 30 bytes  received 524,352,082 bytes  38,840,897.19 bytes/sec
total size is 524,288,000  speedup is 1.00

To make a 1 GB file:

dd if=/dev/zero of=filename bs=1G count=1

Linux & all filesystems

xfs_mkfile 10240m 10Gigfile

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

fallocate -l 10G 10Gigfile

OS X, Solaris, SunOS and probably other UNIXes

mkfile 10240m 10Gigfile

HP-UX

prealloc 10Gigfile 10737418240

Explanation

Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.

mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.

Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to filesystems

Get an image extension from an uploaded file in Laravel Notepad++ cached files location No space left on device How to create a directory using Ansible best way to get folder and file list in Javascript Exploring Docker container's file system Remove directory which is not empty GIT_DISCOVERY_ACROSS_FILESYSTEM not set Trying to create a file in Android: open failed: EROFS (Read-only file system) Node.js check if path is file or directory