When should I use mmap for file access

Question

POSIX environments provide at least two ways of accessing files   There s the standard system calls open    read    write    and friends  but there s also the option of using mmap   to map the file into virtual memory   When is it preferable to use one over the other   What re their individual advantages that merit including two interfaces

User · Answer

Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.

This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.

User · Answer

Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.

This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.

User · Answer

An advantage that isn t listed yet is the ability of mmap   to keep a read-only mapping as clean pages   If one allocates a buffer in the process s address space  then uses read   to fill the buffer from a file  the memory pages corresponding to that buffer are now dirty since they have been written to   Dirty pages can not be dropped from RAM by the kernel   If there is swap space  then they can be paged out to swap   But this is costly and on some systems  such as small embedded devices with only flash memory  there is no swap at all   In that case  the buffer will be stuck in RAM until the process exits  or perhaps gives it back withmadvise     Non written to mmap   pages are clean   If the kernel needs RAM  it can simply drop them and use the RAM the pages were in   If the process that had the mapping accesses it again  it cause a page fault the kernel re-loads the pages from the file they came from originally   The same way they were populated in the first place   This doesn t require more than one process using the mapped file to be an advantage

User · Answer

In addition to other nice answers  a quote from Linux system programming written by Google s expert Robert Love      Advantages of mmap         Manipulating files via mmap    has a handful of advantages over the   standard read    and write    system calls  Among them are          Reading from and writing to a memory-mapped file avoids the   extraneous copy that occurs when using the read    or write    system   calls  where the data must be copied to and from a user-space buffer    Aside from any potential page faults  reading from and writing to a memory-mapped file does not incur any system call or context switch   overhead  It is as simple as accessing memory    When multiple processes map the same object into memory  the data is shared among all the processes  Read-only and shared writable   mappings are shared in their entirety  private writable mappings have   their not-yet-COW  copy-on-write  pages shared    Seeking around the mapping involves trivial pointer manipulations  There is no need for the lseek    system call          For these reasons  mmap    is a smart choice for many applications       Disadvantages of mmap         There are a few points to keep in mind when using mmap             Memory mappings are always an integer number of pages in size  Thus  the difference between the size of the backing file and an   integer number of pages is  wasted  as slack space  For small files  a   significant percentage of the mapping may be wasted  For example  with   4 KB pages  a 7 byte mapping wastes 4 089 bytes    The memory mappings must fit into the process  address space  With a 32-bit address space  a very large number of various-sized mappings   can result in fragmentation of the address space  making it hard to   find large free contiguous regions  This problem  of course  is much   less apparent with a 64-bit address space    There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel  This overhead is   generally obviated by the elimination of the double copy mentioned in   the previous section  particularly for larger and frequently accessed   files          For these reasons  the benefits of mmap    are most greatly realized   when the mapped file is large  and thus any wasted space is a small   percentage of the total mapping   or when the total size of the mapped   file is evenly divisible by the page size  and thus there is no wasted   space

User · Answer

An advantage that isn t listed yet is the ability of mmap   to keep a read-only mapping as clean pages   If one allocates a buffer in the process s address space  then uses read   to fill the buffer from a file  the memory pages corresponding to that buffer are now dirty since they have been written to   Dirty pages can not be dropped from RAM by the kernel   If there is swap space  then they can be paged out to swap   But this is costly and on some systems  such as small embedded devices with only flash memory  there is no swap at all   In that case  the buffer will be stuck in RAM until the process exits  or perhaps gives it back withmadvise     Non written to mmap   pages are clean   If the kernel needs RAM  it can simply drop them and use the RAM the pages were in   If the process that had the mapping accesses it again  it cause a page fault the kernel re-loads the pages from the file they came from originally   The same way they were populated in the first place   This doesn t require more than one process using the mapped file to be an advantage

User · Answer

Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.

This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.

User · Answer

mmap is great if you have multiple processes accessing data in a read only fashion from the same file  which is common in the kind of server systems I write   mmap allows all those processes to share the same physical memory pages  saving a lot of memory   mmap also allows the operating system to optimize paging operations   For example  consider two programs  program A which reads in a 1MB file into a buffer creating with malloc  and program B which mmaps the 1MB file into memory   If the operating system has to swap part of A s memory out  it must write the contents of the buffer to swap before it can reuse the memory   In B s case any unmodified mmap d pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap d from    The OS can detect which pages are unmodified by initially marking writable mmap d pages as read only and catching seg faults  similar to Copy on Write strategy      mmap is also useful for inter process communication   You can mmap a file as read   write in the processes that need to communicate and then use synchronization primitives in the mmap d region  this is what the MAP HASSEMAPHORE flag is for    One place mmap can be awkward is if you need to work with very large files on a 32 bit machine   This is because mmap has to find a contiguous block of addresses in your process s address space that is large enough to fit the entire range of the file being mapped   This can become a problem if your address space becomes fragmented  where you might have 2 GB of address space free  but no individual range of it can fit a 1 GB file mapping   In this case you may have to map the file in smaller chunks than you would like to make it fit   Another potential awkwardness with mmap as a replacement for read   write is that you have to start your mapping on offsets of the page size   If you just want to get some data at offset X you will need to fixup that offset so it s compatible with mmap   And finally  read   write are the only way you can work with some types of files   mmap can t be used on things like pipes and ttys

User · Answer

One area where I found mmap   to not be an advantage was when reading small files  under 16K    The overhead of page faulting to read the whole file was very high compared with just doing a single read   system call   This is because the kernel can sometimes satisify a read entirely in your time slice  meaning your code doesn t switch away   With a page fault  it seemed more likely that another program would be scheduled  making the file operation have a higher latency

User · Answer

One area where I found mmap   to not be an advantage was when reading small files  under 16K    The overhead of page faulting to read the whole file was very high compared with just doing a single read   system call   This is because the kernel can sometimes satisify a read entirely in your time slice  meaning your code doesn t switch away   With a page fault  it seemed more likely that another program would be scheduled  making the file operation have a higher latency

User · Answer

mmap has the advantage when you have random access on big files  Another advantage is that you access it with memory operations  memcpy  pointer arithmetic   without bothering with the buffering  Normal I O can sometimes be quite difficult when using buffers when you have structures bigger than your buffer  The code to handle that is often difficult to get right  mmap is generally easier  This said  there are certain traps when working with mmap  As people have already mentioned  mmap is quite costly to set up  so it is worth using only for a given size  varying from machine to machine     For pure sequential accesses to the file  it is also not always the better solution  though an appropriate call to madvise can mitigate the problem   You have to be careful with alignment restrictions of your architecture SPARC  itanium   with read write IO the buffers are often properly aligned and do not trap when dereferencing a casted pointer   You also have to be careful that you do not access outside of the map  It can easily happen if you use string functions on your map  and your file does not contain a  0 at the end  It will work most of the time when your file size is not a multiple of the page size as the last page is filled with 0  the mapped area is always in the size of a multiple of your page size

User · Answer

One area where I found mmap   to not be an advantage was when reading small files  under 16K    The overhead of page faulting to read the whole file was very high compared with just doing a single read   system call   This is because the kernel can sometimes satisify a read entirely in your time slice  meaning your code doesn t switch away   With a page fault  it seemed more likely that another program would be scheduled  making the file operation have a higher latency

User · Answer

mmap is great if you have multiple processes accessing data in a read only fashion from the same file  which is common in the kind of server systems I write   mmap allows all those processes to share the same physical memory pages  saving a lot of memory   mmap also allows the operating system to optimize paging operations   For example  consider two programs  program A which reads in a 1MB file into a buffer creating with malloc  and program B which mmaps the 1MB file into memory   If the operating system has to swap part of A s memory out  it must write the contents of the buffer to swap before it can reuse the memory   In B s case any unmodified mmap d pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap d from    The OS can detect which pages are unmodified by initially marking writable mmap d pages as read only and catching seg faults  similar to Copy on Write strategy      mmap is also useful for inter process communication   You can mmap a file as read   write in the processes that need to communicate and then use synchronization primitives in the mmap d region  this is what the MAP HASSEMAPHORE flag is for    One place mmap can be awkward is if you need to work with very large files on a 32 bit machine   This is because mmap has to find a contiguous block of addresses in your process s address space that is large enough to fit the entire range of the file being mapped   This can become a problem if your address space becomes fragmented  where you might have 2 GB of address space free  but no individual range of it can fit a 1 GB file mapping   In this case you may have to map the file in smaller chunks than you would like to make it fit   Another potential awkwardness with mmap as a replacement for read   write is that you have to start your mapping on offsets of the page size   If you just want to get some data at offset X you will need to fixup that offset so it s compatible with mmap   And finally  read   write are the only way you can work with some types of files   mmap can t be used on things like pipes and ttys

User · Answer

One area where I found mmap   to not be an advantage was when reading small files  under 16K    The overhead of page faulting to read the whole file was very high compared with just doing a single read   system call   This is because the kernel can sometimes satisify a read entirely in your time slice  meaning your code doesn t switch away   With a page fault  it seemed more likely that another program would be scheduled  making the file operation have a higher latency

User · Answer

In addition to other nice answers  a quote from Linux system programming written by Google s expert Robert Love      Advantages of mmap         Manipulating files via mmap    has a handful of advantages over the   standard read    and write    system calls  Among them are          Reading from and writing to a memory-mapped file avoids the   extraneous copy that occurs when using the read    or write    system   calls  where the data must be copied to and from a user-space buffer    Aside from any potential page faults  reading from and writing to a memory-mapped file does not incur any system call or context switch   overhead  It is as simple as accessing memory    When multiple processes map the same object into memory  the data is shared among all the processes  Read-only and shared writable   mappings are shared in their entirety  private writable mappings have   their not-yet-COW  copy-on-write  pages shared    Seeking around the mapping involves trivial pointer manipulations  There is no need for the lseek    system call          For these reasons  mmap    is a smart choice for many applications       Disadvantages of mmap         There are a few points to keep in mind when using mmap             Memory mappings are always an integer number of pages in size  Thus  the difference between the size of the backing file and an   integer number of pages is  wasted  as slack space  For small files  a   significant percentage of the mapping may be wasted  For example  with   4 KB pages  a 7 byte mapping wastes 4 089 bytes    The memory mappings must fit into the process  address space  With a 32-bit address space  a very large number of various-sized mappings   can result in fragmentation of the address space  making it hard to   find large free contiguous regions  This problem  of course  is much   less apparent with a 64-bit address space    There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel  This overhead is   generally obviated by the elimination of the double copy mentioned in   the previous section  particularly for larger and frequently accessed   files          For these reasons  the benefits of mmap    are most greatly realized   when the mapped file is large  and thus any wasted space is a small   percentage of the total mapping   or when the total size of the mapped   file is evenly divisible by the page size  and thus there is no wasted   space

User · Answer

mmap has the advantage when you have random access on big files  Another advantage is that you access it with memory operations  memcpy  pointer arithmetic   without bothering with the buffering  Normal I O can sometimes be quite difficult when using buffers when you have structures bigger than your buffer  The code to handle that is often difficult to get right  mmap is generally easier  This said  there are certain traps when working with mmap  As people have already mentioned  mmap is quite costly to set up  so it is worth using only for a given size  varying from machine to machine     For pure sequential accesses to the file  it is also not always the better solution  though an appropriate call to madvise can mitigate the problem   You have to be careful with alignment restrictions of your architecture SPARC  itanium   with read write IO the buffers are often properly aligned and do not trap when dereferencing a casted pointer   You also have to be careful that you do not access outside of the map  It can easily happen if you use string functions on your map  and your file does not contain a  0 at the end  It will work most of the time when your file size is not a multiple of the page size as the last page is filled with 0  the mapped area is always in the size of a multiple of your page size

User · Answer

mmap is great if you have multiple processes accessing data in a read only fashion from the same file  which is common in the kind of server systems I write   mmap allows all those processes to share the same physical memory pages  saving a lot of memory   mmap also allows the operating system to optimize paging operations   For example  consider two programs  program A which reads in a 1MB file into a buffer creating with malloc  and program B which mmaps the 1MB file into memory   If the operating system has to swap part of A s memory out  it must write the contents of the buffer to swap before it can reuse the memory   In B s case any unmodified mmap d pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap d from    The OS can detect which pages are unmodified by initially marking writable mmap d pages as read only and catching seg faults  similar to Copy on Write strategy      mmap is also useful for inter process communication   You can mmap a file as read   write in the processes that need to communicate and then use synchronization primitives in the mmap d region  this is what the MAP HASSEMAPHORE flag is for    One place mmap can be awkward is if you need to work with very large files on a 32 bit machine   This is because mmap has to find a contiguous block of addresses in your process s address space that is large enough to fit the entire range of the file being mapped   This can become a problem if your address space becomes fragmented  where you might have 2 GB of address space free  but no individual range of it can fit a 1 GB file mapping   In this case you may have to map the file in smaller chunks than you would like to make it fit   Another potential awkwardness with mmap as a replacement for read   write is that you have to start your mapping on offsets of the page size   If you just want to get some data at offset X you will need to fixup that offset so it s compatible with mmap   And finally  read   write are the only way you can work with some types of files   mmap can t be used on things like pipes and ttys

User · Answer

Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.

This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.

User · Answer

mmap is great if you have multiple processes accessing data in a read only fashion from the same file  which is common in the kind of server systems I write   mmap allows all those processes to share the same physical memory pages  saving a lot of memory   mmap also allows the operating system to optimize paging operations   For example  consider two programs  program A which reads in a 1MB file into a buffer creating with malloc  and program B which mmaps the 1MB file into memory   If the operating system has to swap part of A s memory out  it must write the contents of the buffer to swap before it can reuse the memory   In B s case any unmodified mmap d pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap d from    The OS can detect which pages are unmodified by initially marking writable mmap d pages as read only and catching seg faults  similar to Copy on Write strategy      mmap is also useful for inter process communication   You can mmap a file as read   write in the processes that need to communicate and then use synchronization primitives in the mmap d region  this is what the MAP HASSEMAPHORE flag is for    One place mmap can be awkward is if you need to work with very large files on a 32 bit machine   This is because mmap has to find a contiguous block of addresses in your process s address space that is large enough to fit the entire range of the file being mapped   This can become a problem if your address space becomes fragmented  where you might have 2 GB of address space free  but no individual range of it can fit a 1 GB file mapping   In this case you may have to map the file in smaller chunks than you would like to make it fit   Another potential awkwardness with mmap as a replacement for read   write is that you have to start your mapping on offsets of the page size   If you just want to get some data at offset X you will need to fixup that offset so it s compatible with mmap   And finally  read   write are the only way you can work with some types of files   mmap can t be used on things like pipes and ttys

[c] When should I use mmap for file access?

Examples related to c

Examples related to file-io

Examples related to posix

Examples related to mmap