[data-structures] What is copy-on-write?

I would like to know what copy-on-write is and what it is used for? The term 'copy-on-write array' is mentioned several times in the Sun JDK tutorials but I didn't understand what it meant.

This question is related to data-structures copy-on-write

The answer is


I found this good article about zval in PHP, which mentioned COW too:

Copy On Write (abbreviated as ‘COW’) is a trick designed to save memory. It is used more generally in software engineering. It means that PHP will copy the memory (or allocate new memory region) when you write to a symbol, if this one was already pointing to a zval.


It is a memory protection concept. In this compiler creates extra copy to modify data in child and this updated data not reflect in parents data.


Just to provide another example, Mercurial uses copy-on-write to make cloning local repositories a really "cheap" operation.

The principle is the same as the other examples, except that you're talking about physical files instead of objects in memory. Initially, a clone is not a duplicate but a hard link to the original. As you change files in the clone, copies are written to represent the new version.


A good example is Git, which uses a strategy to store blobs. Why does it use hashes? Partly because these are easier to perform diffs on, but also because makes it simpler to optimise a COW strategy. When you make a new commit with few files changes the vast majority of objects and trees will not change. Therefore the commit, will through various pointers made of hashes reference a bunch of object that already exist, making the storage space required to store the entire history much smaller.


"Copy on write" means more or less what it sounds like: everyone has a single shared copy of the same data until it's written, and then a copy is made. Usually, copy-on-write is used to resolve concurrency sorts of problems. In ZFS, for example, data blocks on disk are allocated copy-on-write; as long as there are no changes, you keep the original blocks; a change changed only the affected blocks. This means the minimum number of new blocks are allocated.

These changes are also usually implemented to be transactional, ie, they have the ACID properties. This eliminates some concurrency issues, because then you're guaranteed that all updates are atomic.


It's also used in Ruby 'Enterprise Edition' as a neat way of saving memory.


Copy-on-write is a technique to reduce the memory usage of resource copies using deferred copy. The resource copies are initially virtual (i.e. they share memory) and only become real (i.e. they have their own memory) on the first write operation, hence the name ‘copy-on-write’.

Here after is a Python implementation of the copy-on-write technique using the proxy design pattern. A ValueProxy object (the proxy) implements the copy-on-write technique by:

  • having an attribute bound to an immutable Value object (the subject);
  • translating copy requests to the creation of a new ValueProxy object sharing the same subject attribute as the original ValueProxy object;
  • forwarding read requests to the subject attribute;
  • translating write requests to the creation of a new immutable Value object with the new state and the rebinding of the subject attribute to this new immutable Value object.
import abc

class BaseValue(abc.ABC):
    @abc.abstractmethod
    def read(self):
        raise NotImplementedError
    @abc.abstractmethod
    def write(self, data):
        raise NotImplementedError

class Value(BaseValue):
    def __init__(self, data):
        self.data = data
    def read(self):
        return self.data
    def write(self, data):
        pass

class ValueProxy(BaseValue):
    def __init__(self, subject):
        self.subject = subject
    def read(self):
        return self.subject.read()
    def write(self, data):
        self.subject = Value(data)
    def clone(self):
        return ValueProxy(self.subject)

v1 = ValueProxy(Value('foo'))
v2 = v1.clone()  # shares the immutable Value object between the copies
assert v1.subject is v2.subject
v2.write('bar')  # creates a new immutable Value object with the new state
assert v1.subject is not v2.subject

I shall not repeat the same answer on Copy-on-Write. I think Andrew's answer and Charlie's answer have already made it very clear. I will give you an example from OS world, just to mention how widely this concept is used.

We can use fork() or vfork() to create a new process. vfork follows the concept of copy-on-write. For example, the child process created by vfork will share the data and code segment with the parent process. This speeds up the forking time. It is expected to use vfork if you are performing exec followed by vfork. So vfork will create the child process which will share data and code segment with its parent but when we call exec, it will load up the image of a new executable in the address space of the child process.