[java] Deep copy, shallow copy, clone

I need clarification on the differences between deep copy, shallow copy, and clone in Java

This question is related to java clone

The answer is


Unfortunately, "shallow copy", "deep copy" and "clone" are all rather ill-defined terms.


In the Java context, we first need to make a distinction between "copying a value" and "copying an object".

int a = 1;
int b = a;     // copying a value
int[] s = new int[]{42};
int[] t = s;   // copying a value (the object reference for the array above)

StringBuffer sb = new StringBuffer("Hi mom");
               // copying an object.
StringBuffer sb2 = new StringBuffer(sb);

In short, an assignment of a reference to a variable whose type is a reference type is "copying a value" where the value is the object reference. To copy an object, something needs to use new, either explicitly or under the hood.


Now for "shallow" versus "deep" copying of objects. Shallow copying generally means copying only one level of an object, while deep copying generally means copying more than one level. The problem is in deciding what we mean by a level. Consider this:

public class Example {
    public int foo;
    public int[] bar;
    public Example() { };
    public Example(int foo, int[] bar) { this.foo = foo; this.bar = bar; };
}

Example eg1 = new Example(1, new int[]{1, 2});
Example eg2 = ... 

The normal interpretation is that a "shallow" copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to the same array as in the original; e.g.

Example eg2 = new Example(eg1.foo, eg1.bar);

The normal interpretation of a "deep" copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to a copy of the original array; e.g.

Example eg2 = new Example(eg1.foo, Arrays.copy(eg1.bar));

(People coming from a C / C++ background might say that a reference assignment produces a shallow copy. However, that's not what we normally mean by shallow copying in the Java context ...)

Two more questions / areas of uncertainty exist:

  • How deep is deep? Does it stop at two levels? Three levels? Does it mean the whole graph of connected objects?

  • What about encapsulated data types; e.g. a String? A String is actually not just one object. In fact, it is an "object" with some scalar fields, and a reference to an array of characters. However, the array of characters is completely hidden by the API. So, when we talk about copying a String, does it make sense to call it a "shallow" copy or a "deep" copy? Or should we just call it a copy?


Finally, clone. Clone is a method that exists on all classes (and arrays) that is generally thought to produce a copy of the target object. However:

  • The specification of this method deliberately does not say whether this is a shallow or deep copy (assuming that is a meaningful distinction).

  • In fact, the specification does not even specifically state that clone produces a new object.

Here's what the javadoc says:

"Creates and returns a copy of this object. The precise meaning of "copy" may depend on the class of the object. The general intent is that, for any object x, the expression x.clone() != x will be true, and that the expression x.clone().getClass() == x.getClass() will be true, but these are not absolute requirements. While it is typically the case that x.clone().equals(x) will be true, this is not an absolute requirement."

Note, that this is saying that at one extreme the clone might be the target object, and at the other extreme the clone might not equal the original. And this assumes that clone is even supported.

In short, clone potentially means something different for every Java class.


Some people argue (as @supercat does in comments) that the Java clone() method is broken. But I think the correct conclusion is that the concept of clone is broken in the context of OO. AFAIK, it is impossible to develop a unified model of cloning that is consistent and usable across all object types.


The term "clone" is ambiguous (though the Java class library includes a Cloneable interface) and can refer to a deep copy or a shallow copy. Deep/shallow copies are not specifically tied to Java but are a general concept relating to making a copy of an object, and refers to how members of an object are also copied.

As an example, let's say you have a person class:

class Person {
    String name;
    List<String> emailAddresses
}

How do you clone objects of this class? If you are performing a shallow copy, you might copy name and put a reference to emailAddresses in the new object. But if you modified the contents of the emailAddresses list, you would be modifying the list in both copies (since that's how object references work).

A deep copy would mean that you recursively copy every member, so you would need to create a new List for the new Person, and then copy the contents from the old to the new object.

Although the above example is trivial, the differences between deep and shallow copies are significant and have a major impact on any application, especially if you are trying to devise a generic clone method in advance, without knowing how someone might use it later. There are times when you need deep or shallow semantics, or some hybrid where you deep copy some members but not others.


  • Deep copy: Clone this object and every reference to every other object it has
  • Shallow copy: Clone this object and keep its references
  • Object clone() throws CloneNotSupportedException: It is not specified whether this should return a deep or shallow copy, but at the very least: o.clone() != o

The terms "shallow copy" and "deep copy" are a bit vague; I would suggest using the terms "memberwise clone" and what I would call a "semantic clone". A "memberwise clone" of an object is a new object, of the same run-time type as the original, for every field, the system effectively performs "newObject.field = oldObject.field". The base Object.Clone() performs a memberwise clone; memberwise cloning is generally the right starting point for cloning an object, but in most cases some "fixup work" will be required following a memberwise clone. In many cases attempting to use an object produced via memberwise clone without first performing the necessary fixup will cause bad things to happen, including the corruption of the object that was cloned and possibly other objects as well. Some people use the term "shallow cloning" to refer to memberwise cloning, but that's not the only use of the term.

A "semantic clone" is an object which is contains the same data as the original, from the point of view of the type. For examine, consider a BigList which contains an Array> and a count. A semantic-level clone of such an object would perform a memberwise clone, then replace the Array> with a new array, create new nested arrays, and copy all of the T's from the original arrays to the new ones. It would not attempt any sort of deep-cloning of the T's themselves. Ironically, some people refer to the of cloning "shallow cloning", while others call it "deep cloning". Not exactly useful terminology.

While there are cases where truly deep cloning (recursively copying all mutable types) is useful, it should only be performed by types whose constituents are designed for such an architecture. In many cases, truly deep cloning is excessive, and it may interfere with situations where what's needed is in fact an object whose visible contents refer to the same objects as another (i.e. a semantic-level copy). In cases where the visible contents of an object are recursively derived from other objects, a semantic-level clone would imply a recursive deep clone, but in cases where the visible contents are just some generic type, code shouldn't blindly deep-clone everything that looks like it might possibly be deep-clone-able.