Cloning - top of Musings page

By: Nicholas Duchon


Cloning a Linked List

"clone" - this name comes from the method in the Object class in Java, the top of the class hierarchy in Java, so there is pretty much nothing to be gained from complaining about this word. We can and should understand how this method works, and how overrides of it can be used to implement a desired level of cloning data structures. What makes this concept complicated in languages that support dynamic memory (such as pretty much all modern languages) is that each data structure (class instance) can have pointers to other structures, so the interpretation of cloning gets quite complicated.

Let's start with this original linked list:

Here's what happens if the List object (list) is cloned, without doing any other cloning. 

 

Here's the cloning of the list of Nodes, but notice that the underlying objects (a, b, c, d, e) are not duplicated. I would call this a 2nd-level clone. This cloning would be done in the Node class with code something like:

 

// clone in List:
Object clone () {
Node head = head.clone();
return this;
} // end clone in List

// clone in Node
clone () {
Node n = next.clone();
return this;
} // end clone in Node

 

Here's a full clone of the entire data structure:


References to data?

This is an interesting question, and presents a deep question in how data structures work. There are two concepts:

1. The data structure contains the data. Let's call this an internal data structure.

2. The data structure contains pointers or references to the data. Let's call this a external data structure.

Both approaches have their uses and limitations. Let me explore a few issues which can be used to pick one over the other in particular applications.

The internal structure is useful in a parallel or concurrent processing scenario. Here, the various threads of execution each have their own copy of the structure and it is easy to be sure they don't step on each other. The main problem is that one thread may change the structure through insert and remove operations while the other thread is iterating - this can be VERY bad.

The external structure is useful when a set of data can be effectively sorted on a number of different keys, or when the data itself is large (say images or documents - the references are MUCH smaller than the data). The second point is that making a copy of a lot of data can put a serious strain on computing resources, such as RAM space. Those of you who have used data base programs such as MS Access, Oracle, SQLServer, MySQL or PostGreSQL, will probably have used more than one key on a table. Muliple keys directly correspond to using multiple extrinsic structures on a set of data.

One of the main problems with external structures is concurrent access to the structure. Having one thread modify the structure while another is iterating is a real headache, and is particularly hard to diagnose and debug. This can be addressed by cloning the structure, without cloning the underlying data set. But the problems in concurrent programming are much deeper than just this one concern - so BE WARNED!

Cloning - The internal approach is best handled by using clone methods. The external approach has no direct need for cloning. In fact, the external paradigm rather assumes that objects are NOT cloned.

ND.