Copy-on-Write (CoW) is an optimization strategy used in Swift to enhance performance and memory efficiency when copying value types. The essence of CoW is to share and reuse the same instance of data across different variables or instances until it is modified. But why does it matter? What exactly happens when we decide to copy data, especially considering Swift’s two main types of data models: reference types and value types?

Copying Reference Types

In Swift, when you assign a reference type (such as a class) to a variable or pass it around, you’re passing a reference to the same instance. As a result, modifying data through one reference alters the data for all other references. Under the hood, Swift uses Automatic Reference Counting (ARC) to manage the memory of reference types. ARC tracks the number of active references to each instance. When a reference type instance is copied, Swift merely increase the reference count of the object instead of copying the object’s data. This behavior is called a shallow copy.

Let’s look at an example:

class Document {
    var title: String
    init(title: String) {
        self.title = title
    }
}

let originalDoc = Document(title: "Original")
let copiedDoc = originalDoc // Shallow copy
copiedDoc.title = "Copied"

print(originalDoc.title) // Outputs "Copied"

Here, a modification of copiedDoc’s title also changes originalDoc’s title as they are pointing to the same instance. But what if you want the changes to the copied object shouldn’t affect the original object? In that case, you must explicitly implement the copying logic. This involves creating a new instance and manually copying the data from the original instance to the new one. This is known as a deep copy.

extension Document: NSCopying {
    func copy(with zone: NSZone? = nil) -> Any {
        return Document(title: self.title)
    }
}

let originalDoc = Document(title: "Original")
let copiedDoc = originalDoc.copy() as! Document // Deep copy
copiedDoc.title = "Copied"

print(originalDoc.title) // Outputs "Original"

As you can see, with a deep copy, we explicitly duplicate the Document object, ensuring originalDoc and copiedDoc are entirely independent.

Technically, shallow copying is much faster than deep copying because it involves copying only the reference to the object, not the object’s contents. This can significantly reduce the time it takes to duplicate objects, especially when dealing with complex or large ones. In addition, shallow copying conserves memory usage. Since the actual object is not duplicated, no additional memory is allocated for the contents of the object being copied.

With its benefits, shallowing copying also comes with certain drawbacks. Any modifications made through one reference will affect all other references. This can lead to unintended side effects if different parts of the codebase are not designed to expect there shared state changes. Furthermore, when you need to modify a copied object without affecting the original, shallow copying falls short.

While deep copies enable safe modifications and manipulations of copied data without risking unintended side effects on the original object, these operations can be resource-intensive. The process requires allocating memory for the copy and recursively copying every nested object, thus significantly increase the application’s memory footprint. In addition, implementing deep copy functionality can be complex, especially for objects with circular references or complex relationships. Ensuring that every nested object is correctly copied while maintaining the object graph’s integrity requires careful coding and testing.

So, is there a way to leverage both safe modifications that deep copies offer and the memory efficiency of a shallow copy? The answer is: yes, with value types and Copy-on-Write.

Copying Value Types and CoW

Value types (like structs, arrays, dictionaries) are copied when they are assigned to a new variable or passed to a function. Each instance keeps a unique copy of the data.

struct Document {
    var title: String
}

let originalDoc = Document(title: "Original")
var copiedDoc = originalDoc // Creates a copy
copiedDoc.title = "Copied"

print(originalDoc.title) // Outputs "Original"

This behavior contrasts with the reference semantics of classes and it naturally raises a question: Does this mean Swift always duplicates the data? Not necessarily, thanks to Copy-on-Write (CoW).

CoW optimizes the copying of value types by sharing the data between copies and only creating a distinct copy when one of the copies is modified. If you change the data in one of the copies, Swift then, again, creates a separate instance of the data in memory, ensuring that modifications do not affect other references to the data.

Let’s take a Swift’s Array as an example. Here’s what happens under the hood:

  • When you create an array, Swift internally uses a reference type to manage the array’s data. This might seem counterintuitive since arrays are value types, but this is how Swift enables CoW. Initially, the array’s reference count is 1.
  • When you copy the array, Swift does not immediately duplicate the array’s elements. Instead, it copies the reference to the internal data structure, incrementing its reference count. Now, both arrays essentially point to the same data in memory, and the reference count is 2.
  • Before modifying one of the arrays, Swift uses isKnownUniquelyReferenced(_:) function to check the reference count. Since the reference count is 2, Swift performs a deep copy of the array’s data for the array being modified. This step ensures that changes to one array do not affect the other. After the copy, the modified array points to a new data instance in memory, and its reference count is 1. The unmodified array still points to the original data, also with a reference count of 1.

Implementing CoW with Custom Types

While Swift’s standard library types like arrays and strings utilize CoW automatically, custom types do not inherit this behavior by default. To demonstrate this, let’s examine a simple example:

struct Person {
    var name: String
    let age: Int
}

var person1 = Person(name: "John", age: 40)
var person2 = person1
withUnsafePointer(to: &person1) { print("\($0)") } // 0x000000010275c240
withUnsafePointer(to: &person2) { print("\($0)") } // 0x000000010275c258

Running the above code snippet in your playground will result in two different memory addresses for person1 and person2, even there is no modification here. If you want the behavior of CoW in your custom types, you’ll need to perform some manual work. Let’s go through a practical example to see how we can achieve it.

Suppose you have an Image struct representing images in your app. Each image could potentially be copied multiple times across different parts of your app—for displaying in the UI, editing, or applying filters.

We first need to define a reference type to hold image data:

final class ImageData {
    var pixels: [UInt8]
    init(pixels: [UInt8]) {
        self.pixels = pixels
    }
}

Now, the main part, our Image struct with CoW:

struct Image {
    private var imageData: ImageData
    
    init(pixels: [UInt8]) {
        self.imageData = ImageData(pixels: pixels)
    }
    
    private mutating func copyIfNeeded() {
        if !isKnownUniquelyReferenced(&imageData) {
            imageData = ImageData(pixels: imageData.pixels)
        }
    }
    
    mutating func applyFilter() {
        copyIfNeeded()
        // Assume we modify `imageData.pixels` here
    }
}

As you can see, the implementation of CoW requires a mix of reference and value types. You typically wrap the actual data within a class (to use reference semantics) and then manage access to this data through a struct (to leverage value semantics). This entire process significantly depends on reference counting along with the isKnownUniquelyReferenced(_:) function to efficiently manage memory and ensure that data are copied only when necessary.

Practical Applications

Copy-on-Write is a powerful mechanism in Swift that, when used appropriately, can lead to more efficient and performant applications. It’s one of the key features that make Swift’s value types both powerful and efficient. In practice, there are several scenarios where implementing CoW can be highly beneficial:

  • Applications that handle large images, like photo editing apps, often need to create copies of images for various operations (filters, transformations, etc.). With CoW, the app can avoid the immediate overhead of duplicating large image data until a modification is actually made. This is crucial for performance, especially when working with high-resolution images.
  • Apps that deal with large collections of data, such as a dataset in a data analysis tool, where subsets of data are often created for processing but not always altered. CoW allows the app to work with different subsets efficiently, only creating actual copies when a particular subset is modified, thus saving on memory and processing time.
  • Text editors or graphic design software where users can make changes and then undo/redo these changes. CoW enables efficient handling of document states. Each state is a copy of the document, but actual duplication of data only occurs when changes are made, making undo/redo operations more memory efficient.
  • Applications that implement caching mechanisms, where cached data is often read and occasionally updated. CoW ensures that cached data is shared across various parts of the app without the risk of unintended modifications, while still allowing efficient updates when necessary.

In each of these scenarios, CoW offers a balanced approach to handling data efficiently. It provides the benefits of shared data when unmodified, yet maintains the safety and predictability of data integrity when changes occur.

Thanks for reading! 🚀

Categorized in: