Liskov substitution principle

August 31, 2020

An unsung hero of the SOLID principles.

Derived classes must be substitutable for their base classes.

Liskov substitution principle

Considering each of the SOLID princpiles Liskov substitution principle (LSP) often gets least attention in both theoretical analysis and practical application. Single responsibility invites discussion with its definition of The One reason to change. As a result it has been extensively talked and blogged about. Open-closed principle is well covered by the object-oriented design patterns as many of them are essentially clever tricks for applying the principle. Similarly, there are many popular dependency injection frameworks that implement the dependency inversion. This empowers developers to more easily adhere to those two principles.

That leaves interface segregation and Liskov substitution principle. Of those two I find the former to be easier to abide by. In its definition interface segregation invites the code author to make their interfaces more concise. It's easier to follow because it relates directly to the activity that I, as a code author, am involved in; i.e. writing my interface.

In contrast, Liskov substitution asks me as author of the derived class to make sure it behaves same as the base class (written by someone else) from the perspective of the user code (written by yet another author). It's no just about me and my code, I need to consider two more perspectives. It requires a sort of developer empathy.

This isn't helped by examples that are often used to illustrate the violation of the principle. Classic example is the problem of modeling a square class as a derivative of a more generalized rectangle class. It's a valid example in theory, but is obviously contrived and as such fails to resonate. That's why I was excited when one of my real world bugs boiled down to a violation of Liskov substitution principle.

The bug

I've recently been learning Vala programming language and GTK framework. Vala is a higher-order language, with syntax similar to C# and Java, that compiles down to C. It promises to make targeting the GNOME stack simple. Vala is strongly typed and object-oriented, however this example does not rely on the language itself and can be reproduced in C code as well. GTK is a feature-packed UI toolkit with a rich hierarchy of widgets. It is in the thick of its widgets' inheritance tree that the root of my problem lies.

In my app I needed a container widget that could expand dynamically row by row as I added more child elements to it. Luckily GTK covers just such use-case with its FlowBox component. The FlowBox extends the abstract Container widget which defines the interface for adding and removing child widgets:

Container defines add and remove methods, FlowBox extends from it

This was convenient as I needed to occasionally remove child widgets from the container. Here's a simplified example of what I wanted to achieve:

public class ExampleApp : Gtk.Application {
	protected override void activate () {
		var window = new Gtk.ApplicationWindow (this);
		window.set_default_size (300, 200);

		var hello_world = new Gtk.Label ("Hello, World!");

		var box = new Gtk.FlowBox ();
		box.add (hello_world);
		// removing the child immediately to simplify the example
		box.remove (hello_world);

		window.add (box);
		window.show_all ();
	}

	public static int main (string[] args) {
		var app = new ExampleApp ();
		return app.run (args);
	}
}

Try guessing what this program does:

It opens an empty window.
It opens a window with “Hello, World!” text.
It segfaults.

Spoiler alert: it's option 3:

$ valac --pkg gtk+-3.0 app.vala && ./app
(app:9912): GLib-CRITICAL **: 23:54:22.670: g_sequence_remove: assertion 'iter != NULL' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.670: gtk_widget_show_all: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.697: gtk_widget_get_visible: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.701: gtk_widget_get_visible: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.701: gtk_widget_get_visible: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.701: gtk_widget_get_visible: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.701: gtk_widget_get_visible: assertion 'GTK_IS_WIDGET (widget)' failed
(app:9912): Gtk-CRITICAL **: 23:54:22.701: gtk_widget_is_visible: assertion 'GTK_IS_WIDGET (widget)' failed
[1]    9912 segmentation fault  ./app

Flawed box

What went wrong? The first thing I checked was documentation of add and remove methods on the Container class. The latter one was more interesting (full doc available here):

Removes widget from this. widget must be inside this...

This seemed like a reasonable constraint; I can't remove what's not there. But I just put my hello_world label into the box, how could it not be inside? On to the FlowBox docs (available in full here):

A GtkFlowBox positions child widgets in sequence according to its orientation. ... Although a GtkFlowBox must have only FlowBoxChild children, you can add any kind of widget to it via add, and a GtkFlowBoxChild widget will automatically be inserted between the box and the widget.

There it was: FlowBox wrapped my Label, and so it really wasn't hello_world inside the box, although I thought I had added it. That's why program segfaulted when I tried to remove the label.

Liskov principle violation

In this case it's the FlowBox that breaks the substitution principle. Instance of FlowBox cannot be treated as a Container by the user. This one was easy to spot as I was instantiating the box myself so I knew its exact type. The problem would have been worse had add and remove methods been invoked farther away from the instantiation code. It might have been written by another developer who doesn't even know of the FlowBox class at all.

Understanding what caused the issue was satisfying, but I wasn't ready to let go of it just yet. I wanted to understand what the fundamental issue here was. Container in this case could be considered as a generic container, and FlowBox a derived class with a narrower element type. In this case the GTK classes themselves are not parametrized, but my Java trained brain was happy to recognize a familiar pattern!

Java type system is infamous for having invariant generics, meaning List<String> is not a subtype of List<Object>, even thought String is a subtype of Object. This restriction limits developer's options when writing code but it provides greater compile time safety. (Just as a sidenote; Vala is more loose in this case, allowing more assignments at compile time and failing at runtime.) Kotlin language offers more fine-tuned control while maintaining strict compile time checks. It does this at the expanse of language complexity.

Thinking of Kotlin in this context made me suspect that I was still missing something crucial. I generally find Kotlin interesting but overly complex at times. Its handling of generics is one example of that complexity. So perhaps I was too focused on the type system and compiler in this case? And in GTK widgets hierarchy there was no generic type information to begin with.

Reframing the problem

The issue with FlowBox and Container wasn't that type system was too weak to properly define relationship between them. It's simpler than that. The Container defined a special relationship between its add and remove methods. This contract is described in the documentation. It's easily understandable by a human reader, in fact that's how I managed to find the issue. However, it's completely transparent to the type system and the compiler.

Liskov substitution principle is helpful with precisely this type of problems. When working with strongly typed languages it's easy to forget how many expectations and rules for using a piece of code there are. This is to be expected since the compiler tries to check those rules for us. In a perfect world compiled would stop us from creating invalid derived classes in the first place. However, we are not there yet, and perhaps never will be.

There are a lot of situations when contracts are not enforceable by the compiler. Kotlin's experimental contracts feature is an interesting attempt to expand the scope of the compile time checks. But even that covers only a simple form of contract that is related to a single method. It's hard to declaratively express arbitrary contracts involving multiple methods. The Container's add / remove is one example of such contract. Another popular one is a relationship between equals and hashcode methods.

With this in mind the Liskov substitution principle could be reworded as:

Derived class must not break contracts of its base class.

I find this to be an interesting wording from the perspective of an author of a derived class. Specifically:

It drops the user of the code from my consideration. I no longer have to think about how someone else might attempt to use my derived class. There's no more thinking about substitutions.
It points me directly at what I need to pay attention to: it's the contracts defined by the base class. Those contracts might be expressible in the type system, in which case I can rely on the compiler. Or they might be implicit, in which case I need to pay more attention. In this way it's more actionable that the original wording.

Admittedly, this new wording has a problem of defining a contract. In this aspect the original form is far more elegant and precise. It's easier to define what a substitution is.

Another problem is that there's no clear standard for expressing contracts outside a type system. Putting them into code comments or documentation is one way to go about it. Comments are cheap to add initially and human readers can usually understand them relatively easily. On the other hand they are hard to maintain and reading them requires context switching. There are also annotations that can be used for expressing contracts. For example Java's @Nullable and @NotNull placed on methods to describe return values and parameters. Some times the code of a base class is available so contracts might be inferred from reading the source itself.

Wrap up

Having written all of this down I was finally happy with my conclusions. Of course, none of this helped me fix my code. For that I found that if I manually wrapped my child widget into FlowBoxChild then add method will add it as-is, making the remove work.

On the GTK side of things, the next major framework version will no longer include the Container class with add and remove methods. The migration guide suggests using specialized methods on the derived classes to add child widgets.

Latest Kotlin version 1.4 does not include any improvements to the experimental contracts feature introduced in 1.3. And speaking of programing languages, I've intentionally refrained from bringing up the Go contracts proposal in this discussion. It had only dealt with generics and was abandoned in favor of reusing existing language concepts.

Lastly, I've created a git repository with the example app from above. I've also included versions in C and Python. Unsurprisingly, all three versions behave the same. You can find them on GitHub.

If you liked this post and are interested in more similar content you can subscribe to this blog's RSS feed or follow @antolius@qua.name in fediverse. Alternatively, you can tell me all the things I'm wrong about over at @antolius on Mastodon.