Inheritance and composition are two popular methods of code reuse.
Inheritance is when you reuse a class by inheriting from it and extending/modifying it.
Composition is when you reuse a class by creating and storing an instance as a variable or attribute, and interacting with it via its public interface.
Aim: to convince you that inheritance is fundamentally toxic and that composition is almost always the right choice.
Scenario: implement "OrderedDict" ("ODict"): a dict type that maintains key insertion ordering, and exposes methods like getNthInsertion(n), popfirst(), poplast(), …
How it might look:
By OOP theory, this is a valid use of inheritance! Satisfies "is-a" principle and Liskov substitution.
They are superficially very similar approaches, but with big differences:
These differences cause consequences that make dealing with inheritance-based code a living nightmare.
Consequence 1: Loss of encapsulation ⇒ refactoring is harder
In general, reducing the interconnectedness of systems improves maintainability and comprehensibility
Encapsulation definition 1: hiding the information internal to a system, e.g. by making the implementation details private.
Encapsulation reduces and controls the size of the surface at which the external world can connect and interact with a system, which protects against over-interconnectedness.
Encapsulation applies to structs, classes, modules, services, systems, software teams, …
In composition, ODict connects to base dict only via dict's public interface. And base dict doesn't connect to ODict.
In inheritance, ODict connects to base dict via dict's internal implementation details. And base dict can interact with ODict in unclear/unpredictable ways.
Encapsulation is the mortal enemy of refactoring: the difficulty of refactoring is proportional to the amount of interconnectedness of a system, and inheritance leads to more interconnectedness.
"Fragile base class" problem: if you inherit from a class, then that base class becomes fragile because its descendants will depend on its implementation details
Consequence 2: Loss of abstraction ⇒ testing is hard
Abstraction = exposing only the essential characteristics of a system, so that users focus on what's important.
Abstraction is the most important concept in software design.
Encapsulation is an implementation-level concern. Abstraction is a design-level concern.
But encapsulation can be a tool to achieve abstraction, by forcing the user to depend only on a public interface that may not represent the true implementation details hidden beneath.
Example: a spreadsheet in Google Sheets isn't implemented as a 2D array under the hood. But the 2D array representation is what matters to the user.
Well-abstracted code is easy to test, because the underlying implementation can be replaced with a simplified or faked version, so that we isolate our testing to the code we wish to test.
This is an example of the design principle of inversion of control. Composition adapts easily to inversion of control, inheritance doesn't.
In practice, composition can take advantage of language/library features like monkey-patching or dependency injection.
Due to the subclass's potential dependence on implementation details of the superclass, it's hard to replace or mock the functionality of a superclass.
Even if the subclass interacted with the superclass in a clear way, most languages do not provide support for mocking/faking a superclass.
This is why <SubclassExample> tools are difficult to test! They all inherit from <SuperclassExample>, which has an enormous interface which <SubclassExample> tools depend on. If we could rewrite the <SubclassExample> interface now, we would use a composition-based approach.
Consequence 3: The yo-yo problem
Encapsulation definition 2: bundling of related data and functionality.
Encapsulation reduces cognitive load for the user/developer.
A class written with composition groups all its data and functionality in the class definition, and all interactions with underlying classes takes place via their public interfaces.
A class in an inheritance hierarchy has its data and functionality spread over its entire inheritance tree.
Thus, the experience of working with the code involves yo-yoing up and down the inheritance tree, usually across different files, just to build a mental map of the logic or to make any changes.
A key aspect of clean design is small focused public interfaces. But when we overuse inheritance, our class inherits the interface of all parent classes. This obscures the public interface of the class and makes good design harder to practice.
You might insist that you're a good developer, and you use inheritance just enough but never overdo it, and that your inheritance trees will never be more than 2-3 deep. But usually you're not the only one working on a codebase, and inheritance for developers like a drug: it's easy and enticing and it solves the problem you have right now, so devs reach for it without considering the long-term harm.
The <xxx> codebase is an example: it was built by 4 talented senior developers in a small span of time, but these developers didn't have an aversion to inheritance. Within a couple of years, the codebase had inheritance trees of hundreds of classes, including trees eight levels deep. This means you need to have eight files open at once just to understand what the class does.
Just say no to inheritance!
Consequence 4: Multiple inheritance
The ODict implemented with inheritance achieves code reuse by inheriting from one underlying class: dict.
But eventually, we might want to implement its ordering logic in terms of some other class. Say, a binary tree ("bintree").
At this point we must decide between:
we can inherit from both dict and bintree. But then we deal with the many many serious pitfalls of multiple inheritance.
or we can inherit from dict, and keep bintree as an attribute. But now we are inheriting and composing, and suffer the downsides of both. And how do we decide whether we inherit from dict and compose with bintree, or vice versa?
You might say that an ordered dict "is not" a binary tree, so it shouldn't inherit from bintree. But in general, as a design develops, you will eventually encounter classes that satisfy the "is-a" relationship for two different things.
Composition simply doesn't have this problem. It is more flexible than inheritance by its very nature. And we get this flexibility "for free", without any disadvantages.
Consequence 5: The single-instance problem
When ODict inherits from dict, the inheritance mechanism creates one instance of the underlying dict (the one that you get via super() in Python).
But as a design develops, you will often find that it naturally generalises to be implemented in terms of more than one instance.
E.g. maybe you want to store the reverse dictionary mapping (value->key) as well, in which case ODict is naturally expressed in terms of two dictionary instances.
With composition, you just add another attribute! dict1, dict2. Free flexibility once again.
With inheritance, you can add dict2 as an attribute, but once again you now have a Frankenstein combination of inheritance and composition. Your code ends up interacting with one of its underlying objects via super(), and the others via attribute access.
Conclusion
Each of these consequences cause severe and long-term maintenance impacts on the code:
Loss of encapsulation ⇒ code becomes harder to refactor
Loss of abstraction ⇒ code becomes harder to test
Yo-yo ⇒ code becomes harder to understand
Multiple inheritance ⇒ code becomes harder to extend
Single-instance problem ⇒ code doesn't generalise to >1 instance
Implementation inheritance is a fundamentally bad feature. Even when used "correctly" – i.e. obeying "is-a" and Liskov substitution etc. – it still incurs all these problems.
It follows that you should only use inheritance over composition if its advantages outweigh all of these disadvantages.
It's very rare that this is the case. It generally only happens when you're interacting with external classes that you don't control, where you might be forced to inherit from those classes to work with some interface, or where the consequence of not inheriting might be that you have to implement dozens of delegation methods.
Interface inheritance (i.e. inheriting from a pure interface class that has no data or functionality) is great! It enables polymorphism, which is a powerful tool for abstraction. Use interface inheritance to your heart's content – it does not suffer from any of the problems we've mentioned.
So, next time you see a problem, and you think "I'll use inheritance" – stop. Take your hands off the keyboard. Think really really hard about whether that's what you want to do to yourself and to the people around you. Consider all alternatives. Reach for inheritance as a last resort.
No comments:
Post a Comment