Subclassing and Java Generics

Posted on July 31, 2010 by Tommy McGuire

Labels: notation, java, programming language

Every once in a while, someone asks me, "Just what is the deal with Java generics and the whole '?' thing? Why would anyone want all that 'extends' and 'super' stuff?" And, of course, every time I have forgotten everything I knew about the subject. I look it up in Java Generics, but by the time I have found it, they have gone and wandered off, muttering and griping. So, I sat down, worked my way through the problem myself, and this is what I came up with.

In all the following, take A to be a class, and B a subclass of A, and C a different subclass of A. (C is not actually needed, but it makes part of the problem clearer.) I will use the <: notation to indicate one class is a subclass of another: B <: A, C <: A. The question is, what is the relationship between, say, List<A> and List? Are they unrelated, or is one a subclass of another? If so, which?

Keeping them unrelated is somewhat unsatisfactory, for reasons I will get into below. As a result, there are two possibilities: List <: List<A> or List<A> <: List. In the former case, if I am using the terms correctly, List is covariant on its type parameter, while in the latter, List is contravariant.

The covariant case makes some intuitive sense; for List <: List<A> since B <: A. Further, if you are expecting to retrieve values from a List<A> function parameter for example, the relationship would work. When passed a List to a function expecting a List<A>, the function can pull out and manipulate A instances without problems, once again since B <: A. On the other hand, having List <: List<A> fails when adding objects to the List<A>. If a function expecting a List<A> is passed a List and adds an instance of another subclass of A to the list, C for example, the original List becomes corrupted.

The other option is for List<A> <: List, the contravariant case. Here, it is entirely possible to add new B instances in a function taking a List parameter and passed a List<A> argument. Unfortunately for the contravariant case, removing elements from a List parameter in a function passed a List<A> is not acceptable because the removed elements will be instances of A but not necessarily instances of B.

	Covariant	Contravariant
Extraction	List<B> b = new ArrayList<B>(); b.add(new B()); List<A> a = b; a.get(0); Result: B	List<A> a = new ArrayList<A>(); a.add(new C()); List<B> b = a; b.get(0); Result: C
Insertion	List<B> b = new ArrayList<B>(); List<A> a = b; a.add(new C()); b.get(0); Result: C	List<A> a = new ArrayList<A>(); List<B> b = a; b.add(new B()); a.get(0); Result: B

Covariant

Contravariant

Extraction

List<B> b = new ArrayList<B>();
b.add(new B());
  List<A> a = b;
  a.get(0);

Result: B

List<A> a = new ArrayList<A>();
a.add(new C());
  List<B> b = a;
  b.get(0);

Result: C

Insertion

List<B> b = new ArrayList<B>();
  List<A> a = b;
  a.add(new C());
b.get(0);

Result: C

List<A> a = new ArrayList<A>();
  List<B> b = a;
  b.add(new B());
a.get(0);

Result: B

The table shows skeletal versions of the four cases, indicating which ones are safe and which ones result in corrupted data. In this table, the operations on the assigned, subclassed object are indented and the slightly dodgey, not-quite-type-correct assignments are emphasized. The table also shows why it is unsatisfactory to take the safe, invariant option of keeping List<A> and List unrelated: two of the four options are both perfectly safe and very useful.

Java allows the two safe options and blocks the two unsafe ones by allowing the programmer to specify when an invariant, covariant, or contravariant relationship is acceptable and giving the compiler the information to check that the programmer has not made a mistake in that specification. The tools the programmer can use to specify the options are wildcard types and the super and extends keywords in type parameter expressions.

The wildcard type parameter by itself is just an alternative name for Object, because by itself it has no constraints: List<?> is the same as List<Object>. (This is not quite true, but close enough for the moment.) The wildcard becomes interesting when used with the super and extends constraints.

List<? extends A> indicates a covariant use of List; it allows an instance of a class which is parameterized by a subclass to be assigned to a variable of the class parameterized by a superclass. To put it another way, List<? extends A> acts as a subclass of List<A> for a specific use, with the ability to take A instances from the List. For example:

List<B> b = new ArrayList<B>();
b.add(new B());
  List<? extends A> a = b;
  a.get(0);

The result of the get() method is a valid instance of A. On the other hand, the failing covariant insertion case does not compile, since the call to add() results in a type error.

On the other hand, List<? super A> indicates a contravariant use of List, or in other words allows List<A> to act as a subclass of List<? super B> for the specific use. For example:

List<A> a = new ArrayList<A>();
  List<? super B> b = a;
  b.add(new B());
a.get(0);

The insertion succeeds and the result of the get() is a valid A, which happens to be an instance of B.

Covariant and contravariant relationships are both more important and more complex than just the containers used as examples here. For one thing, functions assigned or used as parameters are typically contravariant in their parameters and covariant in their return types. For example, a function Fc with a type A->B can be used in a place expecting a function F of type B->A. Specifically,

B Fc(A a) {
    return new B();
  }
...
A a = F(new B());

or, in a slightly more useful example,

interface Functor<R,P> { public abstract R apply(P p); }
  
static <R,P> List<R> map(Functor<R,P> f, List<? extends P> ps) {
  List<R> rs = new ArrayList<R>();
  for (P p : ps) {
    rs.add(f.apply(p));
  }
  return rs;
}
...
List<B> b = new ArrayList<B>();
...
Collection<B> b2 = map(new Functor<B,A>() {
        public B apply(A a) { return new B(); }
      }, b);

In this code, Functor<R,P> represents a function object that has a parameter of type P and returns an object of type R. The map() method takes a Functor<R,P> and a List of some subclass of P and produces a List<R>. In use, the function object is a Functor<B,A> and the list is List (where, as before, B <: A). The map() method produces a List, which is assigned to a Collection, which is a superclass of List.

For more information about the relationship between Java, subclassing, and object-oriented inheritance, see Java Generics and Collections by Maurice Naftalin and Philip Wadler, Types and Programming Languages by Benjamin C. Pierce, and the Java Generics FAQs by Angelika Langer. Programming in Scala by Martin Odersky, Lex Spoon, and Bill Venners has a good chapter on generics, covariance, and contravariance in Scala.

Speaking of Scala, the Wikipedia page on Covariance and contravariance has the following:

The Scala programming language originally supported use-site declarations of covariance and contravariance [as I understand it, Java uses use-site declarations], but it turned out to be difficult to achieve consistency of usage-site type annotations, so that type errors were not uncommon. Because declaration-site annotations provide excellent guidance on which methods should be generalized with lower bounds, Scala later switched from usage-site to declaration-site variance annotations. Though usage-site variance annotations allow a single class to have covariant as well as non-variant fragments, Scala's mixin composition makes it relatively easy to explicitly factor classes into covariant and non-variant fragments.

By the way, the Guava library parameterizes the functor argument rather than the collection. I'm not sure why. It also provides an on-going, active view of the argument list rather than a one-time transformation. Whoa.