JEP 286: Local-Variable Type Inference

Author	Brian Goetz
Owner	Dan Smith
Created	2016/03/08 15:37
Updated	2016/03/09 20:18
Type	Feature
Status	Candidate
Component	specification / language
Scope	SE
Discussion	platform dash jep dash discuss at openjdk dot java dot net
Effort	M
Duration	S
Priority	3
Reviewed by	Alex Buckley, Mark Reinhold
Endorsed by	Mark Reinhold
Issue	8151454

Summary

Enhance the Java Language to extend type inference to declarations of local variables with initializers.

Goals

We seek to improve the developer experience by reducing the ceremony associated with writing Java code, while maintaining Java's commitment to static type safety, by allowing developers to elide the often-unnecessary manifest declaration of local variable types. This feature would allow, for example, declarations such as:

var list = new ArrayList<String>();  // infers ArrayList<String>
var stream = list.stream();          // infers Stream<String>

This treatment would be restricted to local variables with initializers, indexes in the enhanced for-loop, and locals declared in a traditional for-loop; it would not be available for method formals, constructor formals, method return types, fields, catch formals, or any other kind of variable declaration.

Success Criteria

Quantitatively, we want that a substantial percentage of local variable declarations in real codebases can be converted using this feature, inferring an appropriate type.

Qualitatively, we want that the limitations of local variable type inference, and the motivations for these limitations, be accessible to a typical user. (This is, of course, impossible to achieve in general; not only will we not be able to infer reasonable types for all local variables, but some users imagine type inference to be a form of mind reading, rather than an algorithm for constraint solving, in which case no explanation will seem sensible.) But we seek to draw the lines in such a way that it can be made clear why a particular construct is over the line -- and in such a way that compiler diagnostics can effectively can connect it to complexity in the user's code, rather than an arbitrary restriction in the language.

Motivation

Developers frequently complain about the degree of boilerplate coding required in Java. Manifest type declarations for locals are often perceived to be unnecessary or even in the way; given good variable naming, it is often perfectly clear what is going on.

The need to provide a manifest type for every variable also accidentally encourages developers towards overly complex expressions; with a lower-ceremony declaration syntax, there is less disincentive to break complex chained or nested expressions into simpler ones.

Nearly all other popular statically typed "curly-brace" languages, both on the JVM and off, already support some form of local-variable type inference: C++ (auto), C# (var), Scala (var/val), Go (declaration with :=). Java is nearly the only popular statically typed language that has not embraced local-variable type inference; at this point, this should no longer be a controversial feature.

The scope of type inference was significantly broadened in Java SE 8, including expanded inference for nested and chained generic method calls, and inference for lambda formals. This made it far easier to build APIs designed for call chaining, and such APIs (such as Streams) have been quite popular, showing that developers are already comfortable having intermediate types inferred. In a call chain like:

int maxWeight = blocks.stream()
                      .filter(b -> b.getColor() == BLUE)
                      .mapToInt(Block::getWeight)
                      .max();

no one is bothered (or even notices) that the intermediate types Stream<Block> and IntStream, as well as the type of the lambda formal b, do not appear explicitly in the source code.

Local variable type inference allows a similar effect in less tightly structured APIs; many uses of local variables are essentially chains, and benefit equally from inference, such as:

var path = Path.of(fileName);
var fileStream = new FileInputStream(path);
var bytes = Files.readAllBytes(fileStream);

Description

For local variable declarations with initializers, enhanced for-loop indexes, and index variables declared in traditional for loops, allow the reserved type name var to be accepted in place of manifest types:

var list = new ArrayList<String>(); // infers ArrayList<String>
var stream = list.stream();         // infers Stream<String>

The type is inferred based on the type of the initializer. If there is no initializer, the initializer is the null literal, or the type of the initializer is not one that can be normalized to a suitable denotable type (these include intersection types and some capture types), or if the initializer is a poly expression that requires a target type (lambda, method ref, implicit array initializer), then the declaration is rejected.

We may additionally consider val or let as a synonym for final var. (In any case, locals declared with var will continue to be eligible for effectively-final analysis.)

The identifier var will not be made into a keyword; instead it will be a reserved type name. This means that code that uses var as a variable, method, or package name will not be affected; code that uses var as a class or interface name will be affected (but these names violate the naming conventions.)

Excluding locals with no initializers eliminates "action at a distance" inference errors, and only excludes a small portion of locals in typical programs.

Excluding RHS expressions whose type is not denotable would simplify the feature and reduce risk. However, excluding all non-denotable types is likely to be too strict; analysis of real codebases show that capture types (and to a lesser degree, anonymous class types) show up with some frequency. Anonymous class types are easily normalized to a denotable type. For example, for

var runnable = new Runnable() { ... }

we normalize the type of runnable to Runnable, even though inference produces the sharper (and non-denotable) type Foo$23.

Similarly, for capture types Foo<CAP>, we can often normalize these to a wildcard type Foo<?>. These techniques dramatically reduce the number of cases where inference would otherwise fail.

Running a prototype over the JDK source code:

83.5% inferred the exact type present in the source code
4% inferred another denotable type (usually a sharper type)
0% were rejected because inferred type was non-denotable
8.5% were rejected because they had no initializer
3% were rejected because initializer was null
.5% were rejected because a target type was required

If we exclude those that are rejected because they have no initializer or the initializer was null, we find that over 99% of local variables are inferrable, and 95% are inferred with the exact type present in the source code.

Of effectively final locals (77% of all locals):

86% inferred the exact type present in the source code
4.5% inferred another denotable type (usually a sharper type)
0% were rejected because inferred type was non-denotable
8% were rejected because they had no initializer
.5% were rejected because initializer was null
.5% were rejected because a target type was required

Alternatives

We could continue to require manifest declaration of local variable types.

We could support diamond on the LHS of an assignment; this would address a subset of the cases addressed by var.

The design described above incorporates several decisions about scope, syntax, and non-denotable types; alternatives for those choices which were also considered are documented here.

Scope Choices

There are several other ways we could have scoped this feature. One, which we considered, was restricting the feature to effectively final locals (val only). However, we backed off from this position because:

The majority (more than 75% in both JDK and broader corpus) of local variables with initializers were already effectively immutable anyway, meaning that any "nudge" away from mutability that this feature could have provided would have been limited;
Capturability by lambdas/inner classes already provides a significant push towards effectively final locals;
In a code block with (say) 7 effectively final locals and 2 mutable ones, the types required for the mutable ones would be visually jarring, undermining much of the benefit of the feature.

On the other hand, we could have expanded this feature to include the local equivalent of "blank" finals (i.e., not requiring an initializer, instead relying on definite assignment analysis.) We chose the restriction to "variables with initializers only" because it covers a significant fraction of the candidates while maintaining the simplicity of the feature and reducing "action at a distance" errors.

Similarly, we also could have taken all assignments into account when inferring the type, rather than just the initializer; while this would have further increased the percentage of locals that could exploit this feature, it would also increase the risk of "action at a distance" errors.

Syntax Choices

There will inevitably be a diversity of opinions on syntax. The two main degrees of freedom here are what keywords to use (var, auto, etc), and whether to have a separate new form for immutable locals (val, let). We considered the following syntactic options:

var x = expr only (like C#)
var, plus val for immutable locals (like Scala, Kotlin)
var, plus let for immutable locals (like Swift)
auto x = expr (like C++)
const x = expr (already a reserved word)
final x = expr (already a reserved word)
let x = expr
def x = expr (like Groovy)
x := expr (like Go)

Whether or not to have a second form for immutable locals (val, let) is a tradeoff of additional ceremony for additional capture of design intent. We already have effectively-immutable analysis for lambda and inner class capture, and the majority of local variables are already effectively immutable. Some people like that var and val are so similar, so that the difference recedes into the background when reading code, while others find them distractingly similar. Similarly, some like that var and let are clearly different, while others find the difference distracting. (If we are to support new forms, they should have equal syntactic weight (both val and let qualify), so that laziness is less likely to entice users to omit the additional declaration of immutability.)

Auto is a viable choice, but Java developers are more likely to have experience with Javascript, C#, or Scala than they are with C++, so we do not gain much by emulating C++ here.

Using const or final seems initially attractive because it doesn't involve new keywords. However, going in this direction effectively closes the door on ever doing inference for mutable locals. Using def has the same defect.

The Go syntax (a different kind of assignment operator) seems pretty un-Javaish.

Non-Denotable Types

We have several choices as to what to do with nondenotable types (null types, anonymous class types, capture types, intersection types.) We could reject them (requiring a manifest type), accept them as inferred types, or try to "detune" them to denotable types.

Arguments for rejecting them include:

Risk reduction. There are many known corner cases with weird types such as captures and intersections in both the spec and the compiler; by allowing variables that have these types, they are more likely to be used, activate corner cases, and cause user frustration. (We are working on cleaning these up, but this is a longer-term activity.)
Expressibility-preserving. By rejecting non-denotable types, every program with var has a simple local transformation to a program without var.

Arguments for accepting them include:

We already infer these types in chained calls, so it is not like our programs are free of these types anyway, or that the compiler need not deal with them.
Capture types arise in situations when you might think that a capture type is not needed (such as var x = m(), where m() returns Foo<?>); rejecting them may lead to user frustration.

While we were initially drawn to the "reject them" approach, we found that there were a significant class of cases involving capture variables that users would ultimately find to be mystifying restrictions. For example, when inferring

var c = Class.forName("com.foo.Bar")

inference produces a capture type Class<CAP>, even though the type of this expression is "obviously" Class<?>. So we chose to pursue an "uncapture" strategy where capture variables could be converted to wildcards (this strategy has applications elsewhere as well). There are many situations where capture types would otherwise "pollute" the result, for which this technique was effective.

Similarly, we normalize anonymous class types to their (first) supertype. We make no attempts to normalize intersection or union types. The largest remaining category where we cannot infer a sensible result is when the initializer is null.

Risks and Assumptions

Risk: Because Java already does significant type inference on the RHS (lambda formals, generic method type arguments, diamond), there is a risk that attempting to use var/val on the LHS of such an expression will fail, and possibly with difficult-to-read error messages.

We've mitigated this by using simplified error messages when the LHS is inferred.

Examples:

Main.java:81: error: cannot infer type for local
variable x
        var x;
            ^
  (cannot use 'val' on variable without initializer)

Main.java:82: error: cannot infer type for local
variable f
        var f = () -> { };
            ^
  (lambda expression needs an explicit target-type) 

Main.java:83: error: cannot infer type for local
variable g
        var g = null;
            ^
  (variable initializer is 'null')

Main.java:84: error: cannot infer type for local
variable c
        var c = l();
            ^
  (inferred type is non denotable)

Main.java:195: error: cannot infer type for local variable m
        var m = this::l;
            ^
  (method reference needs an explicit target-type)

Main.java:199: error: cannot infer type for local variable k
        var k = { 1 , 2 };
            ^
  (array initializer needs an explicit target-type)

Risk: Inferring non-denotable types might press on already-fragile paths in the specification and compiler.

We've mitigated this by normalizing most non-denotable types, and rejecting the remainder.

Risk: source incompatibilities (someone may have used "var" as a type name.)

Mitigated with reserved type names; names like "var" and "val" do not conform to the naming conventions for types, and therefore are unlikely to be used as types. The names "var" and "val" are commonly used as identifiers; we continue to allow this.

Risk: reduced readability, surprises when refactoring.

Like any other language feature, local variable type inference can be used to write both clear and unclear code; ultimately the responsibility for writing clear code lies with the user.