This article originally appeared on IG's blog
After more than 15 years of Java experience, I have tended to brush aside comments about Java's verbosity with one of the following arguments:
- lines of code (LOC) is a bogus metric;
- IDEs generate 90% of my Java code;
- lessons learned from PERL's notorious and incomprehensible conciseness.
LOC metrics are simply not important.
Or are they?
Some time ago we started building our first Spark jobs. The first two that we wrote where basically the same:
It so happened that we wrote one in Clojure and one in Java. When we reviewed the code, this is how the Java version looked:
At first I was surprised that there were so many classes.
Then I was surprised that I was surprised about finding so many classes. After all, the code was the perfectly idiomatic Java code that we all have come to write.
But why was I surprised in the first place? Probably because the Clojure version looked like this:
But maybe it was one huge file with hundreds of lines of code? No. Just 58 lines of code.
Perhaps the Clojure version was a completely unreadable gibberish of magic variables and parentheses all over the place? Here is the main transformation logic between the two versions:
The only difference in readability is that the Java version has a lot more parentheses.
The code review
I usually would not pay attention to Java's verbosity, but during the Java code review I found myself thinking about:
- Which class should I start with?
- Which class should I go next?
- How the classes fit together?
- How the dependency graph looks like?
- Who implements those interfaces? Are they necessary?
- What are the responsibilities of each class?
- Where is the data transformation?
- Is the data transformation correct?
While the Clojure code review was about:
- Is the data transformation correct?
This made me realize that the Clojure version was way simpler to understand, and that the fact of having a single file with 58 lines of code was a very important reason for it.
What about bigger projects?
I don't have any bigger project where the requirements where exactly the same as in here, but it is true that our Clojure micro-services have no more than 10 files, usually 3 or 4, while the simplest of our Java micro-service has several dozens.
And from experience, we know that the time to understand a codebase with 4 small classes is not the same as understanding one with 50 classes.
Incidental Complexity
So given that the inherent complexity of the problem is the same, and that the Clojure version is able to express the solution in 58 lines of code while the Java version require 441 lines of code, and that the Clojure version is easier to understand, what are those extra 383 (87% of the codebase) lines of perfectly idiomatic Java code about?
The answer is that all those extra lines of code fall into the incidental complexity bucket - that complexity that we (programmers) create ourselves by not using the right tools, complexity that our business paid us to create, pays us to maintain, but never really ever asked for.
Are lines of code important? Not as a measure of productivity, but certainly as a measure of complexity, especially if this complexity is incidental instead of inherent.
Imagine deleting 87% of all the code that you have to maintain!
LOC is not a bogus metric, it just depends on what one is measuring. If you're trying to measure code review and maintenance cost then LOC is definitely a valid metric. More lines equals more effort.
Even between different languages?
My Java-wired-brain has said hundreds of times that it was not.
I have changed my mind.
What about you?
Do you optimize Leaf for succinctness? Does that impact readability?
Yes, I even think so comparing across a language. If given the same functionaly equivalent solution in two languages, including error processing, then I would say the one with lower LOC* is of higher value.
Readability is an issue, but I find syntax redundancy is the biggest problem in readability. Less code is simply less to understand, even if it involves more complex operations.
Definitely for Leaf I'll be optimizing for succinctness, but I'm not in favour of complex syntactic structures. I also have inferred and implicit typing, which removes a lot of syntax (it can look like a dynamic language without being one)
(*We might need to be a bit careful in defining LOC though, perhaps just total functional code-size is better, to avoid packed lines somehow getting an undeserved higher score)
Very good point about LOC.
Best of luck with Leaf!
It's curious that some of your conclusions were actually explained by Clojure's creator Rick Hickey: infoq.com/presentations/Simple-Mad...
As a Vim user, I've always been wary of languages which require an IDE to make be able to make even the smallest amount of development on a project. Also, I always found Java to be overly structured with too many layers of abstractions that prevented me from fully understanding what was happening.
And the point is this: not only Java is extremely verbose and requires an IDE to write the code for you - if you want to keep your sanity, that is - it is also very complex, hence hard to understand. So while the first problem might be ignored, the second one is a real concern.
The problem that you have is not that the code is ~440 lines long, but rather that you need 7 files/entities to complete a quite trivial task. What Clojure is telling you here is: you need a couple of simple functions tops. What justifies the added complexity of Java?
Good point about IDEs. I always thought that they were mandatory, who wouldn't want to use them?
They seemed like a positive thing: "my language has a better IDE than yours, hence my language is better".
I still think that the better tools the merrier, but it is more important to need less tools.
Thanks a lot for your comments!
The thing about boilerplate code, which a lot (most?) of that excess is, is that you can ignore it.
But then there's all this code in your app you're ignoring. The language encourages you to ignore the code that you're working in.
There's another aspect to this: It's not just your code you're ignoring. The odds are you're building upon layer after layer of similarly boilerplate-ridden code. This is fine, until you have to figure out what's actually going on.
In Clojure, if you're calling a function, the odds are good you can figure it right out from that. Every now and again I find that there's one layer beneath the one you want—and these days I find that irritating and excessive, never mind the half-dozen, dozen or more layers you may have to dig down in the object-soup. (Digging soup layers? Mixed metaphor felony!)