begriffs

Good books for deep hacks

April 13, 2017
Newsletter ↳

St. Jerome in his study

For the past few months I’ve been compiling a list of books for a deep dive into interesting technical topics. My theory is that working on projects based on these topics will be like strong individual threads I can weave into epic hacks. This list is basically a curriculum for decades of learning about the wonders of computers.

What’s exciting about many of these books is how they draw on the good ideas from history. Many of them cover technologies created in the 1990s and earlier, things that we’d do well to understand, even while surpassing them. Much old software has had time to mature, and has been adjusted to be very effective. If there’s a printed book that is old but still accurate this indicates the software it describes is well constructed.

I’ve also chosen books that cover alternative ways to do things. For instance learning about document layout engines to compare them with the current DOM/CSS monoculture, or about how various distributed version control systems compare with Git.

The books here are emphatically not about “cracking” coding interviews, or any other demonstrative brainteasing. It’s all about intrinsically interesting things. I’ve also omitted the usual suspects like SICP, TAOCP or CLRS – my choice of books are higher-level. They are guides for jumping into fun deep hacks.

Haskell

Let’s start here. I want a language to grow with, one with enough depth to offer years of learning. For me that language is Haskell. Depending on the hack, I’ll be using Haskell or C. Why mess with the things in between? (What’s up with everyone nowadays using a misbegotten child of the browser wars as their main language?)

Haskell compiles into fast code if you avoid some gotchas, and prevents classes of dumb bugs that nobody should have to worry about.

C

Sure, Haskell is great and its abstraction is rewarding but you can’t beat the C language for intrinsic simplicity. The attendant tasks of manual memory management and concurrency may be complex, but there is certainly no hand-waving.

Profiling

Learn the measurements that are relevant to system performance, and how to design rigorous experiments to capture them.

Debugging

Stop guessing and flailing and instead use a systematic approach for finding bugs.

Relational Data Management

Talk about mature technology, SQL has evolved for decades as the world’s foremost declarative language. This selection of books covers SQL mastery along with a deep understanding of the problems of transactions and recovery solved by modern RDBMSs.

Networking

General Networking

These books cover the history and design of TCP/IP and the standard network layers. They talk about design choices, and new developments like IPv6.

Wireless Networking

The magic of radio… it’s a wonder of nature. From its simple spark gap origins to modern mesh networking, radio offers free lightspeed communication to all.

Delay-Tolerant Networking

Delay-tolerant networked programs are designed to work smoothly under an intermittent network connection. They often use a store-and-forward system in which nodes exchange traffic only when they are able.

The old reality of telephone modems and long distance costs made these programs tough and resilient. In today’s always-connected world of pocket surveillance devices it’s nice to have software that works offline.

- Email

Good old email, the original social network. As a successful interoperable world-wide communications standard that has lasted for decades, it should be a rich and instructive topic.

- UUCP and Usenet

These systems allow decentralized propagation of files and messages over several different types of physical connections and link layer protocols.

- Distributed Version Control

I’ve been using Git for many years and quite enjoy it, or at least am brainwashed by familiarity. It would be worthwhile to give other systems a try for comparison.

Chat / Instant Messaging

Before the proliferation of web-based companies competing to host, hoard, and mine organizations’ chats, there was IRC. Learn how to use it and how to operate a channel. Help keep an open internet alive.

For a more person to person chat experience with support for multimedia, there’s XMPP, a well established open standard.

HTTP Reverse Proxy and Caching

Reverse proxies and load balancers have come up many times for me when working with web applications. I think it would pay off to learn them thoroughly.

Cryptography

Learn the building blocks of cryptography, and how/when to apply them as full cryptosystems. These books go deep but not in an overly proof-heavy way.

Privacy

Much of the geeky encryption mumbo jumbo is defenseless against the power of law. What are reasonable expectations for privacy, what is the current law, and how should we frame this issue for those unfamiliar with it?

Dates and Times

Whenever a coding task involves date or time processing I always mentally add a big bump to my cost estimation. That’s because we’re hurtling through a cosmos of spinning rocks that are simultaneously free-falling toward each other, whose very measurements of time and distance are a relativistic funhouse mirror. We make feeble calendar simplifications and smirk, “looks like somebody has a case of the Mondays,” while the infinitude of space rolls above.

Geographical Information Systems

Like measurements of time, measurements of space are complicated. However the payoff appears to be big, with query systems like PostGIS able to plan routes and answer sophisticated spatial queries.

Unicode and Fonts

Amazingly, people have created a standard that can encode all written human languages. Learning about this should provide an interesting perspective on writing and language itself.

Parsing

Being able to parse languages feels like the stuff that wizards do. Those people. Thus far I’m constrained by the syntax devised by others, but creating my own would feel pretty magical.

Garbage Collection

Understanding the techniques of automatic memory management allows us to predict and tune this aspect of runtime performance of programs written in high level languages. For instance, Haskell uses a generational garbage collector with tunable parameters. Knowing the theory allows for reasoned tuning.

Garbage Collection Handbook: The Art of Automatic Memory Management

Unix

The design of the kernel and tools. Plus, how to use OpenBSD, probably the best descendant.

Document Layout

Document layout engines are designed to specify exactly how a document should look on a fixed size page. There are a number of popular systems and comparing them should be interesting.

Application Layout

Application layout engines deal with organizing graphical user interfaces which must accommodate variable window and display sizes.

Seems like everybody’s unreflectively in love with the DOM and CSS. They even use bloatware like Electron in order to bring this beloved layout engine to the desktop. What are the alternatives?

Serial Communication

What with evil maid attacks and Poisontap, I think it would be good to be educated about how USB really works. Plus it’s the way most devices connect.

Graphics

I would love to make impeccable graphics, choosing raster or vector appropriately, and using the best file format for the job. Really understanding how images are encoded and how to efficiently use open source editing tools would provide a lot of power for designing beautiful and usable documentation and ornamentation.

Text Editing

I’m pretty good with Vim, but my reliance on fancy plugins makes me think there may be basics yet to learn in the core program. Also the Emacs based Org Mode looks like the textual Evernote killer.

Number Representation

How do you efficiently and accurately represent the arithmetic of the real numbers in a computer? The IEEE floating point standard has been called “one of the greatest achievements in computing,” so yeah, tell me more!

The Human Side

Licenses and Law

Licenses capture people’s expectations for the behavior, development, and use of programs. Ultimately software exists for human beings, so this topic is very important. It’s also good to understand the implications of the terms and conditions attached to pretty much every commercial program and web site.

Estimation

I suck at estimating software development time! The reassuring thing is most people do. Think what a difference it would make to be able to formulate accurate confidence intervals for development time.

Code Review

I have most experience with code reviews through the Github pull request workflow. However I’ve heard people complain that it is too primitive. Curious to see other approaches and try other programs for the job.