Ask HN: What overlooked class of tools should a self-taught programmer look into

aequitas · 79 days ago

Makefiles. I always dismissed them as a C compiler thing. Something that could never be useful for Python programming. But nowadays every project I create has a Makefile to bind together all task involved on that project. From bootstrapping the dev environment, running checks/test, starting a devserver, building releases and container images. Makefiles are just such a nice place to put scripts for these common tasks compared to normal shell scripts. The functional approach and dependency resolving of Make allows you to express them with minimal boilerplate and you get tab completion for free. Sometimes I try to take more native solutions (eg. Tox, docker) but I always end up wrapping those tools in a Makefile somewhere forthe road since there are always missing links and because Make is ubiquitous on nearly every Linux and macOS it is just all you need to get a project started.

Example: https://gitlab.com/internet-cleanup-foundation/web-security-...

Running the 'make' command in this repo will setup all requirements and then run code checks and tests. Running it again will skip the setup part unless one of the dependency files has changed (or the setup env is removed).

shavenwarthog2 · 78 days ago

In 2019 Makefiles are a useful tool for automating project-level things. Too often webapps will require you to install X to install Y to run producing artifact Z. Since Make is old and baked and everywhere, specifying "make Z" is a useful project-level development process. It's not tied to a language (e.g. Tox) nor a huge runtime (Docker). Make is small enough in scope to be easy, and large enough to be capable without a lot of incantations.

The big downside of Make, alas, is Windows compatibility.

deng · 78 days ago

> The big downside of Make, alas, is Windows compatibility.

GNU Make works fine on Windows. The sources come with a vcproj to build it natively, or you get it from ezwinports. At my dayjob, we have a pretty complicated build with GNU Make for cross-compiling our application to Arm and PowerPC, and it works on Windows, even with special Guile scripts to reduce the number of shell calls which are extremely slow on Windows.

Const-me · 78 days ago

Most popular folder on Windows is "My documents", it has a space at least in some Windows versions. Make doesn't support such paths: http://savannah.gnu.org/bugs/?712

VBScript works better on Windows, IMO. Also works out of the box on all Windows versions since at least 2000 (on Win9x it was shipped with IE).

pinum · 78 days ago

>Most popular folder on Windows is "My documents"

Not really... not since XP, anyway. Unless you have a space in your username (which is a terrible idea for many other reasons), your "Documents" path is C:\Users\JohnSmith\Documents. "Program Files" is pretty much the only important path which is likely to have spaces, and your makefiles (hopefully!) don't need to touch that.

Const-me · 78 days ago

On Windows, it's not up to me to decide where users will keep my stuff, and where it will work. Users decide.

For a software to work fine on Windows, it must support spaces in files and paths. Also Unicode in files and paths.

VBScript does, GNU Make doesn't.

> which is a terrible idea for many other reasons

If you use make to setup stuff, it's very possible you'll need to access "c:\Users\All Users" which does contain space in username. Also "c:\Program Files (x86)\Common Files" which contain more than one.

WhiteOwlLion · 78 days ago

You can try the 8.3 convention,

DOCUME~1 Documents

or

<SYMLINKD> ALLUSE~1 All Users [C:\ProgramData]

deng · 78 days ago

> Make doesn't support such paths

That is entirely correct and really the most glaring downside of Make. In my opinion: If you have spaces in your dependency names, just stay away from Make as far as possible.

enesunal · 77 days ago

You wouldn't install Visual Studio 6 to XP easily; since it wasn't support spaces in "Program Files" directory :)

Const-me · 76 days ago

Long paths were introduced in Windows NT in 1993, VC6 released in 1998.

I’ve just installed Visual C++ 6.0 Professional on a WinXP VmWare machine. Took less than a minute, BTW — modern SSDs are awesome. The default installation path under program files also contain spaces, it’s "C:\Program Files\Microsoft Visual Studio\VC98"

BTW, they have a bug even on the very first welcome screen: http://const.me/tmp/vc6.png

voltagex_ · 78 days ago

You'll have WSL/WSL2 to work with, too. If not make, then CMake is now supported in Visual Studio (2017/2019) and works well.

>ezwinports

Interesting, hadn't run into this before. What's the advantage over MSYS2?

deng · 78 days ago

Eli is one the few free software veterans who exclusively works on Windows. His ports are excellent and native Windows binaries wherever possible ("native" meaning: no MSYS at all). It's not that "native" is always better, but it is good to have the choice. Especially w.r.t. GNU Make, I found the MSYS version to be very hard to reason with, since the additional MSYS path conversion makes things even more complicated than it already is...

baroffoos · 78 days ago

Does make files ever actually work that way? In my experience they always require you to install a bunch of packages for libraries which usually only tells you the package names for ubuntu so you have to hunt down what the package is called on your distro or if that version of the package is even in the repos.

kd5bjo · 78 days ago

If you write them yourself they certainly can. For small projects, you can leave everything explicit and it works great.

Can you break down your problem into a bunch of rules of the form “To produce this file, I need to run these shell commands, which read those other files over there as input”? If so, Make can take care of figuring out which steps actually need to be run.

boomlinde · 78 days ago

Most plain makefiles I've used use a tool like pkg-config to resolve library/header paths.

pixelrevision · 78 days ago

Will not be a problem for much longer :) https://www.theverge.com/2019/5/6/18534687/microsoft-windows...

shpx · 78 days ago

Hopefully Windows 11 is just a reskinned Ubuntu running all the old Windows programs through Wine.

mycall · 78 days ago

I'll be surprised if there will ever be a Windows 11.

pickdenis · 78 days ago

> The big downside of Make, alas, is Windows compatibility.

You'd have to give me a _very_ compelling reason to support developers who use Windows, when Windows lacks these essential tools. Besides, don't people who develop on Windows live in WSL?

ChristianGeek · 78 days ago

Nope. I develop in Python, Java, and Kotlin on Windows and never touch WSL. Make is available natively through Chocolatey (a Windows package installer), but I prefer Gradle.

(I also write code to run on Linux, but still prefer Gradle.)

noir_lord · 78 days ago

Slightly off topic but what would you suggest for someone who is familiar with build systems but who hasn’t used gradle?

I’m just getting into Kotlin and gradle isn’t something I’ve used before since I’m mostly web, .net til now.

OJFord · 78 days ago

Why don't you use WSL?

I can barely understand why you'd want to develop on Windows (ok, for non-Windows-only products) with it, but without it...

bunderbunder · 78 days ago

If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.

nickjj · 78 days ago

> If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.

I've been actively using WSL for over a year along with Docker and set up the Docker CLI in WSL to talk to the Docker for Windows daemon.

Performance in that scenario is no different than running the Docker CLI in PowerShell, or do you just mean I/O performance in general in WSL? In which case once you turn off Windows defender it's very usable. WSL v2 will also apparently make I/O performance 2-20x faster depending on what you're doing.

WSL adds a lot if you're using Docker IMO. Suddenly if you want, you can run tmux and terminal Vim along with ranger while your apps run nice and efficiently in Docker. Before you know it, you're spending almost all of your time on the command line but can still reach into the Windows cookie jar for gaming and other GUI apps that aren't available on Linux and can't be run in a Windows VM.

bunderbunder · 78 days ago

I find that it depends a lot on what you're doing. The real problem with WSL is I/O latency.

It's acceptable for relatively infrequent file access, but will eat you alive if you're doing anything that involves lots of random file access, or batch processing of large sets of small files, or stuff like that.

nickjj · 78 days ago

I just haven't seen that as a problem in my day to day as a developer working with Flask, Rails, Phoenix and Webpack.

That's dealing with 10k+ line projects spread across dozens of files quite often, and even transforming ~100 small JS / SCSS files through Webpack. It's all really fast even on 5 year old hardware (my source code isn't even on an SSD either).

Fast as in, Webpack CSS recompiles often take 250ms to about 1.5 second depending on how big the project is and all of the web framework code is close to instant to reload on change. Hundreds of Phoenix controller tests run in 3 seconds, etc..

myoon · 78 days ago

It isn't perfect. The IO performance is currently poor and it doesn't play well with Windows Defender (wastes a lot of CPU). Also, since your IDE would live in Windows, you can sometimes have issues with Windows and Linux both interacting with the same files.

nimnio · 69 days ago

More developers are coding in Windows than any other operating system -- almost more than Mac and Linux combined. The Hacker News filter bubble might lead us to believe otherwise.

https://insights.stackoverflow.com/survey/2019#technology-_-...

Windows 47.5%

MacOS 26.8%

Linux-based 25.6%

BSD 0.1%

87,851 responses

(The Stack Overflow survey is a poor representation of the entire development community, but it's worth something, maybe the best we have.)

astrobe_ · 78 days ago

I've compiled a thing or two with MSYS2.

voltagex_ · 78 days ago

Developers in corporate environments?

VS2019 supports Clang and cmake now.

logicallee · 78 days ago

>The big downside of Make, alas, is Windows compatibility.

Isn't the big problem that you have no idea what it's doing to your system? Also that you aren't expected to be able to undo it. You can read the makefiles, of course, but it seems simpler not to have to. (Just update the necessary packages yourself, to the latest version.)

Forgive me if this is naive of me.

theon144 · 78 days ago

>Isn't the big problem that you have no idea what it's doing to your system?

As opposed to what exactly? Any other alternative, e.g. separate shell scripts, "npm run" scripts in package.json, running a Docker image, hell even cmake or other make-like tools - does stuff you don't know about without reading the files either.

aequitas · 78 days ago

With Docker at least everything is contained in the container. Which makes isolating and resetting environments a breeze. Something I worry about often is contaminating my system's 'state'. Which always leads to broken builds or incomplete build systems because a missing dependency is not spotted on your system because it was installed by some other tool some other time.

I tend to write my Makefiles to create as much of a local dev environment as possible for every project. Using Python virtualenv/Pipenv/Poetry, Ruby vendored dirs, custom Gopath per project (using direnv), etc. Most tools support some sort of isolation/localisation, but it's often just not on by default.

jrumbut · 78 days ago

I wish more tools did this, I almost always want a local, self-contained environment for everything. The few times I don't actively want I don't see much pain in having one. A couple minutes setup time, maybe?

I have seriously considered hiring someone to audit and prune all the random little libraries and tools I've installed over the years for that one-off time I had to process a weird file format or wanted to try something from HN.

pknopf · 77 days ago

To keep my system clean, I use Darch.

https://godarch.com/

Every boot is a fresh install. Any one-off only becomes persisted unless I add it to my recipes.

https://github.com/pauldotknopf/darch-recipes

alephu5 · 78 days ago

Maybe you'd like NixOS?

jrumbut · 78 days ago

I'm fascinated by NixOS and am following it, just haven't had a lot of time to dive in yet.

It does sound like the right idea. This is hypocritical as someone who doesn't use it but I hope more people use it.

logicallee · 78 days ago

(I was heavily downvoted). What I was thinking is: as opposed to just running your built-in package manager yourself, to upgrade your system to the latest version of all the packages it might require.

wtetzner · 78 days ago

Makefiles are used for more than package management. In fact, it doesn't seem very common to use them for package management. Maybe I'm missing something?

PyroLagus · 77 days ago

I think they were talking about projects that tell you to run `make install`, which I agree is less than ideal.

bch · 78 days ago

If you write your own Makefiles you know what they do. They’re not that hard to grok and even hand-rolled Makefile use is (IMO) underrated.

dnautics · 78 days ago

I always had trouble reading makefiles because the control flow is not very linear. At least with shell files basically everything is explicit.

sk5t · 78 days ago

The dependency graph is one of the better reasons to use makefiles--think of the nonlinearity as a bonus!

aequitas · 78 days ago

I often find the non linear way of working with Make an advantage. Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually. It also greatly improves code reuse as every target/recipe can be considered a function.

dnautics · 78 days ago

> Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually

At that point, I use a scripting -not shell- language (which is not as implicit)

I don't program in c anymore, so my major workflow is all in one language; I write my server in the same language that do s the compilation, which is in the same language that does utilities like creating network tunnels to my lab nodes

m463 · 78 days ago

I hate makefiles.

That said, I wholeheartedly agree with the comment.

It's sad that something so central to a project and so useful and important to so many people seems like it hasn't advanced ... ever.

Developers generally do the minimum with Makefiles and get out. They are similar to 1040 forms in popularity.

I've always had a dream of a redesigned "make" system... with import statements, object oriented rules, clear targets, rules and files clearly seperated, structure and organization... sigh.

aequitas · 78 days ago

I can agree with your sentiment. I often sought alternatives for Make because some things are just missing. However I always end up with a Makefile because Make is just so basic and ubiquitous. Any big change or alternative to Make and you will loose that. That's why I try to ovoid newer Make features. So for me Make not advancing is actually a feature. As it is one of the few things I can depend on to stay the same.

chriswarbo · 75 days ago

I find Nix to be a really nice alternative to Make; although it's also quite heavyweight and "invasive" (it's not just self-contained binary like `make`), so it's more a case of "I'm already using Nix to install dependencies, why not use it for orchestration too?"

alexhutcheson · 78 days ago

Bazel is pretty nice.

dnautics · 78 days ago

Bazel is great until you have to install tensorflow from source into a container and are sitting wondering why you have to put and configure a JVM inside a container-destined-to-be-static-binary, temporarily to get a python program with no Java bindings installed.

As I'm not the most intelligent developer, I'm sure there's a better, more sophisticated way to do this but I got really frustrated and gave up.

imtringued · 77 days ago

Why not use multiple stages in your docker file?

dwrodri · 78 days ago

I'm sure bazel is a good tool when it is used properly, but as a greenhorn in the tech field, the constant version mismatching between bazel and tensorflow can become quite the pain when you have to build tensorflow from source.

ptasci67 · 78 days ago

I will +1 that. I like Bazel a lot because it really forces you into a singular way of building a project which is clean and nice.

That said, mileage may vary. It was originally built by Google so it has quirks. I find it is best suited for projects with compiled dependencies and large repos.

Otherwise, I was going to add that Gradle as a build system is very advanced and improves upon make in many ways.

sanderjd · 77 days ago

Oh whoops, I basically wrote the same comment without reading yours. In any case, here here!

meuk · 78 days ago

> It's awesome that something so popular is so well engineered that it doesn't need changes.

There, fixed it for you. In particular, I'm glad that the OO cancer hasn't spread to something as basic as a build system.

m463 · 75 days ago

What I meant in this respect is that a lot of makefile rules have a lot of commonality - it would be great to inherit a workhorse rule and tweak an option instead of having to copy/paste a rule or twiddle makefile variables (which aren't normal programming language variables)

x0hm · 77 days ago

> Doesn't know how to write OO code. > Calls OO cancer.

uijl · 78 days ago

I really like this tutorial to get into Make files: https://swcarpentry.github.io/make-novice/.

d0mine · 79 days ago

fabfile.py (Fabric) could be used as a Makefile in Python. If you don't ever need to ssh to other machines ti run your tasks, you could use pyinvoke library directly (tasks.py). https://www.fabfile.org/

It is easy to add command line arguments to the tasks, configure them using files (json, yaml), environment variables, to split the task definitions into several modules/namespaces.

bb88 · 78 days ago

Having used fabric in the past, I've always found it just as easy to use a shell script and make files.

There's always some level of bootstrapping a project (installing packages/software, compiling libraries and dependencies) where it's easier to just to write a shell script than to program python to do. E.g. How do you get fabric installed on a system?

There's also been this longevity of sorts that Make seems to have gotten right. People just keep going back to it because it's simple.

theossuary · 78 days ago

I've been moving away from using shell scripts in a tools/ directory to using Python Invoke (http://www.pyinvoke.org), which is the library underlying fabric.

I used bash scripts for years, but for a lot of reasons made the switch:

- It was always painful to create small libraries of functions used across multiple scripts in a project

- It's difficult to consistently configure them with per-user settings. I've written bash implementations of settings management, Invoke handles this for me.

- I'd still have to reach for Python whenever I needed to do anything with arrays or dicts, etc.

- Getting error handling correct can be a chore

Invoke has a lot of nice to haves to:

- Autogenerated help output

- Autocomplete out of the box

- Very easy to add tasks, just a Python function

- Easy to run shell code when needed

- Very powerful config management when needed

- Supports namespacing, task dependencies, running multiple tasks at once and deduplicating them

It's not perfect, but it's a lot better than my hand rolled scripts were.

noobermin · 78 days ago

Groxx replies hits the point. I might work with a small number of platforms, but the "super-simple" qualifier is the point. The point at which you need dictionaries (associative arrays) in your install script, not to mention settings management beyond a make include is the point at which you've outgrown make.

Groxx · 78 days ago

it's also far, far, far easier to make it work predictably on multiple platforms. and easier to understand and change later. that can get nightmarishly hard in make/bash, once you go outside the super-simple realm.

anewhnaccount2 · 78 days ago

Snakemake is a quite nice improvement on make for data munging stuff.

kabacha · 78 days ago

Wow I've never seen such bloated python project before. It has 10 dependancies with 2 additional optional dependancies and the introduction/tutorial is absurdly overspecific.

The first example they use to describe the tool is:

> Cufflinks is a tool to assemble transcripts, calculate abundance and conduct a differential expression analysis on RNA-Seq data. his example shows how to create a typical Cufflinks workflow with Snakemake. It assumes that mapped RNA-Seq data for four samples 101-104 is given as bam files.

This is epitome of non-programmer programming, colour me disappointed.

anewhnaccount2 · 78 days ago

And yet it's still less crufty than the "by hackers for hackers" GNU Automake and less over engineered than the "made by real professional programmers at a real big tech company" Luigi. Would love the hear if you have any suggestions for actual alternatives for doing this type of automation beyond what make can neatly deal with rather than just going "eww.. it has dependencies"; "eww.. it's made by bioinformaticists".

pletnes · 79 days ago

One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.

deng · 78 days ago

> One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.

GNU Make even comes with a vcproj file for building a native binary with Visual Studio. Worked fine for me. Building it with Guile support though is difficult, but fortunately Eli Zaretskii provides native binaries through his ezwinports, and they worked pretty much flawlessly for me. Of course you will usually need a shell to execute recipes, but Make itself runs natively. For more information, see README.W32 in the sources.

vips7L · 78 days ago

Scoop has it in their repos as well (the gow package).

shadowfox · 78 days ago

There are a number of ports of GNU Make to Windows. MSYS2 [1], for examples, provides a reasonable development environment that includes Make.

If you just want a Make, there is [2] which can be installed separately and is part of the GNUWin32 collection.

[1] https://www.msys2.org/ [2] http://gnuwin32.sourceforge.net/packages/make.htm

aequitas · 78 days ago

Isn't non-native development on Windows a solved problem nowadays with WSL(2)?

murderfs · 78 days ago

WSL is currently horrendously (unusably, IMO) slow. WSL2 promises a 20x speed up, but it was already 100x slower than native Linux at some actually-realistic workloads that happen all the time when you're developing (e.g. `git grep`), so it's probably still too slow to be tolerable.

I had the opposite problem of wanting to develop some stuff for Windows from a Linux environment, and I settled on running a linux VM and copying binaries over by scping to WSL, which works reasonably well.

pletnes · 78 days ago

A nice thing with WSL is that you get working make and rsync. But I would like make for coding on native windows. Many FOSS projects use Makefile as the parent post described.

justincormack · 78 days ago

Windows does ship nmake but it is a little different.

quickthrower2 · 78 days ago

I develop on Windows, and I like make as a lazy default you can just type in and as long as you maintain the make file, it will build the thing.

It is also a nice document of what you can build and how.

I also like it because Netlify supports it, so you can get it to run make to deploy your site when you push a commit, giving you a lot of control about your CI, while keeping it simple.

trendoid · 78 days ago

Any good resources for learning about make files that you can recommend?

aequitas · 78 days ago

I don't really know a good all in manual, most thing about Make I learned over years of using it from different sources. And I still sometimes discover new features (and new ones are still added in recent release, but I tend to avoid them to keep Makefiles compatible on older systems).

But the Make manual is pretty comprehensive as a guide and reference: https://www.gnu.org/software/make/manual/make.html

Also (as with most things) knowing what name some concept has makes it easy to search for references. For example the terminology of rules (target, prerequisite, recipe): https://www.gnu.org/software/make/manual/make.html#Rule-Synt...

Things I tend to google often because I forget and some are used more often than others are: automatic variables, implicit rules, conditionals and functions.

One trick that really helps making Make complete is making your own pseudo state files and understanding the dependency system. One of the best features of Make is its dependency resolving. You generally write rules because you want a target (a file or directory) to be created, based on prerequisites (dependencies) according to a recipe (shell scripts). Make figures out that if the prerequisites didn't change, it doesn't need to run the recipe again and it will reuse the target. Greatly saving on build time.

Because Make relies on file timestamps to do its dependency resolving magic if you don't have a file there is not much Make can do. So what you can do instead is create a pseudo target output yourself. For example: https://github.com/aequitas/macos-menubar-wireguard/blob/mas... Here a linter check is run which creates no output. So instead a hidden file .check is created as target. Whenever the sources change the target is invalidated and Make will run this recipe again updating the timestamp of .check. Also note the prerequisites behind the pipe (order-only prerequisites). These don't count toward the timestamp checking, but only need to be there. Ideal for environment dependencies, like in this case the presence of the swiftlint executable.

gilmi · 78 days ago

Matt Might's article is really good:

http://matt.might.net/articles/intro-to-make/

messe · 78 days ago

Worth noting that that's an introduction to GNU make, which, while the most common implementation, isn't the only one out there.

deng · 78 days ago

The GNU Make manual is excellent.

For learning advanced techniques: "The GNU Make Book" by John Graham-Cumming.

antipaul · 78 days ago

https://github.com/pavopax/gists/blob/master/makefile-quick-...

ashton314 · 78 days ago

This video helped me:

https://www.youtube.com/watch?v=fkEz_oVh0B4

aequitas · 78 days ago

This is a nice video. The only thing I'm missing that should be covered imho (as you will encounter it even if you don't use it) is implicit/pattern rules: https://www.gnu.org/software/make/manual/html_node/Pattern-R...

snaky · 78 days ago

"GNU Make Book" by John Graham-Cumming

https://nostarch.com/gnumake

RangerScience · 78 days ago

We're doing this, and I mostly love it. I haven't found a great way to do code re-use across projects yet, and I'm not super happy with the Make function syntax (but, maybe if it needs a function, I should turn it into a shell script that itself is called by the Make command...).

All in all tho, it's a fantastic place to write down long CLI commands (ex: launching a dev docker container with the right networking and volume configurations) that you use a lot when working on the project.

Our Jenkins pipeline also relies on the Makefiles, literally just invoking `make release`, which is also pretty awesome.

aequitas · 78 days ago

When using it in multiple projects and CI you also tend to develop some kind of Developer-API with common commands/targets. No matter what kind of project you run you always use the same target names to get started. No remembering which tool is used for this lanuage, just clone it, run `make` and you're off, `make test` to test, etc.

Make does support includes (https://www.gnu.org/software/make/manual/html_node/Include.h...) which allow for some form of code reuse across projects. But then you encounter the balance between DRY and clarity. There are always exceptions, so you try to make stuff to universal, but then its hard to grok the code. And I feel that if I start to use functions I'm using Make wrong and that kind of logic better fits in shell scripts called from the Makefile. Makefiles (the way I use them at least) should be simple to read and explain themselves. But it's often hard to balance this with the features Make provides, like implicit rules and automatic variables. And if I ever turn to generating Makefiles (other than for C projects where it kind of expected) I will probably retire.

RangerScience · 75 days ago

> common commands

Oh absolutely. It's fantastic for that. Our build pipeline actually relies on that; every project has a "release" target that is basically for the CI to use.

> Make includes

Yeah, I looked into that, and I think I had the same conclusion.

> scripts called from the Makefile

That's what I'm thinking is the way to level up this kind of system. Although then, why have `make init` instead of just `./bin/init` ?

aequitas · 75 days ago

The biggest reason I use Make is the dependency resolving.

In the `make init` example. It doesn't matter how many intermediate steps are involved `init` is the end-state I want to achieve. So in most of my Makefiles the `init` target will fan-out into requirements as wide and deep as it needs, including running apt to install missing system dependencies. But then the good part. If a dependency is already fulfilled Make won't have to run it again. Although sometimes its hard or clunky to convert some dependencies into 'files' so Make can do its dependency resolving work properly.

ddebernardy · 78 days ago

Have you ever considered using Rakefiles instead?

https://github.com/ruby/rake

aequitas · 78 days ago

Never wrote them myself but have encountered them sometimes. Had no major issues with them then I believe. However I would probably write a Makefile to manage the Ruby environment and install Rake as they don't come installed by default.

novaleaf · 78 days ago

self taught dev here too. I have never used make files, but pretty sure I'm using NodeJs in a similar role. I use it to automate all my "scripting", including deploying of my SaaS product to the cloud and running unit tests.

If it sounds interesting, check out https://www.npmjs.com/package/shelljs

PS: I do my primary development on windows, but my production environment is ubuntu. node apps "just work" on both environments. truely cross platform.

lowlevel · 77 days ago

I second this... a lot of times, broken make files are standing between you and victory, so it would be good to at least have some familiarity with them.

kitd · 78 days ago

I put Make in the same class as Vi. I hate using them but I have to learn them because they're the least of N evils, the most pragmatic way out of a hole.

escrichov · 78 days ago

I also use similar Makefiles in my projects. I use "make release" to generate the docker container.

sanderjd · 77 days ago

I love Make in concept and kind of hate it in practice. There is sooo much incidental complexity and so many warts to work around. I think it's a concept that is ripe for a new approach that thoughtfully keeps the good, ditches the bad, and maybe even adds some useful capabilities that aren't already there.

But of course I'm immediately skeptical of this idea a la https://xkcd.com/927/ (Standards). For instance, maybe this is what npm and all the rest thought they were doing. Certainly Rake in the ruby world tried to do this, and I never really liked it, so clearly they missed the mark somehow, at least for me. But then when I feel discouraged about the ability to improve on things, I think about how I felt this way when I first heard about Git. Why would you implement a new source control system when we already have subversion? Sure, svn has its frustrations and warts, but this new thing is just gonna have its own frustrations and warts and now we'll just have another frustrating warty thing and we haven't really gained anything. And this is totally true! Git is super frustrating and warty. Except that it's also way better than subversion, much faster and far more flexible. It was a revelation when I started using it. So I think back to Linus when he was thinking about creating git and think that he probably didn't have this discouraged uncertainty about improving things; he just had ideas for a better way and he went out and did it. (And yes, I know it was influenced by bitkeeper and other DVCs exist, so it's not like he invented the concept, but my point stands.)

So maybe someone could make a better Make?

majkinetor · 75 days ago

On Windows there is great Powershell module Invoke-Build.

noobermin · 78 days ago

Makefiles are so old and quaint, why not use "{flavorofthemonth}".format(flavorofthemonth=np.random.choice(frameworks)) ?

Robin_Message · 79 days ago

Read the curriculum of an undergraduate computer science course and read up on the things you haven't heard of. Some courses will even have lecture notes available.

E.g. these four pages are the university of Cambridge masters in computer science:

https://www.cl.cam.ac.uk/teaching/1819/part1a-75.html

https://www.cl.cam.ac.uk/teaching/1819/part1b-75.html

https://www.cl.cam.ac.uk/teaching/1819/part2-75.html

https://www.cl.cam.ac.uk/teaching/1819/part3.html

(Or a MOOC, but the links above are easy to browse text, syllabuses and lecture notes, not a load of videos.)

learc83 · 78 days ago

I support this 100%. I worked for years as a self-taught programmer. When I went back for my CS degree, I was shocked at how much I didn't know that I didn't know.

Numerous times I'd be sitting in a class and we'd go over a solution to some theoretical problem, and I'd realize that this solved a problem that had taken me days to discover on my own (and this solution was usually better than what I'd come up with).

If you are the kind of person who can work through everything on your own (including what may seem like the the boring parts), I highly recommend doing so.

147 · 78 days ago

Could you give an example of one of the times something theoretical helped you solve a real world problem?

I've thought about going back for my CS degree a lot but can't really justify the cost and time investment vs self teaching. But it's something that's always been in the back of my mind.

phaedrus · 77 days ago

Not the OP, but I too was a self-taught programmer as a teen who got a CS degree in my 20s. I independently came up with the idea binary search in sorted data structures. But the first time I encountered hash tables in the course of getting my CS degree my reaction was "That's impossible! You can't get O(1) efficiency!"

(Sadly though this exposure was not in the context of a theoretical course on data structures, but rather in the context of reading the docs for HashMap as my university dropped older courses and languages to jump on the bandwagon of becoming a "Java school".)

learc83 · 77 days ago

Sure. It's been a while, but the first one is that comes to mind is when we went over Floyd's Tortoise and Hare cycle detection algorithm. I realized it was a much cleaner solution to detecting cycles in a linked list than a solution I developed on my own over several days.

Another example: the automata class I took went over pushdown automata, and I immediately saw that it would solve the issues I'd been having with a finite state machine I was using to handle input for a game.

Oh and recently I needed to put different sections of a screen on different layers so that no 2 adjacent sections were on the same layer. I realized that this was basically just graph coloring, so I was able to find a solution in minutes instead of hours.

I'm sure their are people who can get through most of a CS curriculum on their own, but I'm not that disciplined. I've also never met anyone who was. It has been immensely helpful.

JoeMalt · 78 days ago

To clarify: the first three links are for each year of the (three-year) undergrad program, the fourth is for the Masters.

The Cambridge course isn't perfect, but they do a very good job of making as much teaching material as possible publicly available.

saagarjha · 78 days ago

FWIW, I've found many undergraduate computer science courses to lag behind on tooling, so take the recommendations they have with a grain of salt.

fredley · 78 days ago

The Cambridge course is much more theoretical than most others, afaik. Tooling on programming language semantics, for example, doesn't change that much.

samrat · 78 days ago

Do Cambridge courses not have labs/projects? I looked at the course materials on a few of the courses and couldn't find any. Or are they given out to students separately?

longer_arms · 78 days ago

There are hardware and software labs, which are administered on paper by PhD students. These include(d): ML (the functional programming language), FPGA/soft core development, Java tasks, breadboarding some logic, prolog and probably some different ones now (looks like some machine learning tasks?). Some of them are referenced and described on the links above. There's also a group project in year 2, a dissertation individual project in year 3, and a small holiday project between 1 and 2. Overall, a few students get through it without being able to properly program, but most basically self teach.

superlopuh · 78 days ago

There is a system of supervisions, that is a bit like doing homework and going over it in a private (1/2/3 students to one prof) lesson once every two weeks. Sometimes the questions would be standard for a course, sometimes the professors chose their own. They are not necessarily directly tied to the course as lectured.

rat9988 · 78 days ago

Thank you very much. This is very valuable.

hackerc · 78 days ago

Does an offline copy of this exist? Do you think it will go down when the term probably ends soon?

mwilliamson · 78 days ago

I would expect the material stay up: at the moment, everything back to 1998/1999 is still accessible:

https://www.cl.cam.ac.uk/teaching/material.html

samwyse · 78 days ago

Do you know about curl/wget? Each one does pretty much the same thing as the other, but you can start a religious war by suggesting that one is preferable.

Anyway, either of them will let you mirror a website so you never have to worry about it going down.

bigiain · 78 days ago

And, since every Unix command line tool inevitably gets mined and turned into a web service, you could always submit those urls to archive.org instead of or as well as curl-ing/wget-ing them.

voltagex_ · 78 days ago

https://github.com/ArchiveTeam/grab-site if you're super serious. Also archive.org will probably accept those output warcs.

jtolmar · 78 days ago

1. Profiler. There's a standard tool that tells you what part of your code is slow. Over half the time it'll find something dumb and easy to fix instead of whatever you expected.

2. SQL / relational database schemas. Persistence opens up a lot of capabilities. And databases themselves are very well-optimized; if you do any nontrivial data manipulation it's likely that whatever the query planner comes up with will be faster than your first idea of how to do it by hand.

3. Graph searches. An awful lot of problems can be solved by knowing how to turn problem into a graph search. Make sure not to fall into the trap of thinking a graph search is limited to paths through space - you can solve problems like "get through this dungeon with keys and doors" by adding duplicate nodes for the different states.

4. Sequential Bayesian Filters. Are almost as useful as graphs, but aren't in a standard CS curriculum so you'll look like a wizard. These solve the problem of "I want to know a thing and I know how it changes over time, but I only can get rough estimates for its current state." Kalman Filters are simple and give great results when applicable. Particle Filters have lower quality but are applicable to more problems and dirt simple to code.

rassibassi · 78 days ago

Support for 4! Yet, my understanding is that particle filters are superior but computational more demanding. For nonlinear problems, the extended Kalman Filter linearizes the task, whereas particle filters don't and work with many point estimates instead.

I loved this book: https://users.aalto.fi/~ssarkka/pub/cup_book_online_20131111...

and also Thomas Schoen group does great work on Sequential Monte Carlo (SMC), MCMC for sequential data :) http://user.it.uu.se/~thosc112/index.html

They are also building a probabilistic programming language for sequential data! https://github.com/lawmurray/Birch

jtolmar · 77 days ago

Regular old Kalman Filters are the best (literally perfect) when your problem fits all their requirements. They also have a lot of nice properties if you're dealing with a problem that mostly fits their requirements. But the linear-gaussian requirement is pretty steep, they don't always work.

I don't like the EKF much and prefer the UKF. The core filtering code is a little more complex but they're much easier to actually work with; you can give them arbitrary functions like a particle filter.

Particle filters have the advantage of being able to handle arbitrarily wacky distributions. But they are random and do some wacky things in edge cases. They'll behave much more poorly in low-evidence situations than other filters will. And they fall over spectacularly if you switch from low-evidence to high-evidence (there's a workaround for this but it's still counterintuitive). Finally they're just more computationally expensive than the others.

Birch sounds interesting, I'll take a look.

novaleaf · 78 days ago

strongly agree on profiler and SQL.

Horror story re SQL: in my SaaS I skipped sql and went with cloud datastore (NoSql) and regret it. basically (to simplify) you can't query your data without doing a full table scan (IE Slow).

alex_57_dieck · 77 days ago

NoSql is not no sql though..

gwbas1c · 78 days ago

Unit testing, mocking, and various other testing techniques.

Why? Any project of sufficient complexity is very hard to test. If all you're doing is code -> build -> run to debug your code, you can very easily break something that's not in your immediate attention.

The problem is that good unit testing is hard, and time consuming. It can be so time consuming, that unless you can really plan in advance how you test, you could spend more time writing test code than real application code. (This is what happens when writing professional, industrial-strength code.)

So, when a hobby project becomes sufficiently interesting enough; such that the code will be so complicated that your code -> build -> run loop won't hit most of your code, you should think about how to have automated tests. They don't have to be "pure, by the book" unit tests, but they should be an approach that can hit most of your program in an automated manner.

You don't need to do "pure" mocking either. If you're writing something that calls a webservice, you could write a mock webserver and redirect your program to it. If you're writing something that works with pipes, you could have a set of known files with known results, and always compare them.

The goal is that you should cover most of your program with code -> build -> tests; and only do code -> build -> run for experimentation.

wpietri · 78 days ago

Let me second this. And in particular, I strongly encourage every developer to try starting a new project in a test-driven fashion (by which I mean that you advance the code by writing a bit of test and making it pass, and then doing that over and over.)

There's a qualitative difference between working in a well-tested code base that's very hard to describe convincingly. A lot of my early development experience was in code bases that had little or no testing. Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.

mooreds · 78 days ago

> Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.

I had the luxury of taking a well known data process and rewriting it with integration tests (input in, matching output with a golden file). It changed my professional life. Whereas before our deployment process included a 3 day wait and manual data checking on stage, after I was able to do deploys multiple times a day with confidence.

Made a believer out of me.

JamesBarney · 78 days ago

Unit tests can give you false positives (test failed but code is correct) and false negatives (test passed but code failed).

And TDD seems to create so many tests that you get huge false positive rates. I recently jumped on a project and I made a couple of fairly small code changes (a couple of hours) which caused 100 tests to fail. I then spent the next two days going through and correcting all 100 tests none of which found an issue in my code.

wpietri · 78 days ago

If you're saying that it's possible to do testing badly, I agree, just like it's possible to write production code badly. Sometimes teams new to unit testing do it ritualistically, without really understanding the purpose. That can lead to all sorts of bad outcomes. E.g., lots of tests that look impressive and even generate good coverage numbers, but don't really test what matters. Or tests that are highly duplicative, such that changing one thing in the code requires changing a lot of things in tests.

I have definitely dealt with code bases like that, and that sucks. But I have also dealt with code bases where the tests were great, and that's an amazing experience.

To do TDD well, I think it's important to release early and often and to reflect on one's experience (e.g., with weekly team retrospectives). That way if people are doing something unhelpful, like writing very duplicative tests, pretty soon they'll become an impediment to progress. The team will learn to write the useful tests, while skipping the ones that might fit some hypothetical pattern. It also helps people learn to design for testability; often, painful tests are a sign of bad design of the production code.

andrei_says_ · 77 days ago

What are some resources for “good testing”, test boundaries, and possibly antipatterns?

(Ruby, Rails)

JamesBarney · 77 days ago

I've read a couple TDD books and this definitely seems to be a big blind spot. How to deal with the maintenance issues of unit tests.

They all seem a little fanatical in their pro unit test talk and don't discuss the downsides.

chriswarbo · 75 days ago

I find https://www.youtube.com/watch?v=EZ05e7EMOLM describes my own experiences with automated testing quite well.

tl;dr:

- Focus on "automated testing", don't get obsessed with philosophising about "the true nature of a 'unit'", or other such dogma.

- Be empirical: base your rules on what works; don't base your work on "the rules".

- The goal of testing is to expose problems in our program: "test failure" is a success, because we've found a problem (even if that problem is with the test!). Anything else is secondary (e.g. isolating the location of failures, documenting our API, etc.). Avoiding this goal defeats the point (e.g. choosing to ignore edge cases).

- Focus on functionality rather than implementation details, e.g. 'changing a user's email address' rather than 'the setEmail method of the User class'. This improves reliability and makes failures more useful/meaningful (i.e. "this feature broke" vs "this calling convention has changed").

- Mocking is a crutch: it works-around problems that can usually be avoided entirely during design; it can still be very useful when a design can't be changed (e.g. adding tests to a legacy system).

- Testing a real thing is objectively better than testing a fake thing; we should only mock if testing the real thing is unacceptable.

- If two components always exist together, pretending that they're independent is a waste of time and complexity.

- Having some poor tests is better than having no tests. Tests can be added, removed and improved over time, just like anything else.

- "Property checking" is a quick way to find edge-cases and scenarios we wouldn't have thought of.

- Fast feedback loops are important. Reducing I/O and favouring pure calculation usually speeds up testing more than reducing the number or size of tests (e.g. "unit" vs "end-to-end"). Incidentially, this is also how we avoid having to mock.

nostrebored · 78 days ago

The type of engineers who would screw up 100 unit tests independently are exactly the kind of engineers who should be forced to write tests for their code. Can you imagine the integration tests had they not been doing any testing at all?

ryanmclovin · 78 days ago

Does that indicate that the tests were not written correctly in the first place?

JamesBarney · 77 days ago

I don't think so. They probably could have been written better, they weren't written poorly, but it's really hard to write 200 unit tests for a feature that don't break when the feature is updated.

lenticular · 78 days ago

This is the gospel truth. It does take discipline though, because writing tests sucks. I like to have a policy of never committing a non-trivial function without a test. That way, I can never put it off and wind up with a huge chunk of untested code.

epiphanitus · 78 days ago

Are there any resources out there you would recommend for learning testing techniques in a Python context?

sn9 · 78 days ago

TDD with Python: https://www.obeythetestinggoat.com/

heyoni · 78 days ago

Is that useful if you’re never writing django apps?

a_c · 78 days ago

It depends on your background. Having written web app before let's you quick grasp the ideas laid in the book.

To me, the most important chapters are

- https://www.obeythetestinggoat.com/book/chapter_mocking.html - https://www.obeythetestinggoat.com/book/chapter_purist_unit_...

Having said that, the concepts are universal.

bklaasen · 77 days ago

Brian Okken's "Python Testing with pytest"[1]. More recent than Harry Percival's book.

[1] https://pragprog.com/book/bopytest/python-testing-with-pytes...

crimsonalucard · 78 days ago

You are completely wrong.

Mocking is a huge design smell. The more mocks or integration tests your projects requires to get full coverage the less modular your program is. A program that uses many mocks is a sign of very very poor design. You will find the code more complex to reason about and much harder to reuse code without necessitating a lot of glue code to make things work together. Without proper knowledge you won't even know the program is poorly designed.

I will grant you that 90% of programmers out there don't know how to design programs in a truly modular way, so most engineering projects will require extensive mocking. In fact most engineers can go through their entire career without knowing that they are making their programs more complex and less modular then it needs to be. Following certain design principles I have seen incredibly complex projects require nearly zero mocking (very very rare though).

Mocking indicates a module is dependent on something. Dependency is different from composition.

     Dependencies                                Composition


           C                                        C
 +---------------------+
 |                     |       +----------------+       +-----------------+
 |     A               |       |                |       |                 |
 |                     |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 |        |          | |    in |                |       |                 |  out
 |        |          | |    -->+       A        +------>+         B       +-->
 |        |    B     | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 +---------------------+       +----------------+       +-----------------+

What's going on here? Both examples involve the creation of module C from A and B.

left: 'A' exists as wrapper code around B and is useless on its own. To unit test A you must mock B.

right: every module is reuseable on its own. Nothing needs to be mocked during unit testing. No dependencies.

The only exception to the right example where you MUST mock is a function that does IO. IO functions cannot be unit tested period, they can only be tested with integration tests.

There's a name for the left approach. It's called Object oriented programming using inheritance or composition(the oop version of composition; not functional composition) as a design pattern. (both are bad)

There's also a name for the right approach. It's called functional programming using function composition.

I don't advocate that you strictly follow either style. Just know that when you go left you lose modularity and when you go right you gain it. All functional programming does is force your entire program to be modular down to the smallest primitive unit. Extensive mocking in your program means you went too far to the left.

tangent: Another irony around this world is that a lot of functional programmers (javascript and react developers especially) don't even know about the primary benefit of functional programming. They harp about things like "immutability" or how its more convenient to write a map reduce rather than a for loop without truly ever knowing the real benefits of the style. They're just following the latest buzzword.

tra3 · 77 days ago

Forgive me, if I'm being dense, but doesn't either of these cases depend on how the composed objects are being used?

In your functional example A is an input to B (or vice versa?), how do you propose testing one of the modules without first instantiating the other one?

crimsonalucard · 77 days ago

I'll give you two examples. One functional and the other OOP. Both programs aim to simulate driving given an input of 10 energy units to find the final output energy.

  #oop

 engine = Engine(10)
 car = Car(engine)
 car.drive() #result 8

  class Car:
    def __init__(self, engine):
     self.engine = engine

    def ignite(self):
     self.engine.energy =- 1

    def run(self):
     self.engine.energy =- 1

    def drive(self):
     self.ignite()
     self.run()
     return self.engine.energy

 class Engine:
  def __init__(self, energy):
   self.energy = energy


 # ignite not testable without engine
 # run not testable without engine
 # drive not testable without engine and a car
 # ignite, run, and drive are not modular cannot be used without engine. 
 # engine testable with any integer. 
 # Car useless without engine
 # engine useless without car




 #functional \
 def composeAnyFunctions(a,b):# returns function C from A and B. See illustration above. 
  return lambda x: a(b(x)) 


 def ignite(total_energy):
  return total_energy - 1

 def run(total_energy):
  return total_energy -1 

 drive = composeAnyFunctions(run, ignite)
 drive(10) #result 8

 # compose testable with any pair of functions
 # run testable with any integer
 # ignite testable with any integer
 # drive testable with any integer
 # all functions importable and reuseable with zero dependencies. 
 # input_energy -> ignite -> run -> output_energy

"I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong

you don't necessarily need the car or engine to simulate the energy output of driving.

profalseidol · 77 days ago

I've been using static methods in java that follows the pure function way, it proved very easy to maintain even to those who inherited my code later on.

crimsonalucard · 77 days ago

That's mainly just namespacing. The only point to use an Object in object oriented programming is to unionize objects and state. To combine them together into a single primitive. This combination breaks compose-ability.

Static functions avoid state. You put them in an object in java because java has to have everything in an object. In any other language these would just be top level functions namespaced into a package or something. You are basically using java in a more functional way. Which is fine.

tra3 · 76 days ago

Thank you so much for a concrete example. I need to think about this some more. Clearly the code make sense, but in a wider context, can you have a banana without a jungle? I'm dabbling with some functional programming but I definitely have more experience with oop, so what you're saying is difficult for me to grasp, but the benefits are hard to ignore.

crimsonalucard · 76 days ago

There are downsides to FP as well. I am not advocating one over the other. But there is a concrete theoretical reason why FP is more modular, reuseable and organized than OOP code.

Smalltalk is possibly the only OOP language that lets objects be compose-able and modular. Check out Pharo if you're interested. If you learn smalltalk well enough, you could apply its principles to traditional OOP languages and gain the modularity benefits.

profalseidol · 69 days ago

> There are downsides to FP as well.

I've seen Java 8 functional stuff get unreadable.

But other than that, is there any other concrete downside?

For using Pure Functions, I don't see any downside to this. Aside from it being impossible to use for outside the program side-effects like IO to device.

elsurudo · 77 days ago

I mostly agree with what you're saying, but I will add that is is also possible to write well-designed, modular, easy-to-test (minimal mocks) OOP code. It does provide more guns to shoot yourself in the foot with, I will admit.

crimsonalucard · 77 days ago

Yes you are correct. Check out smalltalk, it fits the paradigm you describe. It was actually rated the most productive programming language in the world according to namcook. Ironically, it's Definitely one of the least popular languages as well.

richardhod · 77 days ago

Your argument appears to be, in TL;DR form: OOP and dependencies are bad and wrong, you must use Functional Programming or you will be wrong.

Isn't that a little extreme?

crimsonalucard · 77 days ago

No. You are putting words in my mouth and accusing me of being extreme. I am NOT promoting one paradigm over the other. TLDR? I hope you read my stuff. I find it rude if someone just comments with a one liner and summarizes everything I said into a catchphrase that is a perversion of the truth. I feel like a presidential candidate.

Anyway, this is what I am saying:

If you use functional programming your code will be more modular and reusable because the paradigm forces you to be that way.

If you use Object Oriented Programming your program will automatically be less reusable and less modular but more object oriented.

This is all I am saying. Your mistaken statement that I am promoting one style over the other is based off of this assumption: Modular programs are better than less modular programs. This is not True.

Something like a physics engine is a better fit for OOP then it is for functional. Although your program will be less modular as a result, OOP is still a better fit because physical objects are easily modelled with OOP objects.

Trees, graphs and algorithms involving things of that nature are a better fit for objected oriented programming then functional because many of these algorithms involve mutating nodes. Again, if you follow this style your program will become less modular overall as a result.

The ideal program is one that spans the spectrum of both OOP and functional. When it calls for it use functional or OOP depending on context. Overall for complex web applications that most startups make, in my opinion, the program should be more functional then it is OOP. A web request is basically a function that takes in a request as an input and outputs a response. The form factor of a function better fit for this, and you get high modularity as a side benefit. There is no point in simulating the request/response paradigm in a stateful Object while losing modularity in the process.

For a game. OOP is better in my opinion. Gaming entities involve constant mutation of things with state so OOP is a better fit. UI is a better fit for OOP as widgets are better represented by objects (FRP aka react&redux, imo works well but is an awkward abstraction)

There is one exception to this rule. In general Objects in object oriented programming are not compose-able. However, Smalltalk is an object oriented language where objects ARE compose-able. Smalltalk is the language that coined the term "object oriented" and although it is no longer popular as it was before it is still a very robust language and learning from it has huge benefits.

richardhod · 76 days ago

Thank you for the clarification!

kyllo · 79 days ago

Learning in-depth your various options for persisting data, is very useful since most applications have to deal with persistence in some form, and increasingly in a distributed manner. Go beyond simply skimming the surface of SQL vs. NoSQL and the marketing claims different databases make about their scalability and consistency. Learn what ACID and CAP stand for and the tradeoffs involved in different persistence strategies. Learn SQL really well. Learn how to read a query plan, which is the algorithm your SQL query gets compiled into. Learn about the tradeoffs of row-based vs column-based storage. Learn how indexes work, and what a B-tree is. Learn the MapReduce pattern. Think about the tradeoffs between sending code to run everywhere your data is stored vs. moving your data to where your code is running.

theossuary · 79 days ago

Two great resources I've been going through are

- https://dataintensive.net - Really deep dives into different types of data storage solutions, their history, and how they actually work.

- http://www.cattell.net/datastores/Datastores.pdf - Good paper that helps differentiate similar but different datastores. Really helpful when you're trying to pick a modern data solution.

barbecue_sauce · 78 days ago

Designing Data-Intensive Applications is probably the best O'Reilly (if not overall technology) book of the past decade.

btown · 78 days ago

The talk on "Turning the database inside-out" [0][1] by the author, Martin Kleppmann, is a fantastic intro to these dynamics, and it's something I'll always recommend to both experienced and inexperienced data modelers and backend developers.

It goes pedagogically through the way things are typically done in a relational database in such a clear way that word-for-word it's one of the best tutorials I've seen... but it also weaves a narrative of "how can this be done better/more scalably/more reliably/more flexibly-to-business-needs" in pointing to a streaming/event-sourcing architecture. You may or not need the latter right away, but it's a fantastic tool to have in your toolbox to be able to say "ah, this new requirement feels like it would benefit hugely from this architecture."

Especially for OP who's starting to think about the "why" of messaging queues, this could be a fantastically valuable first step.

[0] https://www.youtube.com/watch?v=fU9hR3kiOK0

[1] https://www.confluent.io/blog/turning-the-database-inside-ou...

hypertexthero · 78 days ago

Another good resource that guides one through both the philosophy (why) and technical details (how) of building a web application is Software Engineering for Internet Applications:

https://philip.greenspun.com/seia/

jimpudar · 78 days ago

Learning how to use dtrace / bpftrace [0] is very valuable if you ever need to get into serious systems profiling.

There are some really cool data structures out there you might not know about. One of my favorite basic ones that I get a lot of use out of is the trie [1] (a.k.a. prefix tree). Very useful for IP calculations.

Also look into probabilistic data structures [2], very amazing things can be done with them.

[0] https://en.wikipedia.org/wiki/DTrace

[1] https://en.wikipedia.org/wiki/Trie

[2] https://en.wikipedia.org/wiki/Category:Probabilistic_data_st...

cryptonector · 78 days ago

DTrace is life-altering.

I keep hoping that someone will build a dtrace(1) CLI that transpiles to bpftrace.

stcredzero · 78 days ago

Profiling, period

Const-me · 78 days ago

Approximate Windows equivalent of DTrace is sysinternal process monitor, freeware. Very useful sometimes.

Birch-san · 78 days ago

The Windows equivalent of DTrace is.. DTrace. [0] DTrace is about far, far more than snooping the filesystem. At best, Process Monitor is an equivalent of Brendan Gregg's DTrace utility, opensnoop. The true power of DTrace is to correlate events across subsystem boundaries. Like, graphing the top quartile of latencies from network acceses initiated via a given function in your application.

[0] https://techcommunity.microsoft.com/t5/Windows-Kernel-Intern...

sammnaser · 78 days ago

Bloom Filters are awesome.

nickjj · 78 days ago

Shell scripting for processing text. You can often get so much done with so little code and effort.

Also on a semi-related note, I think as a self taught programmer, it's easy to get stuck on things that seem cool but are just procrastination enablers (I know, I've been guilty of it for 20 years). Like, if you're about to start a new project and you want to flesh out what it's about, you really don't need to spend 5 hours researching which mind map tool to use. Just open a text document and start writing, or get a piece of paper and a pen. It won't even take that long.

I spent about 1.5 hours the other day planning a substantially sized web app. All I did was open a text file and type what came into my head. For fun I decided to record the whole process too[0]. I wish more people recorded their process for things like that because I find the journey more interesting than the destination most of the time. Like your journey of eventually finding message queues must have been quite fun and you probably learned a ton (after all, it lead you to message queues, so it was certainly time well spent).

[0]: https://nickjanetakis.com/blog/live-demo-of-planning-a-real-...

slightwinder · 78 days ago

These days it might be better to just learn python. It's cleaner and scales better to complex code. And it's ons most system modern systems available out of the box where shells are available too. Shells are still good for simple oneliners, and knoting multiple processes together, but text-processing involves so many different commands, each with their own quirks, that a consistent simple language is IMHO superiour.

nickjj · 77 days ago

For processing text, using Python doesn't really make sense in a ton of cases.

If I want to search a text file for a specific string, why wouldn't I just use `grep "hello" myfile.text`, or if I wanted to do it on a directory of text it's a minor change of `grep -R "hello" .`.

Why would I go through the trouble of opening a Python interpreter, or writing out a Python script to do the equivalent in Python?

Or if I wanted to grab the third column of a CSV, I would for sure just use the `cut` command or maybe `awk` (depending on what I'm doing).

For more complex parsing you can often pipe together a few commands and maybe convert it into a 5 line Bash script to make it a little easier to create variables, etc.. It becomes something you can whip up in 1 minute.

Then there's also more involved text parsing that doesn't require piping a bunch of commands together or shell script glue, in which case it comes back to using grep with its various flags and potentially a regexp. It's a natural fit for the problem and you can iterate on it so quickly.

chaostheory · 77 days ago

Honestly this applies to Ruby & Javascript/Typescript as well and not just Python. I really don't see the value of learning shell scripting anymore when the newer languages are just was easy to learn, terse, and you can adapt better to changing conditions when needed with libraries.

latexr · 77 days ago

I often find multi-line Python scripts with `import os` and others that could be a fraction in size (and just as clear) in bash. Even more ridiculous are the times I find a node script (published to npm, even) that is little more than a wrapper on a shell script.

Inevitably someone will read these arguments and think “those are just bad programmers”, but your point was that you “don't see the value of learning shell scripting”. The value is in not spewing absurd code like that. Shell commands are fast and efficient. There isn’t an emphasis on libraries because instead you use tools. Is `grep` not enough? Try `the silver searcher`[1] or `ripgrep`[2].

Are shell scripts the best instrument for every job? No, but no tool is.

[1]: https://github.com/ggreer/the_silver_searcher

[2]: https://github.com/BurntSushi/ripgrep

chaostheory · 76 days ago

Just because someone knows shell scripting I'm not going to consider them "bad programmers". I know it myself and I've used it extensively before I learned python & ruby.

My point is that the cost for learning and using shell scripts is just too high compared to just using a modern language that's just as terse and a lot more powerful and flexible. Context switching from one language to another isn't free either.

imo the only time shell scripting was practical was when the only major programming languages were C, C++, and Java. imo even Perl5 is more practical than shell scripting.

Also I doubt that the python program you mentioned was that much bigger than a shell script

Draiken · 78 days ago

This 1000x. I put of just getting by with shell script for some years and when I finally decided to get deeper into it, it's magical.

A good series of piped commands with tools available basically everywhere can solve problems you had no idea could be so simple to solve.

0x262d · 78 days ago

have any good resources for this?

bromonkey · 78 days ago

  man bash  
  man awk  
  man sed  
  man grep

You can also do most of this stuff with Perl one liners if that suits your fancy.

sed + awk - https://www.amazon.com/sed-awk-Dale-Dougherty/dp/1565922255

awk - https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoI...

general *nix text processing - https://www.tldp.org/LDP/abs/html/textproc.html

fancyfish · 78 days ago

Perhaps these are a good start:

awk: - GNU Awk User Guide - https://www.gnu.org/software/gawk/manual/gawk.html#Getting-S...

- Grymoire guide - http://www.grymoire.com/Unix/Awk.html

sed: - Grymoire guide - https://www.grymoire.com/Unix/Sed.html - Official docs - http://sed.sourceforge.net/#docs

markvdb · 78 days ago

The IBM developerworks articles about this are old (2000!), but still incredibly useful and well written. Start here:

https://www.ibm.com/developerworks/library/l-sed1/index.html https://developer.ibm.com/tutorials/l-awk1/

a_bonobo · 78 days ago

There's a free O'Reilly book from last year: https://www.datascienceatthecommandline.com/

awk is pure magic.

m463 · 78 days ago

Also, just get to know your shell.

big timesaver: control-r

basic but useful all over:

    for i in *.c; do cp "$i" "$i.bak"; done

etc...

hi41 · 78 days ago

What does double quotes dollar do in the command?

akx · 78 days ago

Ensures filenames with spaces in them get passed as a single argument, instead of being inadvertently expanded into several.

Quoting and expansion issues are a pain in shell languages...

MHP47 · 78 days ago

$i is the variable i declared in the for loop. Quotes just wraps it, so that it's (somewhat) safe if the file has a space in the name

majewsky · 78 days ago

What do you mean, "somewhat"? This looks safe for all I can tell.

nickjj · 78 days ago

You know how you should almost always eat your vegetables?

You should almost always quote your Bash variables too. Here's why: https://nickjanetakis.com/blog/here-is-why-you-should-quote-...

aoeusnth1 · 75 days ago

It prevents filenames with spaces from expanding into two arguments to the command.

chriswarbo · 75 days ago

I'd split your first point in two:

- Shell scripting for running commands and managing files

- Unix utilities for processing text

These happen to complement each other nicely. It's not that bash is better at manipulating text than Python, it's that Python makes it painful to invoke commands (like those Unix utilities) and pipe data between them (e.g. https://news.ycombinator.com/item?id=17733865 ).

m_fayer · 79 days ago

https://dataintensive.net/

I can't recommend this book enough. I have a CS background, and still had quite a few "I can't believe this thing has been hiding in plain sight!" moments while reading it.

redisman · 79 days ago

It's great. Incredibly dense with useful information and it just blows my mind how much knowledge Martin has about the topic. I recommend watching this talk from him to give a little glimpse of the book: https://www.youtube.com/watch?v=5ZjhNTM8XU8 This is just about a little part of one of the chapters.

mirekrusin · 78 days ago

Oh man, this is good, thank you.

copperx · 79 days ago

I'm now torn between reading this one first or the Architecture of Enterprise Applications.

jb3689 · 78 days ago

I loved Designing Data-Intensive Applications. It gives you the reasons why NoSQL databases exist and the problems they solve. Moreover it gives you reasons to select one over another. It's really excellent and one of my top two CS books

mleonard · 78 days ago

Your other top CS book out of interest?

politician · 79 days ago

If it helps, IMO "Designing Data-Intensive Applications" is a better bang-for-the-buck. Enterprise-scale applications are a world unto themselves.

copperx · 78 days ago

Edit: I meant Patterns of Enterprise Application Architecture by Fowler in my comment above. Recommended by DHH.

louthy · 78 days ago

My advice would be to skip it completely. It's just packed full of standard GoF OO dogma.

copperx · 78 days ago

Thanks. So, what you're saying it is redundant if you've read GoF?

splittingTimes · 78 days ago

But this is mainly for distributed (web) systems, right?

Are there good books for data intensive desktop apps? Like games or CAD design tools?

nerpderp82 · 79 days ago

Debuggers and property based testing. It is a select few people that can actually productively (not their own metrics) use print statements for debugging. Learning how to craft repro scenarios and adequately capturing state in a debugging session can enable junior devs to easily surpass senior devs.

Property based testing aren't quite formal methods, but I think they are a good stepping stone. And they also somewhat force your code into an algebraic/functional style which also make it amenable to refactoring, better testing and is easier to understand.

Design tools like Swagger can help one think through services w/o diving into code. Code itself is a liability and should be thought of as "spending" not creating. Code is a debt.

Refactoring and code understanding tools, if you use PyCharm (you should, it is free in all senses), learn how to navigate into your libraries. Read you libraries.

marktangotango · 78 days ago

This x1000 debuggers are seriously undervalued by many developers. It’s like a super power.

thetwentyone · 78 days ago

What are some good resources to learn about debugging patterns and tips/tricks? My preferred language, Julia, recently introduced a nice set of tools related to debugging. I feel like there's probably things that would make me more productive but I think the techniques would be more broadly applicable than a specific language.

drainyard · 78 days ago

One thing I always do these days is I step through any new code I've written the first time I run it. This usually weeds out some bugs that might take a while to find because they are easy to miss. It also ensures that you actually go through each line of code you write, doing a forced code review on yourself early on.

enkiv2 · 78 days ago

I highly recommend learning PROLOG & understanding how to write your own simple planner system. The hairiest real problems are hairy because they're best suited to a declarative style (and programs written declaratively can be made much more efficient through more clever solvers -- given naive code, a clever solver has a much bigger efficiency boost over a dumb solver than an optimizing compiler does over a non-optimizing one -- although PROLOG itself leaks too much abstraction for many of these techniques to be viable in it).

I also recommend understanding message routing systems used in file sharing, like CHORD.

If you don't have a strong background in the math behind theoretical computer science, you might benefit a lot from an understanding of the formal rules around boolean logic, symbolic logic, & state machines -- especially, rules about when certain kinds of things are equivalent (since stuff like demorgan's law are used for simplifying and optimizing expressions a lot, and rules for state machines are used to prove upper limits on resource usage).

If you don't already, learn to use awk. It's a much more powerful language than it seems, and fits extremely well into the gap between command-line prototyping in shell one-liners & porting a prototyped tool to python or perl, and so it's a huge time saver: it is faster to write many kinds of tools in a mix of shell and awk and then rewrite them in python than it is to write them in python in the first place.

bdamm · 78 days ago

I've never used Prolog in the 15 years since I learned it in college. It's an interesting take on programming, for sure, and I appreciated the mind-expanding exercise, but hasn't helped me in my career at all.

Totally agree on awk. I use it almost every day for quick little one-liners. Big time saver.

Also agree on state machines, because from there it is a short hop to understanding formal grammars and the foundation of compilers and languages, which has been immensely useful in my career.

doboyy · 78 days ago

Learning about automata theory was one of the most mind-expanding experiences I had in college. It was certainly not something I would've stumbled upon without guidance. I believe that describes most of the value I derived from a degree, being nudged in the right directions toward solutions and problems that a lot of smart people have though about for a while.

Understanding how different languages, or inputs more generally, can be transformed into meaningful outputs is pretty satisfying. It's a topic that almost seems to transcend the realm of computer science.

enkiv2 · 77 days ago

The best thing you can do for your career is learn things that don't apply to your career. After all, it's impossible to predict unknown unknowns, which makes accidentally having already learned something nobody else valued enough to pick up the most valuable skill.

agentultra · 78 days ago

Formal methods.

It took me nearly a decade of working in distributed systems to be introduced to TLA+ and other tools in this space. Until then my knowledge had been built from textbooks describing the fundamental data structures, algorithms, and protocols... but those texts take an informal approach with the mathematics involved. And since I was self-taught I was reading those texts with an eye for practical applications than for theoretical understanding. I had no idea that a tool existed that would let me specify a design or find potential flaws in systems and protocols, especially concurrent or parallel systems, with such ease.

I think type theory and category theory have also been great tools to have... but I think mathematics in general is probably the more useful tool. Being able to think abstractly about systems in a rigorous way has been the single-biggest booster for me as a practitioner.