Friday, November 12, 2010

The things make got right (and how to make it better)

make is much maligned because people mistake its terse syntax and pickiness about whitespace for signs of being an anachronism. But make's terseness is what makes make fit for purpose, and people who design 'improvements' rarely seem to understand the fundamental zen nature of make.

Here are some things make does well:

1. make's key use is in the expression of dependencies. make has a compact, syntactic cruft-free way of expressing a dependency between a file and other files.

2. Since make is so dependent on handling lists of dependencies it has built-in list processing functionality.

3. Second to dependency management is the need to execute shell commands. make's syntax for including dependencies in shell commands is small which prevents the eye from being distracted from the commands themselves.

4. make is a macro-language not a programming language. The state of a build is determined by the dependency structure and the 'up to dateness' of files. There's no (or little) need for any other internal state.

To see the ways in which make is superior to other similar, more modern, systems this post will compare GNU Make and Rake. I've chosen Rake because I believe its illustrative of what happens when people create new make-like systems instead of just fixing the things that are broken about make.

Here's a simple Makefile showing the syntax used for updating a file (called target) from a list of dependent files by running a command called update.

target: prereq1 prereq2 prereq3 prereq4
update $@ $^

(If you are unfamiliar with make then it's helpful to know that $@ is the name of the file to the left of the :, and $^ is the list of files to the right).

Here's the same thing expressed in Rake. The first thing that's obvious is that there's a lot of syntactic noise around the command and the expression of dependencies. What was clear in make now requires more digging to uncover and things like #{t.prerequisites.join(' ')} are long and unnecessarily ugly.

file target => [ 'prereq1', 'prereq2', 'prereq3', 'prereq4' ] do |t|
sh "update #{t.name} #{t.prerequisites.join(' ')}"
end

The biggest 'problem' that the Rake syntax fixes in make is that the target and prerequisite names can have spaces in them without difficulty. Because a make list is space-separated and there's no escaping mechanism for spaces it's a royal pain to work with paths with spaces in them.

make's terse syntax $@ is replaced by #{t.name} and $^ is #{t.prerequisites.join(' ')}. The great advantage of the terse syntax is that the actual command being executed can be clearly seen. When the command lines are long (with many options) this makes a real difference in debug-ability.

This terseness is better can be seen in an example taken from the Rake documentation:
  
task :default => ["hello"]

SRC = FileList['*.c']
OBJ = SRC.ext('o')

rule '.o' => '.c' do |t|
sh "cc -c -o #{t.name} #{t.source}"
end

file "hello" => OBJ do
sh "cc -o hello #{OBJ}"
end

# File dependencies go here ...
file 'main.o' => ['main.c', 'greet.h']
file 'greet.o' => ['greet.c']

which rewritten in make syntax is:
  
SRC := $(wildcard *.c)
OBJ := $(SRC:.c=.o)

all: hello

.o.c:
cc -c -o $@ $<

hello: $(OBJ)
cc -o hello $(OBJ)

main.o: main.c greet.h
greet.o: greet.c

If you want to fix make then it's worth considering the following make problems that don't require an entirely new language:

1. Fix the 'spaces in filenames' problem. Not hard, just needs consistent escaping or quoting.

2. make has a concept of a PHONY target which is a target that isn't a file (used for things like clean and all). These are in the same namespace as file targets. This should be fixed.

3. make can't detect changes in the commands used to build targets. It would be better if make could do this. You can hack that into make but it's ugly.

4. make relies on timestamps for 'up to date' information. It would be better if make used hashes (in some situations, such as when files are extracted from a source code management system, timestamps can be unreliable). This can also be hacked into make if needed.

5. Ensure that non-recursive make is handled in an efficient manner.

Overall I'd urge make reimplementers to do as Paul Graham has done with LISP: his arc language is very LISP-like rather than something brand new.

And one final note: building and maintaining software build systems is inherently hard. Visualizing and getting right the graph of dependencies and handling cross-platform problems isn't easy. If you do come up with something good, please write good documentation for it.

Labels: ,

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

14 Comments:

Blogger miles said...

What do you think of the various systems that detect dependencies using system call tracing, like fabricate? I see that the example fabricate script suffers from some of the same problems as your Rakefile example, but the system as a whole fixes some of the problems you describe with make.

Also, I learned a few things about makefiles from this post. Thanks!

3:06 PM  
Blogger John Graham-Cumming said...

I founded an entire company that does what fabricate does (and more): http://electric-cloud.com/

3:39 PM  
Blogger jbaltz said...

I'd be interested in hearing your critique of systems that swing all the other way, like ant.

4:18 PM  
Blogger jbaltz said...

I'm curious to hear your critique of systems that have gone entirely the other direction in terms of verbosity, like ant (which, to be fair, probably should only be auto-generated. Then again, perhaps makefiles should ALSO be auto-generated too.)

4:20 PM  
Blogger John Graham-Cumming said...

What ant got right was getting rid of the shell completely and going totally cross-platform. Unfortunately, the ant developers decided to shoot themselves in the head by using XML.

To be honest ant scripts read more like machine code.

5:04 PM  
Blogger jbaltz said...

Yes, ant is made to be both written and read by computer, not by human beings.

If you're in the habit of rolling your makefiles by hand, you'll find ant painful.

But what about those of us who decide to (foolishly or not) decide to let some development environment (Eclipse, emacs, what have you) write our makefiles for us, and humans RARELY read them? (If you use automake/autoconf or any of the other tools meant to enhance portability, you find that things get get messy quickly, no?)

Is make's real benefit just its human-readability?

5:12 PM  
Blogger Mark Tomczak said...

This is an excellent reflection on make's behavior and how to improve it. Thank you for writing this!

I worked on a project that started with a make script; we eventually found the need to move away from it and spent about one week writing our own custom Python build system. Spaces in filenames were the eventual nail in the coffin (along with, to be honest, some slightly immature personal friction with a developer community whose answer seemed to be "Real programmers don't have spaces in filenames..." Thank you for acknowledging that this is a real problem!).

But there were two other issues that we encountered, which I generally caution people to consider when choosing make as their tool:

Debugging: Build scripts are programs, and programs have bugs. We found make's debugging facilities to be lacking; you could dump all the rule expansions, but then you had to grep through that forest of post-expanded cruft. What make gained in terseness in the writing step, it more than made up for in the verbosity of its "expand all rules" output for debugging. In contrast, debugging the Python build script involved slapping pdb on the thing and introspecting the runtime.

Speed: Make has a lot of nifty features for creating wildcard rules that can automatically identify files, creating recursive Makefiles, etc. Unfortunately, these features have a cost, and this isn't necessarily obvious to a beginner "maker." We found ourselves in a situation where our make script took upwards of ten seconds simply to tell us that there were no changes to be made! For whatever reason, when we re-wrote the thing in Python, the Python script was much faster. This may have been an implementation detail; we were developing on Windows, and perhaps make hit the filesystem in a way that it didn't like. I simply don't know. But speed can become an issue with a large make, and it can sneak up on a developer.*

It's worth noting that we had no need to build cross-platform, so that entire feature of toolchains that include a make engine wasn't of interest to us. And I still have no issues recommending make for small projects. But as all of my small projects have a tendency to get large, I tend to reach for a tool other than make these days.**

* It's been a few years; if I recall correctly, a make guru suggested once that the likely culprit for our speed losses is recursive Makefiles. I can't remember clearly enough to say with certainty. But given the debugging challenge, when we found ourselves staring down the barrel of flattening our per-directory Makefile schema, we decided it'd be more fun to roll our own solution.

** I should also acknowledge the possibility that our two problems may have stemmed from a common issue; one could say that if your make script warrants debugging, you're using make wrong. This is a personal preference issue. I'm of the opinion that I have a filesystem for a reason, and if there's a directory full of .c files (and directories that contain .c files), I should not need to explicitly list them in the build rules. Some people are not bothered by this sort of thing.

6:31 PM  
Blogger rog peppe said...

did you check out mk?

the original paper is here: ftp://ftp.cyberway.com.sg/pub/funet/unix/security/docs/usenix/usenix/summer87/mk.ps.gz

it fixes at least three of the issues you mention.

8:20 PM  
OpenID s9 said...

Sigh. Or you could look at OMake, which fixes every last one of the issues you're talking about. The homepage is down at the moment, for come reason, but you can find the source code around.

12:00 AM  
OpenID s9 said...

Or you could consider OMake, which fixes every last one of the issues you mention. The home page at omake.metaprl.org is down at the moment, but the source code can be found readily.

12:02 AM  
Blogger John "Z-Bo" Zabroski said...

1. We should do away with file abstractions completely, and deal only with resources. Alt's Ph.D. thesis on compiler architecture was a step in the right direction. The idea of "PHONY" targets is amusing in this regard. Using filesystem's for source code, especially given the change in how major open source projects are distributed, configured and compiled (e.g., maven ibiblio repositories for apache projects, distributed version control systems like darcs and git, online open source code sharing websites like github and gitorious).

2. Vesta, developed at Compaq by Paul McJones, Butler Lampson and other great programmers, blew away make the moment it was created.

3. If you don't want to use XML for Ant, you don't have to. Simply use Gant, which allows you to write Ant tasks in Groovy and create DSLs that you can actually debug. Make files don't have a debugger, either, so your opinion about the XML being like machine code seems a matter of taste. The bigger issue with the use of XML is how it is used. The XML is really just configuring a semantic object graph, yet the XML elements and attributes don't show that. Neither does Make's DSL, either. Both lose here.

4. Tools like make are very old school and were developed before anyone had really understood what the software development lifecycle should look like; we have more experience now and better understand it. Vesta is one example of a tool that takes it into account. What is sad is that you see companies like Fujitsu building their own software configuration management solutions today, and companies like ThoughtWorks selling "Continuous Integration" servers and hyping features like "pipelines" as if they were radical concepts. In fact, what is really happening is companies like ThoughtWorks and Electric Cloud are taking advantage of uneducated consumers and lack of serious competition.

12:14 AM  
Blogger John "Z-Bo" Zabroski said...

5) The "Software Development Kit" concept is dead. Everything I have already mentioned spells the death of the SDK. I've written about this online before. If you design a system correctly, you don't need "make". You don't need "Service Packs". You don't need wars over which way of managing software configuration is better: BSD or Linux (Solaris is simply a hybrid of the two philosophies, and Windows is not worth criticizing even though individual products within Windows have good practices such as Baseline Configuration Analyzer for automagically detecting software installation problems).

The fact the SDK is dead is the most prominent reason make is dead. You don't need a traditional build process, nor do you want one.

12:18 AM  
OpenID ericm said...

Along the same lines, I did a comparison between make syntax and several other build tools, including cmake, scons and waf -- sort of a build tool Rosetta Stone. You can see it on blog.melski.net.

7:36 PM  
Blogger SteveL said...

As one of the ant developers, I'm not convinced that XML was a mistake. Here is what it gave us.
-out the box integration with XML editors. In 2000, these were still new
-good internationalisation
-easy for other tools -especially IDEs- to parse.
-got to ride the "XML is cool" wave. Yes, I remember that time. But before everyone dismisses it, you'll be shaking your head in JSON in ten years, as you code your S-expressions.


Other strengths of the tool
-x-platform from the outset. You have to struggle to write a build file that doesn't work on windows. Make is really the tuple of (Make, Unix). Now that desktop linux is more common, you can perhaps mandate unix, but it still reduces your contributor base.
-lots of behind the scenes work dealing with java versions and compiler quirks.
-understanding of the java dependency model. It's really hard to do this in Make, believe me.
-very test centric. Our XML format for test results is ubiquitous (which makes it a pity the format can't be streamed to HDD due to the need to put a test summary at the top.
-no programming constructs in the tooling: loops, etc. It works on sets of files. If you have anything more complex, code something reusable.


To be fair, I am aware of its weaknesses, some of which are
-no dry run option
-still too procedural for tooling to infer what's going on.
-XML commenting sucks
-forcing people to drop to java to code something more complex is too low level. Some script language alongside the declarative build metadata would be better.


SteveL

6:35 PM  

Post a Comment

Links to this post:

Create a Link

<< Home