Sloppy systems programming
This post introduces the Software category here. This is the time when my lovely friends who are less excited by software could go do something more exciting than read this.
I was trying to get Cricket running recently, itself an excercise in awkward software constructs, when I ran into the joy of suEXEC. This is a little setuid helper that runs between apache and cgi to switch the user that the cgi will run as. It’s also an interesting little window into why dealing with software can be so infuriating.
I’m humming along, la la la, jumping through the flaming configuration hoops that one comes to expect from unix systems software. Things aren’t Just Working, of course, and I find this in suexec’s log file:
[2005-01-29 22:14:34]: cannot stat program: (grapher.cgi)
I’m having trouble coming up with a metaphor for how poor this is. stat() is a relative operation in that its results depend on where it is executed from and it tells you what went wrong when it fails. You’d think that suEXEC asked about “grapher.cgi” from a given location and stat() returned an error and suEXEC’s error message simply didn’t bother to tell you the entire story — a programming offense itself worth a few weeks of teasing.
But that’s not what’s happening at all.
/*
* Error out if we cannot stat the program.
*/
if (((lstat(cmd, &prg_info)) != 0) || (S_ISLNK(prg_info.st_mode))) {
log_err("cannot stat program: (%s)n", cmd);
exit(117);
}
It wasn’t that stat() failed, it was that suEXEC saw that it had just performed stat() on a link. It apparently decides that this is fatal, because it knows more about the security trade-offs of your environment than you do, and that when it sees this policy violation it will fail and lie to you about why it failed.
Now, I’ll be the first to admit that this in itself is a very minor detail. The rub is that this sort of misleading behaviour isn’t rare at all. I think this struck a chord with me because it made me focus on my changing thoughts about what it is that I do. There was a time when I loved having a catalogue of this kind of behaviour in my head so that I could use all kinds of software and predict the ways in which I would have to work around its behaviour. It was super-fun to be an expert in so many details.
But these days, and I won’t admit to a decade having passed, it all seems like so much wasted time. People who use this software should be focusing on solving their problems instead of spending time discovering that “cannot stat program:” can sometimes mean “I refuse to work with this file because it is a link.”
It seems like after a few decades of building these kinds of software systems we could be doing a better job of it.
Malcolm wrote:
Certainly true. As someone who gets called when this sort of thing happens, it infuriates me sometimes how frequently I have to fire up a debugger or use some sort of tracer to find out what’s really going on, when I simple descriptive error message would do the trick.
I think of this as the “theory of relative locative density of work”. For any given decision about how something should break, programmers tend to see the amount of work that they will have to do, compared to the amount of work I will have to do, as being not worth their time to fix. It’s harder for them to add an extra line or two of code to print a reasonable, accurate error message than it is for me to reverse engineer what happened. I used to think that this was because programmers were Gods who thought “of course, people will just look at the source and see this obvious thing”, but now I realize that it’s because programmers, as a group, are lazy, undisciplined nitwits.
M
Posted on 31-Jan-05 at 2:36 pm | Permalink
Peter wrote:
I just had the exact same problem, exact same context -trying to install cricket. Thank you for saving me some of the frustration you went through.
-peter
Posted on 10-Feb-05 at 1:08 pm | Permalink
boolean wrote:
Part of the problem is the tendency for C programmers to embed as much on a line as possible. Especially for failure conditions. Something I’ve forced myself to do over the past couple of years is include a separate test and exit for all failure conditions. I do this partly because of the ugliness of nesting conditionals in Smalltalk, but I’ve also found that it has a side-effect of more detailed failures.
ok, make fun of the smalltalker now.
ps, hi zab!
Posted on 11-Feb-05 at 5:06 am | Permalink
boolean wrote:
pps, (and sorry for the double post) I should say that programmers in general seem to like dense lines of code - not just C coders. The good ones will at least try to make them readable. This is not a slam, but ties in nicely with Malcolm’s assertion that “Programmers are Lazy”.
rob
Posted on 11-Feb-05 at 6:17 am | Permalink
Adam S wrote:
It apparently decides that this is fatal, because it knows more about the security trade-offs of your environment than you do, and that when it sees this policy violation it will fail and lie to you about why it failed.
Hi Zach!
First let me say, I’m in complete and total agreement with your analysis of why this is the wrong way to handle the issue of stat’ing a link. But as a security weenie, I recognize where the authors were coming from: Stating links is dangerous. The right fix, however is neither to die, nor to lie in your errors, but open the file, then stat the file descriptor. Peter Guttmann maintains code to safely open files in C, at http://www.cypherpunks.to/~peter/snapshot.zip in io/file.c So you dont even have to do the hard work. Now if only he’d chmod it properly. ;)
Posted on 30-Apr-05 at 9:19 am | Permalink
Chris Walsh wrote:
What is more awful, IMO, is that it says stat(2) failed, when it actually called lstat(2)!
I don’t know what log_err() does, but (again, IMO) failing and not saying why (perror() or “ARRRGH! Symlink detected!”) sucks.
BTW, I suspect that it refuses to follow the symlink in order to avoid a race condition (but nonetheless the existing code is still vulnerable to one). Misleading diagnostic output is never acceptable, however. It looks like they may originally have used stat(2) and then modified the code, but not the diagnostic output. Sloppy.
Posted on 30-Apr-05 at 4:05 pm | Permalink
Adam S wrote:
Bad URL above, which Peter gave me while doing some updates. The correct link is http://www.cs.auckland.ac.nz/~pgut001/cryptlib/
Posted on 01-May-05 at 7:43 am | Permalink