greasemonkey (and firebug) made lca2008 happy

Today I noticed that the video of the lightening talks from LCA2008 is available. It has probably been available for a while but I only just noticed :).

I was going to have you download the video that includes all the talks and skip to a particular talk that Paul Fenwick gave on greasemonkey which also mentions firebug. But a bit of searching lead me to Paul’s blog post which mentions the talk and which, in keeping with his apparent passion to make a web that doesn’t suck, includes an embedded youtube movie his talk alone. Nicely done, sir!

The audio recording does a fair job of communicating how much the audience loved the talk. I’m not sure if the audience loved the tools or hated myspace, or what, but either way I had a great time being in the audience . I wore a pretty goofy grin for most of the talk because I was happy for my friends (and loved one!) at moco.

I meant to point the talk out to them but was distracted because the video didn’t appear soon after the talk. With luck some of them won’t have seen it yet and will find some joy in hearing a few hundred people cheering at pieces of the software they work so hard on.

A little more CRFS detail

In my previous post about CRFS metadata performance I said that I didn’t want to go into too much detail until the source is released. I still don’t want to but Evgeniy Polyakov is tempting me! He’s having a good time learning by experimenting with network file systems and posted some theories about CRFS. I’ll respond to his theories with a series of facts about the CRFS protocol and implementation because, well, I love talking about this stuff and rarely get a chance.

The userspace server I’ve implemented (”crfsd”) is btrfs specific. It works directly with the on-disk structures in a btrfs volume. You don’t specify a file system directory tree to export, you specify a block device which contains a btrfs file system. crfsd has exclusive access to the contents of that block device while it is running.

The CRFS client kernel module (”crfs.ko”) doesn’t require kernel patches. I happen to be tracking mainline but, so far, there has been nothing significant in the implementation that restricts it to modern kernels. The use of ->write_begin() will probably be the first thing that starts to restrict the kernel versions that it will support but that hasn’t happened yet.

CRFS does perform writeback caching of metadata operations. The huge performance benefit this brings justifies the complexity of implementing it, which can’t be overestimated. Designing the protocol and then implementing the kernel client such that we can keep this complexity under control is one of the most important aspects of the CRFS system as a whole.

The CRFS network protocol could be said to batch operations, it’s true, though phrasing it that way gives the wrong impression. It’s not like some kind of explicit compound RPC mechanism. Think of it more like the batching that happens when ext3 reads in a block full of inodes as it goes to read a specific inode that it is interested in. CRFS achives similar results from a very different organization of metadata. Think of it as reading and writing groups of items from btrfs leaf blocks because that’s exactly what it is. The opportunistic priming of client caches when they perform normal metadata read requests, at insignificant additional cost, is a natural side-effect of the way CRFS represents metdata.

And with that, I should really return to a nice holiday break.

digg comments on btrfs

A friend pointed out that a reference to btrfs appeared on digg. I wasn’t sure that it merited much attention but a colleague expressed interest in learning more about btrfs.

I should first set the stage by explaining my relation to btrfs. Chris Mason, its primary developer, is my manager at Oracle. He and I started working on btrfs quite a few months ago. I fell back into a more advisory role after I moved on to work on a related project while Chris continued working diligently on the initial btrfs implementation. While I’m not intimately familiar with the code, I’m pretty familiar with the design trade-offs that it currently makes.

I’ll address some of the honest confusion expressed in the comments to that digg post by translating them into questions that one might ask while not suffering from the effects of John Gabriel’s GIF Theory

btrfs isn’t considered stable and isn’t supported. That scares me. Why is btrfs available before it is feature-complete and stable?

Once a file system is complete and supported it becomes very hard to work in features that weren’t originally available. Adding new features that require changes to the format of persistent data on disk becomes much, much, harder. By making it available at this stage we give people the opportunity to request features that might not have occurred to us. All file systems go through this stage, we’re just exposing it to a wider group of people. One is always welcome to simply ignore btrfs until it’s supported if that’s what one desires.

I live a very busy life and couldn’t be bothered to look at the license that btrfs is released under and instead chose to imply that it wasn’t free and open. Was this not the most clever thing I’ve done recently?

Probably. btrfs is released under the GPLv2, the same license as the Linux kernel.

For whatever reason, I have a negative impression of software that is related to the word Oracle. Should I transfer that negativity to btrfs because it is also associated with the word Oracle?

Probably not. The kernel development team at Oracle that produces btrfs is made up of people who worked on the Linux kernel long before they agreed to come work on the kernel for Oracle. Never fear, we tend to work from home in distant states, countries, and continents — far from the influence of whatever magical anti-awesome sauce it is that you think Oracle puts in its developers’ food.

Oracle also developed OCFS2. Are the two projects related?

Not really, although I worked on OCFS2 for a time. The two file systems solve different problems and their development efforts have different resources at their disposal. OCFS2 is about helping multiple machines work on a shared file system without corrupting each others’ efforts. That’s incredibly difficult. btrfs is about making the best of modern file system features available to the majority of Linux installations for the simple case where there’s only one computer using it. That’s relatively less difficult.

btrfs is a new file system. I also know of another new file system, ZFS. Does btrfs make ZFS unneccessary?

I can think of no way in which a current ZFS user would be satisfied by btrfs. If for no other reason than the simple fact that btrfs is not supported anywhere and ZFS is not seriously available to Linux users. Maybe one could entertain having this conversation once btrfs is supported on Linux and Solaris and ZFS is supported on Linux.

All this talk of ZFS and btrfs reminds me that I once heard that ZFS can be slow, or something. Might that also be said of btrfs?

Yes, in as much as that can be said of each and every file system in existence. File system engineering is, at it’s core, a game of having to choose amongst conflicting desires. It’s often the case that implementing a feature in a particular way will benefit one usage pattern while harming some other usage pattern. btrfs and ZFS, both incorporating design elements more modern than the Reagan administration, will tend to chose to skew the trade-offs in similar directions, most of the time.

There are already lots (and lots) of file systems available for Linux. What does btrfs do that those file systems don’t?

Sometimes it can be hard for those of us who work on file systems to clearly communicate why it is that we dislike existing designs. It’s complicated stuff. There’s one property of current Linux file systems, though, that seems like it should be universally ill-received.

Almost all Linux file systems provide almost no protection against data corruption. The only protection they offer is to propagate errors from the storage system up to the application. If the storage system doesn’t realize that the data has been corrupted, perhaps because the corruption happened after the drive, these file systems can get very confused. Returning bad data to applications, overwriting the wrong data on disk, crashing machines, etc.

Now, storage systems have been surprisingly reliable, it turns out. But Linux thrives on cheap commodity hardware, which is not exactly famous for being rock solid. The persistent march of hardware towards commoditization and cheaper manufacturing does not bode well for the future.

That btrfs takes strong measures to address the risk of corruption is the most exciting run-time feature for me. I want flakey hardware to result in a console message indicating data corruption, not mysterious behaviour or kernel panics that some incredibly expensive human has to diagnose.

I mean, no one would ever consider disabling checksumming in TCP. Why on earth do we allow our file systems to operate without similar protection?

vim quickfix error format for sparse

I, like countless others, use vim’s quickfix mode to ease the pain of the compile-fix-compile cycle. vim parses the output of the build so that it can present a summary of errors and enable navigation between them.

sparse is a tool that knows how to find errors in C code that compilers like gcc don’t notice. It requires minimal annotation in the source but provides invaluable functionality, like warning when endian conversions are forgotten.

Which brings us to the point of this post. sparse spits out multi-line errors messages that vim doesn’t completely understand:

tests/btree-stress.c:121:55: warning: incorrect type in initializer (different base types)
tests/btree-stress.c:121:55: expected restricted unsigned long long [usertype] b_offset
tests/btree-stress.c:121:55: got long long [signed] [usertype] offset

vim doesn’t know that each of these error messages belong to the same error. It offers them to the user as three separate errors:

:clist
35 tests/btree-stress.c:121 col 55: warning: incorrect type in initializer (different base types)
36 tests/btree-stress.c:121 col 55: expected restricted unsigned long long [usertype] b_offset
37 tests/btree-stress.c:121 col 55: got long long [signed] [usertype] offset

This is irritating because to navigate past this error you have to know to navigate past the next three errors. This has been an irritating me for, I don’t know, years now. I finally sat down and spent an hour or so poisoning my brain with vim’s arcane configuration.

set efm^=%W%f:%l:%c:\ warning:\ %m,%C%f:%l:%c:\ \ \ \ %m,%Z%f:%l:%c:\ \ \ \ %m

et voila. Now vim considers those three error messages as coming from one error:

:clist
35 tests/btree-stress.c:121 col 55 warning: incorrect type in initializer (different base types) expected restricted unsigned long long [usertype] b_offset got long long [signed] [usertype] offset

I’m sure that format won’t catch all of sparse’s errors but it’ll easy to derive additional formats from it.

I’m also sure that I’m not the first to do this. It would be nice if the sparse guys shipped a sourcable .vimrc along with the tools.

spell checking and vim syntax highlighting

Today I sent out some kernel patches to fix some bug. Our esteemed colleague Randy “eagle eyes” Dunlap pointed out that I had some spelling error.

<rdd> zab: darn, missed the window for s/intead/instead/

How embarrassing! That got me wondering why I don’t have my editor politely raising an eyebrow at me when I misspell things. It is the 21st century, and all. So I read up on syntax highlighting and spell checking in vim. Turning it on is easy enough.

:setlocal spell spelllang=en_us

With that, vim barely hides its true intent behind its default color scheme: to burn a hole in the back of your retina.

MY EYES

Let’s chose some colors that won’t send us into epileptic fits.


:highlight clear SpellBad
:highlight SpellBad term=standout ctermfg=1 term=underline cterm=underline
:highlight clear SpellCap
:highlight SpellCap term=underline cterm=underline
:highlight clear SpellRare
:highlight SpellRare term=underline cterm=underline
:highlight clear SpellLocal
:highlight SpellLocal term=underline cterm=underline

Now misspelled words are underlined and red while other words that it thinks are questionable, for seemingly uninteresting reasons, are simply underlined.

Phew.

Now we can go about our business. z= offers alternative spellings for the word under the cursor, zg adds a word to the list of accepted words, etc.

To round it off we add our own acceptable words list.

:set spellfile=~/.vim/spellfile.{encoding}.add

So there we go! This one’s for you, Randy!

Best of both worlds

894

A significant number of my friends — not exactly newcomers to software! — have embraced OS X. With memories of the early days of the Mac, I was nervous. Allow me to share the story that finally pushed me over the edge.

A few months ago I spent an hour or so configuring X on my Dell X1 to support both the built-in LCD and an external LCD monitor. There were a few scary moments where I, being human, screwed up the configuration of which pipe went where and gave the laptop’s LCD some very bad refresh rates. Luckily it wasn’t damaged.

When I finally got it working the result was pretty cool! Windows flowed off of one LCD and onto the other. I had to logout and edit the config file to disable the second display, or change its resolution or anything, of course. That’s just how X rolls.

Then a week later Alice plugged that very same LCD into her MacBook Pro and it just worked. Live. We tried unplugging it with apps on the display being unplugged and they were moved back onto the remaining laptop LCD. Holy crap!

Step back for a moment and imagine that we’re talking about something other than the dizzyingly complex world of software and computers. An expert using tool A gets mediocre results after an hour of risky work and any user instantly gets superior results using tool B. Why on earth would anyone chose tool A?

That was roughly my line of thinking when Apple released the MacBook Pros with the Core 2 Duo. I broke down and ordered one. That was about two weeks ago and I’m happy to report that things have gone as well as I’d hoped they would. I haven’t had to fight the hardware support battles that are sadly present in the Linux world. I’m appreciating having apps that work nicely together. I think this is the first time that I’ve had the significant players in my contact game — phone, palm, mail app — all talking to each other.

I was down in the bay at Oracle HQ last week. I was asked to give a presentation. I threw together some slides with Keynote. I plugged the projector in and hit the “Play” button and had each slide on the projector and the presenters display — the current and next slides, time spent total and on the current slide — on the local laptop LCD. Even the most mouth-foamy Linux advocates had to admit that this was neato.

Don’t get me wrong. Linux is a fantastic tool for a non-trivial set of problems. Our customers certainly have some serious problems that they prefer to solve with Linux. My day job is still working with Linux. I’m damn good at it and I enjoy it. This leads me to Parallels, the current leader of the OS X PC virtualization pack. It was trivial use an FC6 ISO on the network to install Linux in a virtual machine. I now have a window, which I can full-screen if I so desire, that offers the comfortable Linux environment. Hence the title of this post — I didn’t give up Linux in exchange for OS X. I made it so I can use the modern desktop in OS X to avoid trivial hurdles while solving hard problems for people with Linux. I’m one happy camper. A hoopy frood, if you will.

I couldn’t end this post without mentioning Quicksilver. I was referred to it about a week into my OS X experience. I have been using computers for nearly all my life and I can say this with a straight face: Quicksilver has changed the way I interact with the computer. I will not attempt, and no doubt fail, to clearly describe it in depth. Tutorials exist for that level of introduction. I will, however, share this example of the keystrokes required to bring up my dad’s (a.k.a. Doug) entry in Address Book at any time: control-space, d, o, u, return. I now take this kind of warm-knife-through-butter efficiency for granted.

Foxy

Gather ’round the fire and celebrate the Firefox 1.5 release with a video of passers-by sharing their opinions of IE and Firefox.

You can guess which comment is everyone’s favourite.

Sloppy systems programming

This post introduces the Software category here. This is the time when my lovely friends who are less excited by software could go do something more exciting than read this.

I was trying to get Cricket running recently, itself an excercise in awkward software constructs, when I ran into the joy of suEXEC. This is a little setuid helper that runs between apache and cgi to switch the user that the cgi will run as. It’s also an interesting little window into why dealing with software can be so infuriating.

I’m humming along, la la la, jumping through the flaming configuration hoops that one comes to expect from unix systems software. Things aren’t Just Working, of course, and I find this in suexec’s log file:


[2005-01-29 22:14:34]: cannot stat program: (grapher.cgi)

I’m having trouble coming up with a metaphor for how poor this is. stat() is a relative operation in that its results depend on where it is executed from and it tells you what went wrong when it fails. You’d think that suEXEC asked about “grapher.cgi” from a given location and stat() returned an error and suEXEC’s error message simply didn’t bother to tell you the entire story — a programming offense itself worth a few weeks of teasing.

But that’s not what’s happening at all.


/*
* Error out if we cannot stat the program.
*/
if (((lstat(cmd, &prg_info)) != 0) || (S_ISLNK(prg_info.st_mode))) {
log_err("cannot stat program: (%s)n", cmd);
exit(117);
}

It wasn’t that stat() failed, it was that suEXEC saw that it had just performed stat() on a link. It apparently decides that this is fatal, because it knows more about the security trade-offs of your environment than you do, and that when it sees this policy violation it will fail and lie to you about why it failed.

Now, I’ll be the first to admit that this in itself is a very minor detail. The rub is that this sort of misleading behaviour isn’t rare at all. I think this struck a chord with me because it made me focus on my changing thoughts about what it is that I do. There was a time when I loved having a catalogue of this kind of behaviour in my head so that I could use all kinds of software and predict the ways in which I would have to work around its behaviour. It was super-fun to be an expert in so many details.

But these days, and I won’t admit to a decade having passed, it all seems like so much wasted time. People who use this software should be focusing on solving their problems instead of spending time discovering that “cannot stat program:” can sometimes mean “I refuse to work with this file because it is a link.”

It seems like after a few decades of building these kinds of software systems we could be doing a better job of it.