LCA2008 CRFS talk went well

Well, it’s been almost three weeks since I gave a talk on CRFS at LCA 2008 and I’m just now getting around to sharing my thoughts on how it went. We’ll pretend that the delay makes the thoughts that much more.. thoughtful, but clearly that’s already not the case.

I was impressed by the quality of volunteers at LCA. On the morning of my talk they had two people in the room running the AV equipment and making sure that I got time cues. After the talk they had me put a PDF of the slides on to a USB flash drive. Within 24 hours they had both the slides and the video of the talk available for download from the conference’s programme page. People noticed, too. The next morning I awoke to find emails from people half-way around the world who had read the slides and had decent questions to ask about how CRFS works. That’s pretty great.

I am a little worried that these “linux.conf.au” links will break next year when the next incarnation of the conference builds their web site. I guess if I was clever I’d grab a copy now and serve up the talk materials locally.

I will admit to having some trouble deciding just which pieces of CRFS to try and squeeze into a short introductory talk. I tried to stick to the most fundamental basics but I’m not sure I can trust my judgment here. I have a tendency to misjudge the level of pre-existing knowledge in a given audience. I’d love to hear feedback from my colleagues who have different levels of experience with file systems.

I will also happily admit to going a little too far with LCA’s motto of being “fun, informal and seriously technical”. I really hammed it up in a few places. I felt like the audience enjoyed it but the video didn’t pick up the reasonably steady trickle of giggles from the audience so the viewer can be forgiven for thinking that I was just being a crazy person :). I think I’ll take Val’s positive characterization of the talk as “technical stand-up improv comedy” as an indication that I was doing something right.

The frighteningly keen LCA attendee may have noticed that one or two (or three) of us put “bonghits” in our talks. I blame the dangerous intersection of Dave Jones and conference subsidized bottles of wine.

As for CRFS, it continues on at full speed. There have been signs of life in the process of getting approval to release the source so maybe I’ll have something exciting to report soon. I’ve just doomed the process by typing those words, of course.

Last week I converted crfsd from being a confusing threaded process to a group of processes with explicit boundaries for sharing state. I should have called it the honorary Rusty Russell Hates Threads commit but I chickened out. Commit messages are forever!

At the moment I’m pushing to get the coherency protocol stumbling along such that the initial release can bear more resemblance to what the final CRFS system will look like. With luck I’ll make it in time.

greasemonkey (and firebug) made lca2008 happy

Today I noticed that the video of the lightening talks from LCA2008 is available. It has probably been available for a while but I only just noticed :).

I was going to have you download the video that includes all the talks and skip to a particular talk that Paul Fenwick gave on greasemonkey which also mentions firebug. But a bit of searching lead me to Paul’s blog post which mentions the talk and which, in keeping with his apparent passion to make a web that doesn’t suck, includes an embedded youtube movie his talk alone. Nicely done, sir!

The audio recording does a fair job of communicating how much the audience loved the talk. I’m not sure if the audience loved the tools or hated myspace, or what, but either way I had a great time being in the audience . I wore a pretty goofy grin for most of the talk because I was happy for my friends (and loved one!) at moco.

I meant to point the talk out to them but was distracted because the video didn’t appear soon after the talk. With luck some of them won’t have seen it yet and will find some joy in hearing a few hundred people cheering at pieces of the software they work so hard on.

Well played, Murphy

Alright, put yourself in the mindset of a server in my basement. You’re kind of sad that the guy who maintains you is a few thousand miles away. Then his lovely wife has the nerve to go to California. It’s pretty lonely down there. What do you do?

Yes, that’s right, you have a few fans fail. Then heat gathers in the top of your ancient PC case. Which causes the power supply, cleverly designed to sit in the top of the case where heat gathers, to fail. The faulty power supply pulls power from two drives in a four drive array which flips the array into degraded mode wherein it can only return errors. Which hangs the machine as ext3 gets IO errors in the journal. You’ll show them!

I’m quite lucky to live a few blocks from one of the most capable sysadmins that I had the pleasure of starting my career with. I gave him a call, we shared some Simpson’s quotes (mostly Professor Frink), and managed to get things up and running again. He was able to transplant a power supply from a neighbouring test box. Thankfully the power drop didn’t damage the drives. Phew.

This was made that much funner by the fact that I hadn’t yet synced the most recent CRFS changes from that machine to a box at Oracle. The source that I’m giving a talk about on Friday here in Melbourne. Where the source is intended to be released.

So, I guess this means I get to play Christmas on Newegg with PC hardware when I get back. Yay, prezzies!

Melbourne bound!

Well, I’m heading down to Melbourne for LCA 2008 in a few hours. I’m not exactly excited by the length of the trip (SKW6084 and UAL839) but I’m definitely looking forward to attending LCA and to seeing Melbourne. It looks like a nice city. It’s a shame that I didn’t arrange to stay longer. Ah, well.

I’ll be giving a talk on CRFS while I’m down there. I did a practice run for a small audience of friendly Linux folks in Portland which was well received so I have high hopes that people at the conference will enjoy it. I know I certainly enjoy talking about this technology, but, well, I guess I would ;).

I thought I’d share a slide from the talk that I find geeky and satisfying:

silly-rename003.png

The slide is demonstrating a particularly weird behaviour of the Linux NFS client adorably called silly renaming. I like the slide because it’s using a relatively small set of system calls to illustrate how differently NFS can behave than “local” file systems. I use it during the talk to illustrate one of my primary motivations for working on CRFS — that we have a network file system that doesn’t penalize its users by requiring that their applications know to work around its behavioural quirks.

Anyway, if this stuff interests you I hope you’ll come have fun at the talk with us.

A little more CRFS detail

In my previous post about CRFS metadata performance I said that I didn’t want to go into too much detail until the source is released. I still don’t want to but Evgeniy Polyakov is tempting me! He’s having a good time learning by experimenting with network file systems and posted some theories about CRFS. I’ll respond to his theories with a series of facts about the CRFS protocol and implementation because, well, I love talking about this stuff and rarely get a chance.

The userspace server I’ve implemented (”crfsd”) is btrfs specific. It works directly with the on-disk structures in a btrfs volume. You don’t specify a file system directory tree to export, you specify a block device which contains a btrfs file system. crfsd has exclusive access to the contents of that block device while it is running.

The CRFS client kernel module (”crfs.ko”) doesn’t require kernel patches. I happen to be tracking mainline but, so far, there has been nothing significant in the implementation that restricts it to modern kernels. The use of ->write_begin() will probably be the first thing that starts to restrict the kernel versions that it will support but that hasn’t happened yet.

CRFS does perform writeback caching of metadata operations. The huge performance benefit this brings justifies the complexity of implementing it, which can’t be overestimated. Designing the protocol and then implementing the kernel client such that we can keep this complexity under control is one of the most important aspects of the CRFS system as a whole.

The CRFS network protocol could be said to batch operations, it’s true, though phrasing it that way gives the wrong impression. It’s not like some kind of explicit compound RPC mechanism. Think of it more like the batching that happens when ext3 reads in a block full of inodes as it goes to read a specific inode that it is interested in. CRFS achives similar results from a very different organization of metadata. Think of it as reading and writing groups of items from btrfs leaf blocks because that’s exactly what it is. The opportunistic priming of client caches when they perform normal metadata read requests, at insignificant additional cost, is a natural side-effect of the way CRFS represents metdata.

And with that, I should really return to a nice holiday break.

CRFS performance teaser

Friends and colleagues have been hearing me talk about CRFS for a while. CRFS is an acronym that stands for “coherent remote file system”. It’s a project that I’ve been working on to implement a networked file system that is, well, great. I haven’t been too public about it partially for fear of being accused of peddling vapourware but mostly because we’re still working in Oracle to get approval to release the code.

That said, the implementation is far enough along that I can make some meaningful performance measurements. I thought I’d share one which demonstrates what CRFS can do for metadata performance.

These tests were run between two machines. Each have onboard e1000 chips connected to a cheap consumer-grade netlink gigabit switch. They each have 2 gig of memory and single dual-core intel processors of the Penryn generation.

Each test iteration is trivial. We make a new file system on the server, mount it on the client, untar a kernel source tree, purge the client’s data cache, and then read back the file data. Specifically, we run the following commands on the client:

tar -xf /dev/shm/linux-2.6.17.tar
echo 1 > /proc/sys/vm/drop_caches
find linux-2.6.17 -type f | xargs cat > /dev/null

We repeat this series first with the server storing the file system on a single SATA drive and then in ram (tmpfs) only. The CRFS numbers would be pretty baffling on their own so we also run the test over NFS (v3, TCP). We record measurements just like the time(1) command: real wall clock time, cpu time spent in userspace, cpu time spent in the kernel.

                   seconds                   command
                (real user sys)

            nfs                 crfs

disk: 45.12 0.12 10.22 : 12.55 0.09 2.69 : tar -xf /dev/shm/linux-2.6.17.tar
      19.21 0.05 3.54  : 11.04 0.05 1.17 : find linux-2.6.17 -type f | xargs cat > /dev/null

 ram: 43.83 0.13 9.91 :  7.90 0.12 2.66 : tar -xf /dev/shm/linux-2.6.17.tar
      18.64 0.08 3.61 : 10.68 0.05 1.00 : find linux-2.6.17 -type f | xargs cat > /dev/null

The NFS numbers are roughly the same whether its storing on disk or in ram because we’re using the ‘async’ option. Asking NFS to actually perform each write operation on disk wouldn’t have been sporting at all.

CRFS is limited by the disk speed because its userspace server is waiting for writes to hit disk before sending a response to the client.

CRFS is able to do the same work in less time, even when writes go all the way to disk, because its network protocol goes to great lengths to reduce conversation over the network.

I won’t waste everyone’s time with details until the code is out there and available for people to play with. My intention is to give people something to look forward to :).

The description of my upcoming CRFS talk at LCA ‘08 in Melbourne provides a little more detail. Do come to the talk if you can! It should be fun.