A little more CRFS detail
In my previous post about CRFS metadata performance I said that I didn’t want to go into too much detail until the source is released. I still don’t want to but Evgeniy Polyakov is tempting me! He’s having a good time learning by experimenting with network file systems and posted some theories about CRFS. I’ll respond to his theories with a series of facts about the CRFS protocol and implementation because, well, I love talking about this stuff and rarely get a chance.
The userspace server I’ve implemented (”crfsd”) is btrfs specific. It works directly with the on-disk structures in a btrfs volume. You don’t specify a file system directory tree to export, you specify a block device which contains a btrfs file system. crfsd has exclusive access to the contents of that block device while it is running.
The CRFS client kernel module (”crfs.ko”) doesn’t require kernel patches. I happen to be tracking mainline but, so far, there has been nothing significant in the implementation that restricts it to modern kernels. The use of ->write_begin() will probably be the first thing that starts to restrict the kernel versions that it will support but that hasn’t happened yet.
CRFS does perform writeback caching of metadata operations. The huge performance benefit this brings justifies the complexity of implementing it, which can’t be overestimated. Designing the protocol and then implementing the kernel client such that we can keep this complexity under control is one of the most important aspects of the CRFS system as a whole.
The CRFS network protocol could be said to batch operations, it’s true, though phrasing it that way gives the wrong impression. It’s not like some kind of explicit compound RPC mechanism. Think of it more like the batching that happens when ext3 reads in a block full of inodes as it goes to read a specific inode that it is interested in. CRFS achives similar results from a very different organization of metadata. Think of it as reading and writing groups of items from btrfs leaf blocks because that’s exactly what it is. The opportunistic priming of client caches when they perform normal metadata read requests, at insignificant additional cost, is a natural side-effect of the way CRFS represents metdata.
And with that, I should really return to a nice holiday break.
Post a Comment