Friends and colleagues have been hearing me talk about CRFS for a while. CRFS is an acronym that stands for “coherent remote file system”. It’s a project that I’ve been working on to implement a networked file system that is, well, great. I haven’t been too public about it partially for fear of being accused of peddling vapourware but mostly because we’re still working in Oracle to get approval to release the code.
That said, the implementation is far enough along that I can make some meaningful performance measurements. I thought I’d share one which demonstrates what CRFS can do for metadata performance.
These tests were run between two machines. Each have onboard e1000 chips connected to a cheap consumer-grade netlink gigabit switch. They each have 2 gig of memory and single dual-core intel processors of the Penryn generation.
Each test iteration is trivial. We make a new file system on the server, mount it on the client, untar a kernel source tree, purge the client’s data cache, and then read back the file data. Specifically, we run the following commands on the client:
tar -xf /dev/shm/linux-2.6.17.tar
echo 1 > /proc/sys/vm/drop_caches
find linux-2.6.17 -type f | xargs cat > /dev/null
We repeat this series first with the server storing the file system on a single SATA drive and then in ram (tmpfs) only. The CRFS numbers would be pretty baffling on their own so we also run the test over NFS (v3, TCP). We record measurements just like the time(1) command: real wall clock time, cpu time spent in userspace, cpu time spent in the kernel.
seconds command
(real user sys)
nfs crfs
disk: 45.12 0.12 10.22 : 12.55 0.09 2.69 : tar -xf /dev/shm/linux-2.6.17.tar
19.21 0.05 3.54 : 11.04 0.05 1.17 : find linux-2.6.17 -type f | xargs cat > /dev/null
ram: 43.83 0.13 9.91 : 7.90 0.12 2.66 : tar -xf /dev/shm/linux-2.6.17.tar
18.64 0.08 3.61 : 10.68 0.05 1.00 : find linux-2.6.17 -type f | xargs cat > /dev/null
The NFS numbers are roughly the same whether its storing on disk or in ram because we’re using the ‘async’ option. Asking NFS to actually perform each write operation on disk wouldn’t have been sporting at all.
CRFS is limited by the disk speed because its userspace server is waiting for writes to hit disk before sending a response to the client.
CRFS is able to do the same work in less time, even when writes go all the way to disk, because its network protocol goes to great lengths to reduce conversation over the network.
I won’t waste everyone’s time with details until the code is out there and available for people to play with. My intention is to give people something to look forward to :).
The description of my upcoming CRFS talk at LCA ‘08 in Melbourne provides a little more detail. Do come to the talk if you can! It should be fun.