some data on packing directories in blocks

I was watching hg pull do its best impression of paint drying today and thought I’d sit down and run some numbers on the distribution of directory sizes on a boring linux box. This topic invariably comes up when Linux FS people get in a room together.

Imagine that for each directory we make a rough calculation of the number of bytes that the user cares about in that directory. We’ll include the number of bytes in the file name for every directory entry. We’ll include the file size for each regular file. For each subdirectory we’ll only account for an additional 8 bytes. ’struct stat’ can be a stand-in for inode data. We’ll use it to account for the directory itself and for each non-directory entry.

This is just a rough sketch. It under-estimates the FS book-keeping in a few places. It ignores the targets of symlinks and extended attributes entirely.

The question we’re trying to answer might be obvious to FS developers in the crowd: How many directories — directory entries, inodes, file data, and all — would fit in a single FS block of a given size?

I crunched these numbers on a bare FC6 i386 root file system. The resulting table lists the number of directories (and percentage there-of) which fit inside blocks of increasing powers of two.

[root@kiyoko fs-stats]# ./dir-size-hist -p /
      size            nr < % <
       512            3439      46
      1024            3746      50
      2048            4065      54
      4096            4374      58
      8192            4732      63
     16384            5044      67
     32768            5423      73
     65536            5849      78
    131072            6199      83
    262144            6677      89
    524288            6916      93
   1048576            7059      95
   2097152            7310      98
   4194304            7364      99
   8388608            7395      99
  16777216            7406      99
  33554432            7413      99
  67108864            7417      99
 134217728            7418      99

So, roughly, more than half of the directories in a root file system would fit in a 4k block. It sounds like a great opportunity for obvious efficiency gains — disk utilization, seek times, read cache overhead, etc.

Post a Comment
*Required
*Required (Never published)