some data on packing directories in blocks
I was watching hg pull do its best impression of paint drying today and thought I’d sit down and run some numbers on the distribution of directory sizes on a boring linux box. This topic invariably comes up when Linux FS people get in a room together.
Imagine that for each directory we make a rough calculation of the number of bytes that the user cares about in that directory. We’ll include the number of bytes in the file name for every directory entry. We’ll include the file size for each regular file. For each subdirectory we’ll only account for an additional 8 bytes. ’struct stat’ can be a stand-in for inode data. We’ll use it to account for the directory itself and for each non-directory entry.
This is just a rough sketch. It under-estimates the FS book-keeping in a few places. It ignores the targets of symlinks and extended attributes entirely.
The question we’re trying to answer might be obvious to FS developers in the crowd: How many directories — directory entries, inodes, file data, and all — would fit in a single FS block of a given size?
I crunched these numbers on a bare FC6 i386 root file system. The resulting table lists the number of directories (and percentage there-of) which fit inside blocks of increasing powers of two.
[root@kiyoko fs-stats]# ./dir-size-hist -p /
size nr < % <
512 3439 46
1024 3746 50
2048 4065 54
4096 4374 58
8192 4732 63
16384 5044 67
32768 5423 73
65536 5849 78
131072 6199 83
262144 6677 89
524288 6916 93
1048576 7059 95
2097152 7310 98
4194304 7364 99
8388608 7395 99
16777216 7406 99
33554432 7413 99
67108864 7417 99
134217728 7418 99
So, roughly, more than half of the directories in a root file system would fit in a 4k block. It sounds like a great opportunity for obvious efficiency gains — disk utilization, seek times, read cache overhead, etc.
Post a Comment