User space file system comments
Posted on July 9, 2011 by Tommy McGuireThis is a comment I made on the Gluster blog, about a post on one of Linus Torvalds' incendiary comments on user-space filesystems.
Way back in the dim mists of time, when microkernels were hot, Linus was having flame wars with AT, and the HURD actually sounded like it might someday be usable, I worked on the IBM microkernel project writing filesystem benchmarks. The idea was to build OS/2 (!), AIX, and other OS personalities on top of Mach, as multiple user-space servers.
There were a lot of things wrong with the implementation (getpid() required a trip through Mach to an OS server), but the one thing that was a continual problem was filesystem performance. At one point, reading a page off disk was CPU bound. The issue wasn’t context switches, really, which can themselves be made arbitrarily fast, but carrying pages of data along with the context switch. The original copy-on-write turned out to be a pessimization—a very-well-written memcpy took a little less time to copy a page than the vm system took to set up the page share and copy-on-write. Then, if you eventually had to copy the page….
I never figured out what was going on with the virtual memory system, but I’ve seen the same issue over and over. The MkLinux paper at the (1st and only, AFAIK) FSF conference is one of my favorite performance papers ever. The table showing performance comparisons looks good, as long as you didn’t notice that the top third was dhrystones; MkLinux didn’t actually slow the processor down. The Scout OS, a good research uK OS reported excellent performance, but didn’t separate user-space processes into different memory domains. The numbers reported by the security-enhanced follow-on, whose name I can’t remember but which did use multiple memory domains, was much worse. If you read between the lines of that paper comparing it to Linux, it would still have been a small multiple slower than Linux if it only used 2 protection domains instead of 3 for filesystem accesses.
I’m as big a fan of user-level filesystems as anyone. They provide a neat and protected way of doing things that would otherwise not be as clean. On the other hand, to say “user space advantages far outweigh kernel space advantages” as a general rule is a pretty thoroughly blinkered view, too.
Oh, and if you want the references to the MkLinux, Scout, and whatever-that-other-one-was papers, I’ll try to round them up.