Just about any basic course on operating systems that covers file systems at least mentions some kind of a log structured file system as an example of a less traditional disk layout. Log structured file systems have proved to perform very well under metadata intensive workloads and offer features unknown to ordinary file systems. On the other hand, even though Linux boasts support of a wide range of file storage systems, none of them has been log structured until this day. This document describes a brand new implementation of a log structured file system for Linux 2.6.17 (LFS).
Log structured file systems have been around for a long time. They have been proposed in 1991 by Mendel Rosenblum and John K. Ousterhout [1] who described them in the following way:
``A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently.''
The proposed scheme also included division of the underlying device into segments and garbage collecting truncated and outdated data together with an optimal garbage collecting strategy based on a cost-benefit analysis.
In 1993, Margo Seltzer, Keith Bostic, Marshall Kirk McKusick, and Carl Staelin picked up on this work and implemented a log structured file system that was fully integrated into a contemporary production UNIX operating system [2] and managed to overcome many of the shortcomings of the first prototype implementation. We have based our on-disk data structures on their work even though we have amended them to better suit the current Linux environment.
Since then there exist implementations of log structured file systems for old BSD UNIXes. Google reveals several attempts to implement a log structured file system for Linux but none of them has ever been finished. Only one of them, for example, has a working garbage collector, the rest of them are often unable to reclaim free space from outdated and deleted data. The one that has a garbage collector does not support mmap, often trashes cache for no good reason and has other serious shortcomings.
Log structured file systems were proposed because they perform very well under certain workloads. Moreover, fast crash recovery is a logical extension of such a file system and they make it possible to implement so called snapshots. By a snapshot we mean a read-only file system that represents the state of the original file system at a given moment. The original file system can then be modified but the snapshot still has the same contents. This is useful particularly for backing up consistent data while the file system itself is mounted and stored information is being changed.
Even though there is still room for improvement, we have managed to put together a working and competitive file system that:
Our file system still has a few shortcomings though. Because of the time frame allocated to us and hardware that we had at our disposal, we were only able to thoroughly test a limited number of configurations. For example, even though there is no reason why the file system should not work on an architecture different from i386 and extra care has been taken to provide for big endian architectures, no tests on other architectures were performed simply because we did not have access to the hardware. Furthermore, the number of configurations that could potentially be achieved by tuning the size of blocks and segments would require lots of months of thorough testing that we did not have. Therefore, these parameters are fixed at the moment even though internally the system is ready for a wide range of values.
Following this introductory text, the second chapter deals with setting up the file system. It covers necessary requirements, patching the kernel, compiling the source code and installing the file system onto your computer. The third chapter describes the file system from the user point of view. It covers the utilities that come along with it and explains how you can obtain information about mounted file systems. The fourth chapter describes LFS on-disk structures, the fifth one explains the internals of the kernel module that implements the file system and the sixth presents an overview of the implementation of snapshots. The seventh chapter deals with implementation of user space utilities. Finally, we have included our manual pages and the GNU licence terms and conditions that apply to this project.
There are not many typographical conventions in this document, nevertheless it is probably necessary to say that especially important parts of the text are printed in bold, technical terms are highlighted in an italic font, identifiers and other text appearing in the source code are typeset using a monospace font, just like the input and output of various utilities displayed in a terminal.
Viliam Holub 2006-12-04