Note to self: EXT4 Filesystem with MySQL on Ubuntu under VirtualBox is toxic; use ‘nobarrier’ mount option

Am running WordPress importers; that which required more than 30 minutes to import – utilising less than 2% CPU for the whole time – suddenly required less than 3 minutes after adding the nobarrier option to the root mountpoint and rebooting.

barrier=<0|1(*)> barrier(*) nobarrier

This enables/disables the use of write barriers in the jbd code. barrier=0 disables, barrier=1 enables. This also requires an IO stack which can support barriers, and if jbd gets an error on a barrier write, it will disable again with a warning. Write barriers enforce proper on-disk ordering of journal commits, making volatile disk write caches safe to use, at some performance penalty. If your disks are battery-backed in one way or another, disabling barriers may safely improve performance. The mount options “barrier” and “nobarrier” can also be used to enable or disable barriers, for consistency with other ext4 mount options.

Gah. Hasn’t Linux gotten past this fuckwittage yet? #ZFS

6 thoughts on “Note to self: EXT4 Filesystem with MySQL on Ubuntu under VirtualBox is toxic; use ‘nobarrier’ mount option

  1. Max Allan

    So are your disks battery backed or are we going to see a solitary blog post here in a few weeks about how the power went out and you couldn’t fsck your disks?

    Reply
  2. Simon Waters

    As far as I can see ext4 is doing what the database requests correctly, and it seems likely ext3 (which I assume simply has barriers off in the default configs you’ve seen) doesn’t guarantee to do what is expected.

    It is so the wrong level to fix this. Presumably the issue is that the importer is committing every single record, and I presume has something daft switched on such as some sort of access logging such that the database has a non-trivial number of records to restore.

    I’m curious how ZFS is different in this regard.

    Ted Tso explains why ext3 doesn’t crumple too often without barriers.
    http://lwn.net/Articles/283161/

    Just when I got my head around file system semantics needed for reliable IMAP operation they locked up my correspondent for murdering his wife :(

    Reply
  3. alecm Post author

    Compare:

    https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarr.html

    A write barrier is a kernel mechanism used to ensure that file system metadata is correctly written and ordered on persistent storage, even when storage devices with volatile write caches lose power. File systems with write barriers enabled also ensure that data transmitted via fsync() is persistent throughout a power loss.

    Enabling write barriers incurs a substantial performance penalty for some applications.

    Specifically, applications that use fsync() heavily or create and delete many small files will likely run much slower.

    with:

    http://docs.oracle.com/cd/E19253-01/819-5461/zfsover-2/index.html

    ZFS is a transactional file system, which means that the file system state is always consistent on disk. Traditional file systems overwrite data in place, which means that if the system loses power, for example, between the time a data block is allocated and when it is linked into a directory, the file system will be left in an inconsistent state. Historically, this problem was solved through the use of the fsck command. This command was responsible for reviewing and verifying the file system state, and attempting to repair any inconsistencies during the process. This problem of inconsistent file systems caused great pain to administrators, and the fsck command was never guaranteed to fix all possible problems. More recently, file systems have introduced the concept of journaling. The journaling process records actions in a separate journal, which can then be replayed safely if a system crash occurs. This process introduces unnecessary overhead because the data needs to be written twice, often resulting in a new set of problems, such as when the journal cannot be replayed properly.

    With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. Thus, the file system can never be corrupted through accidental loss of power or a system crash. Although the most recently written pieces of data might be lost, the file system itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost.

    Reply
  4. Simon

    Yes but the reason for the slow performance is that mysql does fsync, so that when the database returns a commit it is on disk. Not merely that the filesystem is self consistent, but that the data is safe. Thus it is the same as running on ZFS with sync=standard as I read the excerpt of zfs docs. The only linux benchmark I found show comparable performance between ZFS and ext4 for mysql but that may be the vaguaries of random benchmarks, although I expect solaris ZFS is faster. But I think in both cases they have to write outstanding data and sometimes metadata and ensure it is flushed through disk write cache before returning, and since both try to ensure they are writing to contiguous blank space it is probably unsuprising the performance is in a similar ballpark. Or is there some subtlety I missed here. Anyway I learnt stuff about LVM and modern filesystems even if I’m still misunderstanding something.

    Reply

Leave a Reply