Filesystem metadata extraction/storage with early bup versions

Posted: 2012-05-27

I’ve invested in extended attributes for tagging, am revisiting backups (having recently moved back to Linux), have found rsync --link-dest just a bit too slow and inefficient and am determined to use bup. When used in low-level tar -cf - / | bup split mode, bup treats its input as a blob and therefore preserves whatever metadata your archiver does, but it’s inefficient and slow to archive the whole tree every time. High-level metadata preservation is more efficient, feeding bup only what changed, and apparently should make it into bup 0.25, but today’s Ubuntu has 0.22a-1. That seems to leave the following choices:

  1. Use an extended attribute-preserving archiver and low-level bup, put up with the speed hit for now and switch to high-level bup once it supports metadata preservation. My understanding is that you could switch from low-level to high-level bup without changing repository and the first high-level backup would enjoy deduplication, because the efficiency advantage of the high-level stuff is just to feed bup only files known to have changed, rather than all files.
  2. Extract the extended attributes to one or more files, in some format that’s easy to apply to metadata-less files freshly extracted from bup, and use high-level bup today.

I do my incremental backups manually, almost every night. (Yeah, if it’s not automatic, it isn’t a backup — but if you’re away from home more nights than you’re home, on flaky Internet connections where you’re competing with iPlayer and YouTube downloads, fast manual backup to a caddied USB 3 drive beats slow automatic backup over the network.) Duration is important; I want to go to bed, now, knowing the backup is done. 20 minutes is about as long as I can be bothered to wait. Just creating a full dar archive and writing to /dev/null took ~10m for /root and ~1h34m for /home.

Choosing an archive tool/format

So, what tool/format (in descending order of importance):

Ideas that didn’t survive scrutiny:

It turns out GNU cp has a --attributes-only option and includes extended attributes, so I settled on the idea of a loopback mounted filesystem. My underlying filesystem is ext3, journalling isn’t needed and SquashFS, while compact, would require enough inodes for a separate copy of the tree, so ext2 seems suitable.

Trying it out

My na├»ve first try didn’t work:

    root@jin:~# dd if=/dev/null of=/home/meta.ext2 bs=1G seek=1
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 3.0289e-05 s, 0.0 kB/s
    root@jin:~# mke2fs /home/meta.ext2
    [...]
    Block size=4096 (log=2)
    [...]
    65536 inodes, 262144 blocks
    13107 blocks (5.00%) reserved for the super user
    [...]
    root@jin:~# mkdir test-backup
    root@jin:~# mount -o loop /home/meta.ext2 test-backup/
    root@jin:~# cp --archive --attributes-only --one-file-system /. test-backup/
    [...]
    cp: cannot create directory `test-backup/./var': No space left on device

    root@jin:~# df --block-size=1K test-backup/
    Filesystem     1K-blocks  Used Available Use% Mounted on
    /dev/loop0       1032088 47144    932516   5% /root/test-backup
    root@jin:~# df --inodes test-backup/
    Filesystem     Inodes IUsed IFree IUse% Mounted on
    /dev/loop0      65536 65536     0  100% /root/test-backup

Too much space for data blocks and too little for inodes. The number of inodes needed is simply however many the source filesystem is using right now:

    root@jin:~# df --inodes /
    Filesystem                      Inodes  IUsed IFree IUse% Mounted on
    /dev/mapper/jin-root--enc_crypt 286720 206800 79920   73% /

Specifying this is easy:

    root@jin:~# mke2fs -N 206800 /home/meta.ext2

The easy approach to allocating data blocks is to wildly overestimate. Provided the underlying file is freshly, sparsely allocated for each backup then bup will de-dup the virtual, zero blocks. (If bup didn’t, you could still use resize2fs.) A quick test shows that bup does throw away zeros:

    root@jin:~# umount test-backup/
    root@jin:~# dd if=/dev/null of=/home/meta.ext2 bs=1G seek=1
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 3.2749e-05 s, 0.0 kB/s
    root@jin:~# export BUP_DIR=bup-zeros-test
    root@jin:~# bup init
    Initialized empty Git repository in /root/bup-zeros-test/
    root@jin:~# bup split -n zeros -v /home/meta.ext2 
    [...]
    root@jin:~# du -hs $BUP_DIR
    7.7M    bup-zeros-test
    root@jin:~# rm -rf $BUP_DIR

Probably pointless fine-tuning

Just out of curiosity though, and perhaps to get an estimate within the right order of magnitude, could we easily get the number of data blocks a bit closer to the required number? Data blocks are required for:

Ignoring xattrs (like the GNU find maintainers) and other filesystem overheads for a moment the total filesystem size in bytes would be:

block_size * num_oversize_links + directory_bytes + inode_size * num_inodes

For num_oversize_links:

    root@jin:~# find /mnt/root-enc-snap/ -type l -size +59c -printf "x" | wc -c
    14500

For directory_bytes:

    root@jin:~# find /mnt/root-enc-snap/ -type d -printf "%b * 512\n" | paste -s -d + | bc
    84413440

With 4096b blocks and 256b inodes, that’s (bytes/blocks):

    root@jin:~# echo '4096 * 14500  +  84413440  +  256 * 206800' | bc
    196746240
    root@jin:~# echo '(4096 * 14500  +  84413440  +  256 * 206800) / 4096' | bc
    48033

You need a to pick value for block and inode sizes to make an estimate, but is it worth playing with them? I haven’t; since the filesystem image is going to be submitted to bup deduplication then discarded it probably doesn’t matter much. Extended attributes can’t exceed one block, and while most of mine are under 100 bytes I like to keep my options open. I’m sure I also read somewhere that some ext2 implementations only support 4096b blocks, so perhaps that size makes your backups just a bit more portable.

Let’s try it:

    root@jin:~# mke2fs -N 206800 -b 4096 -I 256 -m 0 /home/meta.ext2 48033
    [...]
    206864 inodes, 48033 blocks
    [...]
    root@jin:~# dumpe2fs -h /home/meta.ext2 
    [...]
    Inode count:              206864
    Block count:              48033
    Reserved block count:     0
    Free blocks:              34880
    Free inodes:              206853
    [...]
    root@jin:~# mount -o loop /home/meta.ext2 test-backup/
    root@jin:~# cp --archive --attributes-only --one-file-system /. test-backup/
    root@jin:~# df --inodes test-backup/
    Filesystem     Inodes  IUsed IFree IUse% Mounted on
    /dev/loop0     206864 206711   153  100% /root/test-backup
    root@jin:~# df -k test-backup/
    Filesystem     1K-blocks   Used Available Use% Mounted on
    /dev/loop0        140328 138860      1468  99% /root/test-backup

Much better. It’s a little surprising that it’s big enough, given that I made no allowance for overheads or xattrs, but the root filesystem probably doesn’t have any xattrs (mine are in /home on a separate mount) and the directories seem to take up less space in the backup than on the original (perhaps deleted entries don’t get reclaimed, so the original has holes that the clone doesn’t?) which compensates for overheads and leaves a little room to spare.

I initially chose ext2 over SquashFS because SquashFS has no attributes-only mode and would therefore be inode-hungry. Of course, once there’s a populated image with the right number of inodes and data blocks, it’s a tiny extra step to put it in a SquashFS.

Space efficiency comparison

How big are the filesystem images and how well do they deduplicate?

Format sparse loopback ext2 with 10% more than estimated required blocks metastore squashfs (with default options)
Apparent size (MiB) 573.74 91.05 13.56
On-disk size (MiB) 524.45 91.05 13.56
Gzipped size (MiB) 13.48 19.23 13.44
Size when bup-saved, uncompressed, into an empty per-format repository (MiB) 19.07 24.11 13.64
After three days (three bup-saves) as above (MiB) 58.73 (19.62/day) 42.60 (9.09/day) 41.10 (13.65/day)

My choice: loopback ext2

I’m not too fussed about squeezing every last byte out of the metadata storage. I am very fussed about using only reliable, stable code when it comes to backups, and minimising the number of extra scripts, tools and dependencies. (I make an exception for bup because without it I couldn’t afford nearly as many snapshots for the same disk space/backup time.) A loopback ext2 filesystem therefore seems right to me (even before the space efficiency comparison results are in). Perhaps one day someone (me?) will look at porting Backup Bouncer to Linux so the accuracy of these solutions can be objectively compared.

Notes and afterthoughts

Perhaps SquashFS would deduplicate better with inode table compression disabled, in keeping with the tar-before-gzip approach. I haven’t tried this.

Perhaps it’s worth mentioning that cp --attributes-only doesn’t delete existing attributes, so can only be used accurately on a tree freshly restored from bup. I’ve not tested whether it copies ext2-only flags like the immutable flag. It doesn’t detect hard links.

It’s also worth pointing out that — once stable and reliable — bup’s own metadata format will probably use optimisations tailored specifically for bup’s deduplication, so I’ll prefer it when it gets there.