Migrating ZFS from Linux to FreeBSD
Published on 2020-02-11.
ZFS on Linux might get a lot of the latest features, and with a distribution like Arch Linux you have the bleeding edge, but it makes great sense to migrate everything ZFS related to FreeBSD. On FreeBSD ZFS is a first class citizen. This means that you don't have to worry about hostile kernel commits that suddenly breaks ZFS, or kernel modules that has to be re-compiled every time the kernel is updated. Being a first class citizen also means that the entire operating system is tailored to work really well with ZFS. The installer makes it really easy to get ZFS on root, with support for all the different possible configurations, and all the relevant tools know about ZFS, even the 'top' commando shows the memory usage of the ZFS ARC!
Table of contents
Introduction
The FreeBSD version of ZFS is a bit behind the ZFS version on Linux, and as such FreeBSD 12.1 doesn't have things like native ZFS encryption yet. But since FreeBSD is pulling stuff in from ZFS on Linux now, rather than Solaris, it is getting new ZFS features faster.
But the most important thing is that ZFS is treated as a first class citizen. You don't have to patch anything, and you don't have to worry about breakage, and all the tools know about ZFS.
However, migrating an exiting ZFS pool from Linux to FreeBSD isn't easy. If you already have a pool running on Linux with "Linux-only" features, or newer features enabled in the pool, you need to backup the data, export the pool, and then create a new pool on FreeBSD. But unless you really need those specific features, migrating is worth all the work!
If you already are running ZFS on Linux, then you already know all the good stuff about ZFS, but then you also know about the less welcoming environment ZFS currently lives in on Linux. Having the operating system designed to work with ZFS from the ground up makes a huge difference.
You can read more about ZFS on FreeBSD in the FreeBSD manual.
In this minor tutorial I'll speak a little about some issues and address a couple of tools.
Boot environments
With FreeBSD on ZFS you get boot environments. A boot environment is a bootable instance of the operating system plus any installed third party packages. It is based upon a bootable clone of a ZFS dataset.
With FreeBSD you can manage multiple boot environments and each boot environment can have different versions of the operating system and/or packages. This means that the boot environment also allow the system to be upgraded, while preserving the old system environment in a separate ZFS dataset. Should the upgrade go wrong for some reason, you can just boot of the previous boot environment.
Each boot environment consists of a root dataset and, optionally, other datasets nested under that root dataset.
When you install FreeBSD on ZFS a default boot environment is created. You can then use the bectl utility to manage boot environments.
You can even create a new boot environment based upon a snapshot of the current running environment, then mount the newly created environment from the current running system and update the system inside the new environment without touching the running system. This is very useful if you have to manage a remote system where you only have access to the machine via the console. You can make the new environment active, then have the machine boot into it, but should the boot procedure fail due to some problem with the upgrade, the machine can be set to automatically boot into the previous boot environment.
You can also copy and move a ZFS boot environment into another machine and run it there, or use a FreeBSD Jail to test the results in.
This means that you can not only do major reconfiguration of running third party applications such as mail servers, web servers, etc., but you can also mass populate large amounts of servers with one configured boot environment, and at the same time you can use it as a bare metal backup solution.
Selections of boot environments has been integrated into the FreeBSD loader which means you can always change the boot environment at boot.
With bectl
we can list all the boot environments. At the moment I only have the default one:
# bectl list BE Active Mountpoint Space Created default NR / 5.40G 2020-02-02 02:37
Under the Active
column the letter N
points to the active boot environment, while the letter R
is the boot environment that will be booted from on the next boot.
Let's create a new boot environment, mount it, and install some packages in that:
# bectl create -r testing-packages # bectl list BE Active Mountpoint Space Created default NR / 5.40G 2020-02-02 02:37 testing-packages - - 8K 2020-02-12 09:26
The -r
option is the recursive option, it is needed to make sure we get all the relevant datasets.
Let's mount it:
# bectl mount testing-packages successfully mounted testing-packages at /tmp/be_mount.BCNN # ls /tmp/be_mount.BCNN/ .cshrc bootpool etc media rescue tmp .profile COPYRIGHT home mnt root usr bin dev lib net sbin var boot entropy libexec proc sys zroot
Let's install a packages in it:
# pkg -r /tmp/be_mount.BCNN/ install tuxpaint
We can then activate the testing-packages
boot environment for the next boot where we will have tuxpaint
installed. If tuxpaint
for some reason should mess up our FreeBSD system we can revert back to the default environment (it is Tux after all).
Let's unmount it and set the new environment as the active one:
# bectl umount testing-packages # bectl activate testing-packages successfully activated boot environment testing-packages # bectl list BE Active Mountpoint Space Created default N / 9.87M 2020-02-02 02:37 testing-packages R - 5.46G 2020-02-12 09:26
The letter R
now shows that at the next boot we will boot into the testing-packages
boot environment.
Let's imagine that Tux did mess up our system, we can then reboot and use the default
boot environment from the boot loader, we can then activate the default
environment and destroy the testing-packages
environment if we don't want to investigate further:
# bectl activate default successfully activated boot environment default # bectl list BE Active Mountpoint Space Created default NR / 5.40G 2020-02-02 02:37 testing-packages - - 59.3M 2020-02-12 09:26 # bectl destroy testing-packages bectl destroy: leaving origin 'zroot/ROOT/default@2020-02-12-09:26:02-0' intact # bectl list BE Active Mountpoint Space Created default NR / 5.40G 2020-02-02 02:37
A more relevant use case is to create a boot environment before an upgrade, then do the upgrade in the default
environment, and if something goes wrong, like a driver not working as expected any longer, you can revert the whole system back.
FreeBSD is setup so that your home directory and the directories /var/log/
, /var/crash/
, /var/audit/
and /var/mail/
doesn't get affected by the different boot environment. That way you won't find all your logs files reverted, or the files in your home directory reverted. It's only the operating system and third party packages that get reverted then.
Missing /dev/disk/by-id/
Creating pools using the by-id
label on GNU/Linux provides a huge advantage. Not only do you eliminate the possibility of disks switching device names if you need to change a disk and happen to reboot before the old disk has been replaced by a new disk, but you automatically get the serial number of the disk into the label.
$ ls -gG /dev/disk/by-id/ ata-ST31000340NS_9QJ089LF -> ../../sdd ata-ST31000340NS_9QJ0EQ1V -> ../../sdb ata-ST31000340NS_9QJ0F2YQ -> ../../sdc ...
This makes it easier to identify a broken disk. All you need to do is to map the serial number to the number of the slot that the disk is attached to, or you can also put the serial number on a sticker and then attach that to the front of the disk.
On FreeBSD there is no /dev/disk/by-id/
directory, but there is something almost identical called disk_ident
which is located in /dev/diskid/
- when setup correctly.
If you happen to run FreeBSD with ZFS on root then (depending on your setup) the installer might have disabled diskid
and used normal device names like ada1
and ada2
instead. You might also manually have used such device names when you created your pool.
So you might see something like this:
# zpool status pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 errors: No known data errors
It's not a major problem because even if a device name gets switched and ada3
becomes ada2
etc., as soon as you replace the broken device with a new one, ZFS will figure things out. ZFS is really good at keeping track of the disks and it has its own internal identification system.
However, ideally we would like to see something like this:
# zpool status pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 diskid/DISK-WD-WCC7K3NRUYL1 ONLINE 0 0 0 diskid/DISK-WD-WCC7K6CAX8AY ONLINE 0 0 0 diskid/DISK-Z30133A1 ONLINE 0 0 0 diskid/DISK-W300GYTS ONLINE 0 0 0
Some people don't like to use diskid
because occasionally the serial number gets encoded if it contains spaces and it can look really ugly then. However, I haven't personally run into that problem, but even then I still prefer to use diskid
because the serial number is still readable and I don't have to manually provide the disk with a label where I might make a typo without noticing.
People use different approaches and recommend different things and the book FreeBSD Mastery: Advanced ZFS, by Allan Jude and Michael W. Lucas, provides valuable information.
Anyway, in order to get diskid
working you need to make sure it isn't disabled:
# sysctl kern.geom.label.disk_ident.enable kern.geom.label.disk_ident.enable: 0
In this case it is disabled. Put the following into /boot/loader.conf
:
kern.geom.label.disk_ident.enable="1"
Then enable glabel:
geom_label_load="1"
Then reboot.
You will now be able to see disks in /dev/diskid/
. However, you cannot see a disk in "diskid" if it has already been mounted using another GEOM. So if the disk is already mounted using "adaX" then it won't show up in "diskid". This is called "GEOM withering".
If you already have a running ZFS pool created with "adaX" labels you can export the pool, reboot the machine, then have ZFS import the pool using the "diskid" labels using the '-d' option:
# zpool export mypool # reboot
Then after the reboot:
# zpool import -d /dev/diskid/ mypool
Now you have a directory called /dev/diskid/
and it has labels with serial numbers in it:
# ls /dev/diskid/ DISK-W300GYTS DISK-W300GYTSp1 DISK-W300GYTSp9 DISK-WD-WCC7K3NRUYL1 DISK-WD-WCC7K3NRUYL1p1 DISK-WD-WCC7K3NRUYL1p9 DISK-WD-WCC7K6CAX8AY DISK-WD-WCC7K6CAX8AYp1 DISK-WD-WCC7K6CAX8AYp9 DISK-Z30133A1 DISK-Z30133A1p1 DISK-Z30133A1p9
Unfortunately it doesn't show how these IDs are mapped to adaX like it does with by-id
on Linux, but we can get that information manually if we need it.
If we're working with an exported pool we can list the devices we can import:
# zpool import pool: mypool id: 1918994596645956952 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: mypool ONLINE raidz1-0 ONLINE ada1 ONLINE ada2 ONLINE ada3 ONLINE ada4 ONLINE
Then we can map each to the serial number:
# geom disk list ada1|grep ident ident: WD-WCC7K3NRUYL1
So ada1
is WD-WCC7K3NRUYL1
.
However, we don't even need to do that. As mentioned above, ZFS keeps track of disks using its own system. So we can just ask ZFS to only look for disks in a specific path using the -d
option and force an import using these identifiers instead:
# zpool import -d /dev/diskid/ mypool # zpool status pool: mypool state: ONLINE scan: scrub repaired 0 in 0 days 02:01:47 with 0 errors on Thu Jan 30 23:49:42 2020 config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 diskid/DISK-WD-WCC7K3NRUYL1 ONLINE 0 0 0 diskid/DISK-WD-WCC7K6CAX8AY ONLINE 0 0 0 diskid/DISK-Z30133A1 ONLINE 0 0 0 diskid/DISK-W300GYTS ONLINE 0 0 0 errors: No known data errors
This way we have changed the pool from using the adaX
device names to using serial numbers instead.
Useful tools
Some of these tools are not unique to FreeBSD, but I'll address them anyway.
top
With top
you can get useful information about the memory consumption of the ZFS ARC:
$ top last pid: 933; load averages: 0.21, 0.07, 0.02 up 0+00:01:34 04:13:40 28 processes: 1 running, 27 sleeping CPU: 0.2% user, 0.0% nice, 0.5% system, 0.1% interrupt, 99.2% idle Mem: 149M Active, 38M Inact, 332M Wired, 7304M Free ARC: 115M Total, 49M MFU, 64M MRU, 64K Anon, 474K Header, 1965K Other 30M Compressed, 84M Uncompressed, 2.78:1 Ratio Swap: 2048M Total, 2048M Free
If you press C
you'll get the raw CPU mode instead of the weighted CPU mode. If you like to see all the cores you can press P
, which is like pressing 1
on Linux.
last pid: 950; load averages: 0.16, 0.11, 0.04 up 0+00:05:48 04:17:54 28 processes: 1 running, 27 sleeping CPU 0: 0.1% user, 0.0% nice, 0.1% system, 0.1% interrupt, 99.7% idle CPU 1: 0.1% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.8% idle CPU 2: 0.1% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.7% idle CPU 3: 0.0% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle Mem: 149M Active, 39M Inact, 335M Wired, 7299M Free ARC: 118M Total, 49M MFU, 67M MRU, 64K Anon, 485K Header, 2002K Other 31M Compressed, 86M Uncompressed, 2.78:1 Ratio Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 754 root 1 30 0 172M 149M select 3 0:00 0.00% smbd 722 ntpd 2 20 0 18M 18M select 2 0:00 0.00% ntpd 771 root 1 39 0 14M 5380K nanslp 2 0:00 0.00% smartd 924 root 1 21 0 20M 9892K select 2 0:00 0.00% sshd 725 root 1 20 0 11M 2296K select 0 0:00 0.00% powerd
You can also switch to IO display by pressing m
in which case you can monitor how much each process is reading and writing to disk.
last pid: 953; load averages: 0.23, 0.14, 0.05 up 0+00:07:02 04:19:08 28 processes: 1 running, 27 sleeping CPU 0: 0.1% user, 0.0% nice, 0.1% system, 0.1% interrupt, 99.8% idle CPU 1: 0.1% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.8% idle CPU 2: 0.1% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle CPU 3: 0.0% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.8% idle Mem: 149M Active, 39M Inact, 335M Wired, 7298M Free ARC: 118M Total, 49M MFU, 67M MRU, 64K Anon, 485K Header, 2002K Other 31M Compressed, 86M Uncompressed, 2.78:1 Ratio Swap: 2048M Total, 2048M Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 754 root 34 1 36 26 63 125 19.69% smbd 722 ntpd 532 10 26 1 40 67 10.55% ntpd 771 root 2 1 0 1 0 1 0.16% smartd 924 root 26 3 14 3 3 20 3.15% sshd 725 root 1601 3 0 1 0 1 0.16% powerd
camcontrol
You can use camcontrol
to get information about which disk is located as what device and bus:
# camcontrol devlist <ST9120821AS 7.24> at scbus0 target 0 lun 0 (ada0,pass0) <WDC WD40EFRX-68N32N0 82.00A82> at scbus1 target 0 lun 0 (ada1,pass1) <WDC WD40EFRX-68N32N0 82.00A82> at scbus2 target 0 lun 0 (ada2,pass2) <ST4000DX001-1CE168 CC44> at scbus3 target 0 lun 0 (ada3,pass3) <ST4000DM000-1F2168 CC52> at scbus4 target 0 lun 0 (ada4,pass4)
You can also get a lot of other useful information:
# camcontrol identify /dev/ada1 pass1: <WDC WD40EFRX-68N32N0 82.00A82> ACS-3 ATA SATA 3.x device pass1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ACS-3 ATA SATA 3.x device model WDC WD40EFRX-68N32N0 firmware revision 82.00A82 serial number WD-WCC7K3NRUYL1 WWN 50014ee2650a51a8 additional product id cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 4096, offset 0 LBA supported 268435455 sectors LBA48 supported 7814037168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 5400 ...
GEOM
GEOM is a modular disk transformation framework. It permits access and control to classes, such as Master Boot Records and BSD labels, through the use of providers, or the disk devices in /dev
. By supporting various software RAID configurations, GEOM transparently provides access to the operating system and operating system utilities.
You can also use geom
to list the serial number and other useful information. The serial number is listed as ident
:
# geom disk list Geom name: ada1 Providers: 1. Name: ada1 Mediasize: 4000787030016 (3.6T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r1w1e2 descr: WDC WD40EFRX-68N32N0 lunid: 50014ee2650a51a8 ident: WD-WCC7K3NRUYL1 rotationrate: 5400 fwsectors: 63 fwheads: 16
You can read more about GEOM in the FreeBSD manual.
diskinfo
You can also use diskinfo
to get relevant information about a disk:
# diskinfo -v ada1 ada1 512 # sectorsize 4000787030016 # mediasize in bytes (3.6T) 7814037168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 7752021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. WDC WD40EFRX-68N32N0 # Disk descr. WD-WCC7K3NRUYL1 # Disk ident. No # TRIM/UNMAP support 5400 # Rotation rate in RPM Not_Zoned # Zone Mode
gpart
On FreeBSD fdisk
has been deprecated and replaced with gpart
. If you need to see the partition table of a specific disk, you can use gpart
:
# gpart show ada0 => 40 234441568 ada0 GPT (112G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 4194304 2 freebsd-swap (2.0G) 4196352 230244352 3 freebsd-zfs (110G) 234440704 904 - free - (452K)
gstat
You most likely already know about iostat
, which is a well know tool that reports I/O statistics.
Another great tool is gstat
that combined with the -p
option provides GEOM I/O statistics in one second intervals. Right now my disks are not doing anything:
# gstat -p dT: 1.011s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0| ada0 0 0 0 0 0.0 0 0 0.0 0.0| ada1 0 0 0 0 0.0 0 0 0.0 0.0| ada2 0 0 0 0 0.0 0 0 0.0 0.0| ada3 0 0 0 0 0.0 0 0 0.0 0.0| ada4
zdb
Another really useful tool is the zdb utility.
It has a ton of options, but it is important to note that zdb
is a not a general purpose tool and options may change. The output of zdb
reflects the on-disk structure of a ZFS pool, and is inherently unstable. The precise output of most invocations is not documented, a knowledge of ZFS internals is assumed.
We can get a lot of interesting information:
# zdb -C mypool MOS Configuration: version: 5000 name: 'mypool' state: 0 txg: 32087 pool_guid: 1918994596645956952 hostid: 3865062308 hostname: 'foo' com.delphix:has_per_vdev_zaps vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 1918994596645956952 create_txg: 4 children[0]: type: 'raidz' id: 0 guid: 1652892640172304252 nparity: 1 metaslab_array: 70 metaslab_shift: 37 ashift: 12 asize: 16003128885248 is_log: 0 create_txg: 4 com.delphix:vdev_zap_top: 65 children[0]: type: 'disk' id: 0 guid: 14433036235128068445 path: '/dev/diskid/DISK-WD-WCC7K3NRUYL1' whole_disk: 1 DTL: 170 create_txg: 4 com.delphix:vdev_zap_leaf: 66 children[1]: type: 'disk' id: 1 guid: 15124204927999292914 path: '/dev/diskid/DISK-WD-WCC7K6CAX8AY' whole_disk: 1 DTL: 169 create_txg: 4 com.delphix:vdev_zap_leaf: 67 children[2]: type: 'disk' id: 2 guid: 13432944061349488304 path: '/dev/diskid/DISK-Z30133A1' whole_disk: 1 DTL: 168 create_txg: 4 com.delphix:vdev_zap_leaf: 68 children[3]: type: 'disk' id: 3 guid: 4262499248131338058 path: '/dev/diskid/DISK-W300GYTS' whole_disk: 1 DTL: 167 create_txg: 4 com.delphix:vdev_zap_leaf: 69 features_for_read: com.delphix:hole_birth com.delphix:embedded_data
We can also get a history of what's been done to the pool:
# zdb -h mypool History: 2020-01-26.03:56:49 zpool create mypool raidz1 /dev/diskid/DISK-WD-WCC7K3NRUYL1 /dev/diskid/DISK-WD-WCC7K6CAX8AY /dev/diskid/DISK-Z30133A1 /dev/diskid/DISK-W300GYTS 2020-01-26.03:58:04 zfs create -o compress=lz4 mypool/pub 2020-01-30.21:48:04 zpool scrub mypool
We can also display some basic dataset information about our pool:
# zdb -d mypool Dataset mos [META], ID 0, cr_txg 4, 7.87M, 167 objects Dataset mypool/pub [ZPL], ID 85, cr_txg 19, 2.47T, 4339 objects Dataset mypool [ZPL], ID 51, cr_txg 1, 128K, 8 objects Verified large_blocks feature refcount of 0 is correct Verified large_dnode feature refcount of 0 is correct Verified sha512 feature refcount of 0 is correct Verified skein feature refcount of 0 is correct Verified device_removal feature refcount of 0 is correct Verified indirect_refcount feature refcount of 0 is correct
Final notes
This minor write up hasn't come anywhere near all the great stuff that FreeBSD has to offer when you run ZFS on FreeBSD, but I hope that it at least has presented the option of running ZFS on FreeBSD as not only a viable alternative to Linux, but also as an advantage.
Running ZFS on FreeBSD is not only less of a hassle, especially if you want to run ZFS on root, but it also provides better tooling, integration and easier administration. It also has good documentation for many of the tuneable options and an experienced and friendly community.