Saturday, October 8, 2011

Installing full-ZFS server at OVH

OVH is a french web hosting service in France, that provides dedicated servers (and many other things). It is great because they offer an infrastructure which brings you low-cost but yet professional facilities for less than 20 euros a month. They provide Linux, FreeBSD and even Solaris: you can of course ask for your server to be installed with this but the real great thing is the netboot feature that will boot the same OS as the one which is installed on your server.

The FreeBSD installation is UFS based. It is nonetheless possible to migrate in on ZFS with little wizardry. It is best the do this with a fresh installation, but is should be possible to do so as long as you use less than the half of you hard drive. However, you have to move everything into the first physical half of the hard drive (it is easier when the server has just been installed, as you just have to keep the root partition).

The procedure is the following: You move all your data in a (small) transient partition at the end of the disk. Then a ZFS partition is created at the beginning of the disk, and again move your data there. You can then destroy the transient partition and create a physical swap partition in its stead. Indeed, although FreeBSD can use a ZFS vdev as swap, it cannot dump to it. Therefore this procedure creates a real partition for swap.

Your FreeBSD is booted. Go to the OVH manager and in the "Netboot" page and select "rescue-pro". Then reboot you server, wait for a while, you should receive a mail with the root password of your netbooted server.

Once connected on it, create a partition large enough to hold all your data at the end of the hard drive. We will copy them here in order to be able to install the ZFS partition at the beginning of the disk.


rescue-bsd# fdisk /dev/ad0
******* Working on device /dev/ad0 *******
[...]

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 488397105 (238475 Meg), flag 80 (active)
beg: cyl 1/ head 0/ sector 1;
end: cyl 655/ head 0/ sector 63
The data for partition 2 is:

The data for partition 3 is:

The data for partition 4 is:


rescue-bsd# bsdlabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 20971520 0 4.2BSD 4096 16384 64
b: 2097152 20971520 swap
c: 488397105 0 unused 0 0 # "raw" part, don't edit
d: 465328433 23068672 4.2BSD 0 0 0

rescue-bsd# mount /dev/ad0s1a /mnt/
rescue-bsd# df -k /mnt/
Filesystem 1024-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 10154158 495960 8845866 5% /mnt
rescue-bsd# umount /mnt/


We have a swap partition on /dev/ad0s1b and an empty filesystem on /dev/ad0s1d. The root partition only uses 500 MB. We are going to create a partition at the end of the disk to copy the content of it. Thus this partition must be large enough. But this partition should also large enough to hold the swap partition you want on your system eventually. In this example I want 1 GB of swap.

So let's create a 1 GB partition at the end of the disk. 1 GB is 1024*1024*1024/512 = 2097152 sectors. The disk is 488397105 sectors wide, so the partition would start at 488397105 - 2097152 = 486299953. A good practice is to align the partition to a 4 KB boundary: 486299953 % 4096 = 2353, so we will use 486299953 - 2353 = 486297600 for first sector of the partition. Given the end of the disk is 488397105, the partition size will be 488397105 - 486297600 = 2099505 sectors.


rescue-bsd# bsdlabel /dev/ad0s1 > /tmp/ad0.label
rescue-bsd# vi /tmp/ad0.label
[...]

rescue-bsd# cat /tmp/ad0.label
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 20971520 0 4.2BSD 4096 16384 64
b: 2099505 486297600 4.2BSD 4096 16384 64
c: 488397105 0 unused 0 0 # "raw" part, don't edit

rescue-bsd# bsdlabel -R /dev/ad0s1 /tmp/ad0.label
rescue-bsd# newfs /dev/ad0s1b
/dev/ad0s1b: 1025.1MB (2099504 sectors) block size 16384, fragment size 2048
using 6 cylinder groups of 183.77MB, 11761 blks, 23552 inodes.
super-block backups (for fsck -b #) at:
160, 376512, 752864, 1129216, 1505568, 1881920

rescue-bsd# mount /dev/ad0s1b /mnt
rescue-bsd# cd /mnt
rescue-bsd# dump -0af - /dev/ad0s1a | restore -rf -
DUMP: Date of this level 0 dump: Sat Oct 8 10:07:58 2011
DUMP: Date of last level 0 dump: the epoch
DUMP: Dumping /dev/ad0s1a to standard output
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 499089 tape blocks.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
[...]
DUMP: finished in 131 seconds, throughput 3816 KBytes/sec
DUMP: DUMP IS DONE
rescue-bsd# cd
rescue-bsd# umount /mnt


Now we can remove the first partition and create a big ZFS partition spanning from the beginning of the disk to the beginning of the second partition we have just created.


rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 20971520 1 freebsd-ufs (10G)
20971520 465326080 - free - (222G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart delete -i 1 ad0s1
ad0s1a deleted
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 - free - (232G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart add -s 486297600 -t freebsd-zfs ad0s1
ad0s1a added
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-ufs (1.0G)


Now let's create the ZFS pool. But the OVH netboot only provides a read-only root filesystem, so we have to tell zpool(8) to put the cache file into /tmp (this file will be needed to import the pool at boot time). We must also to tell the ZFS layer to temporary mount the pool into /mnt, so it won't try to mount the root of the pool as /.


rescue-bsd# kldload opensolaris
rescue-bsd# kldload zfs
rescue-bsd# zpool create -o cachefile=/tmp/zpool.cache -o altroot=/mnt zroot /dev/ad0s1a
rescue-bsd# zpool export zroot


Install the various bootcodes. The MBR bootcode should already be there anyway. The ZFS bootcode is somewhat strange because it consists actually of two parts that must be written at different places (note that the first dd(1) uses /dev/ad0s1 while the second one uses /dev/ad0s1a):

rescue-bsd# gpart bootcode -b /boot/boot0 ad0
bootcode written to ad0
rescue-bsd# dd if=/boot/zfsboot of=/dev/ad0s1 count=1 bs=512
rescue-bsd# dd if=/boot/zfsboot of=/dev/ad0s1b skip=1 seek=1024 bs=512


Then re-import the pool with the same options used during its creation and create the datasets for the base filesystem (I'm using the same layout as described on the FreeBSD wiki):

rescue-bsd# zpool import -o cachefile=/tmp/zpool.cache -o altroot=/mnt zroot
rescue-bsd# zfs set checksum=fletcher4 zroot
rescue-bsd# zfs set mountpoint=none zroot
rescue-bsd# zfs create -o mountpoint=/ zroot/rootfs
rescue-bsd# zpool set bootfs=zroot/rootfs zroot
rescue-bsd# zfs create -o compression=on -o exec=on -o setuid=off zroot/rootfs/tmp
rescue-bsd# chmod 1777 /mnt/tmp/
rescue-bsd# zfs create zroot/rootfs/usr
rescue-bsd# zfs create zroot/rootfs/usr/home
rescue-bsd# ln -s /usr/home /mnt/home
rescue-bsd# zfs create -o compression=lzjb -o setuid=off zroot/rootfs/usr/ports
rescue-bsd# zfs create -o compression=off -o exec=off -o setuid=off zroot/rootfs/usr/ports/distfiles
rescue-bsd# zfs create -o compression=off -o exec=off -o setuid=off zroot/rootfs/usr/ports/packages
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/usr/src
rescue-bsd# zfs create zroot/rootfs/var
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/var/crash
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/db
rescue-bsd# zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/rootfs/var/db/pkg
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/empty
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/var/log
rescue-bsd# zfs create -o compression=gzip -o exec=off -o setuid=off zroot/rootfs/var/mail
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/run
rescue-bsd# zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/rootfs/var/tmp
rescue-bsd# chmod 1777 /mnt/var/tmp


Note that I've activated compression on some datasets as in the FreeBSD wiki, but on a low-end box with little CPU power, I advise to turn it off.

Now let's copy our data to the ZFS partition.

rescue-bsd# mount /dev/ad0s1b /media
rescue-bsd# cd /media
rescue-bsd# find . | cpio -dump /mnt/
rescue-bsd# cd
rescue-bsd# umount /media
rescue-bsd# zfs set readonly=on zroot/rootfs/var/empty


Note that cpio(1) does not handle file flags set by chflags(8). Your system will be able to boot, but some security seatbelt won't be here until you perform an installworld.

Let's create the swap partition instead of the transient UFS filesystem:

rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart delete -i 2 ad0s1
ad0s1b deleted
rescue-bsd# gpart add -t freebsd-swap ad0s1
ad0s1b added
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-swap (1.0G)


Now we need to configure the system to be able to boot from ZFS:


rescue-bsd# echo 'zfs_load="YES"' > /mnt/boot/loader.conf
rescue-bsd# echo 'vfs.root.mountfrom="zfs:zroot/rootfs"' >> /mnt/boot/loader.conf
rescue-bsd# cp /tmp/zpool.cache /mnt/boot/zfs/
rescue-bsd# vi /mnt/etc/fstab
[...]
rescue-bsd# cat /mnt/etc/fstab
# Device Mountpoint FStype Options Dump Pass#
/dev/ad0s1b none swap sw 0 0
proc /proc procfs rw 0 0


Et voilĂ ! You can reboot your server (don't forget to deactivate netbooting from the OVH web interface).