Showing posts with label FreeBSD. Show all posts
Showing posts with label FreeBSD. Show all posts

Sunday, July 27, 2014

Installing FreeBSD on a Dedibox (online.net)

Created: 2014/07/27
Updated: 2014/08/12 - higher-end Dedibox
Updated: 2014/08/14 - software RAID-1

My first server was a Dedibox. I then switched to OVH's Kimsufi (for the anecdote, "Kimsufi" sounds like "enough for me" in French) which at the time was more attractive (15 EUR/month instead of 20).  My setup then evolved to make use of failover IPs (an extra IP address which you can switch from one OVH server to another).

But now my Kimsufi servers are getting old and slow or expensive, depending for which side you're looking at it and I'd like to upgrade them. OVH has very attractive prices like 5 EUR/month for the cheapest one (at least theoritically given I've never seen them available despite them constantly announcing the last server was shipped a few hours ago).  For the price I pay today overall, I could get an i5 CPU on the main server and a cheap and low-grade server for the failover...

... if only OVH was still proposing failover IPs on Kimsufi grade servers. The Kimsufi offer has spun off to really focus on home-servers where -- I guess -- people don't need such fancy services. If you want to have a failover IP at OVH now, you need to get an enterprise grade server which costs at least 80 EUR/month.  I guess this move has been motivated by the IPv4 addresses crunch.

Anyway.  I'm now going back to Dedibox which still offers failover IPs. Also they offer access to the server's console through a Java applet, which can be super useful. Something OVH did back in the time but then removed. However Dedibox don't offer FreeBSD by default (which Kumsufi did at the time, and maybe still do) so you have to install it yourself. Here is how:

  • Boot an Ubuntu rescue system, login and run sudo to be root.
  • Download a FreeBSD release or snapshot ISO; 10-STABLE can be found here: http://ftp2.fr.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/10.0/
  • Install QEMU:
    apt-get update && apt-get install qemu-kvm
    
  • Start QEMU with a VNS server, attached to the raw disk, booting on the ISO:
    qemu-system-x86_64 -no-kvm -hda /dev/sda -cdrom FreeBSD-*.iso -net nic,model=e1000 -vnc :1 -boot d
    or, if you have two disks:
    qemu-system-x86_64 -no-kvm -hda /dev/sda -hdb /dev/sdb -cdrom FreeBSD-*.iso -net nic,model=e1000 -vnc :1 -boot d
  • Connect in using VNC:
    xvncviewer ${server_ip}:1
Now you can install FreeBSD. The only thing is that the bootloader won't be installed (correctly?) for some unknown reason. So you need to do it yourself.
I couldn't come up with the a way to partition the server the way I want with the bsdinstaller, so I typically switch to the console (Alt-F4) right after choosing the keymap to make it using gpart. I'll describe it here for the record, so it will save me from research the next time I'll do it.

Here is the partitioning scheme:
  • We're in 2014, so we're using GPT boot, which requires a small partition for the bootloader.
  • I want the base system in UFS, mainly because it's less brittle to deal with remotely and you can use nextboot(8) fully (the new kernel will be booted only once and they the previous one will be reinstated, so you can try kernels); although all this is less mandatory if you have a working console as with Dedibox.
  • A small swap partition.
  • The remaining in ZFS.

You have a single disk or a hardware RAID controller

Here is how to do it from scratch:

###
### Setup partitions
###
# dd if=/dev/zero of=/dev/ada0 bs=64k count=128
128+0 records in
128+0 records out
8388608 bytes transferred in 0.042453 secs (197599244 bytes/sec)
# gpart show
# gpart create -s GPT ada0
ada0 created
# gpart add -t freebsd-boot -s 64k -i 1 ada0
ada0p1 added
# gpart add -t freebsd-ufs -s 20G -i 2 ada0
ada0p2 added
# gpart add -t freebsd-swap -s 2G -i 3 ada0
ada0p3 added
# gpart add -t freebsd-zfs -i 4 ada0
ada0p4 added
###
### Create the filesystems
###
# newfs -j /dev/ada0p2
[...]
# zpool create tank /dev/ada0p4
cannot mount '/tank': failed to create mountpoint (this is expected and harmless)
###
### Install the bootcode manually as it will fail for some reason
###
# gpart bootcode -b /boot/pbmr -p /boot/gptboot -i 1 ada0
bootcode written to ada0

Now go ahead and install FreeBSD.

You want to software RAID-1

Note that we use geom_mirror for the first three partition, and ZFS mirroring for the pool as I think it has better performances.

Also, note that the current bsdinstaller does not seem to understand what is a geom_mirror device and wants us to create a partitioning scheme on it.  We will therefore mount the partition to /mnt manually.

###
### Setup partitions on the first disk, ada0
###
# dd if=/dev/zero of=/dev/ada0 bs=64k count=128
128+0 records in
128+0 records out
8388608 bytes transferred in 0.042453 secs (197599244 bytes/sec)
# gpart show
# gpart create -s GPT ada0
ada0 created
# gpart add -t freebsd-boot -s 64k -i 1 ada0
ada0p1 added
# gpart add -t freebsd-ufs -s 20G -i 2 ada0
ada0p2 added
# gpart add -t freebsd-swap -s 2G -i 3 ada0
ada0p3 added
# gpart add -t freebsd-zfs -i 4 ada0
ada0p4 added
###
### Now duplicate those steps for ada1.
###
[...]
###
### Now create the mirror
###
# kldload geom_mirror
# gmirror label gm-root ada0p2 ada1p2
# gmirror label gm-swap ada0p3 ada1p3
###
### Create the filesystems
###
# newfs -j /dev/ada0p2
[...]
# zpool create tank mirror /dev/ada0p4 /dev/ada1p4
cannot mount '/tank': failed to create mountpoint (this is expected and harmless)
###
### Install the bootcode manually as it will fail for some reason
###
# gpart bootcode -b /boot/pbmr -p /boot/gptboot -i 1 ada0
bootcode written to ada0
# gpart bootcode -b /boot/pbmr -p /boot/gptboot -i 1 ada1
bootcode written to ada1
###
### Mount the partition to install FreeBSD
###
# mount /dev/mirror/gm-root /mnt

Now when the installer asks you about the partitioning, select "Shell" and FreeBSD will be installed to /mnt once you exit this shell.

End of installation

A few notes (some for myself):
  • The default router in online.net's network is .1.
  • Add the following lines to /etc/rc.conf
  • sendmail_submit_enable=NO
    sendmail_outbound_enable=NO
    sendmail_msp_queue_enable=NO
    
  • YMMV but on QEMU I have an em0 interface, but on the real server it can be igb0 or bce0; so I need to change my /etc/rc.conf accordingly. What I typically do if I'm not sure, is duplicate the ifconfig_em0 line to ifconfig_igb0 and ifconfig_bce0.
  • Add a user or enable root login on sshd.
  • If you are using software RAID, add geom_mirror_load="YES" to /boot/loader.conf.
  • If your server has a PERC h200 controller, the disk won't be /dev/ada0 but /dev/da0, even on FreeBSD10, so change /etc/fstab accordingly.
  • Add the swap to /etc/fstab.
Small example of what /etc/fstab look like (with a software mirror in that case, but you cannot boot on a mirror, the subsystem is not there yet at boot, so you need to pick one of the disk):
/dev/mirror/gm-root     /               ufs     rw      1       1
/dev/mirror/gm-swap     none            swap    sw      0       0
Now before trying to boot FreeBSD on the real server, just try it on QEMU, this will save you some time if you missed something. First umount cleanly your disks and then kill QEMU from the host. Now re-run it using:
qemu-system-x86_64 -no-kvm -hda /dev/sda -net nic,model=e1000 -vnc :1 -boot c
This should go to the boot pompt. Shut down cleanly. Then you can try to boot it on the real server. If it does not come only, you can still use the Java console to debug it (for example, the interface name may not be correct).

Tuesday, March 6, 2012

Portmaster options combo to upgrade FreeBSD ports

Some years ago I was using the famous Portupgrade to maintain my ports. This software is mature, very powerful and easy to use. Unfortunately its dependency on Ruby makes it really cumbersome, especially because I have many jails.

Therefore when Doug Barton began Portmaster, which is written is shell and does more or less the same thing (well, actually less, but I can live with it), I was quite eager to use it. One thing I didn't like from the beginning with Postmaster was that it is not able to work alone: it is constantly asking things. Of course there are options to disable this, but this leads to me the second problem: they are not intuitive! (at least for me...)

After some struggle, I finally managed to find the options I always want to use and I'm writing them as a reminder and in the hope to help someone else in the same hassle:

# portmaster -dBGm BATCH=1 --no-confirm --delete-packages -a

Here are the details:

  • I'm using portconf to configure the ports' build knobs, so I don't want to run the configuration or to be asked something about it. Just use the defaults unless I told otherwise: -G -m BATCH=1;
  • Don't create a backup package, I'm not running any financial application: -B;
  • Don't ask me if the distfiles must be cleaned, just do it: -d;
  • Don't ask me if I really want to upgrade my ports, I already executed the command proving it: --no-confirm;
  • Remove packages once installed: --delete-packages;
  • Upgrade everything: -a, but you might not want to ugprade everything at once so you can replace this with one or more port name.

Saturday, October 8, 2011

Installing full-ZFS server at OVH

OVH is a french web hosting service in France, that provides dedicated servers (and many other things). It is great because they offer an infrastructure which brings you low-cost but yet professional facilities for less than 20 euros a month. They provide Linux, FreeBSD and even Solaris: you can of course ask for your server to be installed with this but the real great thing is the netboot feature that will boot the same OS as the one which is installed on your server.

The FreeBSD installation is UFS based. It is nonetheless possible to migrate in on ZFS with little wizardry. It is best the do this with a fresh installation, but is should be possible to do so as long as you use less than the half of you hard drive. However, you have to move everything into the first physical half of the hard drive (it is easier when the server has just been installed, as you just have to keep the root partition).

The procedure is the following: You move all your data in a (small) transient partition at the end of the disk. Then a ZFS partition is created at the beginning of the disk, and again move your data there. You can then destroy the transient partition and create a physical swap partition in its stead. Indeed, although FreeBSD can use a ZFS vdev as swap, it cannot dump to it. Therefore this procedure creates a real partition for swap.

Your FreeBSD is booted. Go to the OVH manager and in the "Netboot" page and select "rescue-pro". Then reboot you server, wait for a while, you should receive a mail with the root password of your netbooted server.

Once connected on it, create a partition large enough to hold all your data at the end of the hard drive. We will copy them here in order to be able to install the ZFS partition at the beginning of the disk.


rescue-bsd# fdisk /dev/ad0
******* Working on device /dev/ad0 *******
[...]

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 488397105 (238475 Meg), flag 80 (active)
beg: cyl 1/ head 0/ sector 1;
end: cyl 655/ head 0/ sector 63
The data for partition 2 is:

The data for partition 3 is:

The data for partition 4 is:


rescue-bsd# bsdlabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 20971520 0 4.2BSD 4096 16384 64
b: 2097152 20971520 swap
c: 488397105 0 unused 0 0 # "raw" part, don't edit
d: 465328433 23068672 4.2BSD 0 0 0

rescue-bsd# mount /dev/ad0s1a /mnt/
rescue-bsd# df -k /mnt/
Filesystem 1024-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 10154158 495960 8845866 5% /mnt
rescue-bsd# umount /mnt/


We have a swap partition on /dev/ad0s1b and an empty filesystem on /dev/ad0s1d. The root partition only uses 500 MB. We are going to create a partition at the end of the disk to copy the content of it. Thus this partition must be large enough. But this partition should also large enough to hold the swap partition you want on your system eventually. In this example I want 1 GB of swap.

So let's create a 1 GB partition at the end of the disk. 1 GB is 1024*1024*1024/512 = 2097152 sectors. The disk is 488397105 sectors wide, so the partition would start at 488397105 - 2097152 = 486299953. A good practice is to align the partition to a 4 KB boundary: 486299953 % 4096 = 2353, so we will use 486299953 - 2353 = 486297600 for first sector of the partition. Given the end of the disk is 488397105, the partition size will be 488397105 - 486297600 = 2099505 sectors.


rescue-bsd# bsdlabel /dev/ad0s1 > /tmp/ad0.label
rescue-bsd# vi /tmp/ad0.label
[...]

rescue-bsd# cat /tmp/ad0.label
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 20971520 0 4.2BSD 4096 16384 64
b: 2099505 486297600 4.2BSD 4096 16384 64
c: 488397105 0 unused 0 0 # "raw" part, don't edit

rescue-bsd# bsdlabel -R /dev/ad0s1 /tmp/ad0.label
rescue-bsd# newfs /dev/ad0s1b
/dev/ad0s1b: 1025.1MB (2099504 sectors) block size 16384, fragment size 2048
using 6 cylinder groups of 183.77MB, 11761 blks, 23552 inodes.
super-block backups (for fsck -b #) at:
160, 376512, 752864, 1129216, 1505568, 1881920

rescue-bsd# mount /dev/ad0s1b /mnt
rescue-bsd# cd /mnt
rescue-bsd# dump -0af - /dev/ad0s1a | restore -rf -
DUMP: Date of this level 0 dump: Sat Oct 8 10:07:58 2011
DUMP: Date of last level 0 dump: the epoch
DUMP: Dumping /dev/ad0s1a to standard output
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 499089 tape blocks.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
[...]
DUMP: finished in 131 seconds, throughput 3816 KBytes/sec
DUMP: DUMP IS DONE
rescue-bsd# cd
rescue-bsd# umount /mnt


Now we can remove the first partition and create a big ZFS partition spanning from the beginning of the disk to the beginning of the second partition we have just created.


rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 20971520 1 freebsd-ufs (10G)
20971520 465326080 - free - (222G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart delete -i 1 ad0s1
ad0s1a deleted
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 - free - (232G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart add -s 486297600 -t freebsd-zfs ad0s1
ad0s1a added
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-ufs (1.0G)


Now let's create the ZFS pool. But the OVH netboot only provides a read-only root filesystem, so we have to tell zpool(8) to put the cache file into /tmp (this file will be needed to import the pool at boot time). We must also to tell the ZFS layer to temporary mount the pool into /mnt, so it won't try to mount the root of the pool as /.


rescue-bsd# kldload opensolaris
rescue-bsd# kldload zfs
rescue-bsd# zpool create -o cachefile=/tmp/zpool.cache -o altroot=/mnt zroot /dev/ad0s1a
rescue-bsd# zpool export zroot


Install the various bootcodes. The MBR bootcode should already be there anyway. The ZFS bootcode is somewhat strange because it consists actually of two parts that must be written at different places (note that the first dd(1) uses /dev/ad0s1 while the second one uses /dev/ad0s1a):

rescue-bsd# gpart bootcode -b /boot/boot0 ad0
bootcode written to ad0
rescue-bsd# dd if=/boot/zfsboot of=/dev/ad0s1 count=1 bs=512
rescue-bsd# dd if=/boot/zfsboot of=/dev/ad0s1b skip=1 seek=1024 bs=512


Then re-import the pool with the same options used during its creation and create the datasets for the base filesystem (I'm using the same layout as described on the FreeBSD wiki):

rescue-bsd# zpool import -o cachefile=/tmp/zpool.cache -o altroot=/mnt zroot
rescue-bsd# zfs set checksum=fletcher4 zroot
rescue-bsd# zfs set mountpoint=none zroot
rescue-bsd# zfs create -o mountpoint=/ zroot/rootfs
rescue-bsd# zpool set bootfs=zroot/rootfs zroot
rescue-bsd# zfs create -o compression=on -o exec=on -o setuid=off zroot/rootfs/tmp
rescue-bsd# chmod 1777 /mnt/tmp/
rescue-bsd# zfs create zroot/rootfs/usr
rescue-bsd# zfs create zroot/rootfs/usr/home
rescue-bsd# ln -s /usr/home /mnt/home
rescue-bsd# zfs create -o compression=lzjb -o setuid=off zroot/rootfs/usr/ports
rescue-bsd# zfs create -o compression=off -o exec=off -o setuid=off zroot/rootfs/usr/ports/distfiles
rescue-bsd# zfs create -o compression=off -o exec=off -o setuid=off zroot/rootfs/usr/ports/packages
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/usr/src
rescue-bsd# zfs create zroot/rootfs/var
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/var/crash
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/db
rescue-bsd# zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/rootfs/var/db/pkg
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/empty
rescue-bsd# zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/rootfs/var/log
rescue-bsd# zfs create -o compression=gzip -o exec=off -o setuid=off zroot/rootfs/var/mail
rescue-bsd# zfs create -o exec=off -o setuid=off zroot/rootfs/var/run
rescue-bsd# zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/rootfs/var/tmp
rescue-bsd# chmod 1777 /mnt/var/tmp


Note that I've activated compression on some datasets as in the FreeBSD wiki, but on a low-end box with little CPU power, I advise to turn it off.

Now let's copy our data to the ZFS partition.

rescue-bsd# mount /dev/ad0s1b /media
rescue-bsd# cd /media
rescue-bsd# find . | cpio -dump /mnt/
rescue-bsd# cd
rescue-bsd# umount /media
rescue-bsd# zfs set readonly=on zroot/rootfs/var/empty


Note that cpio(1) does not handle file flags set by chflags(8). Your system will be able to boot, but some security seatbelt won't be here until you perform an installworld.

Let's create the swap partition instead of the transient UFS filesystem:

rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-ufs (1.0G)

rescue-bsd# gpart delete -i 2 ad0s1
ad0s1b deleted
rescue-bsd# gpart add -t freebsd-swap ad0s1
ad0s1b added
rescue-bsd# gpart show ad0s1
=> 0 488397105 ad0s1 BSD (233G)
0 486297600 1 freebsd-zfs (232G)
486297600 2099505 2 freebsd-swap (1.0G)


Now we need to configure the system to be able to boot from ZFS:


rescue-bsd# echo 'zfs_load="YES"' > /mnt/boot/loader.conf
rescue-bsd# echo 'vfs.root.mountfrom="zfs:zroot/rootfs"' >> /mnt/boot/loader.conf
rescue-bsd# cp /tmp/zpool.cache /mnt/boot/zfs/
rescue-bsd# vi /mnt/etc/fstab
[...]
rescue-bsd# cat /mnt/etc/fstab
# Device Mountpoint FStype Options Dump Pass#
/dev/ad0s1b none swap sw 0 0
proc /proc procfs rw 0 0


Et voilĂ ! You can reboot your server (don't forget to deactivate netbooting from the OVH web interface).

Thursday, June 30, 2011

Configuring FreeBSD with dual console

This post is short as I intend to use it more as a reminder than a full-fledged article.

As an introduction for the un-educated reader, here is a simple paste of the boot(8) manpage:


By default, a three-stage bootstrap is employed, and control is automati-
cally passed from the boot blocks (bootstrap stages one and two) to a
separate third-stage bootstrap program, loader(8). This third stage pro-
vides more sophisticated control over the booting process than it is pos-
sible to achieve in the boot blocks, which are constrained by occupying
limited fixed space on a given disk or slice.


In summary: boot0 -> boot2 -> loader -> kernel

The first stage (boot0) cannot be configured, as the code as to fit in 512 bytes. It will simply use the default system console (the screen).

However the following things can be configured more or less independently:
- boot2 (stage 2);
- loader(8) (stage 3);
- the kernel;
- login(8).

Configuring boot2



boot2 is configured through /boot.config. This file contains the flags documented in boot(8), as though they were given on the boot2 prompt. Therefore if you want to see boot2 output on both your screen and the serial console, you have to put "-D" in it.


shell# cat /boot.config
-D


Configuring loader(8)



loader(8) is configured through /boot/loader.conf. The console variable can be set either "vidconsole", "comconsole" or "vidconsole,comconsole" to have both "comconsole,vidconsole" works too, we will see the difference later)


shell# grep ^console /boot/loader.conf
console="vidconsole,comconsole"


Configuring the kernel



/boot/loader.conf also contains variables that will set kenv variables, which will define the kernel behaviour. See this comment in /boot/defaults/loader.conf:


##############################################################
### Kernel settings ########################################
##############################################################

# The following boot_ variables are enabled by setting them to any value.
# Their presence in the kernel environment (see kenv(1)) has the same
# effect as setting the given boot flag (see boot(8)).

#boot_askname="" # -a: Prompt the user for the name of the root device
#boot_cdrom="" # -C: Attempt to mount root file system from CD-ROM
#boot_ddb="" # -d: Instructs the kernel to start in the DDB debugger
#boot_dfltroot="" # -r: Use the statically configured root file system
#boot_gdb="" # -g: Selects gdb-remote mode for the kernel debugger
#boot_multicons="" # -D: Use multiple consoles
#boot_mute="" # -m: Mute the console
#boot_pause="" # -p: Pause after each line during device probing
#boot_serial="" # -h: Use serial console
#boot_single="" # -s: Start system in single-user mode
#boot_verbose="" # -v: Causes extra debugging information to be printed
#init_path="/sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init:/stand/sysinstall"
# Sets the list of init candidates
#init_shell="/bin/sh" # The shell binary used by init(8).
#init_script="" # Initial script to run by init(8) before chrooting.
#init_chroot="" # Directory for init(8) to chroot into.



So basically, the kernel defaults to use the screen only, but you can override this by setting the boot_multicons variable:


shell# grep ^boot_multicons /boot/loader.conf
boot_multicons="YES"


How the whole stuff works



Actually when you configure one stage, subsequent stages will use the same settings unless configured to do differently. So in the end you just have to configure boot2.

Userland output



Contrary to the other parts, userland boot output can only be sent to one device at time. Even when configured with the above settings, the userland boot output will only appear on screen.

Actually, the kernel will pick the first entry from the console kenv variable to sent userland output to. So if you are not often behind the screen and you prefer to see the userland boot output on the serial console:


shell# grep ^console /boot/loader.conf
console="comconsole,vidconsole"


Configuring login(8)



Not seeing the userland boot output on one console or the other doesn't mean it is unusable. FreeBSD is configured by default to spawn a login: prompt on the screen. You can easily configure it to spawn another one one the serial console, as explained in this chapter on the handbook:


shell# grep ttyu0 /etc/ttys
ttyu0 "/usr/libexec/getty std.9600" dialup on secure



That's all.

Monday, June 23, 2008

Chicken and egg problem with Propolice in runtime linker/loader

Some background first: Back in 2006, I was frustrated because FreeBSD was somewhat lagging behind other open-source operating systems in term of integrated security features. One of them is a GCC extension originally named Propolice or SSP for Stack Smashing Protection. As its name lets sound, it protects (very efficiently) against stack based buffer overflows. Historically Propolice has been developed by Hiroaki Etoh at IBM for gcc-2.95.3 and then gcc-3.4.4 as an external patch, but it has now been included in the mainstream, starting at gcc-4.1. The patch to integrate Propolice in FreeBSD has been existing for more than two years on my website, but then FreeBSD only provided gcc-3.4.4 and heavily patching a contributed software is ruled out by policy, so it couldn't be committed in FreeBSD-6. I missed the FreeBSD-7 window for various reasons, and now I'm working to get it committed to FreeBSD-8 (aka CURRENT).

How does Propolice work? The compiler identifies functions that might be vulnerable (containing a stack based buffer) and during their prologue, pushes a one-word canary between the return address stored in the stack and the local variables. In the function's epilogue, the canary is checked against its original value and if it has changed then a buffer overflow occurred and the program is aborted. The canary is initially in the BSS segment but is initialized to a random value by a function called during the program startup (namely, a constructor). Both the canary and the initializer function are provided in FreeBSD's libc.

When I sent the patch for review back in april, Antoine Brodin noticed that when build world is performed with -fstack-protector-all (which makes GCC to protect all functions instead of only those containing a local buffer), it breaks the whole system. There were actually various problems, such as the
initializer function being protected itself: during its prologue the canary was equal to zero but during the epilogue its value had been set to a random value meanwhile so obviously the saved value did't match... This problem has been resolved quickly. The nasty problem lay in the runtime loader (aka rtld-elf): once it was installed, all programs would fail with SIGSEGV.

When a dynamically-linked program is run, the kernel always transfers control to rtld behing the scene, instead of the actual program. The purpose is to do runtime linking of libraries needed by the program, which includes resolving symbols and performing relocations, before actually transfering control to it. So I've recompiled rtld without SSP, but it was still crashing. I've narrowed down the segfault to a call mmap(2) which turned out to be the first call into libc, against which rtld was statically linked. One of the very first thing rtld has to do is to relocate itself, mainly to be able to access global data which are addressed through GOT (Global Offset Table). This was the very problem. Given that all libc functions were protected with Propolice, mmap(2)'s prologue tried to push the canary, which is accessed through the «__stack_chk_guard» global symbol. This means it used a pointer from the GOT, which had not been initialized at this point.

As an additional note (and a reminder for me ;p), I came to thinking that the problem could also arise in the canary initializer which stands in rtld's .init section. After some thinking, I realized that usual .init and .fini sections were handled by rtld itself, so rtld's ones are actually never run I think.

Obviously rtld must been compiled without SSP. As a temporary solution, libc is not allowed to be compiled with -fstack-protector-all. I think a better solution would be to create a librtld containing symbols required by rtld and compiled without SSP.

Sharp minds have certainly understood that if the original patch worked without -fstack-protector-all it was just a matter of chance because no functions during relocation of rtld's GOT entries had been elected by GCC to be protected.

Thursday, February 21, 2008

FreeBSD textdump(4) is awesome

FreeBSD has had the reputation of being rock solid for a long time. One of the reason for this is that FreeBSD provides a great number powerful debugging tools.

Especially, when your kernel panics, you have three options:

  • Live debug with ddb(4), but this is not always possible if the box has to be up back quickly.

  • Dump the memory to perform post-mortem analysis.

  • Do nothing and pray that the panic won't happen again too soon.



Memory dumps use the swap device. This is perfectly legal because once your OS has crashed, you won't do anything with the data in the swap anyway. On the next reboot, savecore(8) checks if the swap partition contains a memory dump and copies it into a file in /var/crash.

In the beginning, only full memory dumps were possible. In this case if you have 1 GB of RAM, you need a swap partition of at least 1 GB too. So is the file in /var/crash. This worked well but given that most of users are not kernel developpers, kernel dumps are usually useless unless they are transmitted to the right folk. But a 1-GB file is cumbersome.

In April 2006, Peter Wemm introduced minidumps. They are very similar to full dumps except that, from what I've understood, only the kernel memory is dumped. Typically, on my laptop with 1 GB of RAM, minidumps took about 150 MB. The problem, while lessened, was still there though.

A couple of weeks ago, Robert Watson commited a new feature called textdump(4) in FreeBSD 8.0-CURRENT. Basically, this is possible because of two new features of ddb(4):

  • It is possible to define "scripts" (no loop or condition, only a sequence of commands), certain special names corresponding to events.

  • ddb(4) output can be captured in an internal buffer and dumped in place of the memory.



In this post, Robert Watson gives numerous informations about textdumps. I strongly advice you to read this. The very important thing is that most of panics reported by users can be solved by a backtrace and a couple of DDB commands. This is precisely what this feature achieves. Moreover, textdumps rarely exceed one megabyte, which is far more convenient than dumps or minidumps and can be easily sent by e-mail.

Moreover, users using FreeBSD as desktop obviously run X.org. When a panic arise, it is not possible to go back to console mode, so ddb(4) is not accessible. If you've asked your kernel to drop to ddb(4) on panic as I did, the kernel dump is not performed automatically and you're screwed. Textdumps removes this needle from your foot.

Now let's see how to use them. FreeBSD will automatically configures (mini)dumps for you. This is possible to do in a single command:

root# ddb script kdb.enter.panic="textdump set; capture on; show pcpu;trace;show locks;ps;alltrace;show alllocks;show lockedvnods; call doadump"

"kdb.enter.panic" is a script name with a special meaning: as its name lets sound, it will be automatically executed on panic. The first command "textdump set", forces the next dump to be the captured ddb(4) output instead of the traditional memory dump. The second one "capture on"... enables the capture of commands output. Next comes a bunch of ddb(4) commands commonly. The final command "call doadump" performs the actual dump. If you want to reboot automatically, you can add the "reset" command afterward.

As far as I know, there is no configuration sugar to enable this automatically at boot time, so for now I stuck it in /etc/rc.local.