Technicals writings from JLH

Tuesday, May 6, 2014

Netboot OpenBSD on a Soekris

This post is a reminder for me on how to do this. This is the second time I spend two entire evenings to achieve this, and I seem to remember both time I succeeded using different methods. This time I decided to dump what I have in mind because in 2014, this is just silly how backward and complicated it is :). By the way, I know some people more experienced than me in OpenBSD who may read this, feel free to comment and tell me what I'm doing wrong to end up with something so convoluted :).

I'm using a Debian machine as a server. There is a direct Ethernet cable from the Soekris to the server (10.0.0.1), nothing else is on the network.

Download the OpenBSD release and extract it to /openbsd.
Install the following packages on the Debian server:

udhcpd, because it's way simpler than ISC dhcpd
atftpd, do NOT use the tftpd package, which is the implementation from NetKit, this is really broken and althrough supposed to be very simple I wasted hours on this
rarpd
bootparamd
nfs-kernel-server

Here are the relevant bits for /etc/udhcpd.conf:

start     10.0.0.100
end       10.0.0.110
interface eth2
siaddr    10.0.0.1
sname     debian
bootfile  /tftpboot/pxeboot
option    dns      8.8.8.8 8.8.4.4
option    router   10.0.0.1
option    domain   domain.local
option    subnet   255.255.255.0
option    lease    864000
option    rootpath /openbsd

Start it:
```
udhcpd -f /etc/udhcpd.conf
```

atftpd probably added a similar line to /etc/inetd.conf; keep everything as-is and change the serving path (last argument):

tftp            dgram   udp4    wait    nobody /usr/sbin/tcpd /usr/sbin/in.tftpd --tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /openbsd

Start or reload inetd.
Now I advise to start
a
```
tcpdump -ev
```
in one terminal t
o see what's going on on the wire, and
```
tail -f /var/log/daemon.log
```
in an other. You can give a try to the Soekris: provided you have a serial cable on it (otherwise I don't even know why you're reading this), hit Ctrl-P at the boot and type:
```
boot f0
```
This won't work, but note the Soekris MAC address and the IP address which has been distributed to it.
Now fill up /etc/ethers for rarpd (that's oldschool, isn't it? :)):
```
00:00:24:14:11:80   10.0.0.102
```
Now let's configure bootparamd by first telling the Soekris name in /etc/hosts:
```
10.0.0.102   soekris
```
and then giving NFS root in /etc/bootparams.conf (!)
```
soekris root=10.0.0.1:/openbsd
```
Export /openbsd in NFS in /etc/exports:
```
/openbsd   10.0.0.0/24(rw,no_root_squash)
```
and reload the list of exported filesystems:
```
/etc/init.d/nfs-kernel-server restart
```
Now configure a bit OpenBSD before booting it:

/openbsd/etc/boot.conf (set the console to see the kernel boot and tell pxeboot where to find the kernel):
```
stty com0 19200
set tty com0
boot tftp:/tftpboot/bsd
```
/openbsd/etc/ttys (enable getty on the console and set the correct speed):
```
console "/usr/libexec/getty std.19200"  vt220   on  secure
```

Verify that everything is running: inetd, udhcpd, rarpd, all RPC daemons (rpcbind should be start before all of them), including rpc.bootparamd.
You should now be able to boot, with root having no password!

Thursday, February 21, 2013

Poor man's browser sandboxing

Last update: Sat Feb 23 15:14:13 CET 2013

Nowadays I use my browser most of the time, as you probably do as well. And like me, you are probably quite annoyed to run this big software blob and its unavoidable bugs against so many websites. Yes, browsers are the prime attack vector now.

So why not sandbox it as another user so as to insulate it from you sensitive data (ssh keys/agent, files, ...)? You can do this in about 15 minutes. This is more a quick reminder for me than a full-fledged blog post, so the explanations and commands are a little terse. Adapt them to fit your need.

$ sudo adduser -m browser
$ sudo mkdir ~browser/.ssh
$ sudo cp ~/.ssh/id_rsa.pub ~browser/.ssh/authorized_keys

# You should shut down your browsers before doing this.
$ sudo mkdir ~browser/.config
$ sudo cp -Rp ~/.config/google-chrome ~browser/.config/
$ sudo cp -Rp ~/.mozilla ~browser/
$ sudo chown -R browser ~browser

$ cat > bin/runasbrowser.sh << EOF
#!/bin/sh
exec ssh -Xf browser@localhost "$@"
EOF
$ chmod +x bin/runasbrowser.sh

# Be sure that your ~/bin belongs to your $PATH.
$ runasbrowser Xephyr :1
$ runasbrowser google-chrome

Notes:

You will probably need to put user "browser" in a group to allow him to play music :).
I've got the idea of this here; you can see that the author did much more things than me to set this up. I'm not sure why it works for me with so few steps.

Tuesday, December 11, 2012

Producers/consumers timely queue in Python

For one of my projects (Push2mob), I had to implement a multiple producers / multiple consumers timely queue, that is a priority queue where the priority is a timestamp at which the item should be delivered. After multiple more or less successful attempts I've finally come up with an implementation that was efficient and neat. I've posted it on StackOverflow to ask for some review, and got a good one.

Here is the final result. I would be glad if it could help someone.

# Copyright (c) 2012, Jeremie Le Hen 
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met: 
#
# 1. Redistributions of source code must retain the above copyright notice, this
#    list of conditions and the following disclaimer. 
# 2. Redistributions in binary form must reproduce the above copyright notice,
#    this list of conditions and the following disclaimer in the documentation
#    and/or other materials provided with the distribution. 
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import collections
import heapq
import threading
import time

class TimelyQueue:
    """
    Implements a similar but stripped down interface of Queue which
    delivers items on time only.
    """

    def __init__(self, resolution=5):
        """
        `resolution' is an optimization to avoid wasting CPU cycles when
        something is about to happen in less than X ms.
        """
        self.timerthread = threading.Thread(target=self.__timer)
        self.timerthread.daemon = True
        self.resolution = float(resolution) / 1000
        self.queue = []
        self.triggered = collections.deque()
        self.putcond = threading.Condition()
        self.getcond = threading.Condition()
        # Optimization to avoid waking the thread uselessly.
        self.putwaketime = 0
        self.terminating = False
        self.timerthread.start()

    def put(self, when, item):
        """
        `when' is a Unix time from Epoch.
        """
        with self.putcond:
            heapq.heappush(self.queue, (when, item))
            if when < self.putwaketime or self.putwaketime == 0:
                self.putcond.notify()

    def get(self, timeout=None):
        """
        Timely return the next object on the queue.
        """
        with self.getcond:
            if len(self.triggered) > 0:
                when, item = self.triggered.popleft()
                return item
                self.getcond.wait(timeout)
            try:
                when, item = self.triggered.popleft()
            except IndexError:
                return None
            return item

    def qsize(self):
        """
        Self explanatory.
        """
        with self.putcond:
            return len(self.queue)

    def terminate(self):
        """
        Request the embedded thread to terminate.
        """
        with self.putcond:
            self.terminating = True
            self.putcond.notifyAll()

    def __timer(self):
        with self.putcond:
            maxwait = None
            while True:
                curtime = time.time()
                try:
                    when, item = self.queue[0]
                    maxwait = when - curtime
                    self.putwaketime = when
                except IndexError:
                    maxwait = None
                    self.putwaketime = 0
                self.putcond.wait(maxwait)
                if self.terminating:
                    return

                curtime = time.time()
                while True:
                    # Don't dequeue now, we are not sure to use it yet.
                    try:
                        when, item = self.queue[0]
                    except IndexError:
                        break
                    if when > curtime + self.resolution:
                        break

                    self.triggered.append(heapq.heappop(self.queue))
                if len(self.triggered) > 0:
                    with self.getcond:
                        self.getcond.notify(len(self.triggered))


if __name__ == "__main__":
    q = TimelyQueue()
    N = 100000
    t0 = time.time()
    for i in range(N):
        q.put(time.time() + 2, i)
    dt = time.time() - t0
    print "put done in %.3fs (%.2f put/sec)" % (dt, N / dt)
    t0 = time.time()
    i = 0
    while i < N:
        a = q.get(3)
        if i == 0:
            dt = time.time() - t0
            print "start get after %.3fs" % dt
            t0 = time.time()
        i += 1
    dt = time.time() - t0
    print "get done in %.3fs (%.2f get/sec)" % (dt, N / dt)
    q.terminate()
    # Give change to the thread to exit properly, otherwise we may get
    # a stray interpreter exception.
    time.sleep(0.1)

Friday, June 15, 2012

Using raw devices with VirtualBox run as user

Today I needed to run a virtual machine on my laptop. It is not very powerful so I wanted to avoid as much overhead as possible: skipping the host's VFS/filesystem layer is fairly easy, you just have to give access to a raw partition the your virtualization software instead of a file. After all, both are just seen as a big array of bytes. My hard drive is under LVM, so I created a dedicated logical volume that I wanted to be writable by me given I planned to run VirtualBox as a user.

I knew I could do this with udev(8) but skimming through the documentation and fumbling its rules would have been too long for my available time, so I tried Google with no luck and finally asked on IRC, where I found that someone already did this for the same reason.

The configuration line is quite easy:

root@r2d2# cat /etc/udev/rules.d/99-my.rules
SUBSYSTEM=="block",KERNEL=="dm-*",ACTION=="add|change",ENV{DM_NAME}=="*-vbox*",GROUP="jlh"

This rules basically tells that for any add or change of a block device named "dm-*" and matching "*-vbox*", change its group to "jlh" (note that == and = are different, as in many programming languages). One interesting thing to note is that ENV{DM_NAME}=="*-vbox*" is an helper environment variable that is set by udev(8) standards rules. Those stand in /lib/udev/rules/ on Debian and udev(8) merges the content of this directory with the standard configuration directory /etc/udev/rules.d/. Rules are applied by filename order, so be careful to be the last one. Initially I used "90-my.rules" but there is a rule in "91-permissions.rules" that overrode mine. You can easily debug by running udevd --debug, although the output is quite verbose.

The next step is to create a VMDK file for VirtualBox that will point to the raw device and then attach is to your VM's storage controller. This is quite well documented in the manual (Using a raw host hard disk from a guest).

Basically:

jlh@r2d2$ VBoxManage internalcommands createrawvmdk \
    -filename VirtualBox VMs/vm1/data.vmdk \
    -rawdisk /dev/mapper/vg0-vbox_vm1
jlh@r2d2$ VBoxManage showvminfo vm1 | grep 'Storage Controller Name'
Storage Controller Name (0):            IDE Controller
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "IDE Controller" --port 0 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/FreeBSD/data.vmdtk

Your mileage may vary if, for example, you have a different storage controller name, different port or device. The full "showvminfo" output will tell you which slot is available. Another solution is to add another storage controller, although VirtualBox will not permit you to have multiple IDE controller. You can add a S-ATA controller which allows you to plug up to 30 devices:

jlh@r2d2$ VBoxManage storagectl vm1 --name "SATA Controller" --add sata --controller IntelAHCI --bootable on
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/boot.vmdk
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "IDE Controller" --port 1 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/root.vmdk

VitualBox and VboxManage are pretty well documented.

Wednesday, March 28, 2012

A Varnish threads story

Varnish is a (now) well-known HTTP caching reverse-proxy. It has been written primarily by Poul-Henning Kamp, a famous FreeBSD developer. Varnish is very BSDish: simple, versatile and powerful.

(Yet, configuring it may be pretty tough because HTTP is a complex protocol with regard to caching (RFC 2616 mentions client-side proxies but not server-side ones). Besides, applications living on top of it are often written without any caching consideration in mind. For instance by default Varnish doesn't cache response from requests containing cookies, not it caches responses with a Set-Cookie header, for obvious reasons. Unfortunately PHP applications make heavy use of the PHPSESSID cookie simply because the session_start() function, which is part of the PHP library, is very handy for developers.

Varnish uses a pool of threads to serve requests, with a configurable minimum and maximum values as well as a timeout value (with the -w command-line option). Much like what Apache does with processes when used with the MPM prefork module. Additionally, Varnish enforces a configurable delay between thread creation (parameter's name is thread_pool_add_delay, you can configure it with the -p command-line option).

For some reason, one Varnish instance on a preproduction server here was configured with silly values regarding thread limits: only one thread at minimum. Given the server was often unused, threads timed out and were removed down to one. The problem was that when a developer wanted to test the websites, there was only one thread available and the aforementioned delay between thread creation prevented from spawning them all at one. Albeit being a very powerful server, the website was felt very sluggish.

It took me some time to find out this problem. When I modified the configuration, the website was really, really fast.

Tuesday, March 6, 2012

Portmaster options combo to upgrade FreeBSD ports

Some years ago I was using the famous Portupgrade to maintain my ports. This software is mature, very powerful and easy to use. Unfortunately its dependency on Ruby makes it really cumbersome, especially because I have many jails.

Therefore when Doug Barton began Portmaster, which is written is shell and does more or less the same thing (well, actually less, but I can live with it), I was quite eager to use it. One thing I didn't like from the beginning with Postmaster was that it is not able to work alone: it is constantly asking things. Of course there are options to disable this, but this leads to me the second problem: they are not intuitive! (at least for me...)

After some struggle, I finally managed to find the options I always want to use and I'm writing them as a reminder and in the hope to help someone else in the same hassle:

# portmaster -dBGm BATCH=1 --no-confirm --delete-packages -a

Here are the details:

I'm using portconf to configure the ports' build knobs, so I don't want to run the configuration or to be asked something about it. Just use the defaults unless I told otherwise: -G -m BATCH=1;
Don't create a backup package, I'm not running any financial application: -B;
Don't ask me if the distfiles must be cleaned, just do it: -d;
Don't ask me if I really want to upgrade my ports, I already executed the command proving it: --no-confirm;
Remove packages once installed: --delete-packages;
Upgrade everything: -a, but you might not want to ugprade everything at once so you can replace this with one or more port name.

Wednesday, November 9, 2011

Apache mod_rewrite evilness (with dynamic vhosts and .htaccess rewrite rules)

At my new $job, we have a SVN repository for the websites we are maintaining. We devised a workflow to work with it: each developer has one or more branches for himself. The merge of their features is done in the trunk. Once everything seems to work, we merge into the "preproduction" branch and finally in the "production" branch.

On the developement web server, I wanted them to be able access every branch with their web browser. Initially, there was a "svn.mywebsite.com" virtual host and each branch was accessible through an URL-path within it. Unfortuntaly, for "historical" reason, the web site doesn't work correctly if set in an URL sub-directory (and we are currently writing a new version of this website, so we actually don't want to spend time fixing it). I am therefore doomed to create a virtual host for each SVN trunk/tag/branch.

Here is the relevant part of the initial configuration I wrote:


<VirtualHost *:80>
    ServerName   svn.mywebsite.com
    ServerAlias *.svn.mywebsite.com

    DocumentRoot /var/empty

    RewriteEngine on

    RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/trunk$1       [L]

    RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/branches%1$1  [L]

    RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/tags%1$1      [L]

</VirtualHost>

So far it's easy and it would have worked if there wasn't the following RewriteRule in the .htaccess at the root of the project:


RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The problem with this rule is that if we request a "virtual URL" (which does not match a physical file in the hierarchy), index.php is called with the original URL in the query string, which results in an internal redirect within Apache.

I save you from the RewriteLog, but let's say you try: http://trunk.svn.mywebsite.com/virtual-url.

Initially the ${REQUEST_URI} is "/virtual-url". The vhost rewrite rules are applied, which redirect to the filesystem path: /home/www-data/svn/project/trunk/virtual-url.

Then we reach the per-directory (.htaccess) rewrite rules which, given the file doesn't exist, redirect to /home/www-data/svn/project/trunk/index.php with the following query string q=virtual-url.

Here is the first trap: an internal redirect is done within Apache, which restarts the rewrite rules evaluation from the beginning, with ${REQUEST_URI} set to /home/www-data/svn/project/trunk/index.php, while the ${HTTP_HOST} is still the same. So the directory will be prepended twice if we do not put a safeguard: basically checking that $REQUEST_URI doesn't contain /home/www-data/svn/project/.

But the true evilness is here: we cannot rewrite to a full filesystem path in a per-directory rewrite rule. The subsequent internal redirect will invariably think that this is an URL path, that is it will try to serve a page as if you had requested: http://trunk.svn.mywebsite.com/home/www-data/svn/project/trunk/index.php. Because of the safeguard above, the dynamic vhost magic will not apply, and Apache will try to reach this file from the vhost's DocumentRoot and you will get a 404.

The workaround for this is trick Apache into thinking the content of ${REQUEST_URI} is a full filesystem path if the latter looks like a filesystem path :-). Contrary to the per-directory rewrite rules, the vhost rewrite rules are able to redirect to a full filesystem path. So just match the whole content and redirect to it.


<VirtualHost *:80>
    ServerName   svn.mywebsite.com
    ServerAlias *.svn.mywebsite.com

    DocumentRoot /var/empty

    RewriteEngine on

    RewriteRule ^(/home/www-data/svn/project/.*)       $1             [L]

    RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/trunk$1       [L]

    RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/branches%1$1  [L]

    RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
    RewriteRule $(.*)        /home/www-data/svn/project/tags%1$1      [L]

</VirtualHost>

The first rewrite rule matches a ${REQUEST_URI} containing the a filesystem path. This is not exactly a rewrite, this is just a trick to trigger the mod_rewrite evaluation of the substitution.