Tuesday, December 11, 2012

Producers/consumers timely queue in Python

For one of my projects (Push2mob), I had to implement a multiple producers / multiple consumers timely queue, that is a priority queue where the priority is a timestamp at which the item should be delivered.  After multiple more or less successful attempts I've finally come up with an implementation that was efficient and neat.  I've posted it on StackOverflow to ask for some review, and got a good one.

Here is the final result.  I would be glad if it could help someone.

# Copyright (c) 2012, Jeremie Le Hen 
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met: 
#
# 1. Redistributions of source code must retain the above copyright notice, this
#    list of conditions and the following disclaimer. 
# 2. Redistributions in binary form must reproduce the above copyright notice,
#    this list of conditions and the following disclaimer in the documentation
#    and/or other materials provided with the distribution. 
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import collections
import heapq
import threading
import time

class TimelyQueue:
    """
    Implements a similar but stripped down interface of Queue which
    delivers items on time only.
    """

    def __init__(self, resolution=5):
        """
        `resolution' is an optimization to avoid wasting CPU cycles when
        something is about to happen in less than X ms.
        """
        self.timerthread = threading.Thread(target=self.__timer)
        self.timerthread.daemon = True
        self.resolution = float(resolution) / 1000
        self.queue = []
        self.triggered = collections.deque()
        self.putcond = threading.Condition()
        self.getcond = threading.Condition()
        # Optimization to avoid waking the thread uselessly.
        self.putwaketime = 0
        self.terminating = False
        self.timerthread.start()

    def put(self, when, item):
        """
        `when' is a Unix time from Epoch.
        """
        with self.putcond:
            heapq.heappush(self.queue, (when, item))
            if when < self.putwaketime or self.putwaketime == 0:
                self.putcond.notify()

    def get(self, timeout=None):
        """
        Timely return the next object on the queue.
        """
        with self.getcond:
            if len(self.triggered) > 0:
                when, item = self.triggered.popleft()
                return item
                self.getcond.wait(timeout)
            try:
                when, item = self.triggered.popleft()
            except IndexError:
                return None
            return item

    def qsize(self):
        """
        Self explanatory.
        """
        with self.putcond:
            return len(self.queue)

    def terminate(self):
        """
        Request the embedded thread to terminate.
        """
        with self.putcond:
            self.terminating = True
            self.putcond.notifyAll()

    def __timer(self):
        with self.putcond:
            maxwait = None
            while True:
                curtime = time.time()
                try:
                    when, item = self.queue[0]
                    maxwait = when - curtime
                    self.putwaketime = when
                except IndexError:
                    maxwait = None
                    self.putwaketime = 0
                self.putcond.wait(maxwait)
                if self.terminating:
                    return

                curtime = time.time()
                while True:
                    # Don't dequeue now, we are not sure to use it yet.
                    try:
                        when, item = self.queue[0]
                    except IndexError:
                        break
                    if when > curtime + self.resolution:
                        break

                    self.triggered.append(heapq.heappop(self.queue))
                if len(self.triggered) > 0:
                    with self.getcond:
                        self.getcond.notify(len(self.triggered))


if __name__ == "__main__":
    q = TimelyQueue()
    N = 100000
    t0 = time.time()
    for i in range(N):
        q.put(time.time() + 2, i)
    dt = time.time() - t0
    print "put done in %.3fs (%.2f put/sec)" % (dt, N / dt)
    t0 = time.time()
    i = 0
    while i < N:
        a = q.get(3)
        if i == 0:
            dt = time.time() - t0
            print "start get after %.3fs" % dt
            t0 = time.time()
        i += 1
    dt = time.time() - t0
    print "get done in %.3fs (%.2f get/sec)" % (dt, N / dt)
    q.terminate()
    # Give change to the thread to exit properly, otherwise we may get
    # a stray interpreter exception.
    time.sleep(0.1)

Friday, June 15, 2012

Using raw devices with VirtualBox run as user

Today I needed to run a virtual machine on my laptop. It is not very powerful so I wanted to avoid as much overhead as possible: skipping the host's VFS/filesystem layer is fairly easy, you just have to give access to a raw partition the your virtualization software instead of a file. After all, both are just seen as a big array of bytes. My hard drive is under LVM, so I created a dedicated logical volume that I wanted to be writable by me given I planned to run VirtualBox as a user.

I knew I could do this with udev(8) but skimming through the documentation and fumbling its rules would have been too long for my available time, so I tried Google with no luck and finally asked on IRC, where I found that someone already did this for the same reason.

The configuration line is quite easy:

root@r2d2# cat /etc/udev/rules.d/99-my.rules
SUBSYSTEM=="block",KERNEL=="dm-*",ACTION=="add|change",ENV{DM_NAME}=="*-vbox*",GROUP="jlh"

This rules basically tells that for any add or change of a block device named "dm-*" and matching "*-vbox*", change its group to "jlh" (note that == and = are different, as in many programming languages). One interesting thing to note is that ENV{DM_NAME}=="*-vbox*" is an helper environment variable that is set by udev(8) standards rules. Those stand in /lib/udev/rules/ on Debian and udev(8) merges the content of this directory with the standard configuration directory /etc/udev/rules.d/. Rules are applied by filename order, so be careful to be the last one. Initially I used "90-my.rules" but there is a rule in "91-permissions.rules" that overrode mine. You can easily debug by running udevd --debug, although the output is quite verbose.

The next step is to create a VMDK file for VirtualBox that will point to the raw device and then attach is to your VM's storage controller. This is quite well documented in the manual (Using a raw host hard disk from a guest).

Basically:

jlh@r2d2$ VBoxManage internalcommands createrawvmdk \
    -filename VirtualBox VMs/vm1/data.vmdk \
    -rawdisk /dev/mapper/vg0-vbox_vm1
jlh@r2d2$ VBoxManage showvminfo vm1 | grep 'Storage Controller Name'
Storage Controller Name (0):            IDE Controller
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "IDE Controller" --port 0 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/FreeBSD/data.vmdtk

Your mileage may vary if, for example, you have a different storage controller name, different port or device. The full "showvminfo" output will tell you which slot is available. Another solution is to add another storage controller, although VirtualBox will not permit you to have multiple IDE controller. You can add a S-ATA controller which allows you to plug up to 30 devices:

jlh@r2d2$ VBoxManage storagectl vm1 --name "SATA Controller" --add sata --controller IntelAHCI --bootable on
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/boot.vmdk
jlh@r2d2$ VBoxManage storageattach vm1 --storagectl "IDE Controller" --port 1 --device 0 --type hdd --medium /home/jlh/VirtualBox\ VMs/root.vmdk

VitualBox and VboxManage are pretty well documented.

Wednesday, March 28, 2012

A Varnish threads story

Varnish is a (now) well-known HTTP caching reverse-proxy. It has been written primarily by Poul-Henning Kamp, a famous FreeBSD developer. Varnish is very BSDish: simple, versatile and powerful.

(Yet, configuring it may be pretty tough because HTTP is a complex protocol with regard to caching (RFC 2616 mentions client-side proxies but not server-side ones). Besides, applications living on top of it are often written without any caching consideration in mind. For instance by default Varnish doesn't cache response from requests containing cookies, not it caches responses with a Set-Cookie header, for obvious reasons. Unfortunately PHP applications make heavy use of the PHPSESSID cookie simply because the session_start() function, which is part of the PHP library, is very handy for developers.

Varnish uses a pool of threads to serve requests, with a configurable minimum and maximum values as well as a timeout value (with the -w command-line option). Much like what Apache does with processes when used with the MPM prefork module. Additionally, Varnish enforces a configurable delay between thread creation (parameter's name is thread_pool_add_delay, you can configure it with the -p command-line option).

For some reason, one Varnish instance on a preproduction server here was configured with silly values regarding thread limits: only one thread at minimum. Given the server was often unused, threads timed out and were removed down to one. The problem was that when a developer wanted to test the websites, there was only one thread available and the aforementioned delay between thread creation prevented from spawning them all at one. Albeit being a very powerful server, the website was felt very sluggish.

It took me some time to find out this problem. When I modified the configuration, the website was really, really fast.

Tuesday, March 6, 2012

Portmaster options combo to upgrade FreeBSD ports

Some years ago I was using the famous Portupgrade to maintain my ports. This software is mature, very powerful and easy to use. Unfortunately its dependency on Ruby makes it really cumbersome, especially because I have many jails.

Therefore when Doug Barton began Portmaster, which is written is shell and does more or less the same thing (well, actually less, but I can live with it), I was quite eager to use it. One thing I didn't like from the beginning with Postmaster was that it is not able to work alone: it is constantly asking things. Of course there are options to disable this, but this leads to me the second problem: they are not intuitive! (at least for me...)

After some struggle, I finally managed to find the options I always want to use and I'm writing them as a reminder and in the hope to help someone else in the same hassle:

# portmaster -dBGm BATCH=1 --no-confirm --delete-packages -a

Here are the details:

  • I'm using portconf to configure the ports' build knobs, so I don't want to run the configuration or to be asked something about it. Just use the defaults unless I told otherwise: -G -m BATCH=1;
  • Don't create a backup package, I'm not running any financial application: -B;
  • Don't ask me if the distfiles must be cleaned, just do it: -d;
  • Don't ask me if I really want to upgrade my ports, I already executed the command proving it: --no-confirm;
  • Remove packages once installed: --delete-packages;
  • Upgrade everything: -a, but you might not want to ugprade everything at once so you can replace this with one or more port name.