Showing posts with label www. Show all posts
Showing posts with label www. Show all posts

Wednesday, March 28, 2012

A Varnish threads story

Varnish is a (now) well-known HTTP caching reverse-proxy. It has been written primarily by Poul-Henning Kamp, a famous FreeBSD developer. Varnish is very BSDish: simple, versatile and powerful.

(Yet, configuring it may be pretty tough because HTTP is a complex protocol with regard to caching (RFC 2616 mentions client-side proxies but not server-side ones). Besides, applications living on top of it are often written without any caching consideration in mind. For instance by default Varnish doesn't cache response from requests containing cookies, not it caches responses with a Set-Cookie header, for obvious reasons. Unfortunately PHP applications make heavy use of the PHPSESSID cookie simply because the session_start() function, which is part of the PHP library, is very handy for developers.

Varnish uses a pool of threads to serve requests, with a configurable minimum and maximum values as well as a timeout value (with the -w command-line option). Much like what Apache does with processes when used with the MPM prefork module. Additionally, Varnish enforces a configurable delay between thread creation (parameter's name is thread_pool_add_delay, you can configure it with the -p command-line option).

For some reason, one Varnish instance on a preproduction server here was configured with silly values regarding thread limits: only one thread at minimum. Given the server was often unused, threads timed out and were removed down to one. The problem was that when a developer wanted to test the websites, there was only one thread available and the aforementioned delay between thread creation prevented from spawning them all at one. Albeit being a very powerful server, the website was felt very sluggish.

It took me some time to find out this problem. When I modified the configuration, the website was really, really fast.

Wednesday, November 9, 2011

Apache mod_rewrite evilness (with dynamic vhosts and .htaccess rewrite rules)

At my new $job, we have a SVN repository for the websites we are maintaining. We devised a workflow to work with it: each developer has one or more branches for himself. The merge of their features is done in the trunk. Once everything seems to work, we merge into the "preproduction" branch and finally in the "production" branch.

On the developement web server, I wanted them to be able access every branch with their web browser. Initially, there was a "svn.mywebsite.com" virtual host and each branch was accessible through an URL-path within it. Unfortuntaly, for "historical" reason, the web site doesn't work correctly if set in an URL sub-directory (and we are currently writing a new version of this website, so we actually don't want to spend time fixing it). I am therefore doomed to create a virtual host for each SVN trunk/tag/branch.

Here is the relevant part of the initial configuration I wrote:

<VirtualHost *:80>
ServerName svn.mywebsite.com
ServerAlias *.svn.mywebsite.com

DocumentRoot /var/empty

RewriteEngine on

RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/trunk$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/branches%1$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/tags%1$1 [L]

</VirtualHost>


So far it's easy and it would have worked if there wasn't the following RewriteRule in the .htaccess at the root of the project:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]


The problem with this rule is that if we request a "virtual URL" (which does not match a physical file in the hierarchy), index.php is called with the original URL in the query string, which results in an internal redirect within Apache.

I save you from the RewriteLog, but let's say you try: http://trunk.svn.mywebsite.com/virtual-url.


  • Initially the ${REQUEST_URI} is "/virtual-url". The vhost rewrite rules are applied, which redirect to the filesystem path: /home/www-data/svn/project/trunk/virtual-url.


  • Then we reach the per-directory (.htaccess) rewrite rules which, given the file doesn't exist, redirect to /home/www-data/svn/project/trunk/index.php with the following query string q=virtual-url.


  • Here is the first trap: an internal redirect is done within Apache, which restarts the rewrite rules evaluation from the beginning, with ${REQUEST_URI} set to /home/www-data/svn/project/trunk/index.php, while the ${HTTP_HOST} is still the same. So the directory will be prepended twice if we do not put a safeguard: basically checking that $REQUEST_URI doesn't contain /home/www-data/svn/project/.


  • But the true evilness is here: we cannot rewrite to a full filesystem path in a per-directory rewrite rule. The subsequent internal redirect will invariably think that this is an URL path, that is it will try to serve a page as if you had requested: http://trunk.svn.mywebsite.com/home/www-data/svn/project/trunk/index.php. Because of the safeguard above, the dynamic vhost magic will not apply, and Apache will try to reach this file from the vhost's DocumentRoot and you will get a 404.

    The workaround for this is trick Apache into thinking the content of ${REQUEST_URI} is a full filesystem path if the latter looks like a filesystem path :-). Contrary to the per-directory rewrite rules, the vhost rewrite rules are able to redirect to a full filesystem path. So just match the whole content and redirect to it.




<VirtualHost *:80>
ServerName svn.mywebsite.com
ServerAlias *.svn.mywebsite.com

DocumentRoot /var/empty

RewriteEngine on

RewriteRule ^(/home/www-data/svn/project/.*) $1 [L]

RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/trunk$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/branches%1$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/tags%1$1 [L]

</VirtualHost>


The first rewrite rule matches a ${REQUEST_URI} containing the a filesystem path. This is not exactly a rewrite, this is just a trick to trigger the mod_rewrite evaluation of the substitution.