Wednesday, November 9, 2011

Apache mod_rewrite evilness (with dynamic vhosts and .htaccess rewrite rules)

At my new $job, we have a SVN repository for the websites we are maintaining. We devised a workflow to work with it: each developer has one or more branches for himself. The merge of their features is done in the trunk. Once everything seems to work, we merge into the "preproduction" branch and finally in the "production" branch.

On the developement web server, I wanted them to be able access every branch with their web browser. Initially, there was a "svn.mywebsite.com" virtual host and each branch was accessible through an URL-path within it. Unfortuntaly, for "historical" reason, the web site doesn't work correctly if set in an URL sub-directory (and we are currently writing a new version of this website, so we actually don't want to spend time fixing it). I am therefore doomed to create a virtual host for each SVN trunk/tag/branch.

Here is the relevant part of the initial configuration I wrote:

<VirtualHost *:80>
ServerName svn.mywebsite.com
ServerAlias *.svn.mywebsite.com

DocumentRoot /var/empty

RewriteEngine on

RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/trunk$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/branches%1$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/tags%1$1 [L]

</VirtualHost>


So far it's easy and it would have worked if there wasn't the following RewriteRule in the .htaccess at the root of the project:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]


The problem with this rule is that if we request a "virtual URL" (which does not match a physical file in the hierarchy), index.php is called with the original URL in the query string, which results in an internal redirect within Apache.

I save you from the RewriteLog, but let's say you try: http://trunk.svn.mywebsite.com/virtual-url.


  • Initially the ${REQUEST_URI} is "/virtual-url". The vhost rewrite rules are applied, which redirect to the filesystem path: /home/www-data/svn/project/trunk/virtual-url.


  • Then we reach the per-directory (.htaccess) rewrite rules which, given the file doesn't exist, redirect to /home/www-data/svn/project/trunk/index.php with the following query string q=virtual-url.


  • Here is the first trap: an internal redirect is done within Apache, which restarts the rewrite rules evaluation from the beginning, with ${REQUEST_URI} set to /home/www-data/svn/project/trunk/index.php, while the ${HTTP_HOST} is still the same. So the directory will be prepended twice if we do not put a safeguard: basically checking that $REQUEST_URI doesn't contain /home/www-data/svn/project/.


  • But the true evilness is here: we cannot rewrite to a full filesystem path in a per-directory rewrite rule. The subsequent internal redirect will invariably think that this is an URL path, that is it will try to serve a page as if you had requested: http://trunk.svn.mywebsite.com/home/www-data/svn/project/trunk/index.php. Because of the safeguard above, the dynamic vhost magic will not apply, and Apache will try to reach this file from the vhost's DocumentRoot and you will get a 404.

    The workaround for this is trick Apache into thinking the content of ${REQUEST_URI} is a full filesystem path if the latter looks like a filesystem path :-). Contrary to the per-directory rewrite rules, the vhost rewrite rules are able to redirect to a full filesystem path. So just match the whole content and redirect to it.




<VirtualHost *:80>
ServerName svn.mywebsite.com
ServerAlias *.svn.mywebsite.com

DocumentRoot /var/empty

RewriteEngine on

RewriteRule ^(/home/www-data/svn/project/.*) $1 [L]

RewriteCond %{HTTP_HOST} ^trunk\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/trunk$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).branches\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/branches%1$1 [L]

RewriteCond %{HTTP_HOST} ^([^.]+).tags\.svn\.mywebsite\.com$
RewriteRule $(.*) /home/www-data/svn/project/tags%1$1 [L]

</VirtualHost>


The first rewrite rule matches a ${REQUEST_URI} containing the a filesystem path. This is not exactly a rewrite, this is just a trick to trigger the mod_rewrite evaluation of the substitution.