Wednesday 2 April 2008

April Fool's day

As last year, I managed to make my year's forum unreadable for April Fool's day. Awful, pink theme from last year was yesterday joined by truly horrible spelling and flipped avatars. :>

Here are the results:

Here is a little HOWTO. It may be useful for something practical, or for next year. Basically, you have to use mod_ext_filter see the sed example.

Polish spelling mistakes are different to English ones. They are (mostly) caused by fact, that there are (for historical & (maybe) other slavic languages "compatibility" reasons (is legacy a proper word?)) few sounds that have the two spellings, like ź-rz and u-ó, h-ch. It makes it very easy to inject typos to polish text.

We are talking about HTML pages. You cannot break html, eg <a chref="..."> would be BAD. So we don't touch anything inside html tags, comments and entities. This makes it hard to use regexps. If I were to write this joke's filter program once again I would use flex. But I wrote a short and slow (2,2 seconds to process a page - unacceptable) automata-based python script, and then (when it turned out how bad it performs) added lighting-fast (0,01 s/page) C-code generation to it. Generating code for automata is very easy.

code is here

Code generation itself is a bit over-engineered, too. I shouldn't have cared about beautiful indentation & proper newlines. I should have used GNU indent instead.

Apache configuration:
ExtFilterDefine ortozawal mode=output intype=text/html cmd="/usr/local/bin/ortozawal"
ExtFilterDefine ungzip-filter mode=output intype=text/html cmd="/bin/gunzip -"
ExtFilterDefine gzipme-filter mode=output intype=text/html cmd="/bin/gzip"
ExtFilterDefine flip-image mode=output cmd="/usr/bin/convert - -flip -"

<Location "/forum">
SetOutputFilter ungzip-filter;ortozawal;gzipme-filter
</Location>

<Location "/forum/images">
SetOutputFilter flip-image
</Location>

gunzip-filter-gzip is a filthy trick. You can probably avoid it by proper filter configuration. If you know how, please drop me a comment.

No comments: