Less sleep, more science!

Krzysiek

Beep, beep! I'm still (more or less) alive.

University takes me lots of time, which is my primary excuse for not blogging
for so long time. Projects are like gas - they fill up all available space^H^H^H^H^H time.

Interesting (for geeky enough definition of "interesting") stuff I've done recently:

Spellchecker. It's a quite smart one -- written in python and C, it uses trie (ternary search tree) for storing dictionary. It's super-fast to load data (there are no pointers, so loading it is just one read of binary file, and you can use it).

Then you can use TST to quickly (it's quicker than hashmap) check if a word belongs to vocabulary or retrieve (it's still bleeding fast) a list of words that are no further (in Levenstein's (edit) distance) than misspelled one.

Then it uses longest common subsequence algorithm to get parts that don't match and compares those parts using knowledge about typical spelling errors in Polish. It can correct "grzegrzułka", "zomp" and "fzhut". In summary: it's cool.

I'm preparing a talk about scala. It's work in progress. You can see some slides here. I'll give this talk on 3th December as a part of BIWAK

Oh, yes, BIWAK. We (BIT science club) have started a series of talks called BIWAK.

Oh, yes, science club. I've done some work on platform game with cool physics, but there is nothing cool to show off yet.

I've published some of my .rc files

Hooray new swimming pool! Hooray hiking! Hooray birthdays and weddings. Hooray real life.

Krzysiek

As last year, I managed to make my year's forum unreadable for April Fool's day. Awful, pink theme from last year was yesterday joined by truly horrible spelling and flipped avatars. :>

Here are the results:

Here is a little HOWTO. It may be useful for something practical, or for next year. Basically, you have to use mod_ext_filter see the sed example.

Polish spelling mistakes are different to English ones. They are (mostly) caused by fact, that there are (for historical & (maybe) other slavic languages "compatibility" reasons (is legacy a proper word?)) few sounds that have the two spellings, like ź-rz and u-ó, h-ch. It makes it very easy to inject typos to polish text.

We are talking about HTML pages. You cannot break html, eg <a chref="..."> would be BAD. So we don't touch anything inside html tags, comments and entities. This makes it hard to use regexps. If I were to write this joke's filter program once again I would use flex. But I wrote a short and slow (2,2 seconds to process a page - unacceptable) automata-based python script, and then (when it turned out how bad it performs) added lighting-fast (0,01 s/page) C-code generation to it. Generating code for automata is very easy.

code is here

Code generation itself is a bit over-engineered, too. I shouldn't have cared about beautiful indentation & proper newlines. I should have used GNU indent instead.

Apache configuration:

ExtFilterDefine ortozawal mode=output intype=text/html cmd="/usr/local/bin/ortozawal"
 ExtFilterDefine ungzip-filter mode=output intype=text/html cmd="/bin/gunzip -"
 ExtFilterDefine gzipme-filter mode=output intype=text/html cmd="/bin/gzip"
 ExtFilterDefine flip-image  mode=output cmd="/usr/bin/convert - -flip -"

 <Location "/forum">
     SetOutputFilter ungzip-filter;ortozawal;gzipme-filter
 </Location>

 <Location "/forum/images">
     SetOutputFilter flip-image
 </Location>

gunzip-filter-gzip is a filthy trick. You can probably avoid it by proper filter configuration. If you know how, please drop me a comment.

Krzysiek

I've just read Terence Parr's post titled "How To Read C Declarations". The quoted "Golden Rule" makes reading declarations really easy even for a drunk ape, but it's one of the kind I dislike. It gives you some "magical" steps to follow (here are the same rules stated more verbosely in an awful, BASIC-ish GOTO-step-N manner) with no explanation why this way, no another.

Here is the missing explanation:

Declaration reflects how you use declared expression (how you get the value of it), so in int tab[]; tab is an array (you index it) of integers. int (*tab)[]; is a pointer to an array (you dereference it, then index). int *(tab[]); is an array of pointers — you have to index it, then dereference.

How about int *tab[]? You have to know operator precedence

It's not that hard as in looks like. In our case the rule of thumb is: "Postfix binds stronger than prefix.", so you read int *tab[] as array of pointers. "Postfix binds stronger than prefix" is the reason why you look right, then left, in "Golden rule".

Easy, isn't it? Now you know this post's title reads pointer to the function returning pointer to an array of pointers to functions returning pointers to integers. 10 is redundant in this case. (It's no 5 from here, I'm so lazy...).

Of course you'll find out that the Golden rule is an obvious result of sentences above. It's convinient to read declarations that way, but IMO it's very bad to actually think about declarations only in terms of now jump out of the parenthesis.

Less sleep, more science!

Saturday, 15 November 2008

random stuff

Wednesday, 2 April 2008

April Fool's day

Saturday, 9 February 2008

int (((b)())[10])();

Search This Blog

Blog Archive

Labels

People:

EuroPython 2009

Less sleep, more science!

Saturday, 15 November 2008

random stuff

Wednesday, 2 April 2008

April Fool's day

Saturday, 9 February 2008

int *(*(*(*b)())[10])();

Search This Blog

Blog Archive

Labels

People:

EuroPython 2009

int (((b)())[10])();