Less sleep, more science!

Krzysiek

Beep, beep! I'm still (more or less) alive.

University takes me lots of time, which is my primary excuse for not blogging
for so long time. Projects are like gas - they fill up all available space^H^H^H^H^H time.

Interesting (for geeky enough definition of "interesting") stuff I've done recently:

Spellchecker. It's a quite smart one -- written in python and C, it uses trie (ternary search tree) for storing dictionary. It's super-fast to load data (there are no pointers, so loading it is just one read of binary file, and you can use it).

Then you can use TST to quickly (it's quicker than hashmap) check if a word belongs to vocabulary or retrieve (it's still bleeding fast) a list of words that are no further (in Levenstein's (edit) distance) than misspelled one.

Then it uses longest common subsequence algorithm to get parts that don't match and compares those parts using knowledge about typical spelling errors in Polish. It can correct "grzegrzułka", "zomp" and "fzhut". In summary: it's cool.

I'm preparing a talk about scala. It's work in progress. You can see some slides here. I'll give this talk on 3th December as a part of BIWAK

Oh, yes, BIWAK. We (BIT science club) have started a series of talks called BIWAK.

Oh, yes, science club. I've done some work on platform game with cool physics, but there is nothing cool to show off yet.

I've published some of my .rc files

Hooray new swimming pool! Hooray hiking! Hooray birthdays and weddings. Hooray real life.

Krzysiek

My first script at vim.sf.net. You can grab it here. Source code of it and unittests (nose is great!) are on my mercurial repo.

Have fun!

Krzysiek

As last year, I managed to make my year's forum unreadable for April Fool's day. Awful, pink theme from last year was yesterday joined by truly horrible spelling and flipped avatars. :>

Here are the results:

Here is a little HOWTO. It may be useful for something practical, or for next year. Basically, you have to use mod_ext_filter see the sed example.

Polish spelling mistakes are different to English ones. They are (mostly) caused by fact, that there are (for historical & (maybe) other slavic languages "compatibility" reasons (is legacy a proper word?)) few sounds that have the two spellings, like ź-rz and u-ó, h-ch. It makes it very easy to inject typos to polish text.

We are talking about HTML pages. You cannot break html, eg <a chref="..."> would be BAD. So we don't touch anything inside html tags, comments and entities. This makes it hard to use regexps. If I were to write this joke's filter program once again I would use flex. But I wrote a short and slow (2,2 seconds to process a page - unacceptable) automata-based python script, and then (when it turned out how bad it performs) added lighting-fast (0,01 s/page) C-code generation to it. Generating code for automata is very easy.

code is here

Code generation itself is a bit over-engineered, too. I shouldn't have cared about beautiful indentation & proper newlines. I should have used GNU indent instead.

Apache configuration:

ExtFilterDefine ortozawal mode=output intype=text/html cmd="/usr/local/bin/ortozawal"
 ExtFilterDefine ungzip-filter mode=output intype=text/html cmd="/bin/gunzip -"
 ExtFilterDefine gzipme-filter mode=output intype=text/html cmd="/bin/gzip"
 ExtFilterDefine flip-image  mode=output cmd="/usr/bin/convert - -flip -"

 <Location "/forum">
     SetOutputFilter ungzip-filter;ortozawal;gzipme-filter
 </Location>

 <Location "/forum/images">
     SetOutputFilter flip-image
 </Location>

gunzip-filter-gzip is a filthy trick. You can probably avoid it by proper filter configuration. If you know how, please drop me a comment.

Krzysiek

Grab it here.

It's < 256 LOC toy lisp. Nothing super-fancy. If you don't count unit tests and Guido's mm.py it's even < 128 LOC.

It's quite pythonic. When you write eval in LISP it's just a big cond. It translates directly to if/elif/elif/... which doesn't look good. I used dict of functions instead, which looks better in my eyes. Uses as many Python features as possible. Python functions, python lists. etc.

Talking about Python and lisp lists:


class slist(object):
    class iterator(object):
        def __init__(self, lst):
            self.lst = lst
        def next(self):
            if self.lst:
                result = self.lst.car
                self.lst = self.lst.cdr
                return result
            else: raise StopIteration

    def __iter__(self):
        return slist.iterator(self)

    def __init__(self, a, b=None):
        self.car = a
        self.cdr = b

    def __str__(self):
        return '(%s)' % ' '.join(str(elem) for elem in self)

list_123 = slist(1, slist(2, slist(3)))
a, b, c = list_123

Yes, this is just a linked list, not a true scheme cons, since it doesn't support improper lists, I posted it just to show how Python iterators are neat.

Krzysiek

It's amazing what activities can one invent, just to have something to do but learn...
I've just written an external DSL to ease writing CppUnit tests. It's not finished, but it works.

There is one thing I particularly hate about CppUnit: you have to name every test in fixture at least twice - once when it's created, and another time when it's added to the suite. OK, it's C++, we don't have reflection to do this stuff automatically, preprocessor is to dumb to fix it too... but it still sucks. It's repetitive and error prone. And you can end with tests that are never called, because you have forgotten to add them to suite.

Blah. I think I'm clear.

I wanted to have something simple - generated code should be very similar to our source - tests bodies themselves should be just copied intro right places and so on. No full-blown C++ parsers... no C++ parser at all. Just something that is a little more than preprocessor macros.

And here it goes:

@includeHeaders

@beginFixture TestSum
Sum *sum;
@setUp {
   sum = new Sum();
}
@tearDown {
   delete sum;
}
@test "empty sum of nothing should be zero" {
CPPUNIT_ASSERT_EQUAL(0, sum->getResult());
}
@test "simple sum of 1+2+3" {
sum->add(1);
sum->add(2);
sum->add(3);
CPPUNIT_ASSERT_EQUAL(6, sum->getResult());
}
@endFixture

translates to


#include <cppunit/TestCase.h>
#include <cppunit/extensions/HelperMacros.h>

class TestSum: public CppUnit::TestFixture {
CPPUNIT_TEST_SUITE( TestSum );
CPPUNIT_TEST( empty_sum_of_nothing_should_be_zero );
CPPUNIT_TEST( simple_sum_of_1_2_3 );
CPPUNIT_TEST_SUITE_END();

public:
Sum *sum;
void setUp() {
   sum = new Sum();
}
void tearDown() {
   delete sum;
}
void empty_sum_of_nothing_should_be_zero() {
   CPPUNIT_ASSERT_EQUAL(0, sum->getResult());
}
void simple_sum_of_1_2_3() {
   sum->add(1);
   sum->add(2);
   sum->add(3);
   CPPUNIT_ASSERT_EQUAL(6, sum->getResult());
}
};

Simple as that.

Mercurial repo.

Time to finish this longish post before it took me more time to blog about it, than to actually write it...
cut!
My first post on my first blog is finished. Applause!

Less sleep, more science!

Saturday, 15 November 2008

random stuff

Friday, 2 May 2008

organize your python imports in vim

Wednesday, 2 April 2008

April Fool's day

Saturday, 1 March 2008

small, pythonic lisp

Monday, 14 January 2008

DSL for CppUnit tests

Search This Blog

Blog Archive

Labels

People:

EuroPython 2009