I recently had to implement language-identification for some experiments with Clueweb-12++. I am not a racket expert and this code is possibly very stupid but it was mainly a learning exercise.
I need to implement serialization in order to obtain the space gains. Currently I write bools out and I am carrying around extra info.
And this is the language-id module itself.
You can download the entire src here.