Eureka, I can mongo-doop with dumbo, oops?

Ok, so thanks to the "klaasy" guy working on dumbo and uncomfortable Netbook power coding on the train, i was able to keep using my two recently favorite tools, Python and MongoDb. I merged the current mongo-hadoop repo with a fork which had implemented typedbytes mongo input and output formats (Cleaned it up a tinsy bit) and voila you can do a simple dumbo wordcount as follows:import astimport dumboclass Mapper:

Ramble Rumble for Languages

I’m tired of listening to programmers ramble about what framework or programming language is better. So let me get this straight, you want me to sacrifice speed (C++) for programming convenience and smoothness (Python). No i don’t, if you believe your program needs to be faster than the speed of light then by all means code with a fast language, you can write machine code for all i care. I like being able to quickly prototype programs on my net book while being on a train and as a result i have tied the knot with python.

Verbose TFxIDF (Weighting) Example with Dumbo, The Begining

Recently, i ventured into the world of information retrieval and data mining because its cool to learn something new and it is the future of the “InterWebbs”. Over the few weeks, i have buried my head into research papers, books, source code with my trusty Netbook as my side kick. One of the concepts i have picked up is the infamous TFxIDF. The super smart weighting algorithm that everyone seems to rant about.

Recipe for the Semantic Web - mongodb, hadoop, nltk, scrapy, django

Recently, i have been working on my dream (5 Years and counting) project i came up with during the first few months of my Freshman Year back in 06. It was supposed to be the best thing to happen to the internet (In my head) but i was never able to complete it.