Verbose TFxIDF (Weighting) Example with Dumbo, The Begining

Recently, i ventured into the world of information retrieval and data mining because its cool to learn something new and it is the future of the “InterWebbs”. Over the few weeks, i have buried my head into research papers, books, source code with my trusty Netbook as my side kick. One of the concepts i have picked up is the infamous TFxIDF. The super smart weighting algorithm that everyone seems to rant about.

Recipe for the Semantic Web - mongodb, hadoop, nltk, scrapy, django

Recently, i have been working on my dream (5 Years and counting) project i came up with during the first few months of my Freshman Year back in 06. It was supposed to be the best thing to happen to the internet (In my head) but i was never able to complete it.