Hacking DailySocial's News

DailySocial.net is a tech blog founded by Rama Mamuaya. I enjoy visiting DailySocial and reading about the Indonesian tech scene. But yet I’ve grown weary of filtering news to read. So why not hack a news classifier I thought. Core Computing It took 10 minutes to hack something up in Python. Why so fast you ask? Because text processing is second nature in Python. NLTK is good but TextBlob is great. [Read More]

WordGrapher - Build A Graph from Words and Documents

Just recently (last night), Steven Loria updated TextBlob to v0.5.0. The module enabled a relatively easy way to do Natural Language Processing in Python. NLTK is a dependency so it’s familiar turfs with an easier getting started part. Based on this, I did also did an easy way to parse a set of words and documents to measure important keywords based on TF-IDF algorithm. A few minutes ago I uploaded the module to PyPi and tagged it as v0. [Read More]

Machine Favorited Tweets - Organically Improve Followers Count

Last night, I read about James Moriss’ blog post on how to gain more followers by favoriting other tweets relevant to your own tweets or some other keywords you are interested in. The downside was, you still have to input the keywords yourself. So I hacked up some codes let the codes figure out what keywords are proper. My first try into Python’s NLTK. WARNING: The codes below are not production ready codes. [Read More]

Google Adsense and Facebook Ads Annoyances

I have been having questions every time I am served ads by Google Adsense for the last few months. With Adsense, essentially everywhere you go on the Internet, you are followed by it. Almost any website with ads I visited were serving them from Adsense. It is annoying! Not because of the nature of being served ads but because of being served the same ads over and over. My first reaction when I began realizing how annoying this has been was to figure how Adsense can pull something like this. [Read More]

Naive Bayes Classifier - Revisited

During the last week, I’ve been following up work with a side project to do machine learning with Urbanesia’s comprehensive data. A lot of late night reading and fiddling with foreign codes were the highlights of my last week. Wanted to elaborate my implementations and how several kinds of technologies affect benchmarks particularly with classification performance. The repo for the codes is at Github here. During time span of the first batch of codes until now, I have made lots of changes to the codes and also the data store. [Read More]