Naive Bayes Classifier in Python v1.0.4

Just finished work on a Naive Bayes Classifier in Python. Was interested to benchmark Python performance with large data sets. Also had the chance to get to know more about Cython. Indeed as a C extension, it increased performance. So this project all started from my own implementation in PHP here. As it turns out, PHP is more performant than Python as of version 1.0.4 of this library. But there are differences. The Python module redis available at PyPi is not compiled as a C extension while the PHP counterpart is definitely a C extension. So the bottleneck here I suspect is with the Redis client. Expect some more enhancements to the Redis clients in future versions. ...

June 7, 2013 · Batista Harahap

Naive Bayes Classifier - Revisited

During the last week, I’ve been following up work with a side project to do machine learning with Urbanesia’s comprehensive data. A lot of late night reading and fiddling with foreign codes were the highlights of my last week. Wanted to elaborate my implementations and how several kinds of technologies affect benchmarks particularly with classification performance. The repo for the codes is at Github here. During time span of the first batch of codes until now, I have made lots of changes to the codes and also the data store. I wasn’t sure at first, which database will bring the best performance. I’m testing on a fairly low spec hardware which is a Macbook Air Late 2011 with 4 GB DDR3, SSD and Intel Core i5 1.7GHz, this is nothing compared to a real server relatively. By the way, although relatively low spec, she’s got a name, it’s Claire. ...

October 16, 2012 · Batista Harahap

Simple Naive Bayes Classifier for PHP

Recently Hacker News is flooded with numerous articles discussing or at least mentioning Naive Bayes Classifier algorithm. It’s a basic algorithm to classify a set of words into a certain category (set) based on prior learning of words and its probabilities. It sounds simple enough but without actual technical guide book, it’s quite trivial since most of the information out there regarding it is too messy for newbies like myself. ...

February 27, 2012 · Batista Harahap