Machine Favorited Tweets - Organically Improve Followers Count

Last night, I read about James Moriss' blog post on how to gain more followers by favoriting other tweets relevant to your own tweets or some other keywords you are interested in. The downside was, you still have to input the keywords yourself. So I hacked up some codes let the codes figure out what keywords are proper. My first try into Python’s NLTK.

WARNING: The codes below are not production ready codes. These are only proof of concepts and therefore should not be used in production environments without proper knowledge.

So now the warning is out of the way, let’s hack some codes. Here is the the original code from James Morrison.

As you can see, the codes are pretty much the building blocks of what I’m trying to achieve. I made my own modifications to produce the codes below.

Since I want to the codes to scan my own tweets, the my_tweets() function was introduced. I also introduced 2 regex patterns to filter URLs and @screen_name as a variable, hence twitter_namespace.

The next step was to figure out what to do with my last tweets. Over the years, I have grown in favor of TF-IDF to filter out keywords against its own document set and also against a larger part collection of documents. Using this analysis, 5 of Urbanesia’s articles are always on out Top 10 traffic by pageviews. You can read more about the topic here.

Sadly, Python’s NLTK does not have a TF-IDF module. It’s not too difficult to implement but a quick search brings me to this Github Gist. It was almost all that I needed except that I don’t want to do Keywords VS Doc VS Docs comparison, I just want important keywords either singles, bigrams and up until trigrams. So the codes need some refactoring which results to codes below.

For the codes to run, you need to install NLTK’s Stopwords module. Here’s how:

\$ python

It will show up either in the terminal or a GUI if you’re on a GUI environment. Since I also tweet in Bahasa Indonesia, so also need stopwords for it. I got from Pebahasa Github repo. Put the gist below into your ~/nltk_data/corpora/stopwords/indonesian