Batista R. Harahap

CodeIgniter Session With Memcache + Anti Bots!

Last night was a thrilling change of routine. Urbanesia was crippled because of the unprecedented growth of our MongoDB databases. I must admit that MongoDB is like Memcache with steroids, well it overdosed. MongoDB doesn't have any mechanism to limit its memory usages, the only limit we can define is the size of its individual files. Therefore, something must be done!

The second flaw was with CodeIgniter by design. By default, CodeIgniter uses its own Session handling mechanism either by using cookie and or database. The database types supported were limited to drivers available for CodeIgniter. Well we hacked it to use MongoDB a few months ago.

The boomerang was that CodeIgniter again by default does not filter bots for its session mechanism. Urbanesia is very attractive to bots and therefore all of our sessions were mostly bots, this equals junk data. The garbage collector for sessions was also very primitive. We had to do something about this.

We wanted a fast and simple yet elegant solution to tackle the problems above. MySQL is out of the question of course, Insert/Update activities will surely lock tables and we can't afford it. So we turned to Memcache. The most important built in feature with Memcache was its ability to limit memory usage and therefore giving us a garbage collector for stale sessions without extra codes at all!

There are no known Memcache session handling available with CodeIgniter as to my knowledge, so I went ahead and did a whole redo of our MY_Session library to accomodate Memcache as our Session storage engine. The first thing to do was to filter bots that frequently visit Urbanesia and deny them sessions, instead a cookie will do them just fine.

function __detectVisit() {
       $this->CI->load->library('user_agent');
       $agent = strtolower($this->CI->input->user_agent());

       $bot_strings = array(
           "google", "bot", "yahoo", "spider", "archiver", "curl",
           "python", "nambu", "twitt", "perl", "sphere", "PEAR",
           "java", "wordpress", "radian", "crawl", "yandex", "eventbox",
           "monitor", "mechanize", "facebookexternal", "bingbot"
       );

       foreach($bot_strings as $bot) {
               if(strpos($agent, $bot) !== false) {
                       return "bot";
               }
       }

       return "normal";
}

Yes it's quite primitive but it works and it satisfied our needs to filter the most frequent bots. The next step was to build namespaces adjusted with some of CodeIgniter's built in Session handling mechanisms.

function __build_namespace($sess_id, $ip_addr = 0, $user_agent = '') {
    $this->namespace .= $sess_id;
    if($this->sess_match_ip == TRUE && $ip_addr > 0)
        $this->namespace .= '#'.ip2long($ip_addr);
    if($this->sess_match_useragent == TRUE && $user_agent != '')
        $this->namespace .= '#'.md5($user_agent);
}

The 3 parameters accepted are all components within a standard CodeIgniter Session. Since CodeIgniter gave us options like sessmatchip and sessmatchuseragent, it's important to adjust the namespace as a filter of its own actually. One of the most difficult part was to decide whether to use JSON or serialized array to store custom user data. I decided to use JSON in the end. Here's a code snippet of setting a session value to Memcache.

$this->CI->memsess->set($this->namespace, json_encode($this->userdata), $this->sess_expiration);

FYI, I used another library called memsess, short of Memcache Sessions lol to let me shard Memcache arrays. I wanted an exclusive Memcache instance solely be used to store sessions. The main reason was to keep session data as tidy as possible meaning that there are no other data that will push the sessions data away unless we tell them to. This makes the Memcache instance far more predictable. Most of the codings were derived from CISession and modified to use Memcache as storage. I will not go into the full details of the library, instead I'm gonna give the code for the sessread() method. I'm pretty sure it's enough for you to experiment on your own.

function sess_read() {
    // Kick out bots!
    if($this->is_bot) {
        $this->sess_destroy();
        return FALSE;
    }

    $session = $this->CI->input->cookie($this->sess_cookie_name);

    if($session === FALSE) {
        return FALSE;
    }

    if ($this->sess_encrypt_cookie == TRUE) {
        $session = $this->CI->encrypt->decode($session);
    } else {
        $hash   = substr($session, strlen($session)-32);
        $session = substr($session, 0, strlen($session)-32);

        if ($hash !==  md5($session.$this->encryption_key)) {
            $this->sess_destroy();
            return FALSE;
        }
    }

    $session = $this->_unserialize($session);

    if (
        !is_array($session)
        OR ! isset($session['session_id'])
        OR ! isset($session['ip_address'])
        OR ! isset($session['user_agent'])
        OR ! isset($session['last_activity'])
    ) {
        $this->sess_destroy();
        return FALSE;
    }

    if (($session['last_activity'] + $this->sess_expiration) < $this->now) {
        $this->sess_destroy();
        return FALSE;
    }

    if ($this->sess_match_ip == TRUE AND $session['ip_address'] != $this->CI->input->ip_address()) {
        $this->sess_destroy();
        return FALSE;
    }

    if (
        $this->sess_match_useragent == TRUE
        AND trim($session['user_agent']) != trim(substr($this->CI->input->user_agent(), 0, 50))
        ) {
        $this->sess_destroy();
        return FALSE;
    }

    // Build namespace!
    $this->__reset_namespace();
    $this->__build_namespace($session['session_id'], $session['ip_address'], $session['user_agent']);

    $query = $this->CI->memsess->get($this->namespace);
    if(empty($query)) {
        $this->sess_destroy();
        return FALSE;
    }

    $row = json_decode($query);
    if(isset($row->user_data) AND $row->user_data != '') {
        $custom_data = $this->_unserialize($row->user_data);
        if(is_array($custom_data)) {
            foreach($custom_data as $key => $val) {
                $session[$key] = $val;
            }
        }
    }

    $this->userdata = $session;
    unset($session);

    return TRUE;
}

There you go, a glimpse into Session management in CodeIgniter with Memcache. This is a product of experiment because of needs. I'm sure it can be done in smarter ways, the sky is the limit ;)

27 April 2011 by Batista Harahap on bots | ci | codeigniter | memcache | mongodb | session | urbanesia
comments powered by Disqus