Rolling Python 2.7 On CentOS 5.5

So, we have all heard the horror stories of replacing the 2.4 interpreter with a newer one that is not offered in any of the maintained repos. Clouds part, log files build, mutt inboxes overflow and yum (among many other binaries) basically die. It doesn’t sound very appealing at first glance, but maybe you need some cool modules that just aren’t compatible with 2.4x or possibly there have been some standard library additions that are absolutely necessary. Whatever the reason, replacing the 2.4x interpreter is f’n spooky and bad juju no matter how you smoke it.

But wait, does this mean that CentOS is just going to be limited to using such an old build of Python? Nope. We just need to setup a second interpreter and ensure it won’t interfere with the 2.4x version that seemingly everything else builtin to CentOS uses. It can take some searching and sifting to find the proper way to do this, but I found a slightly outdated method written by Matt Reiferson. I took this method and updated it for CentOS 5.5.

Some notes before we begin:

  • we will be using a fresh install of CentOS 5.5 with no packages selected at installation
  • we will be using the x86_64 build

Step 1
Lets login to your box as the root user:

ssh root@yourbox.com
cd

Step 2
Go ahead and install epel (extra packages for enterprise linux). This will add the epel repo to yum and give us a running start at some normally unavailable packages. It also provides some updates to existing repo packages:

rpm -Uhv http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS//rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm

Step 3
Install development tools, ssl and zlib. These development tools will allow us to build python properly as well as aide setuptools when using easy_install later in this tutorial. The ssl and zlib libraries will be found automatically when we build python and in turn become incorporated into the interpreter:

yum groupinstall 'Development Tools'
yum install openssl-devel* zlib*.x86_64

Step 4
Download all the files that we will need; sqlite, python and setuptools:

wget http://sqlite.org/sqlite-amalgamation-3.6.23.1.tar.gz
wget http://python.org/ftp/python/2.7/Python-2.7.tgz
wget http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg

Step 5
Build and install SQLite:

cd
tar xfz sqlite-amalgamation-3.6.23.1.tar.gz
cd sqlite-3.6.23.1/
./configure
make
make install

Step 6
Build and install Python 2.7. Now, there are some important things to discuss here. First and foremost we have given the option –prefix=/opt/python2.7. This option installs the python binaries and the python library in /opt/python2.7 (it will make the dir for us) rather than in /usr/local/ which would, as we stated above, replace the standard python interpreter and inherently be bad juju. The /opt directory in redhat based distributions is a directory provides a home for larger, mostly custom built, binaries and applications.

Also, we made sure that the interpreter is going to make use of multiple threads by adding the –with-threads option. I believe that by default, with-threads is true, but better to be safe than sorry. Finally, the –enable-shared option just allows python to be embedded into other apps:

cd
tar xfz Python-2.7.tgz
cd Python-2.7
./configure --prefix=/opt/python2.7 --with-threads --enable-shared
make
make install

Step 7
We need to now make sure that all the users of the system access the new interpreter when python is typed into standard in. To do this, we will need to add a couple of aliases and an addidtion to the $PATH to each users .bash_profile. This file is kept in the home directory of each user (ie: /home/usera/.bash_profile):

su - root
cd
nano .bash_profile
# add the following lines to the bottom of the file
alias python='/opt/python2.7/bin/python'
alias python2.7='/opt/python2.7/bin/python'
PATH=$PATH:/opt/python2.7/bin
# 'ctrl + o' to save the file and 'ctrl+x' to close the file
# now do the same for every other user, like this:
nano /home/usera/.bash_profile
# add the following lines to the bottom of the file
alias python='/opt/python2.7/bin/python'
alias python2.7='/opt/python2.7/bin/python'
PATH=$PATH:/opt/python2.7/bin
# 'ctrl + o' to save the file and 'ctrl+x' to close the file

Step 8
Now we need to update BASH so that it knows about the new shared libraries that we have put on the system. Lets create a symlink to them and then reload the cache of the shared libraries (don’t actually type –> below. It is just to show that you are on a new line):

su - root
cd
cat >> /etc/ld.so.conf.d/opt-python2.7.conf
-->/opt/python2.7/lib	#hit 'enter' and then 'ctrl+d'
ldconfig

Step 9
Now lets roll up some setup tools. This will also give us our conduit to the cheese shop (aka: pypi) which I am a complete fan of, despite the nay sayers. Also, we will add some more symlinks :

cd
sh setuptools-0.6c11-py2.7.egg
cd /opt/python2.7/lib/python2.7/config
ln -s ../../libpython2.7.so .

Step 10
Ignore Bridgett Cherry.

Step 11
You are done senior! Logout and log back in, to get the new bash profile stuff and continue on your way!

Monitor your django methods

So . . . it occurred to me that there wasn’t a real way to monitor the internals of a django app/project. Sure you can run munin (or any flavor of monitoring app) and watch apache response times, db hits, network lag and so on. All great, right? Well it turns out that if you are in the market for something “out of the box” to profile the actual methods inside your django shiz, then you are “out of luck”.

So, being inspired by some old code I found that showed promise of how to profile older django projects using hotshot and the need to know how I could “monitor django”, I came up with this 1.2 compliant profiler. Thanks to the original 0.96 dev for lighting the way:

import sys
import os
import re
import hotshot, hotshot.stats
import tempfile
import StringIO

from django.conf import settings

words_re = re.compile( r'\s+' )

group_prefix_re = [
    re.compile( "^.*/django/[^/]+" ),
    re.compile( "^(.*)/[^/]+$" ),
    re.compile( ".*" ),
]

class ProfileMiddleware(object):
    def process_request(self, request):
        if (settings.DEBUG or request.user.is_superuser) and 'lookie' in request.GET:
            self.tmpfile = tempfile.mktemp()
            self.prof = hotshot.Profile(self.tmpfile)

    def process_view(self, request, callback, callback_args, callback_kwargs):
        if (settings.DEBUG or request.user.is_superuser) and 'lookie' in request.GET:
            return self.prof.runcall(callback, request, *callback_args, **callback_kwargs)

    def get_group(self, file):
        for g in group_prefix_re:
            name = g.findall( file )
            if name:
                return name[0]

    def get_summary(self, results_dict, sum):
        list = [ (item[1], item[0]) for item in results_dict.items() ]
        list.sort( reverse = True )
        list = list[:40]

        res = "      tottime\n"
        for item in list:
            res += "%4.1f%% %7.3f %s\n" % ( 100*item[0]/sum if sum else 0, item[0], item[1] )

        return res

    def summary_for_files(self, stats_str):
        stats_str = stats_str.split("\n")[5:]

        mystats = {}
        mygroups = {}

        sum = 0

        for s in stats_str:
            fields = words_re.split(s);
            if len(fields) == 7:
                time = float(fields[2])
                sum += time
                file = fields[6].split(":")[0]

                if not file in mystats:
                    mystats[file] = 0
                mystats[file] += time

                group = self.get_group(file)
                if not group in mygroups:
                    mygroups[ group ] = 0
                mygroups[ group ] += time

        return "<pre>" + \
               " ---- By file ----\n\n" + self.get_summary(mystats,sum) + "\n" + \
               " ---- By group ---\n\n" + self.get_summary(mygroups,sum) + \
               "</pre>"

    def process_response(self, request, response):
        if (settings.DEBUG or request.user.is_superuser) and 'lookie' in request.GET:
            self.prof.close()

            out = StringIO.StringIO()
            old_stdout = sys.stdout
            sys.stdout = out

            stats = hotshot.stats.load(self.tmpfile)
            stats.sort_stats('time', 'calls')
            stats.print_stats()

            sys.stdout = old_stdout
            stats_str = out.getvalue()

            if response and response.content and stats_str:
                response.content = "<pre>" + stats_str + "</pre>"

            response.content = "\n".join(response.content.split("\n")[:40])

            response.content += self.summary_for_files(stats_str)

            os.unlink(self.tmpfile)

        return response

To use this middleware you will first have to install it inside of your django project and add it to the MIDDLEWARE_CLASSES tuple. Once this is in place all requests will travel through the middleware like so .

Now that every request is traveling through the newly created middleware, we want to see results right? To see a profile of any route inside your project simply type ?lookie at the end of the route:
Lookie
Lookie

Once you do this your browser will be filled with lines of hotshot profiles on each method that was called to make your magic request . . . magical:

Notice that it even profiles the methods that make django work ;) . Hotshot is magical.

I wouldn’t reccomend using this on anything live for too long as the middleware overheard might be too much for high traffic sites. You can always leave the code and just not include it inside the MIDDLEWARE_CLASSES tuple. Put it in place when things seem slow or you just want to test it out. Have fun and be safe people of the interweb.

Sanitize an abused list

It is often necessary to sanitize lists after substantial use and whilst using mixed sources. This is one of many ways you can achieve this with minimal effort. Apologies for the line return in the WP template.

    def flattenlist(L):
        import types
        WhiteTypes = ('StringType', 'UnicodeType', 'StringTypes', 'ListType', 'ObjectType', 'TupleType')
        BlackTypes= tuple( [getattr(types, x) for x in dir(types) if not x.startswith('_') and x not in whites] )

        tmp = []
        def core(L):
            if  not hasattr(L,'__iter__'):
                return [L]
            else :
                for i in L:
                    if isinstance(i,BlackTypes):
                        tmp.append(i)
                        continue
                    if type(i) == type(str()):
                        tmp.append(i)
                    else:
                        core(i)
            return tmp
        return core(L)

MongoDB ID’s and Django templates

So. Ran into a little frustration using MongoDB with Django; if you try to represent the _id field of a mongo object using something like {{ mongo_object._id }} or {{ mongo_object.id }}, you will get nothing but django barf. To circumvent this you can simply create a template filter. Don’t be scared . . . it’s actually pretty simple. Let get to it:

  1. first off create the following folder and files:
    mkdir -p /path/to/project/appName/templatetags
    touch /path/to/project/an_app/templatetags/__init__.py
    touch /path/to/project/an_app/templatetags/appName_tags.py

    It is important that you make the directory/files inside of an app folder and name it exactly templatetags. the file appName_tags.py can be named anything as long as it has a .py extension.

  2. now paste the following code inside of appName_tags.py:
    from django import template
    register = template.Library()
    
    @register.filter("mongo_id")
    def mongo_id(value):
        return str(value._id)
  3. all that is left is to utilize this filter. to do this, we simply load it inside of the corresponding template and viola:
    <html>
    	<body>
    		{% load appName_tags %}
    		<p>here is your mongodb record id: {{ object|mongo_id }}</p>
    	</body>
    </html>

tip: custom filters can also take two arguments instead of the one shown above. check out this example below (referenced from here):

def cut(value, arg):
    "Removes all values of arg from the given string"
    return value.replace(arg, '')

Python and Tokyo Dystopia

So . . . . after some back and forth with Qing, here is what is ready so far. I know it needs work, but I should be able to get some stuff ready for N-GRAM soon.

#include "Python.h"
#include <dystopia.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdint.h>

static PyObject *
put(PyObject *self,PyObject *args){
    const char *dbname;
    const char *stext;
    const int *kid;
    int ecode;
    bool result;
    TCIDB *idb;
    if (!PyArg_ParseTuple(args, "sis", &dbname, &kid, &stext))
        return NULL;
    /* create the object */
    idb = tcidbnew();
    /* open the database */
    if(!tcidbopen(idb, dbname, IDBOCREAT | IDBOWRITER)){
        ecode = tcidbecode(idb);
        fprintf(stderr, "open error: %s\n", tcidberrmsg(ecode));
    }
    result = tcidbput(idb,(int64_t)kid,stext);
    /* close the database */
    if(!tcidbclose(idb)){
        ecode = tcidbecode(idb);
        fprintf(stderr, "close error: %s\n", tcidberrmsg(ecode));
    }
    /* delete the object */
    tcidbdel(idb);
    return Py_BuildValue("b",result);
}

static PyObject *
search(PyObject *self, PyObject *args){
    const char *stext;
    const char *dbname;
    TCIDB *idb;
    int ecode, rnum, i;
    uint64_t *result;
    char *text;
    PyObject* pList;

    if (!PyArg_ParseTuple(args, "ss", &dbname, &stext))
        return NULL;

    /* create the object */
    idb = tcidbnew();

    /* open the database */
    if(!tcidbopen(idb, dbname, IDBOREADER | IDBONOLCK)){
        ecode = tcidbecode(idb);
        fprintf(stderr, "open error: %s\n", tcidberrmsg(ecode));
    }
    /* search records */
    result = tcidbsearch2(idb, stext, &rnum);
    pList = PyList_New(rnum);
    if(result){
        for(i = 0; i < rnum; i++){
            // printf("r[i]:%lld\n",result[i]);
            PyList_SetItem(pList, i, Py_BuildValue("i", (int)result[i]));
        }
        tcfree(result);
    } else {
        ecode = tcidbecode(idb);
        fprintf(stderr, "search error: %s\n", tcidberrmsg(ecode));
    }

    /* close the database */
    if(!tcidbclose(idb)){
        ecode = tcidbecode(idb);
        fprintf(stderr, "close error: %s\n", tcidberrmsg(ecode));
    }

    /* delete the object */
    tcidbdel(idb);

    return Py_BuildValue("O",pList);
}

PyMethodDef methods[] = {
  {"search", search, METH_VARARGS},
  {"put", put, METH_VARARGS},
  {NULL, NULL},
};

void initpykhufu(){
    PyObject* m;
    m = Py_InitModule("pykhufu", methods);
}

Monitor NAS Space

This handy script can be run daily to monitor how much storage you have left. You will find this most useful when you are leasing space in a VPS environment such as slice host or the like. I added email to the script to send an alert when things go awry ;)

#!/bin/bash
#*** SET ME FIRST ***#
NASUSER="Your-User-Name"
NASPASS="Your-Password"
NASIP="nas.yourcorp.com"
NASROOT="/username"
NASMNTPOINT="/mnt/nas"
EMAILID="admin@yourcorp.com"
#*** END SET ME ***#

GETNASIP=$(host ${NASIP} | awk '{ print $4}')

# Default warning limit is set to 17GiB
LIMIT="17"

# Failsafe
[ ! -d ${NASMNTPOINT} ] && mkdir -p ${NASMNTPOINT}
mount | grep //${GETNASIP}/${NASUSER}

# if not mounted, just mount nas
[ $? -eq 0 ] && : || mount -t cifs //${NASIP}/${NASUSER} -o username=${NASUSER},password=${NASPASS} ${NASMNTPOINT}
cd ${NASMNTPOINT}

# get NAS disk space
nSPACE=$(du -hs|cut -d'G' -f1)
# Bug fix
# get around floating point by rounding off e.g 5.7G stored in $nSPACE
# as shell cannot do floating point
SPACE=$(echo $nSPACE | cut -d. -f1)

cd /
umount ${NASMNTPOINT}

# compare and send an email
if [ $SPACE -ge $LIMIT ]
then
        logger "Warning: NAS Running Out Of Disk Space [${SPACE} G]"
        mail -s 'NAS Server Disk Space' ${EMAILID} <<EOF
NAS server [ mounted at $(hostname) ] is running out of disk space!!!
Current allocation ${SPACE}G @ $(date)
EOF
else
    logger "$(basename $0) ~ NAS server ${NASIP} has sufficent disk space for backup!"
fi

Port Forwarding

To continue the security talks I decided to show you all how to create unpartisan port forwarded. This daemon will allow eleven constant active connections. Just make sure that your destination is able to interrupt the data properly.

#!/usr/bin/env python

#
#     all modules in the stdlib
#

import socket
import sys
import thread

def main(setup, error):
    sys.stderr = file(error, 'a')
    for settings in parse(setup):
        thread.start_new_thread(server, settings)
    lock = thread.allocate_lock()
    lock.acquire()
    lock.acquire()

def parse(setup):
    settings = list()
    for line in file(setup):
        parts = line.split()
        settings.append((parts[0], int(parts[1]), int(parts[2])))
    return settings

def server(*settings):
    try:
        dock_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        dock_socket.bind(('', settings[2]))
        dock_socket.listen(5)
        while True:
            client_socket = dock_socket.accept()[0]
            server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            server_socket.connect((settings[0], settings[1]))
            thread.start_new_thread(forward, (client_socket, server_socket))
            thread.start_new_thread(forward, (server_socket, client_socket))
    finally:
        thread.start_new_thread(server, settings)

def forward(source, destination):
    string = ' '
    while string:
        string = source.recv(1024)
        if string:
            destination.sendall(string)
        else:
            source.shutdown(socket.SHUT_RD)
            destination.shutdown(socket.SHUT_WR)

if __name__ == '__main__':
    main('proxy.cfg', 'error.log')

Keep Functions In Memory Space

So you have a ton of functions that you are running and importing them at the time of execution seems . . . . legacy? Why not throw them into memory. Real memory. While I haven’t actually benchmarked it; the performance increase is drastic. I have commented inside the code, because my fingers are too tired to explain it again:

#!/usr/bin/env python
# Functions can be memoised "by hand" using a dictionary to hold
# the return values when they are calculated:

# Here is a simple case, using the recursive fibonnaci function
#     f(n) = f(n-1) + f(n-2)

fib_memo = {}
def fib(n):
    if n < 2: return 1
    if not fib_memo.has_key(n):
        fib_memo[n] = fib(n-1) + fib(n-2)
    return fib_memo[n]

# To encapsulate this in a class, use the Memoize class:

class Memoize:
    """Memoize(fn) - an instance which acts like fn but memoizes its arguments
       Will only work on functions with non-mutable arguments
    """
    def __init__(self, fn):
        self.fn = fn
        self.memo = {}
    def __call__(self, *args):
        if not self.memo.has_key(args):
            self.memo[args] = self.fn(*args)
        return self.memo[args]

# And here is how to use this class to memoize fib(). Note that the definition
# for fib() is now the "obvious" one, without the cacheing code obscuring
# the algorithm.
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

fib = Memoize(fib)

# For functions taking mutable arguments, use the cPickle module, as
# in class MemoizeMutable:

class MemoizeMutable:
    """Memoize(fn) - an instance which acts like fn but memoizes its arguments
       Will work on functions with mutable arguments (slower than Memoize)
    """
    def __init__(self, fn):
        self.fn = fn
        self.memo = {}
    def __call__(self, *args):
        import cPickle
        str = cPickle.dumps(args)
        if not self.memo.has_key(str):
            self.memo[str] = self.fn(*args)
        return self.memo[str]

touch: manipulative and useful

The touch tool is a very basic one that can also be very helpful. If you are in a POSIX compliant environment (unix/linux/osx) then you will most likely have this tool built-in. touch at the core is a tool that creates files. 0-byte files to be exact. If you execute the touch tool giving only the name of a file (non-existent) then a file will be created with said name. To explain the tool a little better we will use an Apple OS X (10.5) environment to play with

touch /Users/alfred/Desktop/my-new-doc.docu

There are two important things to notice in the above line. One is that I gave no arguments to touch other than the path to the file. Two is that I named the file with a four character extension. Now, this is something of importance: file extensions do not matter in a POSIX environment in most every case. I will write another post on that later, but please take note on the fact that they mean nothing ;)

Now. What we did there was created a zero byte file named my-new-doc.docu on my desktop. We can list the contents of my desktop and see that the file is in fact zero bytes:

alfreds-macbook-pro:~ alfred$ ls -ltr /Users/alfred/Desktop/
total 9174280
-rw-rw-r--  1 alfred  staff  4693598208 Aug 25  2008 h-lcbtrhel5_r.iso
-rw-r--r--  1 alfred  staff     2381114 Oct 15  2008 DSC01258.JPG
-rw-r--r--@ 1 alfred  staff      617742 Feb  4 13:56 IMG_0236.jpg
-rw-r--r--@ 1 alfred  staff      418850 Feb 26 22:13 IMG_0985.JPG
-rw-r--r--@ 1 alfred  staff         585 Apr 27 10:57 stugots.rtf
-rw-r--r--@ 1 alfred  staff       23552 Apr 29 14:52 Rainbow Spreadsheet.xlt
drwxr-xr-x  9 alfred  staff         306 Apr 29 14:53 live-stuff
-rw-r--r--@ 1 alfred  staff         390 May  7 11:09 synuse.py
-rw-r--r--  1 alfred  staff          98 May 13 19:44 new.txt
-rw-r--r--  1 alfred  staff           0 May 15 11:00 my-new-doc.docu
alfreds-macbook-pro:~ alfred$

ok. Now that I have shown you what touch does when a filename is given with no other arguments. Also, the file didn’t exist. What if the file does exist . . . . . :

alfreds-macbook-pro:~ alfred$ touch /Users/alfred/Desktop/my-new-doc.docu
alfreds-macbook-pro:~ alfred$ ls -ltr /Users/alfred/Desktop/my-new-doc.docu
-rw-r--r--  1 alfred  staff  0 May 15 11:20 /Users/alfred/Desktop/my-new-doc.docu
alfreds-macbook-pro:~ alfred$

Can you figure out what happened there? Look at line #3 above. If you are unfamiliar with the ls tool (‘list’), it is showing the modification time and that modification time is different from the first time we listed the file. What does that mean?

It means that simply changed the modification time of the file. It did not destroy the contents of the file. It didn’t recreate the file. It merely changed the modification time to match the exact moment the tool was executed. You might ask why this is useful and to be quite honest. . . . it isn’t really. What is useful are the optional arguments that the touch tool allows :)

The best argument and the only one that I am going to cover is -t

-t allows you to specify the modification time. That means you can say I would the “last modified” time to be June 18th, 1938 at 12:46 pm EST. Now I think everyone will be able to tell how this can be useful. Let see how we can use this:

touch -t 193806181246 /Users/alfred/Desktop/my-new-doc.docu

Take note of the data format that I gave -t argument. When using -t you should format your time as follows:

Century Year Month Date Hour Minute Second*
19 38 06 18 12 46 29


*seconds are entirely optional. I didn’t even use them, but I wanted to show that you can if you needed to

This is what the file looks like in Finder inside of OSX:

touch results in finder

touch results in finder

Now I think everyone can see how effective, helpful and manipulative the touch tool can be. Whether you just need to make an empty file of any kind or need to send a file to someone that you did a long time ago . . . . . touch is the tool for you.

zip, gZip and bZip2: all in one

Ever wanted to just be able to extract these on the fly without having to specify compression type at standard in? Here is a simple way to do so. It has it’s flaws (doesn’t actually read compression type), but it should work in high frequency:

#!/usr/bin/env python

import os
import tarfile
import zipfile

def extract_file(path, to_directory='.'):
    if path.endswith('.zip'):
        opener, mode = zipfile.ZipFile, 'r'
    elif path.endswith('.tar.gz') or path.endswith('.tgz'):
        opener, mode = tarfile.open, 'r:gz'
    elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
        opener, mode = tarfile.open, 'r:bz2'
    else:
        raise ValueError, "Could not extract `%s` as no appropriate extractor is found" % path

    cwd = os.getcwd()
    os.chdir(to_directory)

    try:
        file = opener(path, mode)
        try: file.extractall()
        finally: file.close()
    finally:
        os.chdir(cwd)

All of the modules are in the standard library just in case you were wondering.

« Previous Entries