Shon’s blog ɹ ɐ ɥ ʞ ǝ ɥ s en-us Mon, 21 Dec 2015 00:00:00 +0530 <![CDATA[Postgres Array vs Join benchmark]]> Postgres Array vs Join benchmark

Here is little experiment to measure postgresql array’s performance. For the example problem let us take blog posts and tags.

Join approach

This is perhaps more common approach to model posts and tags. So let’s define model. Here I am using excellent Peewee. So we have three tables Post, Tag and PostTag. PostTag table maintains all post to tag records.

class Post(BaseModel):
    title = CharField(default='example title')

class Tag(BaseModel):
    name = CharField()

class PostTag(BaseModel):
    post = ForeignKeyField(Post)
    tag = ForeignKeyField(Tag)


Postgresql supports array columns. In this model array field Post.tags shall be used to maintain post-tag entries instead of PostTag model. Even Tag is not needed in this case.

class Post(BaseModel):
    title = CharField(default='example title')
    tags = ArrayField(CharField, default=[], index=True)

Complete code

import random

from tqdm import tqdm
from peewee import *
from myapp import db
from playhouse.postgres_ext import ArrayField, ForeignKeyField

class BaseModel(Model):
    class Meta:
        database = db
        only_save_dirty = True

class Post(BaseModel):
    title = CharField(default='example title')
    tags = ArrayField(CharField, default=[], index=True)

class Tag(BaseModel):
    name = CharField()

class PostTag(BaseModel):
    post = ForeignKeyField(Post)
    tag = ForeignKeyField(Tag)

def setup():
    no_of_posts = 25000
    no_of_tags = 10000
    tags_per_post = 15

    for t in (PostTag, Tag, Post):
        if t.table_exists():

    for t in (Tag, Post, PostTag):

    tags = [{'name': ('tag-%d' % i)} for i in range(1, no_of_tags)]

    posts = [{'id': i, 'tags': [('tag-%d' % j) for j in random.sample(range(1, no_of_tags), tags_per_post)]}
             for i in range(1, no_of_posts)]

    for post in tqdm(posts):
        post_id = post['id']
        post_tags = [{'post': post_id, 'tag': tag.split('-')[1]} for tag in post['tags']]

    print('Total posts: %d\nTotal tags: %d\nTags per post: %d\n' % (no_of_posts, no_of_tags, tags_per_post))

 def test_join():
    # => SELECT Count( FROM post INNER JOIN posttag ON ( = posttag.post_id) \
    #    WHERE (posttag.tag_id = 8);
    return == '8').count()

def test_array():
    # => SELECT Count("id") FROM post WHERE tags @> '{tag-8}';

Needless to say selecting tags for a article would be faster as we are elinimating the joins. But it would be interesting to see that how finding articles for given tag will perform.

And here are the numbers on my machine (Mac Air Ubuntu 15.10 Python 2.7.9).

$ python -i
>>> setup()
Total posts: 25000
Total tags: 1000
Tags per post: 15

$ python -mtimeit -s'import bench' 'bench.test_join()'
100 loops, best of 3: 8.32 msec per loop

$ python -mtimeit -s'import bench' 'bench.test_array()'
1000 loops, best of 3: 869 usec per loop
Mon, 21 Dec 2015 00:00:00 +0530 <![CDATA[Android Apps I use]]> Android Apps I use

These are android apps that I find super useful. Ofcourse the list excludes most popular ones like gmail and facebook. These are somewhat less known apps (or now getting popular).




My Tracks








Fri, 04 Dec 2015 00:00:00 +0530 <![CDATA[Crapbali]]> Crapbali

So Friday afternoon we decided to watch Bahubali. We are a bunch of geeks who code for living.


Tickets were a bit expensive but we were really curious, reading with positive reviews all over (five stars and all). The expectation was watching desi Lord of the rings equivalent. So winning argument in favour of watching Bahubali was “if they have sepnd 250cr making it, it should be worth spending extra 150 Rs. to watch it”. Argument is sily but fine, everyone really wanted to watch it. So we applied 20% discount offer on bookmyshow and booked the tickets.

We reached in time. And it began. Waterfall scenes were mesmerizing. Initial few minutes we were convinced that Bahubali is no ordinary boy .. he has great powers. Fine. He finds the mask and he decides to reach top. Fine. As he reached top of the waterfall and the love story story begins

Oh man .. from then until the interval it was crap [Crapbali]. Tatoo in flowing water, the heroine leading warriors and then convincing the tribe leader to have Bahubali lead them .. all so unconvincing even with creative freedom. We were looking at each other with the feeling what crap are we watching.

Alright post interval Bada Bahubali appears and then things were bit more interesting. Battle scene was epic and most of did enjoy it. But it wasn’t without flaws either. How come Rani maa could watch such huge battlefield like she is in some balcony watching a football or a cricket match .. and many more such logical questions.

Then as we were walking back to office, we stopped at a cafe and then discussions started

PD: Arey, me and Pi have a question for you
ME: Ok..
Pi: Who was the old lady who carried baby Bahubali through stream
ME: Umm.. had same question in my mind but let me think
RR: See Rani maa was not Bahubali’s real mother she was Duggubati’s mother, right?
ME: You mean Bada Bahubali..
Pi: What about Devsena

Finally we came to some conclusion.

There were more discussions and debates as we ordered some food. Two things everyone agreed

  • First half was utter crap
  • Second half was entertaining for sure but with many flaws

Best scene in the movie

Kattappa doing Shahrukh Khan in a scene where he runs to kill chota Bahubali, changes his mind midway as he running and slides


They could have done better job with dialogues. I mean

Mera vachan hi shaasan hai

wait .. what?

Songs are forgettable, not sure why are they even there.

Good movies stay in it’s acceptable genre limits. For example many Govinda/Jim carry comedy movies do try to be serious. They often do one thing and do it well. Epic/Superhero/Disaster movies has great graphics but most of the times do not try to be stupid love story.

Bahubali appeared to be intense historical fiction filled with struggle for power, battles, politics. Sure there is place for tender love and human relations provided they are convincing. You can’t turn on stupid mode for an hour long love story and then switch serious mode when second half begins.

Bahubali is big budget movie so there was no lack of resources to make a great movie. Still you can’t help but get the feeling that the epic battle scene is the only (technology) achievement but other than that movie just disappoints on all fronts.

And if you ask me to rate the movie... well two stars or may be just one star .

Also read Wogma’s review.

Sat, 01 Aug 2015 00:00:00 +0530 <![CDATA[Namecheap to AWS Route53 DNS Migration story]]> Namecheap to AWS Route53 DNS Migration story

System administration is not my job but then sometimes I need to wear that hat to help the team. Here is how I managed DNS migration from Namecheap to AWS Route53.

Get the Zone file

Namecheap support was kind enough to send me zone file when I requested for one.

Format the zone file

  • Remove whitespeces in the beginning of all lines in zone file
  • Add below text as first line of zone file
  • Make sure quote all TXT record values

Install cli53

pip install cli53

Zone export

cli53 create
cli53 import --file


Find out your name server. Logon to AWS Route 53 console and find out NS entries. Pick one. In my case it was

pip install \
dns_compare -z --file \
    --server -t false
dns_compare -z --file \
    --server # use your aws ns
# -- OR --- use -t to ignore ttl differences
dns_compare -z --file \
    --server -t false



Do understand steps below will cause some downtime (number of hours).

  • Go to namecheap’s “Transfer DNS to Webhost” page for your domain. Add new name servers. Save.
watch dig -t ANY @
Sun, 21 Dec 2014 00:00:00 +0530 <![CDATA[UI Testing and BDD]]> UI Testing and BDD

Recently I had an opportunity to automate UI testing of the web application we have developed. So I found this excellent UI testing framework Splinter. It being written in Python I was instantly confortable.

# From Splinter's website
from splinter import Browser

 with Browser() as browser:
     url = ""
     browser.fill('q', 'splinter python acceptance testing')
     button = browser.find_by_name('btnG')
     if browser.is_text_present(''):
         print "Yes, the official website was found!"
         print "No, it wasn't found... We need to improve"

Ah well sweet. It supports multiple webdrivers including remote. So it’s possible to integrate it with saucelabs which makes it possible to test on web browsers which are not on your dev box.

Also that you have access to live browser session in Python makes it even more pleasant.

I happened to read on BDD or (Behavior-driven development) and then soon stumbled upon behave. It’s good idea to use it for testing as it provides nice seperation in test cases implementation and test cases. It uses The Gherkin language to describe testing scenarios.

Something like below which even non tech person in your team can write

Feature: SEO Test

    Scenario: Search Google for Splinter
        When I visit ""
        And I fill in "q" with "Splinter Python"
        And I press "btnG"
        Then I should see "splinter.cobrateam" within 5 seconds

    Scenario: Search Google for Shekhar's Blog
        # would fail
        When I visit ""
        And I fill in "q" with "Shekhar Tiwatne"
        And I press "btnG"
        Then I should see "" within 5 seconds

Environment Setup

To make things easier

pip install --upgrade splinter behave behaving
mkdir -p features/steps
touch features/steps/
mv features

Create features/steps/ with below code. It essentially imports steps implementation.

from behave import step
from behaving.web.steps import *
from behaving.personas.steps import *


We are ready. Simply execute behave and see firefox window executing all tests for you. You should see one test passed and one failed as intended.


What next

Read how to implement steps (grammer) here.

Thu, 19 Jun 2014 00:00:00 +0530 <![CDATA[Visapur trek | Route]]> Visapur trek | Route

Visapur is one the twin forts of Lohgad and Visapur near Lonavala. Visapur is less popular than Lohgad but no less beautiful. However route to Visapur is often confuses even the regular trekkers. While we were planning Lohgad trek last month I couldn’t find any route on Internet. So I did track the route.

I used My Tracks android application by Google.

About the route Like described at this blog there are four options to Visapur. We took the ‘Patan gaon’ option.

From Pune we boarded 6.30am Mumbai local and reached Malvali at around 7.30am. Our friends coming from Mumbai had to go to Lonavala and then back to Malvali. From Malvali station we crossed the railway tracks and started walking towards Patan gaon. There is a flyover after station where you see Pune Mumbai express highway.

After a short walk Visapur is visible and then in another hour and half we were at the top.

Note that the return route is different. That is beacuse while returning we took different route and lost the path but could manage with local villager’s help. I won’t suggest returning that way.

Here is the route

Wed, 10 Jul 2013 00:00:00 +0530 <![CDATA[JSON-RPC]]> JSON-RPC

JSON-RPC protocol has got much less attention than it deserves. It is so elegant and simple. Our experience of working on JSON-RPC was plesant.

For uninitiated JSON-RPC is lightweight remote procedure call protocol similar to XML-RPC. I find it incredibly useful in building easy to maintain applications.

We effectively used JSONRPC in our project Cowoop to make it easy to debug application.

It is often seen that unless it is an open source application, in the design phase very little attention is paid towards maintainablity of the application. With many no so clearly seperated layers it makes it increasingly difficult to debug. This makes bug fixing painful and no fun process for those who are working on it. And further these in most cases are not the architects who designed the application. Architect is either moved on to design some other project or is working on next release.

Lets directly jump to example code. So here is my python function.

>>> def add(a, b):
        return a + b

>>> add(1, 2)

Project exposes above function add using JSON-RPC. We use Flask + jsonrpc2 to serve JSONRPC over http.

Let us see how does jquery JSONRPC plugin calls this API.




jsonrpc function that you see in above screenshot is part of our js client library. Really it is a a few lines wrapper on top of jquery jsonrpc plugin function jsonRPC.request()

Do you think JSONRPC2 is fairly successful in helping create a maintainable application?



There is no word in JSONRPC2 specification about Authentication yet (not a complaint) . But I think it is necessary for further success of JSONRPC. It’s possible to use http auth but not many would prefer it so I see people implementing two type of solutions.

Authentication | Using cookies

Session id is kept in authcookie and sent/validated with every http request.

Authentication | Using special parameters in rpc call

Session id is passed as special parameter in every rpc call. For eg. above add function may be invoked like below

add(1, 2, _session='somesessionid')

Authentication | Ideal

Sun, 13 Jan 2013 00:00:00 +0530 <![CDATA[Choosing MVC(MVVM) library for Cowspa]]> Choosing MVC(MVVM) library for Cowspa

In the initial phase of this open source project cowspa developed by Cowoop, we had chosen knockout.js as important client side component. However during the implementation team bowed to the delivery pressure and couldn’t get a chance to learn, explore and use knockout or any other MVC library. Certainly a mistake! Ultimately we have to paid the price of having to maintain really complex pure jquery based bug-prone codebase. However at this stage, I decided to rectify this and started looking for different client side templating and MVC libraries. Essentially the library that binds the templates to data and can auto-update as the data changes. I was stunned by choice of libraries/frameworks available. There are just too many of them. Ones I liked are

Ember is perhaps the best one even according to this post. Angular is very capable and is sheer magic. And well there are many more like excellent backbone.js. We finally narrowed down two options: Knockout and Agility. Agility, I liked most, mainly because of it’s clear syntax and looked like producing very maintainable code. So it was really tempting to go for it. But for us at this point of time knockout scored better at mainly with it’s project maturity (age). Agility is at 0.1.2 at this moment and knockout 2.0.0 and hence knockout enjoys much bigger community.

So while we go ahead now with knockout I will certainly keep an eye on agility’s development would love to find an excuse to use it.

Wed, 04 Apr 2012 00:00:00 +0530 <![CDATA[Configuring your ubuntu for faster internet access]]> Configuring your ubuntu for faster internet access

While there is a lot already written here my quick howto

$ sudo bash
# apt-get install dnsmasq squid
# echo "listen-address=" >> /etc/dnsmasq.conf
# echo "no-dhcp-interface=" >> /etc/dnsmasq.conf
# vi /etc/dhcp3/dhclient.conf
# # ^ uncomment line #prepend domain-name-servers;
# vi /etc/resolv.conf  # Add nameserver
# /etc/init.d/dnsmasq restart
# vi /etc/squid/squid.conf

http_port 3128
visible_hostname localhost

acl all src

cache_effective_user proxy
cache_effective_group proxy

http_access allow all
icp_access allow all

positive_dns_ttl 1 month
negative_dns_ttl 1 minute
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

cache_dir ufs /cache 400 16 256
cache_store_log none

# mkdir /cache # I have this dir on reizerfs partition
# chown proxy.proxy /cache

# /etc/init.d/squid restart

Configure your browser to use Also read detailed dnsmasq setup article

Fri, 04 Feb 2011 00:00:00 +0530 <![CDATA[xfce and ubuntuone]]> xfce and ubuntuone

I do like Ubuntu Netbook Remix’s UI. However with 10.04 it’s just gone so unstable for me.

  • After login when system prompts for keyring secret, UNR environment crashes and drops to gnome. I have to relogin if I need UNR env.
  • After I removed a few packages it could not just start gnome panel causing a great inconvenience. I guess this is due to Evolution integration with latest Ubuntu. Mail client I like and use is Thunderbird. I cant switch to Evolution.
  • Initially after 10.04 release it was damn slow to respond, so had to do some work around to get it to acceptable speed.

Considering all that I decided to switch to Xfce. It just worked like charm. But now I also use (and like :) ) UbuntuOne service for my backup. UbuntuOne is not integrated for XFCE. Also you cant do everything from UbuntuOne’s cli.

For more details you might want to check Ubuntu One wiki .

Fri, 26 Nov 2010 00:00:00 +0530 <![CDATA[Reliance Netconnect Broadband+ on Linux]]> Reliance Netconnect Broadband+ on Linux
  • It works. Make sure while purchasing you inform them that you use Linux
  • It’s fast and reliable in Pashan, Pune area
  • Below config worked for me on Ubuntu 9.10 AND 10.04
  • There are Linux drivers on the CD but I could not get it working on Ubuntu 9.10.
  • For activation, I had to use Windows :(
Fri, 26 Nov 2010 00:00:00 +0530 <![CDATA[Redis patterns | search]]> Redis patterns | search


You want to implement search against user objects stored in redis using Python. Something like querying for all user ids whose username begins with “an”.


Here we have user objects stored in as hashes with “user:obj:” as prefix.

For example

We need some extra data structures to support our search i.e. (search user objects where username begins with given phrase. So search for jo should match John, Joe and so on. We will use sorted sets of all usernames and will assign every element a score. This score is a float and helps us in finding the matching words.

Some scores for eg.

So for above four string if we find strings that has score that is => 0.097 and < 0.098, we find all strings that begins with ‘a’


# Search usernames that begins with given phrase
# usernames: (username1, username2, ..)
# userscore:<username>: float
# user:obj: { id: int, username: string }

usernames_zset = "usernames"

def my_ord(c):
    return "%03d" % ord(c)

def get_score(s):
    return '0.' + ''.join(map(str, map(my_ord,s)))

def get_next_score(s):
    s_score = get_score(s)
    part0 = s_score[:4]
    c = s_score[4]
    next_c = str(int(c)+1)
    part1 = s_score[5:]
    return part0 + next_c + part1

def add_user(conn, username, score):
    # The User Object
    uid = conn.incr('user:idgen')
    conn.hset('user:obj:%d' % uid, 'id', username)
    # datastructures necessary to implement search
    conn.zadd(usernames_zset, username, score)

def add_test_data(conn):
    test_data = ('abc', 'ab', 'a', 'shekhar', 'shon', 'sh', \
        'zxcvbnmasdfghjklqwertyuiop0', 'zxcvbnmasdfghjklqwertyuiop00')

    for username in test_data:
        score = get_score(username)
        add_user(conn, username, score)

import redis
conn = redis.Redis()


# conn.zrange(usernames_zset, 0, -1) # Whole set
a_score = get_score('a')
b_score = get_next_score('a')

print 'Find all users starting with "a" -> INF'
print conn.zrangebyscore(usernames_zset, a_score, 'INF')
print 'Find all users starting with "a"'
print conn.zrangebyscore(usernames_zset, a_score, b_score)
print 'Find all users starting with "a" limit 2'
print conn.zrangebyscore(usernames_zset, a_score, 'INF', 0, 2)


This to demonstrate simple redis pattern and using it in Python.

See Also

There are already some good writeups on related topics.

Fri, 26 Nov 2010 00:00:00 +0530 <![CDATA[All izz well?]]> All izz well?

Surely one of the most over-hyped film. What was good? Aamir, Music and the first half laughs. Post interval it becomes a predictable, boring, idiot bollywood movie.

Munnabhai s were certainly better. I would say watch it on TV or atleast don’t pay 3 times higher than usual for tickets like we did. 4 and half stars by critic hmmmm..

Fri, 26 Nov 2010 00:00:00 +0530 <![CDATA[Unicode]]> Unicode

Pulling your hairs over some i18n bug or you fix it but are not able to explain what. This is little help in getting fair idea about unicode/codecs/encoding/decoding etc.

Quick tips:

  1. It does not make sense to have a string without knowing what encoding it uses.
  2. Utf-8 is a way of storing string of Unicode code points.
  3. Encoding: Transforming a unicode object into a sequence of bytes
  4. Decoding: Recreating the unicode object from the sequence of bytes is known as decoding. There are many different methods for how this transformation can be done (these methods are also called encodings).


Must Read


Continue reading

Sat, 28 Mar 2009 00:00:00 +0530 <![CDATA[Youtube flash videos to DivX (on Linux)]]> Youtube flash videos to DivX (on Linux)

This how I convert flash I usually use Firefox VideoHelper Addon to download youtube videos. To play them on my Philips DVP5986K DVD player from USB drive, I need to convert it to DivX.

mencoder /home/shon/Desktop/file-864260998.flv -ovc lavc -oac mp3lame -ffourcc DX50 -o out.avi

Sun, 15 Mar 2009 00:00:00 +0530 <![CDATA[Tata Indicom USB Modem on Linux]]> Tata Indicom USB Modem on Linux

cat /etc/wvdial.conf:

[Dialer Defaults]
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
Modem Type = USB Modem
Baud = 460800
New PPPD = yes
Modem = /dev/ttyACM0
ISDN = 0
Stupid mode = 1
Phone = #777
Password = internet
Username = internet

Don’t understand above. Um ok but I am too lazy to explain.

Sat, 14 Mar 2009 00:00:00 +0530 <![CDATA[LinkedIn backlash]]> LinkedIn backlash

Linkedin is one of the few sites that has certainly impressed me with it’s clever design. I would rate it very highly for professional networking. It has one very popular feature “recommendations”. Well while I am not against recommending or get recommended by, as I have done both in past. But I see people who think that more and more people they have in their list (no matter how they know well each other professionally) and more recommendations they have received (mostly by requesting others) would make their prospects better. Umm oh, I wonder why are they are madly behind this. I receive a quite a few requests often. Some morning you check your emails and you see that some colleague in your company 2 years ago sends you a mail with subject “can you endorse me?”. And the email says something similar.

Dear ,

I’m sending this to ask you for a brief recommendation of my work that I can include in my LinkedIn profile. If you have any questions, let me know.

Thanks in advance for helping me out. -

Now this above guy could be someone I don’t know that well how well he/she is skilled. But now I can’t deny the request so in a day or two I would look at some other recommendations available for my other LinkedIn friends, copy some matter and send what is requested for. He happily accepts and send me a nice Thank you email. I see people who worked completely unrelated departments and has probably no ability to judge other’s work, go praising out of good relationships. May be what should happen on Orkut testimonials or somewhere similar. Does these people who have tens and hundreds of people in network and so many recommendations have no work other than hopping the jobs and sending such requests.

Next time I interview a guy with many endorsements , I would probably more cautious hiring him.

Sat, 14 Mar 2009 00:00:00 +0530 <![CDATA[My open source projects]]> My open source projects
  • Cowoop: Open source application to manage coworking business.
  • httpagentparser: Python HTTP Agent Parser.
  • SPHC: Simple Pythonic HTML Creator.
  • Syncer: A event daemon based on Pyro.
  • Stockie: A personal portfolio manager for an Investor
Tue, 17 Feb 2009 00:00:00 +0530 <![CDATA[Qemu networking setup]]> Mon, 16 Feb 2009 00:00:00 +0530 <![CDATA[Using DOT language to produce Flowchart]]> Using DOT language to produce Flowchart

better than struggling with the graphical tools.

$ cat
digraph FlowChart {

 node [
         fontname = "Bitstream Vera Sans"
         fontsize = 8
         shape = "record"

 edge [
         fontname = "Bitstream Vera Sans"
         fontsize = 8
         fontcolor = "Red"

// all blocks
greet [label="Hello, techie", shape="oval"]
which_os [label="What OS do you use?" shape="diamond"]
like_me [label="Great, me too!", shape="oval"]
which_browser [label="You must be using firefox", shape="diamond"]
ff [label="Cool", shape="oval"]
bye [label="Bye", shape="oval"]

// relations
greet -> which_os
which_os -> like_me [label="I use Linux"]
which_os -> which_browser [label="I use Windows"]
which_browser -> ff [label="Right"]
which_browser -> bye [label="what firefox?"]

Here is the result.

$ dot -Tpng -o test.png && eog test.png
../../../_images/002.png ]]>
Tue, 09 Dec 2008 00:00:00 +0530 <![CDATA[Getting older, getting better and better!]]> Getting older, getting better and better!

Python programming is joy. I was stuck on python 2.3 at my work for long and could not really get chance to explore later versions. Now that I got the opportunity doing re-architecture of the product I started exploring these. I am more than excited looking at deque, groupby, defaultdict and much more ... Also on top of it there exist excellent python softwares like twisted, sqlalchemy, turbogears makes it even more cool.

It’s little pity that the language is stll somewhat less recognized than others. Or there are more hyped languages exist.

Thu, 17 Apr 2008 00:00:00 +0530 <![CDATA[Contract verification in Python]]> Contract verification in Python
import zope.interface.verify

class ITest(zope.interface.Interface):
   def foo(arg1): pass
   def bar(): pass

class Test(object):
   def foo(self): pass

class Test2(object):
   def foo(self, arg1): pass

class Test3(object):
   def foo(self, arg1): pass
   def bar(self): pass

for cls in (Test, Test2, Test3):
       if zope.interface.verify.verifyClass(ITest, cls):
           print "OK: %s correctly implements %s" % (cls.__name__, ITest.__name__)
   except Exception, err:
       print "Error detected with %s's implementation: %s" % (cls.__name__, err)
Thu, 17 Apr 2008 00:00:00 +0530