Small metaprogram using Python

If you have used so many softwares, you might have come across some that shows some Tips dialog with a checkbox that says "Show tip on startup." If you get annoyed, you uncheck it. It won't appear next time.

Guess how it is programmed. There must be some kind of a properties file or database that stores some boolean (or any other form of data) to store this setting. And everytime the application is loaded, it checks for the variable unnecessarily wasting cycles. It won't remember that you said you don't want to see any tips. It's like an unintelligent being (which is what it is.. afterall it's just a set of instructions.) Metaprograms can be made to look intelligent (from developer's point of view.)

This program is a very basic example. It prints out Hello and Good Morning. It asks you whether you want to be greeted with Hello everytime. If you say no, then it removes the related code from itself. IT CHANGES ITSELF! (Does it sound like Matrix? I Robot? Bicentennial Man?) It changes itself like the frogs in Jurassic Park novel.

CODE:

def deleteHello():

    f = open("sixthsense.py","r")

    output = []

    for line in f:

        if not line.endswith("deleted#\n"):

            output.append(line)

    f.close()

    f = open("sixthsense.py","w")

    f.writelines(output)

    f.close()

 

 

print "Hello" #line that might be deleted#

raw_input()   #line that might be deleted#

print "Good Morning"

raw_input()

choice = raw_input("Should I greet you with Hello always? (y/N)")  #line that might be deleted#

if choice != "y":  #line that might be deleted#

   deleteHello()   #line that might be deleted#

Remember to copy the file before running as you know that it can change itself.

PS:

One interesting thing happened.. I made a foolish mistake.

Before this condition,

 if not line.endswith("deleted#\n"):

I actually used this line.

 if not "#line" in line:

That was really stupid. Because the condition line itself satisfies the condition and gets removed. :P 

Posted via email from Art, Science & Technology

A web crawler and twitter crawler using Python

I still haven't started studying Python properly but doing fun tasks with it and learning on the way. I have made a Web Crawler that when given a url crawls through the child links recursively in a depth first order.

Web Crawler Code:

 

 

import urllib,string,re,sys

DEPTH = 0

MAXDEPTH = 0

SITENO = 0

LINKS = []

if len(sys.argv) < 3:

    print "Usage: python crawl.py <url> <depth>"

    sys.exit()

else:

    MAXDEPTH = int(sys.argv[2])

def findLinks(url,parent,score):

    global DEPTH

    global SITENO

    global LINKS

    global MAXDEPTH

    DEPTH += 1

    if DEPTH > MAXDEPTH:

        DEPTH -= 1

        return

    SITENO += 1

    print "DEPTH: %d\tSiteNo: %d\tSite: %s\tParent: %d\tScore: %d" % (DEPTH,SITENO,url,parent,score)

    LINKS.append(url)

    body = urllib.urlopen(url).read()

    try:

        start=string.index(body,"<body")

        body = body[start:]

    except ValueError:

        pass

    links = re.findall('''href=["']([^"']+)["']''', body, re.I)

    links = [ link \

              for link in links \

              if (link.startswith("http://") and \

              (not (link.endswith(".xml") or link.endswith(".css") or link.endswith(".js"))) and \

              (link not in LINKS)) ]

    if not links:

        DEPTH -= 1

        return

    score *= len(links)

    parent = SITENO

    for link in links:

        findLinks(link,parent,score)

    DEPTH -= 1

findLinks(sys.argv[1],0,1)

 

 

Usage:

Run "python crawl.py http://en.wikipedia.org/ 4"

Or start the depth with 1, 2 and go on for a better understanding of the working. Don't give big depth unless your processor has enough juice. I couldn't run this in my pc. I ran it in a high speed server where I have a shell access. It will display a set of fields. Depth of root node will be 1. SiteNo is a unique identifier for each site. Parent id of a site refers to its parent's siteno. Score(just for fun) starts with 1 for the root node. Score of a child node is the score of its parent node multiplied by number of siblings.

 

Crawl

 

Now combining this with my previous twitter script, I wrote a twitter crawler that crawls through friend nodes in a similar networked tree fashion. But too bad, there's a limit of only 150 requests per hour per client on twitter.

Twitter Crawler Code:

import urllib,sys

try:

    import json

except ImportError:

    import simplejson as json

DEPTH = 0

MAXDEPTH = 0

IDNO = 0

NAMES = []

if len(sys.argv) < 3:

    print "Usage: python twittertree.py <username> <depth>"

    sys.exit()

else:

    MAXDEPTH = int(sys.argv[2])

def findFriends(name,parentidno):

    global DEPTH

    global MAXDEPTH

    global IDNO

    global NAMES

    DEPTH += 1

    if DEPTH > MAXDEPTH:

        DEPTH -= 1

        return

    IDNO += 1

    print "DEPTH: %d\tID_No: %d\tUser_ID: %s\tParent_ID_No: %d" % (DEPTH,IDNO,name,parentidno)

    NAMES.append(name)

    try:

        jsondata=json.loads(urllib.urlopen('https://api.twitter.com/1/friends/ids.json?screen_name='+name).read())

    except IOError:

        DEPTH -= 1

        return

    friendsidlist = []

    try:

        friendsidlist = ",".join(map(str,jsondata['ids'][:100]))

    except KeyError:

        print "Rate Limit exceeded.. Only 150 requests per hour allowed.."

        sys.exit()

    jsondata2 = json.loads(urllib.urlopen('https://api.twitter.com/1/users/lookup.json?user_id='+friendsidlist).read())

    friendsnamelist = []

    try:

        friendsnamelist = [each['screen_name'] for each in jsondata2 if each['screen_name'] not in NAMES]

    except TypeError:

        pass

    if not friendsnamelist:

        DEPTH -= 1

        return

    parentidno = IDNO

    for eachfriendname in friendsnamelist:

        findFriends(eachfriendname,parentidno)

    DEPTH -= 1

findFriends(sys.argv[1],0)

 

Run "python twittertree.py vigneshwaranr 2"

Posted via email from Art, Science & Technology