Friday, May 6, 2011

Defining dynamic functions to a string

I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...

What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.

From stackoverflow
  • It is possible to map string operations to numbers:

    >>> import string
    >>> ops = {1:string.split, 2:string.replace}
    >>> my = "a,b,c"
    >>> ops[1](",", my)
    [',']
    >>> ops[1](my, ",")
    ['a', 'b', 'c']
    >>> ops[2](my, ",", "-")
    'a-b-c'
    >>>
    

    But maybe string descriptions of the operations will be more readable.

    >>> ops2={"split":string.split, "replace":string.replace}
    >>> ops2["split"](my, ",")
    ['a', 'b', 'c']
    >>>
    

    Note: Instead of using the string module, you can use the str type for the same effect.

    >>> ops={1:str.split, 2:str.replace}
    
  • If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:

    def all_lines(somefile, methods):
      """Apply a sequence of methods to all lines of some file and yield the results.
      Args:
        somefile: an open file or other iterable yielding lines
        methods: a string that's a whitespace-separated sequence of method names.
            (note that the methods must be callable without arguments beyond the
             str to which they're being applied)
      """
      tobecalled = [getattr(str, name) for name in methods.split()]
      for line in somefile:
        for tocall in tobecalled: line = tocall(line)
        yield line
    
  • First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)

    Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.

    class O(object):
        c = str.capitalize
        r = str.replace
        s = str.strip
    
    def process_line(line, *ops):
        i = iter(ops)
        while True:
            try:
                op = i.next()
                args = i.next()
            except StopIteration:
                break
            line = op(line, *args)
        return line
    

    The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.

    The process_line function is where all the interesting things happen. First, here is a description of the argument format:

    • The first argument is the string to be processed.
    • The remaining arguments must be given in pairs.
      • The first argument of the pair is a string method. Use the shortened method names here.
      • The second argument of the pair is a list representing the arguments to that particular string method.

    The process_line function returns the string that emerges after all these operations have performed.

    Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.

    f = open("parrot_sketch.txt")
    for line in f:
        p = process_line(
            line,
            O.r, ["He's resting...", "This is an ex-parrot!"],
            O.c, [],
            O.s, []
        )
        print p
    

    Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.

  • To map names (or numbers) to different string operations, I'd do something like

    OPERATIONS = dict(
        strip = str.strip,
        lower = str.lower,
        removespaces = lambda s: s.replace(' ', ''),
        maketitle = lamdba s: s.title().center(80, '-'),
        # etc
    )
    
    def process(myfile, ops):
        for line in myfile:
            for op in ops:
                line = OPERATIONS[op](line)
            yield line
    

    which you use like this

    for line in process(afile, ['strip', 'removespaces']):
        ...
    

0 comments:

Post a Comment