Code Answer: Penalties of a script constantly looping in the background

I know this topic has been discussed in the past, but I am a tiny bit paranoid about resource usage.

I am looking into writing a daemon for queing jobs to archive files into zip files for a web app i am working on. It would behave something like this:

while True:
    while morejobs():
        zipfile()
    sleep(15seconds)

What sort of resources would be consumed by a process constantly looping away in the background (provided there is nothing to zip)? Is there anything i should be aware of or careful of?

Edit:

It looks like most of the answers are concerned about the duration of the sleep. I blindly set it to sleep (in the code example) for 15 milliseconds at a time. I actually intended it to be 15 seconds, and i have 'updated' the code to reflect that.

Edit Again:

What would be the lowest reasonable time for the script to be sleeping? Is 5 seconds to low? I have no idea what the load of this app would be or how often new jobs would be added to the queue.

From stackoverflow

This has the potential to hammer your CPU, even when there is nothing to process.

Edit: Actually sleep() takes an argument as a number of seconds, not milliseconds so I don't think the CPU is going to be a problem. Still, perhaps you could use a cron job to schedule something like this.

joshhunt : This is supposed to be redistributable, and be able to be set up with the minimal amount of fuss. I think i am already adding a bit to much 'complexity' without having cron jobs all over the place.
Instead of sleeping for 15 seconds, it might be better to have a callback which restarts your job when new files arrive.
- Process available files
- Check for new files every 60 seconds or whatever interval you choose
- When a new file arrives, process it and any others which may have arrived since the last interval
S.Lott : sleep(15) sleeps for 15 seconds, not 15 milliseconds. http://docs.python.org/library/time.html#time.sleep

Chris Ballance : @S.Lott Thanks for the clarification, in my primarily .NET world Sleep() is specified in MS.
Why not just use a cron job to run a script every minute or so? At least you are not depending on your own loop to be continuously running in the background.

Ber : A cron job gives a lot more CPU load as a loop sleeping the same duration, since the program has to be loaded into memory and a new process is created for every iteration. With the loop it just runs one more cycle, then goes back to sleep.

Peter D : I think the overhead is worth the standard tried and true simplicity of cron to a custom script running contantly on my machine. To each their own I guess.
Besides the cost of hammering your cpu, there is the cost of the morejobs() call. You can mitigate by using a higher value for sleep(), or you can use some sort of mailbox that receives requests and then fires the zipfile() operation.

It is normal for some operations to have a background thread scheduled that temporarily checks for something. In this case the best is to use sensible values for sleep().
If it takes (and these figures are examples) 20 seconds for a file to arrive and 5 seconds for you to process it, what is the harm in your process waiting for, on average, another 7.5 seconds before it even detects that the file is there?

A sleeping process should have as close to zero impact on the CPU as it is possible to get.

So no, I would not be concerned about this aspect at all.

The one thing you should be concerned about is how to restart the process automatically if it fails. I would run a cron job every 5 minutes (your choice of actual frequency) to kill off the old copy (politely, and only if it's running) and then start a new one. That way, there'll only be a 5-minute downtime at most if something goes wrong.

I say politely because the old one may be in the middle of processing files and you should not interrupt that unless it's recoverable.
Sleep involves no overhead. The Linux OS uses a very simple signal to wake a sleeping process.

What you're showing is the "busy-waiting" design pattern.

To eliminate overhead, you want to be woken ONLY when there's work to do.

Ways to do this.
1. Wait on read.
2. Wait on a select function call. See http://docs.python.org/library/select.html
3. Wait for a lock to be released. See http://docs.python.org/library/posixfile.html.
Of these, waiting on a read is perhaps easiest. Reading from a pipe or a socket is what you want to do.

I'm guessing that you have a "multiple-writers-single-reader" design pattern. In this case, there are two candidate solutions.
1. Multiple requests per socket. This is the FTP-like solution where you write a simple server that listens for connections on one socket and opens a dedicated connection for each client. Then you use select to determine which client is sending a file.
2. Single request per socket. This is the HTTP-like solution where you receive requests in some socket and the request is a big flood of data. When the request is all finished, the socket is closed so another client can get it.
In these two cases, you're not sleeping -- you're waiting for I/O's to complete.
"A thousand reasoned opinions are worth one measurement".

Just try it.

As an alternative you can lower the priority of your process. (I'm only familiar with the windows method)

On Windows:

def setpriority(pid=None,priority=1):
    """ Set The Priority of a Windows Process.  Priority is a value between 0-5 where
        2 is normal priority.  Default sets the priority of the current
        python process but can take any valid process ID. """

    import win32api,win32process,win32con

    priorityclasses = [win32process.IDLE_PRIORITY_CLASS,
                       win32process.BELOW_NORMAL_PRIORITY_CLASS,
                       win32process.NORMAL_PRIORITY_CLASS,
                       win32process.ABOVE_NORMAL_PRIORITY_CLASS,
                       win32process.HIGH_PRIORITY_CLASS,
                       win32process.REALTIME_PRIORITY_CLASS]
    if pid == None:
        pid = win32api.GetCurrentProcessId()
    handle = win32api.OpenProcess(win32con.PROCESS_ALL_ACCESS, True, pid)
    win32process.SetPriorityClass(handle, priorityclasses[priority])

from: http://code.activestate.com/recipes/496767/

Code Answer

Sunday, May 1, 2011

Penalties of a script constantly looping in the background

Edit:

Edit Again:

0 comments:

Post a Comment

Blog Archive