Sandro Tosi: Having fun fixing a bug (with ParallelPython)

Lenny general freeze is approaching, so I'm walking thru my packages bugs and I found this one reported against fdupes.

I got 4Gb of memory, so how to reproduce an OoM? Let's start with a rather large file base, but I need a script to generate it. With 4 cores and a fast Raptor hd, I want something to stess them while learning something: Python + ParallelPython was the answer!

I wrote a small script that takes 3 parameters: the destination directory, the number of dirs to create and the number of files (with random contents) to created inside each of the created dirs. I ran it with 1000 dirs and 1000 files, so it generated 1M files set to work on.

But even with so many files, fdupes ran fine... At that point I realized that ulimit could be my friend, and the "-v" option would have saved me some time, but without the fun to write the script :)

If you don't know pp, here is a little explanation of the main part of the script (I forgot to comment the script well enough):

job_server = pp.Server() is the pp job server: with no parameter, it allocates execution slots for the number of cores/CPUs on your system (you can force more/less)
jobs.append(job_server.submit(gen_dir_set, (dirname+'/'+str(i),fileno), (), ('random',))) appends to jobs list the function returned by job_server.submit: this command takes the function to execute, the parameters tuple, the functions called by the first one (here empty) and the tuple of needed modules for the called function (here random).
for job in jobs: job() execute each item in jobs: they are functions that return the results for every jobs (here they are discarded, but you could aggregate them to obtain the final result)
job_server.print_stats() prints some stats about jobs executions (I love stats :D )

ParallelPython is more powerful than this: for examples, it allows to send your jobs to job nodes, machine dedicated to execute tasks (while you keep your own machine as the administration one), and more! Give it a try ;)

Sandro Tosi

2008-07-15

Having fun fixing a bug (with ParallelPython)

No comments:

code_highlight

Matplotlib for Python Developers

Labels

Contributors