2008-11-13

Parse Python files for "import" statements

My goal is to collect all the import statements in a Python code tree to identify what modules are needed to run that piece of code. So let's start from the lower level: parse one file for its import.

One simple solution could be to grep (or re.compile) something like "import (.*)|from (.*) import .*" but what for import statement like:
from  import (
ABC,
DEF,
GHI)
or multiple import on one line?

So, if I had to rewrite the Python interpreter parser, why not use that exact parser?

I started with the parser module, but its output is not so parsable :) even if you can recall the structure (i.e. import statements are at nodes with id = 282, with imported module at id = 287 and so on). It's really powerful, but too low level for this task.

But there is another interesting module: compiler (it's deprecated in 2.6 and removed in 3.0, but I'm working with 2.5, so it's fine). Using parseFile and some objects from compiler.ast, I wrote a simple script that parses a file given in input and prints out the modules imported with "import mod" or with "from mod import xxx".

Suggestions are always welcome :)

UPDATE: please note that conditional imports are included in the list and the calls to __import__ are ignored.

5 comments:

Unknown said...

What about conditional imports? I'm a python noob and i've seen those once or twice.

Do you just include those in your list out of general principal?

Sandro Tosi said...

Hi Sam,
thanks for the comment (I forgot to note this an __import__); the conditional imports are listed, yes, because if the module is available it would be used. I'm talking with my packager hat on, so I need to know what modules to add to debian package dependencies list.

Luca Bruno said...

What about doing it at run-time?

orig_import = __import__
def import_hook (*args):
print args
return orig_import (*args)

__builtins__.__import__ = orig_import
...
run the program, ANY import will be output'd
...

Sandro Tosi said...

Luca, thanks for the tip, it might be useful in another context, but here: this is a statical syntax check, so I don't run any program, and I don't want to change every piece of code in a tree I'm analyzing.

Luca Bruno said...

Oh understood, though you only need to change the first file running the program. The changes are instance-wide.