Gregory Szorc's Digital Home

Python Bindings Updates in Clang 3.1

May 14, 2012 at 12:05 AM | categories: Python, Clang, compilers

Clang 3.1 is scheduled to be released any hour now. And, I'm proud to say that I've contributed to it! Specifically, I've contributed improvements to the Python bindings, which are an interface to libclang, the C interface to Clang.

Since 3.1 is being released today, I wanted to share some of the new features in this release. An exhaustive list of newly supported APIs is available in the release notes.

Diagnostic Metadata

Diagnostics are how Clang represents warnings and errors during compilation. The Python bindings now allow you to get at more metadata. Of particilar interest is Diagnostic.option. This property allows you to see the compiler flag that triggered the diagnostic. Or, you could query Diagnostic.disable_option for the compiler flag that would silence this diagnostic.

These might be useful if you are analyzing diagnostics produced by the compiler. For example, you could parse source code using the Python bindings and collect aggregate information on all the diagnostics encountered.

Here is an example:

from clang.cindex import Index

index = Index.create()
tu = index.parse('hello.c')

for diag in tu.diagnostics:
    print diag.severity
    print diag.location
    print diag.spelling
    print diag.option

Or, if you are using the Python bindings from trunk:

from clang.cindex import TranslationUnit

tu = TranslationUnit.from_source('hello.c')
...

Sadly, the patch that enabled this simpler usage did not make the 3.1 branch.

Finding Entities from Source Location

Two new APIs, SourceLocation.from_position and Cursor.from_location, allow you to easily extract a cursor in the AST from any arbitrary point in a file.

Say you want to find the element in the AST that occupies column 6 of line #10 in the file foo.c:

from clang.cindex import Cursor
from clang.cindex import Index
from clang.cindex import SourceLocation

index = Index.create()
tu = index.parse('foo.c')

f = File.from_name(tu, 'foo.c')

location = SourceLocation.from_position(tu, f, 10, 6)
cursor = Cursor.from_location(tu, location)

Of course, you could do this by iterating over cursors in the AST until one with the desired source range is found. But, that would involve more API calls.

I would like to say that these APIs feel klunky to me. There is lots of redundancy in there. In my opinion, there should just be a TranslationUnit.get_cursor(file='foo.c', line=10, column=6) that does the right thing. Maybe that will make it into a future release. Maybe it won't. After all, the Python bindings are really a thin wrapper around the C API and an argument can be made that there should be minimal extra logic and complexity in the Python bindings. Time will tell.

Type Metadata

It is now possible to access more metadata on Type instances. For example, you can:

See what the elements of an array are using Type.get_array_element_type
See how many elements are in a static array using Type.get_array_element_count
Determine if a function is variadic using Type.is_function_variadic
Inspect the Types of function arguments using Type.argument_types

In this example, I will show how to iterate over all the functions declared in a file and to inspect their arguments.

from clang.cindex import CursorKind
from clang.cindex import Index
from clang.cindex import TypeKind

index = Index.create()
tu = index.parse('hello.c')

for cursor in tu.cursor.get_children():
    # Ignore AST elements not from the main source file (e.g.
    # from included files).
    if not cursor.location.file or cursor.location.file.name != 'hello.c':
        continue

    # Ignore AST elements not a function declaration.
    if cursor.kind != CursorKind.FUNCTION_DECL:
        continue

    # Obtain the return Type for this function.
    result_type = cursor.type.get_result()

    print 'Function: %s' % cursor.spelling
    print '  Return type: %s' % result_type.kind.spelling
    print '  Arguments:'

    # Function has no arguments.
    if cursor.type.kind == TypeKind.FUNCTIONNOPROTO:
        print '    None'
        continue

    for arg_type in cursor.argument_types():
        print '    %s' % arg_type.kind.spelling

This example is overly simplified. A more robust solution would also inspect the Type instances to see if they are constants, check for pointers, check for variadic functions, etc.

An example application of these APIs is to build a tool which automatically generated ctypes or similar FFI bindings. Many of these tools today use custom parsers. Why invent a custom (and likely complex) parser when you can call out into Clang and have it to all the heavy lifting for you?

Future Features

As I write this, there are already a handful of Python binding features checked into Clang's SVN trunk that were made after the 3.1 branch was cut. And, I'm actively working at integrating many more.

Still to come to the Python bindings are:

Better memory management support (currently, not all references are kept everywhere, so it is possible for a GC to collect and dispose of objects that should be alive, even though they are not in scope).
Support for token API (lexer output)
More complete coverage of Cursor and Type APIs
More friendly APIs

I have a personal goal for the Python bindings to cover 100% of the functionality in libclang. My work towards that goal is captured in my python features branch on GitHub. I periodically clean up a patch, submit it for review, apply feedback, and commit. That branch is highly volatile and I do rebase. You have been warned.

Furthermore, I would like to add additional functionality to libclang [and expose it to Python]. For example, I would love for libclang to support code generation (i.e. compiling), not just parsing. This would enable all kinds of nifty scenarios (like channeling your build system's compiler calls through a proxy which siphons off metadata such as diagnostics).

Credits and Getting Involved

I'm not alone in my effort to improve Clang's Python bindings. Anders Waldenborg has landed a number of patches to add functionality and tests. He has also been actively reviewing patches and implementing official LLVM Python bindings! On the reviewing front, Manuel Klimek has been invaluable. I've lost track of how many bugs he's caught and good suggestions he's made. Tobias Grosser and Chandler Carruth have also reviewed their fair share of patches and handled community contributions.

If you are interested in contributing to the Python bindings, we could use your help! You can find me in #llvm as IndyGreg. If I'm not around, the LLVM community is generally pretty helpful, so I'm sure you'll get an answer. If you prefer email, send it to the cfe-dev list.

If you have any questions, leave them in the comments or ask using one of the methods above.