Gregory Szorc's Digital Home

Aggregating Version Control Info at Mozilla

January 21, 2014 at 10:50 AM | categories: Git, Mercurial, Mozilla, Python

Over the winter break, I set out on an ambitious project to create a service to help developers and others manage the flury of patches going into Firefox. While the project is far from complete, I'm ready to unleash the first part of the project upon the world.

If you point your browsers to moztree.gregoryszorc.com, you'll hopefully see some documentation about what I've built. Source code is available and free, of course. Patches very welcome.

Essentially, I built a centralized indexing service for version control repositories with Mozilla's extra metadata thrown in. I tell it what repositories to mirror, and it clones everything, fetches data such as the pushlog and Git SHA-1 mappings, and stores everything in a central database. It then exposes this aggregated data through world-readable web services.

Currently, I have the service indexing the popular project branches for Firefox (central, aurora, beta, release, esr, b2g, inbound, fx-team, try, etc). You can view the full list via the web service. As a bonus, I'm also serving these repositories via hg.gregoryszorc.com. My server appears to be significantly faster than hg.mozilla.org. If you want to use it for your daily needs, go for it. I make no SLA guarantees, however.

I'm also using this service as an opportunity to experiment with alternate forms of Mercurial hosting. I have mirrors of mozilla-central and the try repository with generaldelta and lz4 compression enabled. I may blog about what those are eventually. The teaser is that they can make Mercurial perform a lot faster under some conditions. I'm also using ZFS under the hood to manage repositories. Each repository is a ZFS filesystem. This means I can create repository copies on the server (user repositories anyone?) at a nearly free cost. Contrast this to the traditional method of full clones, which take lots of time, memory, CPU, and storage.

Anyway, some things you can do with the existing web service:

Obtain metadata about Mercurial changesets. Example.
Look up metadata about Git commits. Example.
Obtain a SPORE descriptor describing the web service endpoints. This allows you to auto-generate clients from descriptors. Yay!

Obviously, that's not a lot. But adding new endpoints is relatively straightforward. See the source. It's literally as easy as defining a URL mapping and writing a database query.

The performance is also not the best. I just haven't put in the effort to tune things yet. All of the querying hits the database, not Mercurial. So, making things faster should merely be a matter of database and hosting optimization. Patches welcome!

Some ideas that I haven't had time to implement yet:

Return changests in a specific repository
Return recently pushed changesets
Return pushes for a given user
Return commits for a given author
Return commits referencing a given bug
Obtain TBPL URLs for pushes with changeset
Integrate bugzilla metadata

Once those are in place, I foresee this service powering a number of dashboards. Patches welcome.

Again, this service is only the tip of what's possible. There's a lot that could be built on this service. I have ideas. Others have ideas.

The project includes a Vagrant file and Puppet manifests for provisioning the server. It's a one-liner to get a development environment up and running. It should be really easy to contribute to this project. Patches welcome.

Why do Projects Support old Python Releases

January 08, 2014 at 05:00 PM | categories: Python, Mozilla

I see a number of open source projects supporting old versions of Python. Mercurial supports 2.4, for example. I have to ask: why do projects continue to support old Python releases?

Consider:

Python 2.4 was last released on December 19, 2008 and there will be no more releases of Python 2.4.
Python 2.5 was last released on May 26, 2011 and there will be no more releases of Python 2.5.
Python 2.6 was last released on October 29, 2013 and there will be no more releases of Python 2.6.
Everything before Python 2.7 is end-of-lifed
Python 2.7 continues to see periodic releases, but mostly for bug fixes.
Practically all of the work on CPython is happening in the 3.3 and 3.4 branches. Other implementations continue to support 2.7.
Python 2.7 has been available since July 2010
Python 2.7 provides some very compelling language features over earlier releases that developers want to use
It's much easier to write dual compatible 2/3 Python when 2.7 is the only 2.x release considered.
Python 2.7 can be installed in userland relatively easily (see projects like pyenv).

Given these facts, I'm not sure why projects insist on supporting old and end-of-lifed Python releases.

I think maintainers of Python projects should seriously consider dropping support for Python 2.6 and below. Are there really that many people on systems that don't have Python 2.7 easily available? Why are we Python developers inflicting so much pain on ourselves to support antiquated Python releases?

As a data point, I successfully transitioned Firefox's build system from requiring Python 2.5+ to 2.7.3+ and it was relatively pain free. Sure, a few people complained. But as far as I know, not very many new developers are coming in and complaining about the requirement. If we can do it with a few thousand developers, I'm guessing your project can as well.

Update 2014-01-09 16:05:00 PST: This post is being discussed on Slashdot. A lot of the comments talk about Python 3. Python 3 is its own set of considerations. The intended focus of this post is strictly about dropping support for Python 2.6 and below. Python 3 is related in that porting Python 2.x to Python 3 is much easier the higher the Python 2.x version. This especially holds true when you want to write Python that works simultaneously in both 2.x and 3.x.

Python Package Providing Clients for Mozilla Services

January 06, 2014 at 10:45 AM | categories: Python, Mozilla

I have a number of Python projects and tools that interact with various Mozilla services. I had authored clients for all these services as standalone Python modules so they could be reused across projects.

I have consolidated all these Python modules into a unified source control repository and have made the project available on PyPI. You can install it by running:

$ pip install mozautomation

Currently included in the Python package are:

A client for treestatus.mozilla.org
Module for extracting cookies from Firefox profiles (useful for programmatically getting Bugzilla auth credentials).
A client for reading and interpretting the JSON dumps of automation jobs
An interface to a SQLite database to manage associations between Mercurial changesets, bugs, and pushes.
Rudimentary parsing of commit messages to extract bugs and reviewers.
A client to obtain information about Firefox releases via the releases API
A module defining common Firefox source repositories, aliases, logical groups (e.g. twigs and integration trees), and APIs for fetching pushlog data.
A client for the self serve API

Documentation and testing is currently sparse. Things aren't up to my regular high quality standard. But something is better than nothing.

If you are interested in contributing, drop me a line or send pull requests my way!

Python Bindings Updates in Clang 3.1

May 14, 2012 at 12:05 AM | categories: Python, Clang, compilers

Clang 3.1 is scheduled to be released any hour now. And, I'm proud to say that I've contributed to it! Specifically, I've contributed improvements to the Python bindings, which are an interface to libclang, the C interface to Clang.

Since 3.1 is being released today, I wanted to share some of the new features in this release. An exhaustive list of newly supported APIs is available in the release notes.

Diagnostic Metadata

Diagnostics are how Clang represents warnings and errors during compilation. The Python bindings now allow you to get at more metadata. Of particilar interest is Diagnostic.option. This property allows you to see the compiler flag that triggered the diagnostic. Or, you could query Diagnostic.disable_option for the compiler flag that would silence this diagnostic.

These might be useful if you are analyzing diagnostics produced by the compiler. For example, you could parse source code using the Python bindings and collect aggregate information on all the diagnostics encountered.

Here is an example:

from clang.cindex import Index

index = Index.create()
tu = index.parse('hello.c')

for diag in tu.diagnostics:
    print diag.severity
    print diag.location
    print diag.spelling
    print diag.option

Or, if you are using the Python bindings from trunk:

from clang.cindex import TranslationUnit

tu = TranslationUnit.from_source('hello.c')
...

Sadly, the patch that enabled this simpler usage did not make the 3.1 branch.

Finding Entities from Source Location

Two new APIs, SourceLocation.from_position and Cursor.from_location, allow you to easily extract a cursor in the AST from any arbitrary point in a file.

Say you want to find the element in the AST that occupies column 6 of line #10 in the file foo.c:

from clang.cindex import Cursor
from clang.cindex import Index
from clang.cindex import SourceLocation

index = Index.create()
tu = index.parse('foo.c')

f = File.from_name(tu, 'foo.c')

location = SourceLocation.from_position(tu, f, 10, 6)
cursor = Cursor.from_location(tu, location)

Of course, you could do this by iterating over cursors in the AST until one with the desired source range is found. But, that would involve more API calls.

I would like to say that these APIs feel klunky to me. There is lots of redundancy in there. In my opinion, there should just be a TranslationUnit.get_cursor(file='foo.c', line=10, column=6) that does the right thing. Maybe that will make it into a future release. Maybe it won't. After all, the Python bindings are really a thin wrapper around the C API and an argument can be made that there should be minimal extra logic and complexity in the Python bindings. Time will tell.

Type Metadata

It is now possible to access more metadata on Type instances. For example, you can:

See what the elements of an array are using Type.get_array_element_type
See how many elements are in a static array using Type.get_array_element_count
Determine if a function is variadic using Type.is_function_variadic
Inspect the Types of function arguments using Type.argument_types

In this example, I will show how to iterate over all the functions declared in a file and to inspect their arguments.

from clang.cindex import CursorKind
from clang.cindex import Index
from clang.cindex import TypeKind

index = Index.create()
tu = index.parse('hello.c')

for cursor in tu.cursor.get_children():
    # Ignore AST elements not from the main source file (e.g.
    # from included files).
    if not cursor.location.file or cursor.location.file.name != 'hello.c':
        continue

    # Ignore AST elements not a function declaration.
    if cursor.kind != CursorKind.FUNCTION_DECL:
        continue

    # Obtain the return Type for this function.
    result_type = cursor.type.get_result()

    print 'Function: %s' % cursor.spelling
    print '  Return type: %s' % result_type.kind.spelling
    print '  Arguments:'

    # Function has no arguments.
    if cursor.type.kind == TypeKind.FUNCTIONNOPROTO:
        print '    None'
        continue

    for arg_type in cursor.argument_types():
        print '    %s' % arg_type.kind.spelling

This example is overly simplified. A more robust solution would also inspect the Type instances to see if they are constants, check for pointers, check for variadic functions, etc.

An example application of these APIs is to build a tool which automatically generated ctypes or similar FFI bindings. Many of these tools today use custom parsers. Why invent a custom (and likely complex) parser when you can call out into Clang and have it to all the heavy lifting for you?

Future Features

As I write this, there are already a handful of Python binding features checked into Clang's SVN trunk that were made after the 3.1 branch was cut. And, I'm actively working at integrating many more.

Still to come to the Python bindings are:

Better memory management support (currently, not all references are kept everywhere, so it is possible for a GC to collect and dispose of objects that should be alive, even though they are not in scope).
Support for token API (lexer output)
More complete coverage of Cursor and Type APIs
More friendly APIs

I have a personal goal for the Python bindings to cover 100% of the functionality in libclang. My work towards that goal is captured in my python features branch on GitHub. I periodically clean up a patch, submit it for review, apply feedback, and commit. That branch is highly volatile and I do rebase. You have been warned.

Furthermore, I would like to add additional functionality to libclang [and expose it to Python]. For example, I would love for libclang to support code generation (i.e. compiling), not just parsing. This would enable all kinds of nifty scenarios (like channeling your build system's compiler calls through a proxy which siphons off metadata such as diagnostics).

Credits and Getting Involved

I'm not alone in my effort to improve Clang's Python bindings. Anders Waldenborg has landed a number of patches to add functionality and tests. He has also been actively reviewing patches and implementing official LLVM Python bindings! On the reviewing front, Manuel Klimek has been invaluable. I've lost track of how many bugs he's caught and good suggestions he's made. Tobias Grosser and Chandler Carruth have also reviewed their fair share of patches and handled community contributions.

If you are interested in contributing to the Python bindings, we could use your help! You can find me in #llvm as IndyGreg. If I'm not around, the LLVM community is generally pretty helpful, so I'm sure you'll get an answer. If you prefer email, send it to the cfe-dev list.

If you have any questions, leave them in the comments or ask using one of the methods above.

« Previous Page