Firefox Source Documentation Versus MDN
December 30, 2014 at 12:00 PM | categories: MozillaThe Firefox source tree has had in-tree documentation powered by Sphinx for a while now. However, its canonical home has been a hard-to-find URL on ci.mozilla.org. I finally scratched an itch and wrote patches to enable the docs to be built easier. So, starting today, the docs are now available on Read the Docs at https://gecko.readthedocs.org/en/latest/!
While I was scratching itches, I decided to play around with another documentation-related task: automatic API documentation. I have a limited proof-of-concept for automatically generating XPIDL interface documentation. Essentially, we use the in-tree XPIDL parser to parse .idl files and turn the Python object representation into reStructured Text, which Sphinx parses and renders into pretty HTML for us. The concept can be applied to any source input, such as WebIDL and JavaScript code. I chose XPIDL because a parser is readily available and I know Joshua Cranmer has expressed interest in automatic XPIDL documentation generation. (As an aside, JavaScript tooling that supports the flavor of JavaScript used internally by Firefox is very limited. We need to prioritize removing Mozilla extensions to JavaScript if we ever want to start using awesome tooling that exists in the wild.)
As I was implementing this proof-of-concept, I was looking at XPIDL interface documentation on MDN to see how things are presented today. After perusing MDN for a bit and comparing its content against what I was able to derive from the source, something became extremely clear: MDN has significantly more content than the canonical source code. Obviously the .idl files document the interfaces, their attributes, their methods, and all the types and names in between: that's the very definition of an IDL. But what was generally missing from the source code is comments. What does this method do? What is each argument used for? Things like example usage are almost non-existent in the source code. MDN, by contrast, typically has no shortage of all these things.
As I was grasping the reality that MDN has a lot of out-of-tree supplemental content, I started asking myself what's the point in automatic API docs? Is manual document curation on MDN good enough? This question has sort of been tearing me apart. Let me try to explain.
MDN is an amazing site. You can tell a lot of love has gone into making the experience and much of its content excellent. However, the content around the technical implementation / internals of Gecko/Firefox generally sucks. There are some exceptions to the rule. But I find that things like internal API documentation to be lackluster on average. It is rare for me to find documentation that is up-to-date and useful. It is common to find documentation that is partial and incomplete. It is very common to find things like JSMs not documented at all. I think this is a problem. I argue the lack of good documentation raises the barrier to contributing. Furthermore, writing and maintaining excellent low-level documentation is too much effort.
My current thoughts on API and low-level documentation are that I question the value of this documentation existing on MDN. Specifically, I think things like JSM API docs (like Sqlite.jsm) and XPIDL interface documentation (like nsIFile) don't belong on MDN - at least not in wiki form. Instead, I believe that documentation like this should live in and be derived from source code. Now, if the MDN site wants to expose this as read-only content or if MDN wants to enable the content to be annotated in a wiki-like manner (like how MSDN and PHP documentation allow user comments), that's perfectly fine by me. Here's why.
First, if I must write separate-from-source-code API documentation on MDN (or any other platform for that matter), I must now perform extra work or forgo either the source code or external documentation. In other words, if I write in-line documentation in the source code, I must spend extra effort to essentially copy large parts of that to MDN. And I must continue to spend extra effort to keep updates in sync. If I don't want to spend that extra effort (I'm as lazy as you), I have to choose between documenting the source code or documenting MDN. If I choose the source code, people either have to read the source to read the docs (because we don't generate documentation from source today) or someone else has to duplicate the docs (overall more work). If I choose to document on MDN, then people reading the source code (probably because they want to change it) are deprived of additional context useful to make that process easier. This is a lose-lose scenario and it is a general waste of expensive people time.
Second, I prefer having API documentation derived from source code because I feel it results in more accurate documentation that has the higher liklihood of remaining accurate and in sync with reality. Think about it: when was the last time you reviewed changes to a JSM and searched MDN for content that needed updated? I'm sure there are some pockets of people that do this right. But I've written dozens of JavaScript patches for Firefox and I'm pretty sure I've been asked to update external documentation less than 5% of the time. Inline source documentation, however, is another matter entirely. Because the documentation is often proximal to code that changed, I frequently a) go ahead and make the documentation changes because everything is right there and it's low overhead to change as I adjust the source b) am asked to update in-line docs when a reviewer sees I forgot to. Generally speaking, things tend to stay in sync and fewer bugs form when everything is proximally located. By fragmenting documentation between source code and external services like MDN, we increase the liklihood that things become out of sync. This results in misleading information and increases the barriers to contribution and change. In other words, developer inefficiency.
Third, having API documentation derived from source code opens up numerous possibilities to further aid developer productivity and improve the usefullness of documentation. For example:
- We can parse @param references out of documentation and issue warnings/errors when documentation doesn't match the AST.
- We can issue warnings when code isn't documented.
- We can include in-line examples and execute and verify these as part of builds/tests.
- We can more easily cross-reference APIs because everything is documented consistently. We can also issue warnings when cross-references no longer work.
- We can derive files so editors and IDEs can display in-line API docs as you type or can complete functions as you type, allowing people to code faster.
While we don't generally do these things today, they are all within the realm of possibility. Sphinx supports doing many of these things. Stop reading and run mach build-docs right now and look at the warnings from malformed documentation. I don't know about you, but I love when my tools tell me when I'm creating a burden for others.
There really is so much more we could be doing with source-derived documentation. And I argue managing it would take less overall work and would result in higher quality documentation.
But the world of source-derived documentation isn't all roses. MDN has a very important advantage: it's a wiki. Just log in, edit in a WYSIWYG, and save. It's so easy. The moment we move to source-derived documentation, we introduce the massive Firefox source repository, the Firefox code review process, bugs/Bugzilla, version control overhead (although versioning documentation is another plus for source-derived documentation), landing changes, extra cost to Mozilla for building and running those checkins (even if they contain docs-only changes, sadly), and the time and cognitive burden associated with each one. That's a lot of extra work compared to clicking a few buttons on MDN! Moving documentation editing out of MDN and into the Firefox patch submission world would be a step in the wrong direction in terms of fostering contributions. Should someone really have to go through all that just to correct a typo? I have no doubt we'd lose contributors if we switched the change contribution process. And considering our lackluster track record of writing inline documentation in source, I don't feel great about losing any person who contributes documentation, no matter how small the contribution.
And this is my dilemma: the existing source-or-MDN solution is sub-par for pretty much everything except ease of contribution on MDN and deploying nice tools (like Sphinx) to address the suckitude will result in more difficulty contributing. Both alternatives suck.
I intend to continue this train of thought in a subsequent post. Stay tuned.