Importance of Hosting Your Version Control Server
November 13, 2013 at 09:25 AM | categories: Git, Mercurial, MozillaThe subject of where to host version control repositories comes up a lot at Mozilla. It takes many forms:
- We should move the Firefox repository to GitHub
- I should be allowed to commit to GitHub
- I want the canonical repository to be hosted by Bitbucket
When Firefox development is concerned, Release Engineerings puts down their foot and insists the canonical repository be hosted by Mozilla, under a Mozilla hostname. When that's not possible, they set up a mirror on Mozilla infrastructure.
I think a recent issue with the Jenkins project demonstrates why hosting your own version control server is important. The gist is someone force pushed to a bunch of repos hosted on GitHub. They needed to involve GitHub support to recover from the issue. While it appears they largely recovered (and GitHub support deserves kudos - I don't want to take away from their excellence), this problem would have been avoided or the response time significantly decreased if the Jenkins people had direct control over the Git server: they either could have installed a custom hook that would have prevented the pushes or had access to the reflog so they could have easily seen the last pushed revision and easily forced pushed back to it. GitHub doesn't have a mechanism for defining pre-* hooks, doesn't allow defining custom hooks (a security and performance issue for them), and doesn't expose the reflog data.
Until repository hosting services expose full repository data (such as reflogs) and allow you to define custom hooks, accidents like these will happen and the recovery time will be longer than if you hosted the repo yourself.
It's possible repository hosting services like GitHub and Bitbucket will expose these features or provide a means to quickly recover. If so, kudos to them. But larger, more advanced projects will likely employ custom hooks and considering custom hooks are a massive security and performance issue for any hosted service provider, I'm not going to hold my breath this particular feature is rolled out any time soon. This is unfortunate, as it makes projects seemingly choose between low risk/low convenience and GitHub's vibrant developer community.
Mercurial 2.8 released
November 08, 2013 at 02:30 PM | categories: Mercurial, MozillaMercurial 2.8 has been released.
The changes aren't as sexy as previous releases. But there are a handful of bug fixes that seem useful to pull in. People may also find the new shelve extension useful.
I encourage Mozillians to keep their Mercurial up to date. I once went around the San Francisco office and stood behind people as they upgraded to a modern Mercurial. For the next few weeks I was hearing a lot of "OMG Mercurial is so much better now." Don't handicap yourself by running an older, buggy Mercurial.
If you don't yet feel comfortable running 2.8, 2.7 should be safe.
Using Mercurial to query Mozilla metadata
November 08, 2013 at 09:42 AM | categories: Mercurial, MozillaI have updated my Mercurial extension tailored for Gecko/Firefox development with features that support rich querying of Mozilla/Gecko-development specific metadata!
The extension now comes with a bug full of revision set selectors and template keywords. You can use them to query and format Mozilla-central metadata from the repository.
Revision set selectors
You can now select changesets referencing a specific bug number:
hg log -r 'bug(931383)'
Or that were reviewed by a specific person:
hg log -r 'reviewer(gps)'
Or were reviewed or not reviewed:
hg log -r 'reviewed()'
hg log -r 'not reviewed()'
You can now select changesets that are present in a specific tree:
hg log -r 'tree(central)'
I've also introduced support to query changesets you influenced:
hg log -r 'me()'
(This finds changesets you authored or reviewed.)
You can select changesets that initially landed on a specific tree:
hg log -r 'firstpushtree(central)'
You can select changesets marked as DONTBUILD:
hg log -r 'dontbuild()'
You can select changesets that don't reference a bug:
hg log -r 'nobug()'
You can select changesets that were push heads for a tree:
hg log -r 'pushhead(central)'
(This would form the basis of a push-aware bisection tool - an excellent idea for a future feature in this extension.)
You can combine these revset selector functions with other revset selectors to do some pretty powerful things.
To select all changesets on inbound but not central:
hg log -r 'tree(inbound) - tree(central)'
To find all your contributions on beta but not release:
hg log -r 'me() & (tree(beta) - tree(release))'
To find all changesets referencing a specific bug that have landed in Aurora:
hg log -r 'bug(931383) and tree(aurora)'
To find all changesets marked DONTBUILD that landed directly on central:
hg log -r 'dontbuild() and firstpushtree(central)'
To find all non-merge changesets that don't reference a bug:
hg log -r 'not merge() and nobug()'
Neato!
Template keywords
You can also now print some Mozilla information when using templates.
To print the main bug of a changeset, use:
{bug}
To retrieve all referenced bugs:
{bugs} {join(bugs, ', ')}
To print the reviewers:
{reviewer} {join(reviewers, ', ')}
To print the first version a changeset appeared in a specific channel:
{firstrelease} {firstbeta} {firstaurora} {firstnightly}
To print the estimated first Aurora and Nightly date for a changeset, use:
{auroradate} {nightlydate}
(Getting the exact first Aurora and Nightly dates requires consulting 3rd party services, which we don't currently do. I'd like to eventually integrate these into the extension. For now, it just estimates dates from the pushlog data.)
You can also print who and where pushed a changeset:
{firstpushuser} {firstpushtree}
You can also print the TBPL URL with the results of the first push:
{firstpushtbpl}
Here is an example that prints channel versions and dates for each changesets:
hg log --template '{rev} Nightly: {firstnightly} {nightlydate}; Aurora {firstaurora} {auroradate}; Beta: {firstbeta}; Release: {firstrelease}\n'
Putting it all together
Of course, you can combine selectors and templates to create some mighty powerful queries.
To look at your impact on Mozilla, do something like:
hg log --template '{rev} Bug {bug}; Release {firstrelease}\n' -r 'me()'
You can easily forumate a status report for your activity in the past week:
hg log --template '{firstline(desc)}\n' -r 'firstpushdate(-7) and me()'
You can also query Mercurial to see where changesets have been landing in the past 30 days:
hg log --template '{firstpushtree}\n' -r 'firstpushdate(-30)' | sort | uniq -c
You can see who has been reviewing lots of patches lately:
hg log --template '{join(reviewers, "\n")}\n' -r 'firstpushdate(-30)' | sort | uniq -c | sort -n
(smaug currently has the top score, edging out my 116 reviews with 137.)
If you want to reuse templates (instead of having to type them on the command line), you can save them as style files. Search the Internets to learn how to use them. You can even change your default style so the default output from hg log contains everything you'd ever want to know about a changeset!
Keeping it running
Many of the queries rely on data derived from multiple repositories and pushlog data that is external to the repository.
To get best results, you'll need to be running a monolithic/unified Mercurial repository. You can either assemble one locally with this extension by periodically pulling from the separate repos:
hg pull releases
hg pull integration
Or, you can pull from my personal unified repo.
You will also need to ensure the pushlog data is current. If you pull directly from the official repos, this will happen automatically. To be sure, run:
hg pushlogsync
Finally, you can force a repopulation of cached bug data by running:
hg buginfo --reset
Over time, I want all this to automagically work. Stay tuned.
Comments and future improvements
I implemented this feature to save myself from having to go troving through Bugzilla and repository history to answer questions and to obtain metrics. I can now answer many questions via simple Mercurial one-liners.
Custom revision set selectors and template keywords are a pretty nifty feature of Mercurial. They demonstrate how you can extend Mercurial to be aware of more than just tracking commits and files. As I've said before and will continue to say, the extensibility of Mercurial is really its killer feature, especially for organizations with well-defined processes (like Mozilla). The kind of extensibility I achieved with this extension with custom queries and formatting functions is just not possible with Git (at least not with the reference C implementation that the overwhelming majority of Git users use).
There are numerous improvements that can be made to the extension. Obviously more revision set selectors and template keywords can be added. The parsing routine to extract bugs and reviewers isn't the most robust in the world. I copied some existing Mozilla code. It does well at detecting string patters but doesn't cope well with extracting lists.
I'd also love to better integrate Mercurial with automation results so you can do things like expose a greenpush() selector and do things like hg up -r 'last(tree(inbound)) and greenpush()' (which of course could be exposed as lastgreen(inbound). Wouldn't that be cool! (This would be possible if we had better APIs for querying individual push results.) It would also be possible to have the Mercurial server expose this data as repository data so clients pull it automatically. That would prevent clients from all needing to query the same 3rd party services. Just a crazy thought.
Speed can be an issue. Calculating the release information ({firstnightly} etc) is currently slower than I'd like. This is mostly due to me using inefficient algorithms and not caching things where I should. Speed issues should be fixed in due time.
Please let me know if you run into any problems or have suggestions for improvements. If you want to implement your own revision set selectors or template keywords, it's easier than you think! I will happily accept patches. Keep in mind that Mercurial can integrate with 3rd party services. So if you want to supplement repository data with data from a HTTP+JSON web service, that's very doable. The sky is the limit.
MacBook Pro Firefox Build Times Comparison
November 05, 2013 at 10:00 AM | categories: Mozilla, build systemMany developers use MacBook Pros for day-to-day Firefox development. So, I thought it would be worthwhile to perform a comparison of Firefox build times for various models of MacBook Pros.
Test setup
The numbers in this post are obtained from 3 generations of MacBook Pros:
-
A 2011 Sandy Bridge 4 core x 2.3 GHz with 8 GB RAM and an aftermarket SSD.
-
A 2012 Ivy Bridge retina with 4 core x 2.6 GHz, 16 GB RAM, and a factory SSD (or possibly flash storage).
-
A 2013 Haswell retina with 4 core x 2.6 GHz, 16 GB RAM, and flash storage.
All machines were running OS X 10.9 Mavericks and were using the Xcode 5.0.1 toolchain (Xcode 5 clang: Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)) to build.
The power settings prevented machine sleep and machines were plugged into A/C power during measuring. I did not use the machines while obtaining measurements.
The 2012 and 2013 machines were very vanilla OS installs. However, the 2011 machine was my primary work computer and may have had a few background services running and may have been slower due to normal wear and tear. The 2012 machine was a loaner machine from IT and has an unknown history.
All data was obtained from mozilla-central revision d4a27d8eda28.
The mozconfig used contained:
export MOZ_PSEUDO_DERECURSE=1 mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-firefox.noindex
Please note that the objdir name ends with .noindex to prevent Finder from indexing build files.
I performed all tests multiple times and used the fastest time. I used time command for obtaining measurements of wall, user, and system time.
Results
Configure Times
The result of mach configure is as follows:
Machine | Wall time | User time | System time |
---|---|---|---|
2011 | 29.748 | 17.921 | 11.644 |
2012 | 26.765 | 15.942 | 10.501 |
2013 | 21.581 | 12.597 | 8.595 |
Clobber build no ccache
mach build was performed after running mach configure. ccache was not enabled.
Machine | Wall time | User time | System time | Total CPU time |
---|---|---|---|---|
2011 | 22:29 (1349) | 145:35 (8735) | 12:03 (723) | 157:38 (9458) |
2012 | 15:00 (900) | 94:18 (5658) | 8:14 (494) | 102:32 (6152) |
2013 | 11:13 (673) | 69:55 (4195) | 6:04 (364) | 75:59 (4559) |
Clobber build with empty ccache
mach build was performed after running mach configure. ccache was enabled. The ccache ccache was cleared before running mach configure.
Machine | Wall time | User time | System time | Total CPU time |
---|---|---|---|---|
2011 | 25:57 (1557) | 161:30 (9690) | 18:21 (1101) | 179:51 (10791) |
2012 | 16:58 (1018) | 104:50 (6290) | 12:32 (752) | 117:22 (7042) |
2013 | 12:59 (779) | 79:51 (4791) | 9:24 (564) | 89:15 (5355) |
Clobber build with populated ccache
mach build was performed after running mach configure. ccache was enabled and the ccache was populated with the results of a prior build. In theory, all compiler invocations should be serviced by ccache entries.
This measure is a very crude way to measure how fast clobber builds would be if compiler invocations were nearly instantaneous.
Machine | Wall time | User time | System time |
---|---|---|---|
2011 | 3:59 (239) | 8:04 (484) | 3:21 (201)( |
2012 | 3:11 (191) | 6:45 (405) | 2:53 (173) |
2013 | 2:31 (151) | 5:22 (322) | 2:12 (132) |
No-op builds
mach build was performed on a tree that was already built.
Machine | Wall time | User time | System time |
---|---|---|---|
2011 | 1:58 (118) | 2:25 (145) | 0:41 (41) |
2012 | 1:42 (102) | 2:02 (122) | 0:37 (37) |
2013 | 1:20 (80) | 1:39 (99) | 0:28 (28) |
binaries no-op
mach build binaries was performed on a fully built tree. This results in nothing being executed. It's a way to test the overhead of the binaries make target.
Machine | Wall time | User time | System time |
---|---|---|---|
2011 | 4.21 | 4.38 | 0.92 |
2012 | 3.17 | 3.37 | 0.71 |
2013 | 2.67 | 2.75 | 0.56 |
binaries touch single .cpp
mach build binaries was performed on a fully built tree after touching the file netwerk/dns/nsHostResolver.cpp. ccache was enabled but cleared before running this test. This test simulates common C++ developer workflow of changing C++ and recompiling.
Machine | Wall time | User time | System time |
---|---|---|---|
2011 | 12.89 | 13.88 | 1.96 |
2012 | 10.82 | 11.63 | 1.78 |
2013 | 8.57 | 9.29 | 1.23 |
Tier times
The times of each build system tier were measured on the 2013 Haswell MacBook Pro. These timings were obtained out of curiosity to help isolate the impact of different parts of the build. ccache was not enabled for these tests.
Action | Wall time | User time | System time | Total CPU time |
---|---|---|---|---|
export clobber | 15.75 | 66.11 | 11.33 | 77.44 |
compile clobber | 9:01 (541) | 64:58 (3898) | 5:08 (308) | 70:06 (4206) |
libs clobber | 1:34 (94) | 2:15 (135) | 0:39 (39) | 2:54 (174) |
tools clobber | 9.33 | 13.41 | 2.48 | 15.89 |
export no-op | 3.01 | 9.72 | 3.47 | 13.19 |
compile no-op | 3.18 | 18.02 | 2.64 | 20.66 |
libs no-op | 58.2 | 46.9 | 13.4 | 60.3 |
tools no-op | 8.82 | 12.68 | 1.72 | 14.40 |
Observations and conclusions
The data speaks for itself: the 2013 Haswell MacBook Pro is significantly faster than its predecessors. It clocks in at 2x faster than the benchmarked 2011 Sandy Bridge model (keep in mind the 300 MHz base clock difference) and is ~34% faster than the 2012 Ivy Bridge (at similar clock speed). Personally, I was surprised by this. I was expecting speed improvements over Ivy Bridge, but not 34%.
It should go without saying: if you have the opportunity to upgrade to a new, Haswell-based machine: do it. If possible, purchase the upgrade to a 2.6 GHz CPU, as it contains ~13% more MHz than the base 2.3 GHz model: this will make a measurable difference in build times.
It's worth noting the increased efficiency of Haswell over its predecessors. The total CPU time required to build decreased from ~158 minutes to ~103 minutes to 76 minutes! That 76 minute number is worth highlighting because it means if we get 100% CPU saturation during builds, we'll be able to build the tree in under 10 wall time minutes!
I hadn't performed crude benchmarks of high-level build system actions since the MOZ_PSEUDO_DERECURSE work landed and I wanted to use the opportunity of this hardware comparison to grab some numbers.
The overhead of ccache continues to surprise me. On the 2013 machine, enabling ccache increased the wall time of a clobber build by 1:46 and added 13:16 of CPU time. This is an increase of 16% and 17%, respectively.
It's worth highlighting just how much time is spent compiling C/C++. In our artificial tier measuring results, our clobber build time was ~660 wall time seconds (11 minutes) and used ~4473s CPU time (74:33). Of this, 9:01 wall time and 70:06 CPU time was spent compiling C/C++. This represents ~82% wall time and ~94% CPU time! Please note this does not include linking. Anything we can do to decrease the CPU time used by the compiler will make the build faster.
I also found it interesting to note variances in obtained times. Even on my brand new 2013 Haswell MacBook Pro where I know there aren't many background processes running, wall times could vary significantly. I think I isolated it to CPU bursting and heat issues. If I wait a few minutes between CPU intensive tests, results are pretty consistent. But if I perform CPU intensive tests back-to-back, the run times often vary. The only other thing coming into play could be page caching or filesystem indexing. I accounted for the latter by disabling Finder on the object directory. And, I'd like to think that flash storage is fast enough to remove I/O latency from the equation. Who knows. At the end of the day, laptops aren't servers and OS X is a consumer OS, so I don't expect ultra consistency.
Finally, I want to restate just how fast Haswell is. If you have the opportunity to upgrade, do it.
Distributed Compiling and Firefox
October 31, 2013 at 11:35 AM | categories: Mozilla, build systemIf you had infinite CPU cores available and the Firefox build system could distribute them all for concurrent compilation, Firefox clobber build times would likely be 3-5 minutes instead of ~15 minutes on modern machines. This is a massive win. It therefore should come as no surprise that distributed compiling is very interesting to us.
Up until recently, the benefits of distributed compiling in the Firefox build system couldn't be fully realized. This was because the build system was performing recursive make traversal and make only knew about a tiny subset of the tree's total C++ files at one time. For example, when visiting /layout/base it only knew about 35 of the close to 6000 files that get compiled as part of building Firefox. This meant there was a hard ceiling to the max concurrency the build system could achieve. This ceiling was often higher than the number of cores in an individual machine, so it wasn't a huge issue for single machine builds. But it did significantly limit the benefits of distributed compiling. This all changed recently.
As of a few weeks ago, the build system no longer encounters a low ceiling preventing distributed compilation from reaping massive benefits. If you have build with make -j128, make will spawn 128 compiler processes when processing the compile tier (which is where most compilation occurs). If your compiler is set to a distributed compiler, you will win.
So, what should you do about it?
I encourage people to set up distributed compilation networks to reap the benefits of distributed compilation. Here are some tools you should know about and some things to keep in mind.
distcc is the tried and proven tool for performing distributed compilation. It's heavily used and gets the job done. It even works on Windows and can perform remote processing, which is a huge win for our tree, where preprocessing can be computationally expensive because of excessive includes. But, it has a few significant drawbacks. Read the next paragraph.
I'm personally more excited about icecream. It has some very compelling advantages to distcc. It has a scheduler that can intelligently distribute load between machines. It uses network broadcast to discover the scheduler. So, you just start the client daemon and if there is a scheduler on the local network, it's all set up. Icecream transfers the compiler toolchain between nodes so you are guaranteed to have consistent output. (With distcc, output may not be idempotent if the nodes aren't homogenous since distcc relies on the system-local toolchain. If different versions are installed on different nodes, you are out of luck). Icecream also supports cross-compiling. In theory, you can have Linux machines building for OS X, 32-bit machines building for 64-bit, etc. This is all very difficult (if not impossible) to do with distcc. Unfortunately, icecream doesn't work on Windows and doesn't appear to support server-side preprocessing. Although, I imagine both could be made to work if someone put in the effort.
Distributed compilation is very network intensive. I haven't measured, but I suspect Wi-Fi bandwidth and latency constraints might make it prohibitive there. It certainly won't be good for Wi-Fi saturation! If you are in a Mozilla office, please do not attempt to perform distributed compilation over Wi-Fi! For the same reasons, distributed compilation will likely not benefit you if you are attempting to compile on network-distant nodes.
I have set up an icecream server in the Mozilla San Francisco office. If you install the icecream client daemon (iceccd) on your machine, it should just work. I'm not sure what broadcast nets are configured as, but I've successfully had machines on the 7th floor discover it automatically. I guarantee no SLA for this server. Ping me privately if you have difficulty connecting.
I've started very preliminary talks with Mozilla IT about setting up dedicated compiler farms in Mozilla offices. I'm not saying this is coming any time soon. I feel this will have a major impact on developer productivity and I wanted to get the ball rolling months in advance so nobody can claim this is a fire drill.
For distributed compilation to work well, the build system really needs to be aware of distributed compilation. For example, to yield the benefits of distributed compilation with make, you need to pass -j64 or some other large value for concurrency. However, this value would be universal for every task in the build. There are still thousands of processes that must run locally. Using -j64 on these local tasks could cause memory exhaustion, I/O saturation, excessive context switching, etc. But if you decrease the concurrency ceiling, you lose the benefits of distributed compilation! The build system thus needs to be taught when distributed compilation is available and what tasks can be made concurrent so it can intelligently adjust the -j concurrency limit at run-time. This is why we have a higher-level build wrapper tool: mach build. (This is another reason why people should be building through mach instead of invoking make directly.)
No matter what technical solution we employ, I would like the build system to automatically discover and use distributed compilation if it is available. If we need to hardcode Mozilla IP addresses or hostnames into the build system, I'm fine with that. I just don't want developers not achieving much-faster build times because they are ignorant. If you are in a physical location with distributed compilation support, you should get that automatically: fast builds should not be hard.
We can and should investigate distributed compilation as part of release automation. Icecream should mitigate the concerns about build reproducibility since the toolchain is transferred at build time.
I have had success getting Icecream to work with Linux builds. However, OS X is problematic. Specifically, Icecream is unable to create the build environment for distribution (likely modern OS X/Xcode compatibility issue). Details are in bug 927952.
Build peers have a lot on our plate this quarter and making distributed compilation work well is not in our official goals. I would love, love, love if someone could step up and be a hero to make distributed compilation work better with the build system. If you are interested, pop into #build on irc.mozilla.org.
In summary, there are massive developer productivity wins waiting to be realized through distributed compiling. There is nobody tasked to work on this officially. Although, I'd love it if there were. If you find yourself setting up ad-hoc networks in offices, I'd really like to see some kind of discovery in mach. If not, there will be people left behind and that really stinks for those individuals. If you do any work around distributed compiling, please have it tracked under bug 485559.
« Previous Page -- Next Page »