Any C++ developers in the audience?
C hackers?
Currently we rely too much on heroic efforts and utter-bruteforce by the developers. Throwing more people at code doesn't help.
The general impression, is that initially the developer is fully in control of a program and as the program grows it develops a life of it's own and the developer becomes more and more helpless.
Code is always growing, our ability to understand the codebase is shrinking.
There is no cure for this, but this talk will show how we can forestall the inevitable doom.
Little ability to ensure apis are used correctly.
Hard to ensure optimizations are not broken
Closer Cooperation is the Way Out
Open source tools are in a position to cross-polinate. Yet in reality there is relatively little [vertical in relation to the diagram] cooperation spanning projects.
I'm not sure how other open source projects work, but generally mature projects are treated as black boxes...earily similar to non-Open software.
Mozilla's work is a step towards future with more cooperation.
Static Analysis?
Ability to treat code as data for non-compilation purposes
Useful for:
Finding bugs in code
Generating bindings
Visualizing the codebase
Getting rid of dead code
Should be an essential part of software development
Static analysis tools that we may be familiar with:
Coverity, sparse linux static analysis.
A static analysis framework to code is sort of like DOM to webpages...Just imaging having to customize webpages by using regexps and string insertions here/there.
So once you have a tree-like representation of the source, you can ... (stuff in UL tag)
Why GCC?
De facto standard C++ compiler
Trivial integration with build-system
Incomplete alternatives:
LLVM Clang
Elsa
GCC isn't really a choice. It's more of a matter of why would one NOT use GCC for static analysis? It is both THE C++ compiler on open source platforms AND THE ONLY C++ compiler that works.
So it is really a question of why would one not use GCC. When I started working on analyzing C++ there was a lot of folklore about how abysmal the GCC intermedite forms were.
Clang has the potential to become a formidable GCC competitor, but at the moment their C++ frontend isnt complete, so it's not in the running.
I started out with Elsa which is a from-scratch C++ parser which is well suited for refactoring code, but not so well for analysis. After an initial failed attempt on elsa, i Moved on to gcc and never looked back.
The other problem is that any non-gcc C++ frontend will end up in a C++ arms race with G++ as it introduces new features.
Unfortunately when I started, GCC did not support any way of being extended with third-party functionality.
GCC 4.5: Here Come the Plugins
License change allowing plugins
Plugin API combined the best of the 3rd party plugin patches
The Hydras, LLVM, milepost, etc
Release is due any day now. Currently Mozilla relies on 4.3 for production, we'll be moving to 4.5 asap.
Other big 4.5 features: LTO
* Trivia, license change was by far the biggest stumbling block, once RMS was convinved,
* Luckily for me, the api was largely based on my plugin branch. In general the GCC developers have been extrimely receptive to our needs
GIMPLE is awesome because it basically allows one to treat C++ as C with a few extra features. it's a great simplified ast for static analysis.
GCC attributes are fantatic. Messing with grammars to figure an annotation scheme isn't trivial(as can be seen by C++0x). GCC attributes allow annotating anything we want so far.
Release is due any day now. Currently Mozilla relies on 4.3 for production, we'll be moving to 4.5 asap.
Other big 4.5 features: LTO
The Hydras
GCC Plugins for code analysis
Analyses expressed as JavaScript scripts
Concise analyses
Errors in analyses do not crash the compiler
Easy to create complex datastructures needed for more sophisticated analyses
Why are grep/set the only wide-spread analysis/refactoring tools?
Humans don't scale well to millions of LOC
Need to increase developer leverage by automation
Underlying tools as open source just like own codebase
Mozilla is Big and Fast Moving
More than 1.7 million lines of C++
More than 1 million lines of JS
Constantly being optimized for better performance, adding new features
Can't stop programming to do refactoring. Competitive landscape means we are always looking into any potential wins.
Tried switching mozilla to garbage collection, brand new js engine, etc.
Optimizations are very risky in a mature codebase, safeguarding them with static analysis makes them plausible.
Before I get into how mozilla write analyses, here is a pretty demo.
Search for nsJARInputStream. Show clicking on parent, members, how to jump to implementation
search
Mozilla Analyses
final.js: Java-like "final" keyword for C++
flow.js: Ensure code in a function flows through a particular label
must-override.js: Force derived classes to override certain methods
override.js: Ensure methods exist in base class
outparams.js: Ensure outparameters and return error codes are in sync