A Few Billion Lines of Code Later: CACM article describing static analysis in the real world

The people who created Coverity wrote a nice article a few years back, published in the Communications of the ACM, called A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. The article describes the challenges of developing a commercial static analysis. The whole article is worth reading. Here are the key points:

  • To analyze the code, you need to find it first. This is highly non-trivial because companies have a wide variety of build tools. Coverity solved this by intercepting system calls encountered during the build process.
  • Getting the code to parse is also difficult. Before the static analysis can even run, we need an AST. But parsing poses problems, because different compilers (and different versions of the same compiler) accept slightly different variants of a language. And if the compiler accepts some construct, people take that to be the source of truth, even if the construct is malformed according to the language standard.
  • After parsing comes actual static analysis. They chose to make their tool unsound, to avoid a large number of false positives. I also believe that unsoundness is often unavoidable in real-world static-analysis tools, and coauthored a paper about this in 2015.
  • They try very hard to keep the rate of false positives low, below 20%. They find that programmers don't want to wade through many false positive warnings to search for the real bugs. This is my experience as well from when I worked on Closure Compiler. At Google, it is typical to require a false positive rate below 10%.
  • Explainability of the errors is very important. If a developer doesn't understand why an error happens, they will flag it as false positive, even if it is a real bug. So, Coverity avoids highly sophisticated analyses, because they produce errors that are hard to explain.
Overall, the article is a good reminder of the differences between building static analyses in industry and academia. In academia, one typically handles a small subset of a language, whereas in industry one has to handle a wide variety of perhaps even invalid code.

Also, in academia one typically starts from soundness, and employs sophisticated analyses techniques to try to get a precise analysis, which may or may not have a small number of false positives. Academic analyses often don't scale to programs beyond a few thousand lines of code.

In industry, soundness is usually quickly abandoned. For an analysis to be useful, it must be scalable (handle million-line code bases and ideally finish within a few minutes) and it must have a low number of false positives. Sophisticated analyses techniques are often avoided.

Comments