I recently fixed a time out issue in SWAN, the static analysis framework for Swift that our lab works on. The algorithm was a fixpoint style algorithm, where the same work items are worked on over and over again until no new information in generated. For the particular issue that I fixed, I spent a lot of time chasing after the hypothesis that some work items were being unnessarility being revisited over and over again. This turned out not to be the case. Below I will describe some of the reasons why I had the hypothesis in the first place, and how I got out of the loop to find the actual issue.
Let’s start off with the issue we were having in SWAN. We were trying to analyze some open source projects, and for many of them, we were timing out during call-graph construction. A call-graph is a directed graph where the nodes are methods in a program and there is a edge from a method A to a method B if A calls B. For languages with only static dispatch (e.g. C without function pointers), this graph is very easy to construct — to check if there should be an edge from A to B, we can just check if B shows up as a callee in A’s body. But in languages with dynamic dispatch or function pointers, to check if A calls B, we must check if there is a callee in A’s body that can resolve to B dynamically.
SWAN uses an RTA  style call-graph construction algorithm where keep track of the classes and protocols that have been instantiated by the program and only draw edges to methods of those classes and protocols for dynamic dispatches. The fixpoint parts of the algorithm show up when, while analyzing a method, we find instantiations of a new class (or protocol). If such a class is found, we much revisit all methods and draw edges to methods of this new class. Further, visiting this new class’s methods can lead to the discovery of more classes/protocols which can lead to more revisiting of already visited methods of the program.
Now, to the problem at hand: time outs! I picked a project that SWAN was timing out on and ran SWAN with a debugger on it. During call-graph construction I paused SWAN at certain points to see what methods were being visited by SWAN at that point. For the most part, whenever I paused, SWAN was visiting the same group of methods. I had some logs showing that I was pausing at different runs on those methods, so the algorithm was not just stuck on the same method for a very long time. This is what lead me to believe that we were probably revisiting the same method over and over again.
At this point, I considered two main possibilities. Either the code that checks to see if new information has been generated is buggy, or the code that generates new information is generating faulty new information. I manually inspected the code to find the places where the checks were happening. This took some time, but there seemed to be nothing wrong with the checks. Throughout the next few steps, I would return to the checks just to be sure. And at one point even convinced myself that the checks were faulty, but I turned out to be wrong.
Checking for faulty new information was more difficult. There were lots of places where new information was potentially being generated. I tried to log the information that was being generated, but it quickly started to generate unreadable logs. Plus what counted as faulty information was not clearly defined, so I could not write logging code to only log when faulty information was being generated. I would run SWAN, it would keep running without generating anything useful in the logs.
Next, I started logging runtimes for different parts of the program. The first step of the call-graph algorithm takes in a list of entry points to a Swift program, and it traverses blocks of methods reachable from those entry points while keeping a list of instantiated types (classes and protocols). The entry points are processed more or less independently, so I started timing processing times for different entry points. SWAN considers any uncalled function to be an entry point, and there were around 12000 entry points for the project I was looking at. Looking at the logs, it looked like, most entry points took almost no time at all to process, but then SWAN would get stuck on one entry point before I killed it.
Luckily, among the entry points that SWAN would successfully complete before I killed it, I found one entry point that took about 5 minutes (~ 300 seconds). I hardcoded SWAN to only process this single entry point, and then started logging the methods that were being visited. I could still see that some methods were being revisited several times. I was still working with the hypothesis that these methods were being visited more often that they should be, so I started looking at, why they are being revisited. At first I tried to manually inspect the Swift code, but turned out to be pretty hairy, and I gave up on that approach pretty quickly. Then I tried to look at why the methods were being revisited.
As I had checked earlier, the methods were being revisited because new instantiated types were being discovered. Fortunately, the instantiated types were stored as a set of strings, making it very easy to hard code. When I hardcoded this set of strings into the algorithm, methods were being visited only once, but the performance of the algorithm improved by only 30 to 50 percent. At this point, I finally gave up on my hypothesis that methods were being unnecessarily revisited.
For every module of the program it is analyzing, SWAN creates what it calls a dynamic dispatch graph (DDG). A DDG keeps track of possible resolutions to dynamic references, and the call-graph algorithm performs some reachability queries on this graph. In some of my previous tests, when I was randomly pausing SWAN at different points, a few of the times, the program paused in the middle of a DDG query. I added some timers to log whenever a DDG query took more than 2 seconds, and in the entry point I was testing, there were around 5 queries that took more than 2 seconds and they each took about 5 seconds. From those times, the long queries could not really explain the 5 minutes it took to construct the call-graph.
At this point, I had ran out of all other options, so I ran SWAN with a profiler. I had resisted using a profiler so far because I just did not have it set up — I had initially installed the Community Edition of IntelliJ IDEA. I looked at the different graphs offered by IntelliJ. The flame graph and the call tree ended not being that useful. The graph that was useful was the method list, which shows the number of samples in which a method and it’s callees show up. This showed that DDG queries showed up in 95 percent of samples during call-graph construction. Those less that 2 second queries were adding up!
Looking at the code for DDG queries, to decide reachability from one node to another, the it would compute the shortest path between those nodes. This felt a little bit odd, since I would have used a BFS or DFS search, but it wasn’t the end of the world! However, when I looked at the profiling data again, it showed that almost all of the query time was being spent computing shortest paths.
As a first attempt at a fix, I tried to cache the result whenever we found that a node was reachable from another. While this improved the performance a little bit, it was not the massive gains I was looking for. While the cache could answer very quickly when a was reachable from another, it could not answer very quickly when a node was not reachable from another.
My final fix was to precompute the transitive closure of the DDG after it was generated. This way both reachability and unreachability could be decided very quickly by just checking if an edge exists in the closure. Running the code on the 5 minute entry point gave me a run time of less that a second, so the performance was improved by over 3000 times.
So what does “adversarial input” have to do with any of this? When the code was written, DDG queries were expected to be very rare. Calling a library function for shortest path was the easiest way to decide reachability — it only took around three lines of code. But the input we were giving SWAN required lots of DDG queries. The algorithm was working as intended, it was just never intended to handle these kind of inputs.
While we got ride of one kind of adversarial input, we have introduced others! Computing the transitive closure of a graph can potentially creates lots and lots of edges in a graph if there are long chains in the graph. This can create a some memory pressure. We haven’t run into this yet, but if we do, this will have to get fixed is some way.
- David F. Bacon and Peter F. Sweeney. 1996. Fast static analysis of C++ virtual function calls. In Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’96). Association for Computing Machinery, New York, NY, USA, 324–341. DOI:https://doi.org/10.1145/236337.236371