Quals

The qualifying exam is one of the major milestones in the Ph.D. The goal of the exam is to judge whether a student is qualified to undertake the Ph.D. I’ve heard it can be different for different degrees, so I’ll speak from the CS perspective. At one point, it was an actual exam that occurred over several days, testing students on a variety of topics. Luckily, it has since changed. Now, most schools have students do a project. A student is tasked with coming up with a novel solution to some problem, attempting to execute said solution, and then present the results. Here is a rough timeline of how it went down for me.

Finding the Problem (3-4 weeks)

Around November of 2019, my advisor and I began discussing quals (a little late in the game, but not unheard of). Coming into this, I had, and I can’t emphasize this enough, no clue as to what I would do. At this point, I had been working on a project dealing with incrementally verifying programs. We were able to prove it worked on constructed examples, but not on real world examples. So, I wasn’t feeling the best about myself as a researcher at this point. My advisor suggested I look at machine learning, specifically graph neural networks (GNNs). The quals project is supposed to be something tangentially related to your research. So I do verification stuff, but not ML. At this time I knew what the term “Graph Neural Networks” meant, but that was about it.

Armed with that knowledge, I spent time thinking about how I can shoehorn a GNN into a verification problem. Verification is a very formal process, with proofs and guarantees. Unless you are an ML researcher, ML is magic. Very infrequently can you use “magic” as your explanation as to why something worked, at least in a formal verification proof. So, you can see my dilemma. I spent the next few weeks putting off thinking about quals, and kept trying to make my research work.

Eventually, a fellow researcher of mine, who will remain nameless, told me about a verification tool which uses ML as a heuristic. Heuristics are used to make decision when there is no clear correct answer. For example, a good heuristic for deciding where to take my mother to dinner would be, “If there is a mexican restaurant in a 10 mile radius, go there. Else, go where is closest”. So, I read about this tool. It used ML to select which verifier it should run from a group of verifiers. I decided “Maybe we can use graph neural networks” still knowing extremely little about GNNs. Eventually, I got my advisor on board and I set off.

Proposing (3 weeks)

The next step was forming a committee and proposing what I would do for quals. For my committee, I selected two advisors from my lab and a professor from a different lab who knew a lot about machine learning and graph neural networks. To prepare for the proposal, I read several papers about different approaches to the problem I was tackling. I also brushed up on my knowledge of different verification techniques. Last but not least, I needed to read about GNNs. I started with a survey paper, but it went way over my head. My hope was that I could use a GNN library and focus on the other portions of the project. There is a lovely framework for using GNNs called PyG (pytorch_geometric at the time). Unfortunately, I could not get it installed at the time (it has since become much easier). Because of this, I knew I would have to write my own implementation, so I would have to learn how they work.

If memory serves, I proposed in early February. I spent about 2-3 weeks preparing my proposal document and my presentation. The document was about 4 pages long. It explained what the problem was, my proposed solution, and how long it would take me. The presentation did roughly the same. Overall, it went really well. What I’ve learned is if you spend a lot of time learning about what you’ll be presenting, it becomes a lot easier to present (who knew). I gave myself a (flexible) timeline of 12 weeks to complete the project.

Step 1: Making Graph-Builder (6 weeks)

So, as with all things ML, I needed data. Luckily, there was already a large dataset in the form of the SV-Comp benchmarks. SV-Comp is a competition where verifiers compete to show how great there tool is on a bunch of programs. However, I need to make them graphs (cause graph neural networks). I gave myself 3-4 weeks to do this. It started off pretty smoothly. There are many types of graphs which can be used to represent programs. My plan was to make these graphs and smush them together into one “Franken-graph” (a term which was shot down).

I made the first graph type with relative ease in a week. The second one took a bit longer, maybe a week and a half. Not a big issue, clearly the last one will be easier and I’ll still be on track, right? Right? Of course not. It was the most annoying of all of them, but it got done, and I was only a week or so behind schedule. The results of this process is a pretty nifty tool called graph-builder.

Step 2: Building the Graphs (2 weeks)

This was the fun part where I made-up a lot of time. Luckily, my tool worked pretty great. There was some bugs that needed to be squashed, but ultimately, it was a pretty painless process. Now that I had my data, I could do the ML, which meant making my own GNNs (gulp)

Step 3: Building a GNN (4 weeks)

This was by far the most nerve wracking part of the quals, but without a doubt where I learned the most. At this point, I had read enough about GNNs and watched this video enough times that I felt confident I understood how to write the needed code. I had taken one deep learning course at this time, so I was familiar enough with the framework (PyTorch) I would be using. Writing the code for the GNN was pretty easy actually. I had all of my data in the format I needed and the actual process of GNNs are pretty simple. In pseudocode:

For each node, i, in the graph:
    x = 0
    For each node,j, connected to i:
        x = f(x,j)
    i = g(i,x)

Do this a set amount of time, and bada bing you got GNNs baby. The function f will be some function which can coalesce all the nodes together in such a way that the order they are presented to it won’t matter, like a sum. The function g will be the same. For the quals case I used a gated recurrent unit, or GRU.

So, I got it working, however, it was slow as molasses. Training networks took hours per epoch. If I wanted to do 50 epochs, it would take weeks. So, I spent some time refactoring the code, and it got a little faster, but it got to a point that there wasn’t any fat to trim and I was still looking at about 1 hour per epoch. I tried parallelizing it, but this didn’t help much and lead to a litany of other problems. Eventually, I was talking to a co-worker and he offhandedly made a comment about how “library functions will almost always be faster than anything you do because they are pre-compiled in C”. For the layman in the audience, in terms of programming language, C is Usain Bolt.

So I began scouring the PyTorch documentation for ways to replace my code. And eventually, I found it: index_add(). This function did exactly what I needed. It added the right values to the right place. Previously, I was using loops and a helper function and blah, blah, blah. All of these things slow stuff down. This function took me from hours to 20 some minutes per epoch. Now that I had an efficient enough GNN, I was ready to run my tests.

Step 4: Making sense of the Results (2 weeks)

So, now I have my results. What this means is I’ve trained 10 networks for each configuration I’m looking at so I can average the results. My configurations are different combinations of the graphs, different number of GNNs, and two different GNNs. So there is a lot of data. I wrangle it all into a excel sheet and there’s good news and bad news. Good news: some configurations are better than the current state-of-the-art techniques. Bad news: the best performing ones use no GNNs. What does this tell us? The way we made the graphs is great, but the GNN technique ain’t that helpful. So it’s kinda a mixed bag.

I would like to point out that this whole process did not go that smoothly:

Several times in the process of training networks, I discovered my graphs had issues, so I had to fix them and regenerate them.
I discovered there was a rather large bug in my implementation which made it seem like all configurations were roughly the same.

But this is why we test things and have large servers to rerun all of our tests . Around this time, I scheduled my defense with my committee, June 22nd at 2pm.

Step 5: Writing a (mock) paper (2 weeks)

So now I’m nearing the end. All I need to do is write this all up. Luckily, I had plenty to work with from the proposal. This process was not too bad. The most difficult part for me was writing up the background section. My confidence in my knowledge of verification techniques was pretty good, but I was still afraid of making a mistake. Besides this, I threw together some tables, made a nice image with the help of a lab mate, and tried to explain everything.

One important thing I was told quite frequently during quals from lab mates was that it doesn’t matter if your results are good or bad. What matters is you can explain why they’re good or bad. Luckily, I had a mixed bag instead of all bad. My explanation for why it was good came from my hypothesis of why it would be good. My explanation for why it wasn’t as good took some analysis, but so does most good research. Very rarely does it jump out at you.

Step 6: The Defense (2 hours)

Some quick backstory. My lease for my apartment at the time ended June 22nd. My new lease started the 15th. My friends helped me move in Friday. So i had to both finish moving and prepare for what was the most important presentation of my academic career (so far). In the future, I would suggest not doing this if you can. Especially when your lease start date is negotiable.

Honestly, it was probably a good thing. I woke up that morning and did a final sweep of the apartment and turned in the keys. By the time I was done it was 11, so I re-read some papers, went over my slides, and I was ready to go. I felt pretty comfortable. For the last 3 months, this is more or less all I did. I started giving the presentation (over zoom) and people chimed in with questions. This was great (for me). It broke me from a robotic rhythm and made things feel more organic. By the time I was done, what was supposed to be a 20 minute presentation was probably 40 minutes, mostly due to questions.

At this point, they opened the floor to questions, but people had basically gotten them all out of their systems. So, the committee began. The first member of my committee asked very holistic questions about the purpose and uses of the technique I was proposing. The second one asked more specific questions about the machine learning involved. The third member asked tough questions, with one I believe I straight up couldn’t answer, but most I could confidently show the merits of my technique and that I had a strong knowledge of what I’d done.

Throughout this whole process, I felt pretty zen. I can be a rather anxious person at times, but this stuff was ingrained in my brain. I had prepared well, but didn’t fixate on the “what ifs”. Some questions I knew were coming, but others came and it felt like a conversation finding the answer. All in all, it was a tough, but rewarding process.