layout: post title: Browser Fuzzing tags: [hacking] —

Well it fucking happened. I stopped writing to this blog for a while. Who saw that coming? Anyway I’m making a comeback. The delay in posts was caused by 🥁 - me being in the fucking hospital. Some highlights: perfortated intestine, lost 40 lbs (I look like a goddamn male model but not as pretty), and a near death experience or two. No big fucking deal.

Anyway that’s overwith and I’m mostly cool now so let’s talk about browser fuzzing. Modern browser fuzzing seems to mainly be made up of two things: DOM fuzzing (older) and JavaScript fuzzing which amounts to most of the time JIT engine fuzzing. Each browser uses its own JIT engine so a good fuzzer needs to support a number of these engines to get good performance, coverage, and results. OK cool. Let’s talk about some specific examples. This blog post is going to serve as an actually readable guide on fuzzing a JIT engine. I’m going to admit something - I don’t usually do this, I’ve played around with some browser fuzzing but never really got into it proper, other programs were always more interesting to me. 1-clicks are fine I guess, but I prefer when I can just hit something and get immediate results without user interaction. But anyway, it’s important, the bugs are valuable (500k-2mil depending on platform), so it’s worth talking about. It’s also just a cool and unique target with a lot of low-level stuff to understand.

Anyway we’re gonna focus on Fuzzili, the javascript fuzzing engine by Samuel Groß, which is the de facto standard for js fuzzing shit. I start by reading this: https://saelo.github.io/papers/thesis.pdf Saelo’s (aka Samuel’s) thesis on Fuzzili. This is usually a good place to start but not a good place to end as a thesis will contain the basics and features of a fuzzer but not all the later development. It’s informative, like the AFL original paper, but not comprehensive, like how it turned into AFL++. So ima skim it.

As it turns out the paper is pretty dense and actually worth a complete readthrough. Here’s some basics of how the system works though. First of all, it talks about semantic and syntacical correctness - in the case of fuzzing something like javascript this is not all easy. Syntactical correctness is easy enough, but semantic correctness (a program that semi-makes sense) is a bit of a harder thing. Fuzzilli implements a custom intermediate language (FuzzIL) that is used against the bytecode level of javascript (javascript -> bytecode -> code -> execution being the typical flow of a program). This allows the fuzzer to do some cleverness. For example syntactic correctness can be achieved simply by ensuring that FuzzIL programs can be converted to js. How it handles semantic correctness is a bit of black magic and just solid implementation of FuzzIL. From the paper:

To satisfy the third requirement, we first note that due to the relatively minor changesperformed by each mutation, the chances of it turning a semantically valid program into aninvalid one are relatively small.  Further, all mutations are required to obey to a set of basicsemantic correctness rules of the IL, enforcing such things as the definition of a variablebefore its usage.  Finally, by avoiding the inclusion of semantically invalid programs, it ispossible to keep the overall percentage of emitted invalid programs at an acceptable level.

So we have a way to mutate javascript programs without sacrificing semantic or syntactic correctness. This is great. Now think of a fuzzer like AFL where somewhat random mutations occur on a program and the control flow graph of the program is kept, with edges that represent parts of the program. New edges are “focused on” by the algorithm. A natural question arises - why don’t I just fuzz with AFL++ or something instead of dealing with all this Fuzzili bullshit? Well that’s somewhat astute, until you realize that you’re not going to achive syntactic correctness in a lot of cases, and you’re certainly not going to build complex blocks of javascript programs from example programs. Fuzzili, but hooking into the bytecode level does a good job of actually editing, changing around, and adding to programs in way that may or may not be interesting (usually not, remember it’s a fuzzer so it’s going to depend on brute force), but ARE more often than not syntactically and semantically correct..