Introduction
Hello there ! Today I will present you an introduction to fuzzing and how you can use it as a developer !
What is fuzzing ?
Fuzzing is a technique to find bugs, crashes and vulnerabilities by stressing a process taking an input. This input can be of any kind: a process taking commands, a USB port, a compiler, a form, etc…
To do that, we send many random or pseudo-random inputs to the process and check if it crashes. When an application crashes, something might have gone wrong with memory and might constitute a vulnerability or an entry point for an attacker. In any case, a crash leads to a denial of service and an attacker could use it to prevent users from accessing your service.
What For ?
Fuzzing has two main goals :
-
On the developer side, it can be used to find corner cases and patch the code.
-
On the hacker side, it may allow to find an exploitable crash.
Note that if hackers are using fuzzing, you should be too, in order to prevent them from finding a vulnerability in your program.
Great, what can I fuzz ?
Looking at the short definition I made, you can fuzz any program/device as long as you have two things:
-
an entry point for data you control
-
a way of knowing if the device/process crashed because of your inputs.
Once you have that, you must find a way to continously send inputs while checking how is the target handling it. Do not worry, most of the time you will find programs to help you.
In this article
The purpose of this article is to make an introduction of fuzzing and show how you can quickly setup a fuzzing environment and have results !
I will take an old project: back in December 2020 my group had to develop a shell named 42sh. I will take this project for convenience as I already have a lot of test cases.
How to fuzz a program ?
You only need three things to fuzz a program !
-
The program
-
A fuzzer
-
Test cases
Why do we need test cases ?
Well even if a fuzzer can seem magic, it needs a source to mutate. It is as if you are giving it a working example to show it what your program is waiting for.
Here, I am fuzzing a shell, so I will give working shell scripts to my fuzzer. In this article, we will be exploring how you can use your functional tests to fuzz your program.
Radamsa
Radamsa is a powerful, quick to setup, fuzzer. You can find its source code here.
Radamsa is really easy to use:
1
2
$ echo "Test 1" | radamsa
Test 53435465
As you can see, Radamsa mutates the input and tries to find corner cases. In this example it has increased a lot the number.
Radamsa has many other ways to mutate an input, like duplicating lines:
1
2
3
4
5
6
$ echo "Test 1" | radamsa
Test 1
Test 1
Test 1
Test 1
Test 1
Mutating a char:
1
2
$ echo "test" | radamsa
t?st
Duplicating blocks:
1
2
$ echo "<p>Hello</p>" | radamsa
<p><p><p><p><p><p>Hello</p></p></p></p></p></p>
But more importantly, it might do a combination of the above examples.
From every use of radamsa, you will get a different output because it picks a number in
/dev/urandom
to select the mutation strategy.
However, if you would ever need to have the same output multiple times, you can specify a seed:
1
2
3
4
$ echo "Test 1" | radamsa --seed 42
Ts/v9t 1
$ echo "Test 1" | radamsa --seed 42
Ts/v9t 1
How could Radamsa have helped in my project ?
During the development of 42sh we did not notice a heap-buffer overflow. In fact, our 42sh did not support nested functions. Which means that the following script would crash:
1
2
3
4
5
6
function fun()
{
function fun2()
{ echo toto; }
}
fun
1
2
3
$ ./42sh crash.sh
toto
[1] 18011 segmentation fault (core dumped) ./42sh crash.sh
This is due to a lack of testing on this part of the project. So I wondered if Radamsa could have helped.
In our functional tests we had this script:
1
2
3
4
5
6
7
8
9
10
11
test()
{
echo world
}
test_two()
{
echo hello
}
test_two
test
This is a test that our shell successfully executes. Let’s give it to Radamsa as input.
Setup Radamsa
We need to setup Radamsa a bit. Radamsa is a mutator, it does not take a program and fuzz it automatically. Hopefully we can quickly setup a fuzzing environment.
We will send Radamsa output into 42sh until it crashes. Here is the simplest script you can find on Radamsa GitLab page:
1
2
3
4
5
6
while true
do
radamsa script.sh > fuzzed.sh
./42sh fuzzed.sh > /dev/null 2> /dev/null
test $? -gt 127 && break
done
We first save the output of radamsa in fuzzed.sh
.
So once our program has crashed, you can get the faulty input in that file.
WARNING
You do not want to fuzz shells in your development environment.
You should really do it in a docker container for example.
As we saw, Radamsa can mutate characters so you cannot predict the executed commands.
In my case, I fuzzed with echo toto
and it sometimes created a redirection symbol >
leading to hundreds of files
on my VM after few minutes of fuzzing.
Time to fuzz
Okay now time to launch the script with the test given above. Will we find the crash with inner functions ?
Turns out it did … in 2 seconds.
Time to look at fuzzed.sh
, the output of Radamsa that led to a crash !
1
2
3
4
5
6
7
8
9
10
11
12
13
14
test()
{
est()
{
echo world
}
test_two()
{
echo hello
}
}
test_two
test
It looks messy, but remember that a fuzzer has no concept of coding-style and it is not its purpose. Just by adding a few tabs, you can clearly see what the script does:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
test()
{
est()
{
echo world
}
test_two()
{
echo hello
}
}
test_two
test
It declares two functions inside the function test. Now it is the developer job to test and find the minimal input that makes his program crash.
In my case, the following script is enough to crash 42sh:
1
2
3
4
5
6
7
8
test()
{
test_two()
{
echo hello
}
}
test
This fuzzing script was really quick to setup and Radamsa found a crash in seconds, proving that fuzzing-assisted tests can be really useful !
Test framework
Doing this article, I convinced myself to use fuzzing more in my projects.
In the example above, I knew where the crash was, so I selected the input that could lead Radamsa to find the bug. To find other bugs, I need to treat all the shell scripts I have in my testsuite.
How many runs of Radamsa per test ?
To answer this question, I need to know how many tries Radamsa needed to find the crash above. I added a little counter. It seemed that 700 tries was most of the time large enough to find the crash, so that will be my reference.
In my case we had hundreds of tests, so the fuzzing might take a few minutes.
Putting a limit
Time to put a limit of runs, if you developed well, Radamsa may take forever to find a bug. Test frameworks must give a nice feedback, so if it has found a crash, we will output a red message, a green one otherwise.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RED='\033[0;31m'
NC='\033[0m'
GREEN='\033[0;32m'
TRIES=700
found=0
for i in $(seq 1 $TRIES);
do
radamsa script.sh > fuzzed.sh
# Redirecting segmentation fault message
{ cat fuzzed.sh | ./42sh > /dev/null 2> /dev/null; } 2> /dev/null
if test $? -gt 127; then
echo -e "${RED}Crash found${NC}"
found=1
break;
fi
done
if [ $found -eq 0 ];then
echo -e "${GREEN}Nothing found${NC}"
fi
Executing for each input script
Now we want to execute that script on each test case in a directory.
To get all test cases recursively, we use find <directory>
in a for loop.
1
2
3
4
5
6
7
8
9
10
11
12
13
for file in $(find fuzz_cases -type f); do
found=0
for i in $(seq 1 $TRIES);
do
radamsa "$file" > fuzzed.sh
...
done
if [ $found -eq 0 ];then
echo -e "${GREEN}Nothing for ${file}${NC}"
fi
done
Features !
If we want to really be able to use this little test framework, we will need nicer features.
First, sometimes the program might fall in a infinite recursion depending on what you are fuzzing. This would block our fuzzer. So we will add a little timeout.
1
2
3
...
{ timeout 1 ./42sh fuzzed.sh > /dev/null 2> /dev/null; } 2> /dev/null
...
Now our main problem is that we have lost the possibility to save the faulty scripts (as they are overwritten). So we will save our faulty scripts to another folder.
1
2
3
4
5
6
if test $? -gt 127; then
echo -e "${RED}Crash found for ${file}${NC}"
cp fuzzed.sh "crashes/$(basename ${file}).fuzz"
found=1
break;
fi
Now, time to make this script generic by replacing ./42sh
, and the crashes
and fuzz_cases
folders to program arguments.
Results
I ran this tool with 137 test cases. It took 22 minutes, but considering there is no multithreading, I think it is reasonable. It found more than 20 crashes which means that it works pretty well! On the other hand, I was a bit concerned to find as many bugs on a project that got a very good mark, but looking at crashes, I saw that most of them are caused by the same vulnerability.
Go Further
There are a lot of improvements you can do depending on your project. For example multithreading could be a good idea to speed up the process. That would mean to write everything again in another langage, but I am sure it is worth the time spent. Another thing that could be implemented is crash triage. As we saw in the Results section, we get many crashes that are related, whereas a good fuzzer is able to only keep crashes that are different, which are crashing at different places. A simple way to do that is to compile your project with ASAN and check in which file and on which line the program crashed.
Radamsa for security audit ?
Radamsa is very useful because it is very quick to setup.
But keep in mind that it is blind, it has no idea of the program it is fuzzing. As we saw, it does not even know that your program crashed and you have to setup a fuzzing script.
Running Radamsa and getting no crash is not enough to consider your application secure. There are advanced tools that can help you fuzzing deeper.
I am still sure that Radamsa is helpful in the development phase of a project. But it is not sufficient to audit the security of your program. There are plenty of tools you have yet to discover.
Code coverage guided fuzzing: American Fuzzy Lop (AFL)
What if your fuzzer could be aware of how the input affected the binary it is fuzzing ?
That’s exactly what code coverage guided fuzzing is for !
As its name suggests, this kind of fuzzer uses code coverage techniques to get information on what branches the mutated input ran through.
AFL is a code coverage guided fuzzer made by Google. AFL looks how the mutated input affected the binary. If the mutated input reached new blocks of code never reached before, it will save it as a base to mutate further. The goal of AFL is to reach as much code as it can. Unfortunately, you cannot use AFL for short runs just like we did with Radamsa. AFL is meant to be run for long periods of time, if you want it to find vulnerabilities.
AFL will also do crash triage ! As explained earlier, this means that it will count every input that triggered the same crash as only one vulnerabilty.
Am I secure now ?
If you let run a well setup AFL for hours/days on your program and you find no crashes, you are getting close to a secure service. Now if your application is critical (it runs with privileges and is used by a lot of users for example), you need to go deeper.
AFL must have been run on every common Linux binary by now. But still, for example, we have been missing a bug in sudo for ten years. sudo is probably one of the most critical binaries on Linux, hundreds people must have run AFL on it, but this bug stayed unknown for years. In fact, as this series of videos explains, the bug in sudo involved a lot of difficulties in the fuzzing process (probably the reason why the vulnerability was found by code review).
The example of sudo shows that it is difficult to fuzz one hundred percent of your code. The vulnerable function might be deep in your binary, and AFL may have trouble finding the path to it.
Let’s imagine the following C code:
1
2
3
4
5
6
7
8
void safe_function(int a, int b, int c, void* data)
{
if (a != 35434 || b != 234532 || c != 7894567)
{
return;
}
vuln_function(data);
}
This example illustrates how hard sometimes it can be to access a vulnerable function for a fuzzer. AFL will have a hard time guessing those numbers, and imagine if there are multiple functions with hard conditions like this to come to this code.
What you can do to fuzz vuln_function
(as a developer or a security researcher), is creating an entry point just for fuzzing this part of the code.
You could do something like this:
1
2
3
4
5
6
7
8
// Special main created only to fuzz vuln_function
int main(void)
{
void buffer[256];
// Reading AFL input
read(0, buffer, 256);
vuln_function(buffer);
}
In my 42sh example, I could have specifically targeted the lexer, parser or execution part of the binary by doing this technique.
Conclusion
Thinking about all the test cases in a project can be hard, particularly if you are in a rush. But we have seen that we can setup a simple fuzzing environment in a few minutes, and that the fuzzer can sometimes find a crash in seconds ! I really think it is worth the time spent setting up. Do not be afraid to discover bugs, the earlier you find them, the easiest they are to fix. I encourage you to add the faulty scripts found by fuzzing to your testsuite to prevent any kind of regression in the future.
When should I fuzz my project ?
We have setup a quick fuzzing environment with Radamsa.li I discourage you to run this kind of program everytime you commit. You will be fed up really quick and stop using it. You might want to run it only before a merge to your stable branch. Also, the thing you can do is taking a subset of your tests to launch a fuzzing session, depending on the feature you are working on.
If you have a server at home or a VPS, you can let it run with a high limit to find crashes. On a CI, you can put a job for fuzzing with a little limit: if you often push, at the end it will make a lot of tests.
Finally, once your application is production ready, do a security audit by launching advanced fuzzers like AFL.
Sources and interesting links
Series of videos on how to find the sudo vulnerability with fuzzing