0
0
mirror of https://github.com/mongodb/mongo.git synced 2024-11-24 00:17:37 +01:00
mongodb/docs/libfuzzer.md
Alex Neben b665258d9d SERVER-88970 Added yaml formatting to server repo
GitOrigin-RevId: 35db3811d8f749edd5b79ba910adcbc1ceb54cc4
2024-04-06 05:23:20 +00:00

88 lines
3.2 KiB
Markdown

---
title: LibFuzzer
---
LibFuzzer is a tool for performing coverage guided fuzzing of C/C++
code. LibFuzzer will try to trigger AUBSAN failures in a function you
provide, by repeatedly calling it with a carefully crafted byte array as
input. Each input will be assigned a "score". Byte arrays which exercise
new or more regions of code will score better. LibFuzzer will merge and
mutate high scoring inputs in order to gradually cover more and more
possible behavior.
# When to use LibFuzzer
LibFuzzer is great for testing functions which accept a opaque blob of
untrusted user-provided data.
# How to use LibFuzzer
LibFuzzer implements `int main`, and expects to be linked with an object
file which provides the function under test. You will achieve this by
writing a cpp file which implements
```cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
// Your code here
}
```
`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer
generated bytes in `Data`. `Size` will always truthfully tell your
implementation how many bytes are in `Data`. If your function crashes or
induces an AUBSAN fault, LibFuzzer will consider that to be a finding
worth reporting.
Keep in mind that your function will often "just" be adapting `Data` to
whatever format our internal C++ functions requires. However, you have a
lot of freedom in exactly what you choose to do. Just make sure your
function crashes or produces an invariant when something interesting
happens! As just a few ideas:
- You might choose to call multiple implementations of a single
operation, and validate that they produce the same output when
presented the same input.
- You could tease out individual bytes from `Data` and provide them as
different arguments to the function under test.
Finally, your cpp file will need a SCons target. There is a method which
defines fuzzer targets, much like how we define unittests. For example:
```python
env.CppLibfuzzerTest(
target='op_msg_fuzzer',
source=[
'op_msg_fuzzer.cpp',
],
LIBDEPS=[
'$BUILD_DIR/mongo/base',
'op_msg_fuzzer_fixture',
],
)
```
# Running LibFuzzer
Your test's object file and **all** of its dependencies must be compiled
with the "fuzzer" sanitizer, plus a set of sanitizers which might
produce interesting runtime errors like AUBSAN. Evergreen has a build
variant, whose name will include the string "FUZZER", which will compile
and run all of the fuzzer tests.
The fuzzers can be built locally, for development and debugging. Check
our Evergreen configuration for the current SCons arguments.
LibFuzzer binaries will accept a path to a directory containing its
"corpus". A corpus is a list of examples known to produce interesting
outputs. LibFuzzer will start producing interesting results more quickly
if starts off with a set of inputs which it can begin mutating. When its
done, it will write down any new inputs it discovered into its corpus.
Re-using a corpus across executions is a good way to make LibFuzzer
return more results in less time. Our Evergreen tasks will try to
acquire and re-use a corpus from an earlier commit, if it can.
# References
- [LibFuzzer's official
documentation](https://llvm.org/docs/LibFuzzer.html)