Fuzzing Django Applications With the Atheris Fuzzing Engine

Published in

The Startup

7 min readDec 17, 2020

Atheris hispida, a venomous viper species — Atheris hispida. Photo by Bree Mc, soulsurvivor08 at flickr.com, CC BY 2.0, via Wikimedia Commons

Fuzzing, or fuzz testing, is a technique used to find flaws in software by providing unexpected input. It has been successful in uncovering numerous serious security issues. Additionally, many other bugs could have been spotted earlier if fuzz testing had been conducted.

Google’s Atheris fuzzing engine allows Python programs to be tested with libFuzzer, a widely used library for coverage-guided fuzz testing. Atheris is simple to install and easy to use, making it accessible to developers.

In this article, I will demonstrate how to test a web application based on the Django framework and showcase the results that can be achieved with minimal effort. By leveraging fuzz testing, you can identify potential vulnerabilities within your web application and enhance its security.

(Last updated in June, 2023.)

TL;DR

It doesn’t find all Django views automatically. Prepare a good test corpus to help it find routes in your application, and a dictionary with relevant phrases or strings to use when fuzzing.
The smaller part of the framework you harness is, the more expectations you fail to meet. You may not want to test the URL resolver or the middleware, but views may still depend on them. Get ready for surprising results.
The more work you put into preparation, the better results you get. Take time to implement shortcuts. Mock up external dependencies. If your views resolve domain names, connect to an external databases, or make any other network requests, you won’t get far in a reasonable time.

Preparations

Follow the official README:

google/atheris

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions…

github.com

Take care to install the latest version of Clang. In my case switching from Clang 11 to 12 increased the testing speed (“exec/s”) twofold.

Approach 1: Fuzzing complete page URLs

In this approach we create a django.test.Client and ask Atheris to visit random pages of our project:

#!/usr/bin/env python3
import os, django

# I assume this variable is already set (e.g. in your shell)
# os.environ.setdefault("DJANGO_SETTINGS_MODULE", "proj.settings")
django.setup()

from django.test import Client
client = Client()

def TestOneInput(data):  # this function will be called with random bytes
  url = '/' + data.decode('latin2') # convert random bytes to string
  # these chars result in ValueError: Invalid IPv6 URL
  url = url.replace('[', '%5B').replace(']', '%5D')
  response = client.get(url)
  if response.status_code not in [200, 301, 302, 404]:
    # suspicious! log the HTTP status and the request path
    print(response.status_code, url)
    raise RuntimeError("Badness!")
  elif response.status_code not in [302, 404]:
    # not suspicious, but let's see what pages it finds
    print(response.status_code, url)

# initialize atheris and start fuzzing
import atheris
import sys
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

First, we initialize the Django framework and create a test client. Then we use it to obtain a response for the given path. To simulate accessing your website as a logged-in user, we call “client.force_login” method described in the Client documentation.

The last step is to check the HTTP status code in the response.

If the status code is unexpected, then we immediately stop executing the script to investigate the case further. We want as many such cases as possible.
Otherwise, if Atheris found a properly working page, we log a message.

What status codes are expected or not? It depends on your application. I decided to ignore responses with HTTP redirects (statuses 301, 302) as well as the “Page not found” response (status 404).

Let’s start our script:

$ ALLOWED_HOSTS='["testserver"]' ./fuzz1.py -max_len=150
INFO: Configured for Python tracing with opcodes.
INFO: Seed: 3353455860
INFO: Loaded 2 modules   (1024 inline 8-bit counters): 512 [0x558092250230, 0x558092250430), 512 [0x558092234510, 0x558092234710), 
INFO: Loaded 2 PC tables (1024 PCs): 512 [0x5580924b5c70,0x5580924b7c70), 512 [0x5580924b8490,0x5580924ba490), 
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 20202 ft: 20202 corp: 1/1b exec/s: 0 rss: 153Mb
 NEW_FUNC[1/42]: 0x5580924b7e69
 NEW_FUNC[2/42]: 0x5580924b7e81
#3 NEW    cov: 22119 ft: 23051 corp: 2/5b lim: 4 exec/s: 0 rss: 154Mb L: 4/4 MS: 1 CrossOver-
#4 NEW    cov: 22123 ft: 23055 corp: 3/9b lim: 4 exec/s: 0 rss: 154Mb L: 4/4 MS: 1 CopyPart-
#9 REDUCE cov: 22123 ft: 23055 corp: 3/8b lim: 4 exec/s: 0 rss: 154Mb L: 3/4 MS: 5 CopyPart-ChangeBit-ShuffleBytes-CrossOver-CrossOver-
 NEW_FUNC[1/1]: 0x5580927a98fd
#11 NEW    cov: 22137 ft: 23079 corp: 4/12b lim: 4 exec/s: 11 rss: 154Mb L: 4/4 MS: 2 ChangeByte-ChangeByte-
#12 NEW    cov: 22137 ft: 23099 corp: 5/16b lim: 4 exec/s: 12 rss: 154Mb L: 4/4 MS: 1 ChangeBinInt-
#18 NEW    cov: 22137 ft: 23178 corp: 6/20b lim: 4 exec/s: 18 rss: 154Mb L: 4/4 MS: 1 CopyPart-
#35 NEW    cov: 22137 ft: 23186 corp: 7/24b lim: 4 exec/s: 35 rss: 154Mb L: 4/4 MS: 2 ChangeBit-ChangeByte-
#41 NEW    cov: 22137 ft: 23199 corp: 8/28b lim: 4 exec/s: 41 rss: 154Mb L: 4/4 MS: 1 CrossOver-
 NEW_FUNC[1/1]: 0x5580927a9901
#50 NEW    cov: 22214 ft: 23276 corp: 9/31b lim: 4 exec/s: 25 rss: 154Mb L: 3/4 MS: 4 ShuffleBytes-ChangeByte-EraseBytes-ChangeBinInt-
#64 pulse  cov: 22214 ft: 23276 corp: 9/31b lim: 4 exec/s: 32 rss: 154Mb
#82 REDUCE cov: 22214 ft: 23276 corp: 9/30b lim: 4 exec/s: 27 rss: 154Mb L: 3/4 MS: 2 EraseBytes-ChangeByte-

What we see it a standard output from libFuzzer. If you are new to libFuzzer, the official tutorial explains it in details. I used “-max_len=150” to limit the length of the random value; be sure to take a look at libFuzzer options.

(Edit: An update has been released and the output now contains a proper function name instead of its unreadable memory address.)

As the script runs, the word “NEW” gets rarer and rarer in its output, which means that it’s getting hard to find new code to test. We could help the fuzzer by providing a directory of interesting inputs (corpus). In our case, those inputs would be names of existing pages. LibFuzzer will use that directory to store interesting inputs it finds itself, so it won’t start from scratch next time:

$ mkdir CORPUS1
$ ALLOWED_HOSTS='["testserver"]' ./fuzz1.py -max_len=150 CORPUS1
INFO: Configured for Python tracing with opcodes.
INFO: Seed: 17267508
(...)
INFO:        0 files found in CORPUS1
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 20202 ft: 20202 corp: 1/1b exec/s: 0 rss: 153Mb
#3 NEW    cov: 20206 ft: 20263 corp: 2/2b lim: 4 exec/s: 0 rss: 153Mb L: 1/1 MS: 1 ShuffleBytes-
(...)
#286 NEW    cov: 22718 ft: 24298 corp: 28/72b lim: 4 exec/s: 31 rss: 155Mb L: 2/4 MS: 1 InsertByte-
^C KeyboardInterrupt: stopping.$ ALLOWED_HOSTS='["testserver"]' ./fuzz1.py -max_len=150 CORPUS1
INFO: Configured for Python tracing with opcodes.
INFO: Seed: 96071814
(...)
INFO:       26 files found in CORPUS1
INFO: seed corpus: files: 26 min: 1b max: 4b total: 70b rss: 152Mb
#27 INITED cov: 22718 ft: 24286 corp: 19/50b exec/s: 27 rss: 154Mb
#31 NEW    cov: 22718 ft: 24303 corp: 20/54b lim: 4 exec/s: 31 rss: 154Mb L: 4/4 MS: 4 EraseBytes-CrossOver-ChangeBit-CrossOver-

We are not limited to GET requests. A form handling functionality can be tested with the “client.post(path, data, content_type, …)” function. I recommend reading the function’s documentation as interpretation of “data” in Django depends on the “content_type” argument.

Approach 2: Fuzzing Django views

It takes a lot of time for the fuzzer to discover all views in your application, so let’s direct our test at specific ones. We can use the RequestFactory and call the view directly.

#!/usr/bin/env python3import os, django
django.setup()from django.test.client import RequestFactory
rf = RequestFactory()from myapp.views import ProductListView as view_under_testdef TestOneInput(data):
  url = '/' + data.decode('latin2')  # convert bytes to string
  # surprisingly these chars result in ValueError: Invalid IPv6 URL
  url = url.replace(']', '%5D').replace('[', '%5B')
  request = rf.get(url)
  response = view_under_test.as_view()(request)
  if response.status_code not in [200]:
    print(response.status_code, url)
    raise RuntimeError("Badness!")
  elif response.status_code not in [404]:
    print(response.status_code, url)import atheris
import sysatheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

Bypassing the resolver and middleware results in a big speed up. For my application a crash is reported in the very first test case:

$ ALLOWED_HOSTS='["testserver"]' ./fuzz2.py -max_len=150 -only_ascii=1 CORPUS2INFO: Configured for Python tracing with opcodes.
INFO: Seed: 1307295576
INFO: Loaded 2 modules   (1024 inline 8-bit counters): 512 [0x558c20687db0, 0x558c20687fb0), 512 [0x558c1faebf90, 0x558c1faec190), 
INFO: Loaded 2 PC tables (1024 PCs): 512 [0x558c2077e880,0x558c20780880), 512 [0x558c207810a0,0x558c207830a0), 
INFO:      159 files found in CORPUS2=== Uncaught Python exception: ===
AttributeError: 'Request' object has no attribute 'LANGUAGE_CODE'
Traceback (most recent call last):
  File "./fuzz2.py", line 31, in TestOneInput
    response = view_under_test.as_view()(request)
  (...)
  File "/usr/local/lib/python3.8/site-packages/rest_framework/request.py", line 414, in __getattr__
    return self.__getattribute__(attr)==1861== ERROR: libFuzzer: fuzz target exited
SUMMARY: libFuzzer: fuzz target exited
MS: 0 ; base unit: 0000000000000000000000000000000000000000artifact_prefix='./'; Test unit written to ./crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64:

However, this not a bug. The view under test needs the LANGUAGE_CODE value, which is normally provided by the middleware. We can provide it by setting a constant value as an attribute after creating the request:

request = rf.get(url)
setattr(request, 'LANGUAGE_CODE', 'pl')

But we can also fuzz the value together with the page address. To do that we need to switch to FuzzedDataProvider and ask it for two Unicode strings:

def TestOneInput(data):
  fdp = atheris.FuzzedDataProvider(data)
  url = '/' + fdp.ConsumeUnicode(40)
  url = url.replace(']', '%5D').replace('[', '%5B')
  request = rf.get(url)
  setattr(request, 'LANGUAGE_CODE', fdp.ConsumeUnicode(2))
  (...)

FuzzedDataProvider provides a range of methods that return values of different types which may be more suitable for your use case, so make sure to take a look at the documentation. For example, if we wanted to constrain the value of a variable to a list of predefined values we could use PickValueInList:

language_code = fdp.PickValueInList(['cz', 'sk', 'pl'])

Final thoughts

“This is way too slow.” Optimize. Checks the settings and disable the middleware classes you don’t use. Switch to SQLite. Look through articles about benchmarking and improving Python performance (for example, on the brilliant PythonSpeed blog).

“Reviewing the code will take less time than optimizing for fuzzing.” That’s possible, but preparing automated tests is a one-time investment. If you integrate fuzz testing in your CI pipeline you will benefit from testing your application for regressions after every change.

“It’s amazing how easy it is to set up.” I’m glad you liked it!

Please let me know in the comments if it worked for your project or you have ideas for improvement in the web applications fuzzing techniques. Or just say ‘Hello!’ :-)