Automating Unit Tests in Python with Hypothesis
Unit testing is key to producing quality code. Here’s how to automate it.
Unit testing is key to developing quality code. There’s a host of libraries and services available that you can use to perfect testing of your Python code. However, “traditional” unit testing is time intensive and is unlikely to cover the full spectrum of cases that your code is supposed to be able to handle. In this post, I’ll show you how to use property-based testing with Hypothesis to automate testing of your Python code. I also discuss some of the advantages of using a property-based testing framework.
Property-based automated testing
Unit testing involves testing individual components of your code. A typical unit test takes input data, runs it through a piece of code, and checks the result against some pre-defined expected outcome.
Hypothesis does something different. It is a property-based (or: “generative”) testing framework. A property-based test involves defining general expectations about your code instead of specific examples. For example, if you have some code to calculate total VAT on a number of transactions, you could define a bunch of hypothetical numbers with their corresponding VAT amount ($100 transaction → $xx.xx tax) and put this to the test. However, if you know that VAT is, let’s say, 20 percent, a property-based test would verify that total VAT always is 20 percent of the total amount.
Hypothesis is built on these principles. It generates arbitrary input data according to some specification and subsequently puts that data to the test. What’s more, when Hypothesis finds an example that causes an assert failure, it’ll try to simplify the example and find the smallest failing case — a process called “shrinking”. Hypothesis will essentially try to “break” your code. Your tests will therefore cover a much larger chunk of your domain space with the same amount of code. And, you’re bound to find edge cases that you hadn’t even thought of.
Getting started with Hypothesis
Let’s see how Hypothesis works in practice. Hypothesis has three key components: the code that you’re testing, the strategies that define your test data, and a function that tests your code using the strategies.
Let’s assume we have a simple (and non-sensical) piece of Python code that converts a float value to an integer:
This code has a clear property: the outcome should always be an integer type.
Strategies
To test this code, we’ll first define a “strategy”. A strategy defines the data that Hypothesis generates for testing, and how examples are “simplified”. In our code, we only define the parameters of the data; the simplification (or: “shrinking”) is internal to Hypothesis.
We’ll start with a strategy that generates a float value between 0.0 and 10.0 (inclusive). We define this in a separate file called data_strategies.py. Using a dataclass for this may seem like overdoing it, but it is useful when you’re working with more complex code that takes a bunch of different parameters.
A lot of time can go into defining the strategies, and in fact, it should. The whole point of property-based testing with Hypothesis is that you define the parameters from which data is generated, so that you subsequently allow your automated testing to do its magic. The more time you spend on this, the better your testing is bound to be (think: “high investment; high reward”).
Bringing your code and your strategies together: Running tests with Hypothesis
After we’ve defined our strategy, we add a small piece of code to pass the Hypothesis-generated examples to our function and assert something about the required outcome (the “property”) of the code that we want to test. The code below draws a float value from the generated_data dataclass object that we defined in the data_strategies.py file above, passes that value through our convert_to_integer function, and finally asserts that the expected property holds.
Configuring Hypothesis: Useful settings
Before we run the test module that we developed above, let’s review some of the configurations that we can use to tailor Hypothesis to our use case. Hypothesis comes with a bunch of settings. These settings can be passed to your test function using the settings() decorator, or by registering the settings in a profile, and passing the profile using the decorator (see example code below). Some useful settings include:
max_examples: Controls how many passing examples are required before testing is concluded. This is useful if you have some internal guidelines for the volume of testing that is required for a new piece of code to pass review. As a general rule of thumb: the more complex your code, the more examples you’ll want to run (Hypothesis’ authors note that they managed to find new bugs after several million examples while testing SymPy);deadline: Specifies how long an individual example is allowed to take. You’ll want to increase this if you have very complex code where one example may take more than the default time to run;suppress_health_check: Allows you to specify which “health checks” to ignore. Useful when you’re working with large sets of data(HealthCheck.data_too_large) or data that takes a long time to generate (HealthCheck.too_slow).
Let’s use these settings in our testing module. With these simple lines of code, we can now go ahead and throw 1000s of examples at our function to verify that it works as expected. You can run the tests from terminal (python -m pytest test_my_function.py), or if you use an IDE like Pycharm, by specifying the appropriate pytest configuration for your code.
Upping your game: Using composite strategies
So far, the examples I’ve used are simple. Hypothesis can handle much more complex test cases using composite strategies, which, as the name suggests, allows you to combine strategies to generate testing examples. So, let’s up our game and use Hypothesis in a more complex setting.
Assume you have developed a piece of code that calculates the percentile value of an array. There are plenty of Python libraries out there that will do this for you, but let’s say you’re particularly passionate about the percentile and simply want an implementation of your own. In this example, our aim is to benchmark our solution against an existing implementation. With this in mind, we can define two simple properties of this code that we can test:
- The order in which the array of values for which we calculate the percentile is supplied should not matter for its outcome;
- The function’s output needs to correspond with the value calculated with another, generally accepted library (note that defining this as a “property” is slightly atypical — more on this in the final section of this article).
Let’s start with our percentile function. Here, we implement a version of that uses midpoint interpolation. The function accepts an array of integer or float values (arr), sorts it, identifies the floor and ceiling value based on the specified percentile (perc), and finally takes the midpoint of the two values.
Now, let’s move on to our strategy, that is, a definition of the data that we generate for testing. Here, we define a function called generate_scenario that produces an array of float values of a random length (n), based on a randomly selected distribution (dist). We also generate the percentile value that we want to compute (perc). We return the values and the percentile as a dictionary, so we can easily access the values that we need for testing. Note the use of the @st.composite decorator, which converts “a function that returns one example into a function that returns a strategy that produces such examples”.
Finally, we can use our composite strategy in the testing module. In this module, we specify our strategy settings (SETTINGS), and specify a function that runs our own code. The test_calc_percentile function tests that inverting the order of the array does not affect our percentile function’s output, and compares the result against a NumPy implementation. With the profile that we’ve set up here, we run 10,000 examples, which takes approximately 30 seconds to complete on my five year old laptop.
Before you begin your testing: A note on when you should and shouldn’t use Hypothesis
Before your start developing your own unit tests, there’s a couple of things you need to be aware of. Property-based testing works great in specific contexts. It is a powerful framework when the properties of a piece of code are well and easily defined, that is, “part x of my code needs to produce an outcome that has property y”. When defining the conditions of the “assert” statement is just as complex as the code that you’re trying to test, you’ll end up re-implementing your existing code.
Having two parallel implementations may be appropriate in some instances, for example when you’ve created a novel piece of code or re-implemented something with a novel framework that you need to benchmark against existing implementations, as shown in the example above. In most cases, however, it is not: you’ll waste time and resources, and end up having to maintain two pieces of code that do the exact same thing.
Even when automating your testing, it is advisable to steer a middle course between “traditional” unit tests and property-based testing. You’ll want to avoid making your tests too much of a “black box” and make sure you cover the obvious cases, especially at early stages of development. Creating at least a couple of well understood scenarios is always recommended. And, combining your own “manual” test cases with automated testing will go a long way to improving your test coverage, and can drastically improve your code.
Thanks for reading! What Python tools do you like to use to improve code quality? Please leave your suggestions in the comments!
If you liked this article, here are some other articles you may enjoy:
Please read this disclaimer carefully before relying on any of the content in my articles on Medium.com.