Enabling safe refactoring with pytest and coverage
This article starts like bash approval testing, but the solution takes another path. Let's rewind the story ! (or jump to putting the code under tests to skip the plot)
The plot
Chris the product owner : Hey Alex. The business wants to add a new questions category to Trivia Game. Our developer left the project. Can you do it ?
Alex the dev : Hey Chris. Sure. You send me the code ?
Chris : Here it is trivia.py.
Alex : I take a look at it.
Here we are, with some legacy code. It's valuable. The business runs on it. And it has no tests. The author of this code isn't here anymore to explain his choices. We are left alone with the task.
The plan is to proceed in two steps :
- Get a general idea of what's going on
- Put the code under tests to clean it up
What's going on
We execute the code.
python trivia.py
It prints a series of messages in the console:
Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
They have rolled a 4
Chet's new location is 4
The category is Pop
Pop Question 0
Answer was corrent!!!!
Chet now has 1 Gold Coins.
Pat is the current player
...
Answer was corrent!!!!
Chet now has 6 Gold Coins.
These messages are the observable behavior of the application.
It appears to be a board game. Players are added to a game. They roll dice, answer questions, and earn coins. The game ends when a player gets 6 coins.
The code is messy and error-prone.
Let's refactor it. This will :
- Improve the clarity of the code
- Make it easier to add the new feature
But before making any change, we want to secure it.
Putting the code under test
Extract the suite of if __name__ == '__main__' in a play() method.
Then, create a test that captures the observable behavior of the game.
At the project's root:
pip install pytest
touch test_trivia.py
In test_trivia.py:
import sys
from io import StringIO
from trivia import play
def test_trivia():
output = StringIO()
sys.stdout = output
play()
assert output.getvalue() == APPROVED
APPROVED = None
The test will fail.
Run pytest -k test_trivia -vv and assign the value of output to APPROVED :
APPROVED = """\
Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
They have rolled a 5
Chet's new location is 5
The category is Science
Science Question 0
Answer was corrent!!!!
Chet now has 1 Gold Coins.
Pat is the current player
...
Answer was corrent!!!!
Chet now has 6 Gold Coins.
"""
-vv increase verbosity twice and prevent pytest from truncating the output.
Run pytest -k test_trivia -vv again.
The beginning of the strings are equal but they differ on the first roll:
Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
- They have rolled a 5
? ^
+ They have rolled a 3
? ^
- Chet's new location is 5
? ^
+ Chet's new location is 3
? ^
It looks like there is randomness.
To solve this problem, we can :
- fix randomness by calling
random.seed(10)in the arrange phase of our test - run the test again —
pytest -k test_trivia -vv - capture the output for the seed and paste it into
APPROVED. - run the test again —
pytest -k test_trivia
Green !
Now :
- What about the quality of our test ?
- Do we cover all the production code ?
- Do we cover every behavior?
Let's take a look at the coverage.
Finding holes with test coverage
Test coverage reports gives the code executed by our test suite.
Along with pytest, we can get it by running:
pip install coverage
coverage run -m pytest
coverage html
open htmlcov/index.html

Only two lines aren't covered :
- the
is_playablemethod (but it's dead code) - the suite of
if __name__ == '__main__':(but there isn't any logic here, the bloc just callsplay)
It looks like everything is covered.
Keep in mind that coverage is a negative metric. If the code is not covered, you know for sure that the code isn't protected from regression. But coverage doesn't guarantee you're safe. To be sure an instruction is protected, mutate it. The test should now fail. If it's not the case, your test can be improved.
As we have enough confidence in our test, let's format the code with black.
pip install black
black .
It's way better ✨ 😎 ✨.
Let's have a look to our coverage again
coverage run -m pytest && coverage html
open htmlcov/trivia_py.html

There are 10 missing lines now. 😱
We missed holes due to the formatting.
The clause
if self.places[self.current_player] == 8: return 'Pop'
became
if self.places[self.current_player] == 8:
return 'Pop'
with the formatting.
The header is covered, but the suite is never executed.
This information was missed in the first report.
Format your code to standards. It avoids wrong measures with
coverage.
Filling holes with more test cases
Our test suite is viable, but isn't 100% secure.
I would tend to find it acceptable for day-to-day situations.
It offers a good compromise between level of protection and setup effort.
But we can improve it.
The usual solution is to add missing test scenarios.
Here, the missing scenarios are games with other seeds.
A way to cover them is to :
- duplicates trivia code
- test it over a wide range of seeds
import random
from io import StringIO
import sys
def test_trivia():
for i in range(1000):
seed = i * 100
random.seed(seed)
output = StringIO()
sys.stdout = output
play() # <-- the code we will refactor
random.seed(seed)
reference = StringIO()
sys.stdout = reference
play_reference() # <-- a copy of the original behavior
assert output.getvalue() == reference.getvalue()
If duplicating the code isn't convenient, we can store the approved outputs:
import random
from io import StringIO
import sys
def test_trivia():
for i in range(1000):
seed = i * 100
random.seed(seed)
output = StringIO()
sys.stdout = output
play()
approved_path = f"data/test_trivia-approved-{seed}.txt"
# Generate the approved versions
# with open(approved_path, "w") as f:
# f.write(output.getvalue())
with open(approved_path, "r") as f:
expected = f.read()
assert output.getvalue() == expected
Let's check the coverage:
pytest --cov=. --cov-report html
open htmlcov/trivia_py.html

Dead code appart, all the paths are covered.
We can play a few mutations to increase confidence in the test.
Now we can refactor safely.
Conclusion
We explored a safe way to put trivia under test.
But this approach still has a weakness : it plays with side effects.
It works with randomness and logs. But this approach won't work with external dependencies like HTTP calls and database communications.
Also :
- it can cause problems if you want to run tests in parallel
- it's all or nothing (you cannot decide which log to catch and which not)
A more generic approach can be adopted by injecting dependencies. Let's see in the next post to see how we can achieve this with "Subclass and Override", and "Move to delegate" techniques.
Postscript - I'm not recommending to use logs to perform approval tests. I juste note that it can be used to do so. By the way, in most situations, approval tests are meant to be ephemeral. It's like when you protect the room before painting. Once the job is done, you remove the protections.
Leave a reply