Enabling safe refactoring with pytest and coverage

Wed 26 February 2025

This article starts like bash approval testing, but the solution takes another path. Let's rewind the story ! (or jump to putting the code under tests to skip the plot)

The plot

Chris the product owner : Hey Alex. The business wants to add a new questions category to Trivia Game. Our developer left the project. Can you do it ?

Alex the dev : Hey Chris. Sure. You send me the code ?

Chris : Here it is trivia.py.

Alex : I take a look at it.

Here we are, with some legacy code. It's valuable. The business runs on it. And it has no tests. The author of this code isn't here anymore to explain his choices. We are left alone with the task.

The plan is to proceed in two steps :

Get a general idea of what's going on
Put the code under tests to clean it up

What's going on

We execute the code.

python trivia.py

It prints a series of messages in the console:

Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
They have rolled a 4
Chet's new location is 4
The category is Pop
Pop Question 0
Answer was corrent!!!!
Chet now has 1 Gold Coins.
Pat is the current player
...
Answer was corrent!!!!
Chet now has 6 Gold Coins.

These messages are the observable behavior of the application.

It appears to be a board game. Players are added to a game. They roll dice, answer questions, and earn coins. The game ends when a player gets 6 coins.

The code is messy and error-prone.

Let's refactor it. This will :

Improve the clarity of the code
Make it easier to add the new feature

But before making any change, we want to secure it.

Putting the code under test

Extract the suite of if __name__ == '__main__' in a play() method.

Then, create a test that captures the observable behavior of the game.

At the project's root:

pip install pytest
touch test_trivia.py

In test_trivia.py:

import sys
from io import StringIO

from trivia import play


def test_trivia():
    output = StringIO()
    sys.stdout = output

    play()

    assert output.getvalue() == APPROVED


APPROVED = None

The test will fail.

Run pytest -k test_trivia -vv and assign the value of output to APPROVED :

APPROVED = """\
Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
They have rolled a 5
Chet's new location is 5
The category is Science
Science Question 0
Answer was corrent!!!!
Chet now has 1 Gold Coins.
Pat is the current player
...
Answer was corrent!!!!
Chet now has 6 Gold Coins.
"""

-vv increase verbosity twice and prevent pytest from truncating the output.

Run pytest -k test_trivia -vv again.

The beginning of the strings are equal but they differ on the first roll:

 Chet was added
    They are player number 1
    Pat was added
    They are player number 2
    Sue was added
    They are player number 3
    Chet is the current player
  - They have rolled a 5
  ?                    ^
  + They have rolled a 3
  ?                    ^
  - Chet's new location is 5
  ?                        ^
  + Chet's new location is 3
  ?                        ^

It looks like there is randomness.

To solve this problem, we can :

fix randomness by calling random.seed(10) in the arrange phase of our test
run the test again — pytest -k test_trivia -vv
capture the output for the seed and paste it into APPROVED.
run the test again — pytest -k test_trivia

Green !

Now :

What about the quality of our test ?
Do we cover all the production code ?
Do we cover every behavior?

Let's take a look at the coverage.

Finding holes with test coverage

Test coverage reports gives the code executed by our test suite.

Along with pytest, we can get it by running:

pip install coverage
coverage run -m pytest
coverage html
open htmlcov/index.html

Only two lines aren't covered :

the is_playable method (but it's dead code)
the suite of if __name__ == '__main__': (but there isn't any logic here, the bloc just calls play)

It looks like everything is covered.

Keep in mind that coverage is a negative metric. If the code is not covered, you know for sure that the code isn't protected from regression. But coverage doesn't guarantee you're safe. To be sure an instruction is protected, mutate it. The test should now fail. If it's not the case, your test can be improved.

As we have enough confidence in our test, let's format the code with black.

pip install black
black .

It's way better ✨ 😎 ✨.

Let's have a look to our coverage again

coverage run -m pytest && coverage html
open htmlcov/trivia_py.html

There are 10 missing lines now. 😱

We missed holes due to the formatting.

The clause

if self.places[self.current_player] == 8: return 'Pop'

became

if self.places[self.current_player] == 8: 
    return 'Pop'

with the formatting.

The header is covered, but the suite is never executed.

This information was missed in the first report.

Format your code to standards. It avoids wrong measures with coverage.

Filling holes with more test cases

Our test suite is viable, but isn't 100% secure.

I would tend to find it acceptable for day-to-day situations.

It offers a good compromise between level of protection and setup effort.

But we can improve it.

The usual solution is to add missing test scenarios.

Here, the missing scenarios are games with other seeds.

A way to cover them is to :

duplicates trivia code
test it over a wide range of seeds

import random
from io import StringIO
import sys

def test_trivia():
    for i in range(1000):
        seed = i * 100

        random.seed(seed)
        output = StringIO()
        sys.stdout = output
        play()  # <-- the code we will refactor

        random.seed(seed)
        reference = StringIO()
        sys.stdout = reference
        play_reference()  # <-- a copy of the original behavior

        assert output.getvalue() == reference.getvalue()

If duplicating the code isn't convenient, we can store the approved outputs:

import random
from io import StringIO
import sys

def test_trivia():
    for i in range(1000):
        seed = i * 100
        random.seed(seed)
        output = StringIO()
        sys.stdout = output
        play()

        approved_path = f"data/test_trivia-approved-{seed}.txt"
        # Generate the approved versions
        # with open(approved_path, "w") as f:
        #     f.write(output.getvalue())
        with open(approved_path, "r") as f:
            expected = f.read()

        assert output.getvalue() == expected

Let's check the coverage:

pytest --cov=. --cov-report html
open htmlcov/trivia_py.html

Dead code appart, all the paths are covered.

We can play a few mutations to increase confidence in the test.

Now we can refactor safely.

Conclusion

We explored a safe way to put trivia under test.

But this approach still has a weakness : it plays with side effects.

It works with randomness and logs. But this approach won't work with external dependencies like HTTP calls and database communications.

Also :

it can cause problems if you want to run tests in parallel
it's all or nothing (you cannot decide which log to catch and which not)

A more generic approach can be adopted by injecting dependencies. Let's see in the next post to see how we can achieve this with "Subclass and Override", and "Move to delegate" techniques.

Postscript - I'm not recommending to use logs to perform approval tests. I juste note that it can be used to do so. By the way, in most situations, approval tests are meant to be ephemeral. It's like when you protect the room before painting. Once the job is done, you remove the protections.