Bash approval testing (don't do it)

Jeu 20 février 2025

The plot

Chris the product owner : Hey Alex. The business wants to add a new questions category to Trivia Game. Our developer left the project. Can you do it ?

Alex the dev : Hey Chris. Sure. You send me the code ?

Chris : Yes, here it is trivia.py.

Alex : Ok, I take a look at it.

Here we are, with some legacy code. It's valuable. The business runs on it. And it has no tests. The author of this code isn't here anymore to explain his choices. We are left alone with the task.

The plan is to proceed in two steps :

Get a general idea of what's going on
Put the code under tests to clean it up

What's going on

We execute the code.

python trivia.py

It prints a series of messages in the console:

Chet was added
They are player number 1
Pat was added
They are player number 2
Sue was added
They are player number 3
Chet is the current player
They have rolled a 4
Chet's new location is 4
The category is Pop
Pop Question 0
Answer was corrent!!!!
Chet now has 1 Gold Coins.
Pat is the current player
...
Answer was corrent!!!!
Chet now has 6 Gold Coins.

These messages are the observable behavior of the application.

It appears to be a board game. Players are added to the game. They roll dice, answer questions, and earn coins. The game ends when a player gets 6 coins.

The code is messy and error-prone.

Let's refactor it. This will :

Improve the clarity of the code
Make it easier to add the new feature

But before making any change, we want to secure it.

Putting the code under test

Refactorings restructure the software without modifying its observable behavior. That is in theory, when you don't introduce regressions.

The best way to prevent regressions is to put your code under tests.

A fast way to put legacy code under tests is to capture the output, and check that the output remains the same after each refactoring. In this case, the output isn't returned by a function, put is expressed through the standard output.

Let's catch this output in a file that will serve as a reference:

python trivia.py > trivia.approved.txt

Now, let's make another run and compare the two outputs:

python trivia.py > trivia.received.txt 
diff trivia.received.txt trivia.approved.txt | less

# 8,9c8,9
# < They have rolled a 5
# < Chet's new location is 5
# ---
# > They have rolled a 1
# > Chet's new location is 1
# 12,32d11
# ...

On the approved game, Chet ran a 5 on the first round. On the received game, Chet ran a 1 on the first round.

Randomness is going there.

For the sake of our test, we can fix it. In trivia.py :

from random import seed  # <-- add this import

if __name__ == '__main__':
    not_a_winner = False
    seed(0)  # <-- add this instruction

    game = Game()
    ...

Let's try again

python trivia.py > trivia.approved.txt
python trivia.py > trivia.received.txt 
diff trivia.received.txt trivia.approved.txt

The outputs are now the same!

To put our safety system under stress, we can introduce errors in the code to check if they get detected. Like:

Introduce a typo in the logs
Comment some lines

For example, we can comment the first line of game.roll:

    def roll(self, roll):
        # print("%s is the current player" % self.players[self.current_player])
        print("They have rolled a %s" % roll)
        ...

The output of the edited code differs from our reference:

python trivia.py > trivia.received.txt 
diff trivia.received.txt trivia.approved.txt
# 47a48,49
# > Answer was correct!!!!
# > Sue now has 1 Gold Coins.
# 67a70,71
# > Answer was correct!!!!
# > Sue now has 2 Gold Coins.

The error was detected.

Let's encapsulate this logic in a test.sh script:

#!/bin/sh

python trivia.py > trivia.received.txt
if diff -q trivia.received.txt trivia.approved.txt; then
    echo "✅ : no changes detected"
else
    echo "❌ : something has been broken"
fi

Now we can check that we haven't broken anything with a single command.

Conclusion

We explored a fast way to put our code under tests, but this approach has two main problems:

We don't know which part of the code is covered (or not)
To be able to build our test, we made changes in the behavior of production code. Remember seed(0) ? Now each party is the same. We cannot ship this code. Using this approach can only be a temporary solution.

In the next article, we will see how we can improve these two aspects!