Introduction to Mutation Testing

Mutation Testing (in French: tests de mutation), I recently discovered this term describing a process capable of detecting gaps in unit tests, going beyond code coverage. Today, I present to you this approach consisting in carrying out these tests by manhandling the code.

Unit tests

Considering that the usefulness of unit testing is well established, this subject becomes interesting if you are developing a tested project, no matter how important the code coverage.
The unit tests allow the highlighting of possible regressions caused by a modification of the code. In theory, if the tests validate the program, it means that everything is working correctly in the application. As the first and often the only measure of trust, we use code coverage. The closer this metric gets to 100%, the more it reassures us that no regression will slip through the cracks. Unfortunately, this assertion remains theoretical.
The tests, although essential to the validation of a qualitative application, it is difficult to demonstrate or even to appreciate their relevance.

Code coverage and case coverage

100% code coverage does not mean 100% validated code but only 100% of this code executed when passing the tests, nothing more.
Code coverage (line, statement, branch, etc.) measures only what code was executed by the tests, without guarantee of detection of defects. It is only able to identify code that is still untested.
Testing without an assertion is the obvious example because, although executed, the code is not actually tested. Fortunately, this scenario remains rare, the most common is to encounter code that is partially tested by the test suite. A suite that only partially tests code that can still run all of its branches.
In some cases, code coverage is not an indicator of protection. Here is a simple example:

function isAdult(user) {
    return user.age >= 18;
}

Suppose we want to check the age of a user. We will write the following code to make sure it is major.
To test this code we can try with 12 and 38 as input. This action would be enough to cover this code 100%.
The result would be the same if we omitted to consider 18 as the majority with this typo in our code:

function isAdult(user) {
    return user.age > 18;
}

…or if we only tested the value 12 years, or even worse if we omitted the assertion in our test.
The mutation test will actually be able to detect if each statement is meaningfully tested. It is the standard measure for all other types of coverage.

Other issues in the code

Let's then assume that we don't want unnecessary code in our application. Indeed, each untested part will be a source of potential bug or even additional complexity if it is not essential.
Here's why mutation testing is a great way to test the relevance of such code:
if (someVariable !== null && someVariable.hasValue()) {}
Do we need to check the value “null”? Was the condition added out of habit? It could mean that we are unsure of the variable”someVariable” and would warrant further analysis. We cannot go deeper without realizing it. Mutation testing also helps us with this.

Mutation Tests: What are they?

To detect flaws in our unit tests, there is a solution: mutation testing.

This technique gives more confidence in our tests. Mutation testing is a fairly simple concept. Its principle is to mistreat the source code by altering it to verify that the associated tests fail accordingly. Flaws (or mutations) are automatically seeded into our code, and then tests are run. If the tests fail, then the mutation is killed. If the tests pass, then the mutation survived. In this case, it means that the tests do not match the complexity of the code and leave one or more of its aspects untested. Then, the quality of our tests can be measured from the percentage of mutations killed.
In other words, we run the unit tests on automatically modified versions of the code. When application code changes, it should produce different results and cause unit tests to fail. If a unit test does not fail in this situation, it may indicate a failure in the test suite.
Here are the steps to achieve this:

  • Run the usual test suite to verify that all tests pass green.
  • Modify some parts of the tested code before running the test suite again.
  • Ensure that tests failed as expected after modifying (mutating) the tested code.

Repeat steps 2 and 3 as long as possible mutations remain.
Let's take a concrete example: think of a mutant as an additional class with only one change from the original code. This can be the change of a logical operator in an if clause as shown below:
if( a || b ) {…} => if( a && b ) {…}
The detection and rejection of such a modification by existing tests is referred to as killing a mutant. With a perfect test suite in place, no class mutant would survive. But creating all possible mutants is resource intensive, which is why it is not possible to achieve this approach manually in real scenarios.
Fortunately, there are tools available to create mutants on the fly and automatically run all the tests for each one. The creation of transformation is based on a set of mutation operators called to reveal typical programming errors. The mutation operator used to modify the above code is called the condition operator.

Practical details

This technique therefore consists of two parts: the generation of mutants, then the elimination of these.
Mutant generation is the step of generating mutant classes from source classes. To start, we need the business code on which we want to assess the relevance of our tests. We then take a pool of possible mutations, a mutation being a modification of the source code, such as the action of replacing one operator with another.
Here are some examples :

  • + becomes –
  • * bECOMES /
  • >= becomes ==
  • true becomes false.
  • deleting an instruction
  • etc.

We can modify an arithmetic expression e to |e| (ABS), change one relational arithmetic operator to another (ROR), change one arithmetic operator to another (AOR), change one boolean operator to another (COR), change a bool/arithmetic expression by adding − or ¬ ( UOI), modify a variable name by another one, modify a variable name by a constant of the same type, modify a constant by another constant of the same type…
The actual generation consists of going through all the instructions of the code and for each, determining whether mutations are applicable. If so, each mutation will give rise to a new mutant.
For the following statement:

if (a > 8) { x = y+1 }

We can consider the following mutants:

 if (a < 8) { x = y+1 }
 if (a ≥ 8) { x = y+1 }
 if (a > 8) { x = y-1 }
 if (a > 8) { x = y }

This process can quickly become resource-intensive. When the code to be mutated contains a large number of instructions and the " pool » of possible mutations is significant, then the number of generated mutants increases very quickly.
Once the mutant generation process is complete, the mutants are stored until the next step: elimination!
For the second part of the process, we generated a lot of mutants that we don't want to pass through the tests; the goal will be to eliminate as many as possible. To do this, our weapon will be the improvement of unit tests.

Review of

For a given mutant, there are two possible outcomes, either the tests are always green, or at least one of them has turned red.

Usually we want the tests to be green. But in this context, we are looking for red. Indeed, as we saw earlier, each mutant is supposed to fail at least one of the unit tests. If at least one of the tests fails, this proves that they are able to detect code modifications and therefore prevent possible bugs. On the other hand, if all the tests remain green, the mutant survives, so it remained invisible to the eyes of our tests.
A surviving mutant is therefore the sign of a missing test!

Limitations

The complete analysis of our code can be tedious. As we have seen, the number of mutants can increase very quickly.
On a first phase, we can for example generate 6000 mutants. During the second test phase, more than 98% of them will be eliminated, the percentage varying according to the prior quality of your tests. We still have 150 to 200 mutants left.
A manual analysis of each of them is time-consuming. Moreover, our unit tests are not solely responsible for their survival. There may appear an "equivalent mutant": a mutant that modifies the syntax of the source code, without changing its semantics. This type of mutant prevents a unit test from detecting it.

while(...) {
    index++;
    if (index == 10)
    break;
}

For example, a mutation of " == " towards " <= will produce an equivalent mutant. This example will have the same exit condition of the loop.
Pre-analysis of code coverage, creation of mutants on the fly and all the necessary tests consume a lot of time. For example, a code with 350 tests increases the execution time by four compared to a usual run.
Given these numbers and for practical reasons, mutation tests cannot be run as frequently as unit tests. Therefore, it is important to find an appropriate workflow that offers the best compromise in terms of efficiency. For large software systems, this could mean that mutation testing would be limited to nightly runs.
Before implementing them, you must be in an advanced quality approach. The tests must be placed at the center of the development, to avoid results that are too voluminous to analyze. However, if code coverage has reached its limits this may be a good approach to experiment with. Unfortunately, the current tools do not seem industrialized enough.

Mutation testing and javascript

Mutation tests are much better known and used in the world of Java or in PHP. However, since 2016 there is a way to perform mutation testing in JavaScript thanks to Stryker Mutator. There is also Grunt-mutation-testing, the majority of the source code of which is in the process of being migrated to Stryker.
Here is the link to the Github: http://stryker-mutator.github.io

Conclusion

This article was a quick introduction to mutation testing. We addressed test mutants, appreciated the direct relationship between mutant rate and the quality of an existing test suite, and observed the correlation with code coverage.
Since code coverage is not a very reliable metric, Mutation tests are a quick and easy way to measure the reliability of unit tests. We will promote mutation tests where there is a real issue: the business code.
All in all, the mutation test seems like a nice addition to a set of quality assurance tools., based on automated tests. This practice is quite recent in JavaScript and still unknown. It will be interesting to read the opinions and feedback from advanced users.
Aurélie Ambal, (@Souvir) JS Craftswoman @JS-Republic
[actionbox color=”default” title=”” description=”JS-REPUBLIC Training is referenced by Datadock.
Find all our training courses on our website training.ux-republic.com:

  • UX design
  • Agile
  • JavaScript

” btn_label=”Our trainings” btn_link=”http://training.ux-republic.com” btn_color=”primary” btn_size=”big” btn_icon=”star” btn_external=”1″]