Home Testing of web services Bug hunting with mutation testing

Bug hunting with mutation testing

by admin
Bug hunting with mutation testing
Pyramid of tests

As a software engineer, you can contribute to this process to achieve high quality and make a difference by writing tests -especially unit tests .

However, if you wantto improve the quality of your test suite, you first need to evaluate them.One easy way to do this is to measure code coverage.In this article describes the five types of code coverage :

Coverage of claims

Coverage of solutions

Branch coverage

Switching Coverage

FSM (Finite State Machine) Coverage

When you have a metric, you can set a goal.For example, in Sipios we believe that at least 80% of the branches must be covered, otherwise you will not be able to merge your code. But you have to be careful when reaching this limit:low code coverage means insufficient testing, and high coverage does not guarantee high quality tests.

The simplest example to illustrate this is that you can perform your entire code base during testing and still make no assertion. When it comes to branch coverage, the reason is that less complex branches tend to be easier to cover.

One solution to improve the quality of your tests is called mutation testing.

What is mutation testing?

Mutational testing was originally proposed by Richard Lipton in 1971. According to Wikipedia , it is based on two hypotheses :

The first is the hypothesis of a competent programmer . . It states that most software failures made by skilled programmers are due to small syntax errors.

The second hypothesis is called coupling effect This effect states that simple faults can cascade or in pairs form other, emerging faults.

This is a two-step process: first you must generate mutants, and then try to destroy them with their tests.

Mutant Generation

The first step is to generate another version of your code. If you are familiar with genetic algorithm (GA) used in optimization and search problems, it can be viewed as an initialization step in population generation.

This method requires only your code and a set of mutation operators Then it is necessary to use these operators one by one in the source code for each applied program statement. The result of applying one mutation operator to a program is called mutant The following mutation operators are commonly used :

  • Deletion, duplication, or insertion of statements.

  • Replacing boolean subexpressions with true and false

  • Replacing some arithmetic operations with others, e.g, + to * , - at /

  • Replacing some boolean relations with others, e.g, > to > = , == and <=

  • Replacing variables with others from the same scope (variable types must be compatible).

  • Removing the body of the method implemented in Pitest (we will discuss Pitest later).

For example, if you only use the operator replacing * to / in the following method :

public int multiply(int a, int b) {return a * b;}

You get the wonderful division method that follows :

public int multiply(int a, int b) {return a / b;}

Mutants generated using two or more operators are called higher order mutants (HOM ). We will not discuss HOM testing in this article, but one can find interesting works on how they can be effectively generated.

Kill them all

Killing a mutant is a simple process. You only need to perform tests on the mutant. If one of the tests is red, you're his killed Otherwise, if all your tests are green, the mutant will survive

After you have run the tests on all the mutants, you can calculate estimate of the mutation The mutation score of your tests can be determined by the percentage of mutants killed. The higher this score, the more effective your test suite is.

To figure this out, let's imagine that we have tested our multiplication method with the following test :

@Testpublic void multiplyInts() {assertEquals(7, multiplicationService.multiply(7, 1))}

According to the code coverage, the method multiply is 100% covered, but the mutant that is the divide , will survive. In this case it will give us a 0% mutation result. Hopefully we can add a test that multiplies 2 and 3 to give us a 100% mutation score.

Now that we know the basics, let's see how it works in practice.

How do I perform mutation testing?

In this section you will learn what Pitest is and how to use it in a java project using maven. We will also look at alternatives.

What is Pitest?

According to the site pitest.org :

PIT is a state-of-the-art mutation testing system that provides gold standard test coverage for Java and jvm. It is fast, scalable, and integrates with modern testing and build tools.

How to use it?

Installation with maven is simple and is done with maven quickstart Others quickstart for gradle , ant or the command line can be found here

In fact, you only need to add the plugin to build/plugins your pom.xml

<plugin><groupId> org.pitest</groupId><artifactId> pitest-maven</artifactId><version> LATEST</version></plugin>

There are a huge number of configuration options available on the Quick Start page that you can use to customize your analysis. For example, you can specify target classes and target tests this way :

<configuration><targetClasses><param> fr.service.MultiplicationService</param></targetClasses> <targetClasses><targetTests> <targetTests><param> fr.service.MultiplicationServiceUnitTest</param></targetTests> </configuration;</configuration>

Then you can generate a full HTML report using the mutationCoverage with the command :

mvn org.pitest:pitest-maven:mutationCoverage

Be careful, Pitest requires you to run the mutation test analysis again with the green test set, so you may need to run tests to make sure everything is working correctly.


The reports generated by PIT have an easy-to-read format that combines information about line coverage and mutation coverage. They can be found in the target/pit-reports/YYYYMMDDDHHMI folder .

In our example, we get 100% line coverage corresponding to a 50% mutation result.

Bug hunting with mutation testing

As expected in the previous part, we can greatly improve the quality of the test by adding a test that catches the first mutant.

After adding this new test, we can set a minimum threshold of mutation coverage by adding the option -DmutationThreshold as follows :

mvn org.pitest:pitest-maven:mutationCoverage -DmutationThreshold=85

Other instruments

If you don't use Java, I recommend you check out 21 of the best open source mutation testing projects

My path in mutation testing

Let's talk about my experience with mutation testing. There is no evidence that writing tests for mutants improves the quality of testing. That's why my goal was to find a mutant that might be a bug.

Avoid common bugs

Creation of too many mutants

Generating mutants for each statement of your code using multiple mutation operators will generate a whole army of mutants. You will then need to run tests on each mutant. This is a computationally intensive process, and if you're not careful, you'll have to wait a long time to finish your analysis.

I'm working on a project with over 15 microservices, and I added the Pitest configuration to the parent pom.xml. Started without targeting any class or package, because unit tests can be placed differently in different subprojects. This generated over 5, 000 mutants per microservices.

Definition of useless mutants

Some mutants are uninteresting, especially those generated from the DTO ( Data Transfer Object ). Pitest can generate mutations on the method provided by the annotation, e.g. @Data from the library Lombok Such mutants are best avoided because in most cases you will not be overriding the method provided by the annotation.

Inclusion of integration tests in

Integration tests take longer than unit tests. By default, Pitest uses a timeout of 4 s to avoid blocking in an infinite loop.If your integration tests are slow, each of them may require a timeout of 4 s, multiplied by the number of mutants generated.In other words, it may take several days. Don't try this, please.

Even if it is completed, you should be able to read the report.

You can try to generate a report on the whole code using all the tests. This would take too long to use in an automated process (it took me 23 seconds on a service with 15 unit tests), and you would have too many mutants surviving. Imagine having a 90% mutation result on my 5, 000 mutants, and that would leave me with another 500 mutants to analyze. I think it’s easier to start small and then try to generate a way to analyze the report. The process of doing mutation testing and report analysis is time consuming.

Use mutation testing wisely

First, you need to find the fragment you are interested in. In my case it was a service from the API, which was "affiliated" with my team. That is, we are responsible for that part of the code.

I chose the service because that’s where the logic in the API is supposed to be implemented. The service I chose was a good piece because the code has over 1, 000 lines, and it has, according to git blame , it has at least 18 contributors. The most interesting part of this service was that some pieces of code were written over a year ago, and some lines were written just 2 weeks ago.

Last but not least, this service has 96% line coverage and 93% branch coverage for all tests.

This is a typical kind of service where someone is about to make changes and my team will have to review them. So let’s try to make some changes first and see if that can break the code.

Let’s break everything.

Analysis of estimates

The first thing I saw after generating the report was that we only had 50% line coverage unit tests and. 34% by mutation results

I was disappointed with the 50% coverage because it seemed like the code was poorly tested in this case, but this can easily be explained. Indeed, most often we add methods to services in order to create new routes for the controller. In this case, many tend to run integration tests first. Because integration tests use the service and its methods, you will have a high code coverage rate. When the standard of 80% code coverage is reached, you no longer think about writing unit tests because the metrics show that you did a good job.

I don’t know how to analyze a 34% mutation result. It doesn’t seem to be that bad compared to code coverage. Also, I checked that integration tests can kill some mutants. In fact, we have less than 66% of mutants that can survive all tests.

Survivor Analysis

We killed 39 mutants out of 114. There are 75 survivors left to analyze. This means that if I commit and run a mutant, my unit tests on this service will not see it. As I explained, other tests can still kill these mutants, so it will take a long time to test each one on all tests. We need a better method.

The first thing you can do is focus on some mutation operators. In my service, I focus on the mutation operators that survive the most :

  • 26 deletion of statements

  • 22 nullreturn values

  • 19 conditional negations

I assume that statement deletion – is the easiest operator to analyze. Indeed, it is enough to delete the line and see if the meaning of the code is preserved. Most often these mutants occur in the setter-methods of an object. Sometimes they can occur in API calls, but they are always caught by integration tests.

Returned values null are also easy to analyze and can do a lot of damage. Interestingly, null return values can be obtained by removing the setter statement. Can you imagine creating a higher-order mutant in real life just by removing the setter call? I think this might be an interesting topic to test when you’re doing defensive programming.

What have I learned?

I can do it!

After analyzing only a few mutation operators, I managed to create my first bug using mutation testing. It took me 30 minutes and only conditional negation.

This result is alarming because code refactoring can miss errors like this. In practice, we have a lengthy process that allows us to catch almost any of them before the bug hits production. Indeed, you should test your function locally before creating a merge request. This is where I would catch the bug. Then we do a code review where my colleagues would spot the bug. Then the function is tested by the PO (Product Owner) in the development environment and with QA (Quality Assurance) in preproduction.

However, in a lean manufacturing environment, we all know that the earlier a mistake is caught, the better. We don’t want to waste time. That’s why I think mutation testing is a great tool for developers to understand the usefulness of the test and to make sure that the feature will continue to deliver high quality even after someone changes something in it.

Doing it right the first time

Mutational testing has helped me understand that our process leads to high quality. We have a lengthy process that prevents developers from creating errors in production. I highly recommend that you create your own process and be uncompromising about it. I think mutation testing should be done to help you improve your process, or applied if you work in areas related to safety or security.

I also realized that we can improve testing by helping developers be more accurate when writing a test. The distinction between integration and unit tests should be quite deliberate, and both of these types of tests should be used. I assume TDD (Test Driven Development) would help achieve this by focusing on unit tests first, rather than creating integration tests. This would increase the mutation result.

Mutation testing helped me understand that the test is not necessarily appropriate. That’s why I will use it the next time I refactor. Unfortunately, it lacks tools to help automate the analysis phase. Stay tuned for developments in this technology and what higher-order mutants can bring to the table.

Material prepared as part of the course "Kotlin QA Engineer." If you are interested to learn more about the training format and the program, to get acquainted with the teacher of the course – we invite you to the open house online. Register at Here.

You may also like