Revolutionising Java Development: Exploring AI’s Role in Enhancing Regression and End-to-End Testing alongside BDD

An Overview and Analysis of Emerging Tools and Techniques

Estafet consultants occasionally produce short, practical tech notes designed to help the broader software development community and colleagues. 

If you would like to have a more detailed discussion about any of these areas and/or how Estafet can help your organisation and teams with best practices in improving your SDLC, you are very welcome to contact us at 


In the landscape of modern software development, especially in environments where Java is a primary language, the acceleration of development cycles without sacrificing code quality is a common goal. This is particularly relevant in contexts ranging from startups pioneering innovative applications to companies developing complex financial software. Central to achieving this balance is the adoption of efficient testing strategies.

What happens if you miss a few scenarios while writing unit tests for a new feature? What if that feature involves complex behaviour? What if this feature introduces a failure months after it was developed, and someone then has to fix it and write the proper unit tests?

What if later in the SDLC an end-to-end tester comes back to you with issues regarding a feature you developed months ago? Will you still remember all the details? How much time will you need to switch focus and context from your current work?

In this document, we will discuss ways to effectively increase your productivity, reduce development time, and write high-quality code. We will approach this optimization specifically by looking at some of the best current tools that facilitate test automation. Tools and approaches discussed include unit regression generation tools such as Diffblue, EvoSuite, and Randoop, BDD testing using Cucumber, and productivity aids such as GitHub Copilot, and ChatGPT.

Unit regression testing

This is an innovative approach to unit testing. As we all know, most developers do not enjoy writing traditional unit tests and often get them done quickly, sometimes overlooking some of the use cases relevant to their new implementation. In many cases, large codebases produced in haste to maintain a competitive edge can even lack tests for large chunks of functionality. Another common issue is the late stage at which new features are being E2E tested for regressions – what happens when the testing team discovers, after deployment of the feature (a month may have passed since initial development) that there is an issue? Can’t we do something about this in the early stages of the SDLC?


A key player in this domain is Diffblue, an AI-powered tool that automates the creation of unit tests for Java code. This tool significantly reduces the time developers spend writing and maintaining tests, enabling them to focus more on feature development and less on test creation. For instance, a team working on a new transaction processing system can leverage Diffblue to quickly generate tests for new payment modules, ensuring stability and freeing up time to address other critical development tasks.

With Diffblue we can skip writing unit tests in some cases, and in others, we won’t even need end-to-end testing, since the generated tests mock the entire system (databases, external endpoint calls, etc). By nature, these tests take a snapshot of the working system and once you implement a new feature you can re-run them to verify that there is no regression. This both reduces risk and optimises delivery time.

An example scenario

We will use the pet clinic project provided by Diffblue itself. The project is intentionally missing some unit tests for a few of its functionalities. Let’s demonstrate the power of Diffblue in such cases.

Clone the Pet Clinic project
  1. Open a command line and navigate to where you want to clone this project.
  2. Run the following command:
  1. Open the PetClinic project in IntelliJ.
Install Diffblue Cover Plugin for IntelliJ
  1. In the IntelliJ IDE, open the Plugins menu – either File > Settings > Plugins (Windows/Linux) or IntelliJ IDEA > Preferences > Plugins (macOS).
  2. Select the Marketplace tab, search for Diffblue, and click Install. Your plugin will now be downloaded and installed.
  3. When prompted, click Restart IDE to complete the installation.
Apply a licence

You can apply for a trial licence, refer to

Open the project

Open the project in IntelliJ and let it load

Confirm that you have the plugin up and running by right-clicking the module name and verifying you have the two new options from the plugin (Write Tests and Write Skeleton Tests):

Explore the project

Have a look at the sample project and its functionality, and take note of the already implemented tests and what might be missing.

Generate tests

Right-click on the module like in the previous steps and press Write Tests:

You should see a similar window when done:

What happened?

With the push of a button, we managed to get:

  • A project health check (security, compatibility checks)
  • Detailed analysis of the code, with suggestions on refactoring untestable code.
  • Most importantly, a full set of human-readable unit tests that we can use after we develop new functionality.

From the generated report we can easily explore what was generated, which methods were covered, what was skipped and, importantly, why.

A test looks like this:

This shows a properly structured human-readable format for which the generation is fully customizable.

These new tests can now be re-run after adding a new feature to ensure stability.

The new tests mock parts of your system so the code path can be fully tested from start to finish.

Pros and cons

  1. Automated Test Creation: Diffblue automates the creation of unit tests, significantly speeding up the testing process.
  2. Large code bases: Very useful with bulk creation, saving enormous amounts of time.
  3. Enhanced Coverage: It helps achieve comprehensive code coverage, ensuring that more aspects of the application are tested.
  4. Time Efficiency: Reduces the time developers spend on writing and maintaining tests, allowing them to focus on feature development.
  5. Consistency: Ensures a consistent testing approach across the entire codebase.
  6. Early Bug Detection: Automated testing can identify potential issues early in the development cycle, reducing downstream costs.
  7. CI/CD: Diffblue can be integrated with the build pipelines.
  1. Learning Curve: There may be a learning curve associated with understanding and effectively using the tool, although this appears to be relatively short. If you have previous strong knowledge of Java unit test development and coverage, you’ll find it easier to start. An experienced developer will not have a steep learning curve. The reason is simple – if you don’t understand the generated code, you have to familiarise yourself with unit testing patterns.
  2. Over-reliance on Automation: Sole reliance on automated testing may lead to missing certain edge cases that require human insight. A proper review of achieved test cases against use cases is therefore essential to mitigate this. It also demands common sense from the developers. Critical features and components must be covered with additional component tests as a minimum so that stakeholders are aware that the scenarios are confirmed to be working manually.
  3. Integration Challenges: Integrating automated testing tools into existing workflows might require adjustments in the development process. For example:
    1. When integrating into the CI/CD pipeline one should validate over a period of time (for example a month of monitoring) that the integration makes sense, is effective, and is working. Developers should develop the habit of keeping an eye on the logs and not completely rely on generative magic.
    2. Developers have to shift from developing traditional unit tests to re-running the unit tests from the previous snapshot, analysing the result, and regenerating new ones.
  4. Resource Intensive: Automated testing tools can be resource-intensive, especially for large and complex codebases.
  5. Price: Diffblue’s corporate packages start from $56,000/yearly so there are financial considerations to adopting this tool.

We will briefly consider a couple of open-source alternatives to Diffblue. Both have merit but are currently significantly less feature and support rich compared to Diffblue, and utilise very different test generation paradigms and technology.


EvoSuite is also a tool designed for automatically generating unit tests for Java applications. It is open source and freely available, and it can produce high-quality test cases, maximising test coverage. Unlike Diffblue Cover, it’s slightly more complex to set up and does not use AI for its purpose. The generated tests can be used as a regression test snapshot of the working codebase, just like with Diffblue.

Pros and cons

  1. Open Source: EvoSuite is open-source, making it freely available for use and modification.
  2. Integration with Development Tools: EvoSuite can be integrated with popular IDEs and build tools like Eclipse, IntelliJ, and Maven.
  1. Complex Configuration: The wide range of configuration options can be overwhelming and might require a significant learning curve to use effectively.
  2. Community Support: Being less commercially used than Diffblue, it might have a smaller community and less commercial support.
  3. Lack of Context Awareness: The tests generated may lack understanding of the specific business logic or context of the application.


Randoop is an automated unit test generation tool for Java projects that employs feedback-directed random test generation. By randomly creating sequences of method and constructor calls and using the results of these calls to guide further test generation, Randoop efficiently produces high-coverage test suites. It specialises in detecting errors and generating regression tests, outputting them as JUnit tests. This makes Randoop particularly useful for enhancing testing practices in continuous integration environments and for catching regressions and errors in evolving codebases.

Pros and cons

  1. Ease of Use: Relatively straightforward to set up and use, especially beneficial for projects without extensive existing test suites.
  2. Open Source: Randoop is open-source, making it freely available for use and modification.
  1. Random Nature: The random nature of test generation might lead to the creation of irrelevant or less useful tests, requiring manual review and pruning.
  2. Community Support: Being less commercially used than Diffblue, it might have a smaller community and less commercial support.
  3. Lack of Context Awareness: The tests generated may lack understanding of the specific business logic or context of the application.

Behaviour-Driven Development (BDD)

Behaviour-driven development (BDD) represents a significant shift in the software development paradigm, emphasising collaboration and clarity in defining software requirements. This approach is particularly beneficial in Java development environments, where clarity and precision are paramount. BDD bridges the gap between technical and non-technical stakeholders by using natural language to define application behaviours. Cucumber, a popular BDD tool, facilitates this by allowing the creation of test cases in plain language.

Understanding BDD with Cucumber

Cucumber uses a language called Gherkin to write tests, which are essentially human-readable descriptions of software behaviours. Gherkin is designed to be understandable by anyone, regardless of their technical expertise. This feature is particularly beneficial in environments where developers, testers, business analysts, and even clients need to collaborate closely.

An example scenario

Consider a Java application for an online banking system. A typical feature might involve transferring funds between accounts. Here’s how a Cucumber test case might look:

Drafting initial BDD style requirements

This example demonstrates how BDD with Cucumber creates a clear, concise, and understandable description of the desired feature. Anyone, whether a developer, a QA engineer, or a business stakeholder, can understand what the test is supposed to achieve. This clarity ensures that everyone involved has a shared understanding of the feature’s requirements and behaviours.

The feature structure above can easily be the description of an implementation ticket for the developer, that everyone can refer back to at any point in the SDLC.

Implementing the scenario steps

With this example, we can now see how transparent and readable steps from the feature file are translated into actual tests. Note how state is being maintained between the steps, to achieve a functional testing style. The benefit here is that if at some point the requirements change and the feature file definitions change, the tests will break and have to be redone, in some cases only separate steps have to be upgraded, which is a major time and effort saver if the tests are done properly.

Cucumber provides an amazing short tutorial on how to include this in your development cycle: 

10 Minute Tutorial – Cucumber Documentation

Pros and cons


  1. For Developers: BDD with Cucumber provides a clear guide to what needs to be coded. It ensures that development aligns precisely with the specified requirements, reducing the likelihood of misunderstandings and rework.
  2. For Testers: Test cases written in Gherkin serve as a direct basis for testing scripts. They can be automatically executed, facilitating continuous testing and integration processes.
  3. For Business Stakeholders: The natural language format allows non-technical stakeholders to participate actively in the development process. They can review and verify that the software meets the intended business requirements.
  4. For Project Managers: BDD promotes a more streamlined and efficient workflow, enhancing collaboration and reducing the time and resources needed for explaining technical aspects to non-technical team members.


  1. Initial Setup Time: Setting up BDD with Cucumber can be time-consuming initially.
  2. Maintenance of Test Suites: The test suites need to be maintained and updated as the application evolves.
  3. Writing Skills: Requires good writing skills to create effective Gherkin scenarios.
  4. Possible Overhead: In some cases, the process can add overhead to the development cycle, particularly if not well-integrated.
  5. Misinterpretation Risk: There is a risk of misinterpretation if the scenarios are not written clearly and concisely.

By adopting BDD with tools like Cucumber, Java development projects can achieve a higher level of efficiency and clarity, resulting in software that more accurately meets user needs and business goals. This approach aligns all stakeholders on a common ground, fostering better communication, quicker development cycles, and more reliable outcomes.

AI assistance with Github Copilot

Understanding Github Copilot

GitHub Copilot, powered by advanced AI algorithms, is transforming the way developers write and test code in Java. It serves as an on-the-fly coding assistant, providing real-time suggestions for code snippets, tests, and even entire functions. This tool is especially useful when tackling intricate coding problems or writing tests that cover a wide range of scenarios.

For example, consider a Java development team working on an API for a complex financial service. GitHub Copilot can assist by suggesting optimal ways to structure API calls, handle exceptions, and even recommend the most effective methods for writing comprehensive test cases. This kind of AI assistance is invaluable in speeding up the development process and ensuring high-quality code.

An example scenario


Let’s reuse the code from Clone the pet clinic project

Start IntelliJ and open the pet clinic project.

Under the File menu, click Settings for Windows or Preferences for Mac:

Go to plugins:

Type Github Copilot and hit install:

You might be prompted to restart the IDE and log in with Github.

You will have to set up a trial or a subscription to use GC.


When the plugin is fully integrated with your IDE you are now able to receive code completion suggestions as you type.

This type of assistance requires you to “know what you are doing” and can be a minor efficiency boost.

You are also able to generate code from prompts directly into the text editor field of your IDE

Pros and cons


  1. Enhanced Coding Speed: Provides instant code suggestions, speeding up the development process.
  2. Improved Test Coverage: Offers ideas for test cases, covering a broader range of scenarios.
  3. Learning and Development: It helps developers learn new coding techniques and best practices.
  4. Efficiency in Complex Tasks: Particularly useful in complex coding scenarios, providing innovative solutions.


  1. Dependence on AI: Over-reliance on AI suggestions might reduce developers’ problem-solving skills.
  2. Quality Assurance: AI-generated code requires thorough review and testing to ensure it meets quality standards.
  3. Security Concerns: Developers need to be vigilant about the security implications of AI-suggested code.
  4. Contextual Limitations: Copilot’s suggestions may not always align perfectly with the specific context or requirements of the project.

AI assistance with ChatGPT

ChatGPT, a variant of the GPT (Generative Pre-trained Transformer) model developed by OpenAI, offers another unique form of AI assistance in the realm of software development and testing.

Understanding ChatGPT

ChatGPT is a language model capable of understanding and generating human-like text based on the input it receives. It can interpret queries, provide explanations, generate code snippets, and offer debugging assistance. In our case, we can use it for code generation.

Accessing ChatGPT

ChatGPT can be accessed through various platforms, including web interfaces or integrated tools, depending on the developer’s preference and workflow.

We will be using it mostly through the web app:


Estafet have developed a collection of highly practical guides to effectively utilising ChatGPT at various stages in your SDLC, including one for testers. Refer to our best practices guide here.

Code Snippets Generation

ChatGPT can generate code snippets in various programming languages, including Java, based on the requirements outlined by the developer. We must provide as much information as possible in a single prompt to serve as a context. For example, let’s say we have a controller method and we want some help with unit test generation for it. We can ask ChatGPT to assist:

We have the following Controller method:

Let’s ask ChatGPT to write some tests for it:

We now have some unit tests generated for us by the language model:

Including setup preparations:

And an actual test:

However, a developer must always use common sense when reaching out for code generation tools such as ChatGPT. Always inspect the code, and never use it as-is.

ChatGPT can be useful for inspiration and some well-known tasks like for example code to parse a JSON document to an object structure that can be used in code.

Algorithmic Suggestions

It can also suggest algorithms or data structures best suited for a particular scenario, aiding in more efficient code development.

Error Analysis

ChatGPT can assist in debugging by analysing error messages and suggesting potential causes and solutions.

Test Case Development

The model can help in generating test cases, providing an additional layer of testing that complements traditional methods. However, with ChatGPT there is a lot more manual work involved in providing context to the tool and sometimes multiple queries are needed to achieve 1 goal. Prompt engineering is required.

Pros and cons


  1. Instant Assistance: Provides immediate responses to queries, which can be beneficial for time-sensitive development tasks.
  2. Wide Range of Knowledge: ChatGPT has been trained on a diverse set of data, enabling it to cover a wide range of topics and scenarios.
  3. Code Examples: Capable of generating code snippets and examples that can aid in understanding concepts or solving problems.
  4. Debugging Aid: Offers potential solutions and explanations for bugs or errors in the code.
  5. Learning and Mentoring: Serves as an educational tool for less experienced developers or those learning new programming languages or technologies.
  6. Enhanced Documentation: This can help in creating and maintaining project documentation, improving the overall quality and readability.


  1. Contextual Limitations: While ChatGPT is proficient in many areas, it may not always provide the most contextually appropriate solutions.
  2. Quality Assurance: Suggestions from ChatGPT should be carefully reviewed to ensure they meet the project’s standards and are free from errors.
  3. Over-reliance Risk: Developers might become overly reliant on ChatGPT, which could potentially hinder the development of problem-solving skills.
  4. Interpretation Required: Responses may require interpretation and adaptation to fit the specific needs of the project.
  5. Up-to-date Information: As with any AI model, there’s a risk of outdated information, especially if the model hasn’t been recently updated with the latest data.


Let’s compare the strengths and weaknesses of each approach. GC and ChatGPT share some characteristics in the table since their behaviour is somewhat similar.

Unit regression testing, comparison criteriaDiffblueRandoopEvoSuite
PricingRequires a commercial licenceFree and open-sourceFree and open-source
Best used inThe larger the project, the greater the benefit
Usage complexityEasy to set up, especially IDE integrationsMedium to high relative complexityComplex to set up, restricted to an older version of Java
Comparison CriteriaDiffblue CoverCucumber BDDGitHub CopilotChatGPT
Pricing (this row represents only the up-front cost. The real cost, and savings, would have to take into account working time saved)High, team licence starts from $56,000FreeStarting from $10 per month per personFree for GPT 3.5 or, currently $20 for GPT 4.0. Enterprise licence costs aren’t publicly available 
Project ScaleMedium to enterprise scaleHigh value in all scenariosSmall to medium projects
Usage complexity (for the user)mediummediumlowA small amount of prompt engineering required
Ease of IntegrationhardIntegration with the CI pipelines can be a challengeeasyPlugs easily into an existing Spring applicationn/a
Best used forRegression testing, possibly E2EComponent testing, and component behaviour definitions during the discovery phasesInstant code completion, is very useful when writing a trivial taskGeneral-purpose AI, can be used in all scenarios to quickly (and for free) get a helping hand with code generation, testing plans, and suggestions.


We have covered a number of tools and approaches. Several harness the evolving capabilities of AI in varying ways. Others leverage standard toolsets in BDD. Ultimately, a hybrid approach taking into account where you are in your development cycle (e.g. do you have a lot of legacy code, or are you just starting out), the specific challenges of your codebase, and your available resources and timescales, will best serve most contexts. Drawing on our 20+ years of offering quality delivery and solutions to both the enterprise and startups, including creating innovative and highly efficient test frameworks, Estafet can provide you with tailored advice and solutions to overcome your specific challenges. 

We conclude with a brief overview of the tools and approaches discussed for reference:

Diffblue cover

A game changer in unit regression testing. The larger the project, the greater the benefit. Incredible overall scaling with project size, although the price is very high compared to the others. Easy to get into, can be a challenge to utilise all of its benefits – for example integrating with your CI pipeline.


Free and open-source, Randoop integrates well with existing Java projects and CI pipelines, though its random test generation might require manual filtering for optimal relevance. Ideal for medium to large projects, it can sometimes struggle with very large codebases due to resource intensity.


While it offers a steep learning curve due to its complex configuration, the payoff in terms of test coverage and quality can be substantial, particularly for large and complex projects. Unlike Diffblue and Randoop, unfortunately, it is not well maintained and can’t be used in a modern setup (for example latest versions of IntelliJ and Java)

Cucumber BDD

Allows all of the stakeholders to agree on a component’s behaviour before development even starts. Free to use in your Spring projects immediately, incredibly easy to get into. This component testing approach can be useful in all project sizes and is easily integrated into the common CI pipelines. The pros of using Cucumber BDD far outweigh the cons!

GitHub Copilot

Allows developers to receive code suggestions directly in the code editor while typing. It can take some time to get used to and to provide proper prompts. A small performance boost can be expected, having immediate feedback while coding. With the relatively low price, it is worth it. However, it can negatively impact productivity in large and complex projects since the suggestions will lack the complex context that the developer already has.


Allows “general purpose” assistance to developers, but it takes time to write proper queries (prompts). As with Github Copilot, in large and complex projects it can provide a negative impact due to the tools requiring a large amount of context. Having in mind that the tool is free to use, including this in your development cycle is a must.

By Antonio Lyubchev, Consultant at Estafet

Stay Informed with Our Newsletter!

Get the latest news, exclusive articles, and updates delivered to your inbox.