Week 4: EvoSuite vs LLMs and LLM-Based Test Generation

In this week’s lab you will:

  1. Compare EvoSuite vs LLMs for unit test generation
  2. Use LLMs (e.g. ChatGPT) to generate unit tests and run them
  3. Set up an LLM API call for automated test generation (text/chat completion)
  4. Hear an Assignment 2 explanation (aims, tasks, submission)

Prerequisites

  1. Python environment — e.g. Python 3.8+, PyCharm (optional), and required packages (e.g. openai)
  2. LLM access — e.g. ChatGPT, or another model with code generation
  3. JUnit in your Java project — to compile and run generated tests

Outline (1 hour)

Part Activity Time (guide)
1 EvoSuite vs LLMs comparison ~10 min
2 AI for test generation (hands-on) ~15 min
3 API setup and text completion ~15 min
4 A2 explanation ~10 min

Activity 1: Comparing EvoSuite and Large Language Models (~15 min)#

Task 1.1: Understanding the differences#

Traditional automated test-generation tools such as EvoSuite rely on program analysis techniques (e.g., search-based, constraint-based, or random-based algorithms) to generate tests automatically. Recently, Large Language Models (LLMs) such as ChatGPT provide a new approach: they treat source code as text and generate tests using learned programming knowledge. Both approaches can generate JUnit tests, but they differ significantly in inputs, workflow, and trade-offs.


Task 1.2: Compare the two approaches#

Aspect EvoSuite (Program Analysis Tools) LLMs (e.g., ChatGPT)
Input Compiled bytecode + classpath Source code + natural language prompt
Underlying technique Program analysis algorithms (search-based, constraint-based, random testing) Natural language and code understanding learned from large datasets
Environment setup Requires environment configuration (JDK, EvoSuite jar, runtime libraries) and dependency setup Minimal setup (e.g., API or web interface)
Workflow complexity Multiple steps: compile code → configure dependencies → run EvoSuite → generate tests Simple workflow: provide code and prompt → receive generated tests
Output style Coverage-oriented tests with generic variable names (e.g., calculator0, int0) Often more readable and descriptive tests
Strengths Systematic exploration, strong coverage guarantees Fast, flexible, and easier to use
Limitations Complex setup and configuration May hallucinate or generate incomplete tests

Think about the following questions:

  • Why do traditional tools like EvoSuite require environment configuration and dependency setup?
  • Why can LLMs generate tests with just source code and a prompt?
  • When would you prefer EvoSuite over an LLM, and when the opposite?
  • Could we combine EvoSuite and LLMs to get the best of both approaches?

Traditional tools rely on program analysis and complex infrastructure, while LLMs simplify the workflow by reasoning about source code like humans. This motivates the key question for this assignment 2:

Can Large Language Models generate effective unit tests without relying on traditional program analysis?


Activity 2: LLM-Based Test Generation (~20 min)#

Task 2.1: Generate tests using a simple prompt#

  1. Choose a small Java class. For example:
/**
 * TheArray class provides a method to sort an array of integers in ascending order
 * using the bubble sort algorithm.
 */
public class TheArray {

    /**
     * Sorts the given array of integers in ascending order.
     *
     * @param array the array of integers to be sorted
     */
    public void sortArray(int[] array) {
        int n = array.length;

        for (int i = 0; i < n - 1; i++) {
            for (int j = 0; j < n - i - 1; j++) {
                if (array[j] > array[j + 1]) {
                    int temp = array[j];
                    array[j] = array[j + 1];
                    array[j + 1] = temp;
                }
            }
        }
    }
}
  1. Copy the code into an LLM (e.g. ChatGPT).
  2. Prompt example: “Generate JUnit 4 unit tests for this Java class. Include normal and edge cases. Use descriptive test names and JUnit 4 assertions.”
  3. Copy the generated test code into your project (e.g. src/test/java/).
  4. Fix imports/package if needed, then run the tests (e.g. mvn test or IDE).
  5. Note: Do all tests pass? Any wrong assertions or API misuse?

Specify JUnit 4 (or 5) and assertion style in the prompt to reduce mismatches with your project.

Task 2.2: Improve the prompt (if time)#

Try one of: (a) Ask for tests for a single method. (b) Add “Return only code, no explanation.” (c) Ask for a test that expects an exception (e.g. divide by zero). Compare the new output with the first attempt.

What prompt changes made the output more usable (e.g. fewer fixes needed)?


Activity 3: API setup and text completion (~15 min)#

Task 3.1: Environment and first call#

  1. Create a new Python file (e.g. llm_demo.py).
  2. Set your API key (e.g. from environment):
    api_key = "your-key-here"
  3. Use the OpenAI client to call the chat completions (or text completion) API with a short prompt, e.g. “In one sentence, what is a unit test?”
  4. Print the model’s reply.

Example structure (adapt to your endpoint and model name):

from openai import OpenAI
import os

client = OpenAI(api_key="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")

source_code = '''/**
 * TheArray class provides a method to sort an array of integers in ascending order
 * using the bubble sort algorithm.
 */
public class TheArray {

    /**
     * Sorts the given array of integers in ascending order.
     *
     * @param array the array of integers to be sorted
     */
    public void sortArray(int[] array) {
        int n = array.length;

        for (int i = 0; i < n - 1; i++) {
            for (int j = 0; j < n - i - 1; j++) {
                if (array[j] > array[j + 1]) {
                    int temp = array[j];
                    array[j] = array[j + 1];
                    array[j + 1] = temp;
                }
            }
        }
    }
}
'''

prompt_content = source_code + '\n\n' + 'Generate JUnit 4 unit tests for this Java class. Include normal and edge cases. Use descriptive test names and JUnit 4 assertions.'
response = client.chat.completions.create(
    model="gpt-4o-mini",  # or the model your course uses
    messages=[{"role": "user", "content": prompt_content}],
)
print(response.choices[0].message.content)

What would you need to change to send a system message (e.g. “You are a Java testing expert”) plus a user message?

Task 3.2: Text completion with a simple test-generation prompt#

Use the API to generate JUnit tests for a given method or class. Provide the source code in the prompt. You can

  • Ask the model to generate JUnit tests, including both normal cases and edge cases.
  • To make the output easier to use programmatically, instruct the model to return only the test code (without explanations).

Responses may be wrapped in ` java ... `. Strip the fences and extract the code before saving or compiling.


Activity 4: A2 explanation (~10 min)#

Your tutor will explain Assignment 2: learning goals, main tasks (e.g. code knowledge extraction, prompt design, test generation), and submission requirements. Note deadlines and where to find the full spec.


Learning objectives#

By the end of this tutorial you should be able to:

  • Apply A1 clarifications and understand A2 goals and requirements
  • Explain what code instrumentation is and how coverage tools use it (with a simple manual example)
  • Compare EvoSuite and LLMs (inputs, outputs, strengths, limitations)
  • Generate unit tests with an LLM, paste into a project, and run them
  • (If completed) Write a constrained prompt and see its effect on output

References

bars magnifying-glass xmark