Assignment 1: Automated Unit Test Generation for Java Code [25 MARKS]

  • Due date: Thursday 12th March 2026, 23:55 (Week 3)
  • Assignment Weighting: 25% of total grade
  • Expected Workload: 20-40 hours
  • Hurdle: Not hurdle
  • Type: Individual
  • Submission: Submit through Canvas — see Submission Instructions below.
  • Policies: For late submission, plagiarism, and other policies, see the policies page.

Not Hurdle: This assignment introduces the classic task of automated unit test generation, an essential concept in software quality assurance. The goal is to generate high-coverage test cases for different programs using automated tools.

Make sure to carefully read the instructions for each task.


Introduction#

Quote from a research paper about automated unit test generation:

“Unit testing validates whether a functionally-discrete program unit (e.g., a method) under test behaves correctly. As the primary stage in the software development procedure, unit testing plays an essential role in detecting and diagnosing bugs in a nascent stage and prevents their further propagation in the development cycle. Therefore, writing high-quality unit tests is crucial for ensuring software quality.”

— Yuan, Liu, et al. “Evaluating and Improving ChatGPT for Unit Test Generation [5].”

For a method under test (i.e., often called the focal method), its corresponding unit test consists of a test prefix and a test oracle. In particular, the test prefix is typically a series of method invocation statements or assignment statements, which aim at driving the focal method to a testable state; and then the test oracle serves as the specification to check whether the current behavior of the focal method satisfies the expected one.

However, manually writing and maintaining unit tests is a labor-intensive and time-consuming task, especially for large and complex projects. To address this challenge, automated unit test generation tools such as EvoSuite [2] have been developed. EvoSuite automatically generates unit test cases for Java programs, maximizing code coverage and minimizing manual effort.

In this assignment, we will explore EvoSuite and analyze the quality and coverage of the tests it generates. For a deeper understanding of the underlying concepts, students are encouraged to refer to the papers listed in the references or explore related research through Google Scholar.


Task 1: Using EvoSuite for projects in Defects4J [5 MARKS]#

In this task, you will generate unit tests using EvoSuite for several projects from the well-known dataset Defects4J. Defects4J is a collection of real-world projects with reproducible bugs, designed to advance software engineering research. Each project includes developer-written test cases located in the src/test/java folder, which can serve as a baseline for comparison. Please note that all steps in Task 1 should be done inside the Docker container (not on your host machine).

Task 1.1. Downloading and setting up the docker environment [1/5 MARKS]#

  1. Install Docker Desktop Community Edition, use docker --version to show you have successfully installed it.
  2. Download the Docker Image
  3. Load the docker image and run a container from the image. After you go into the container use ls, java -version, mvn -version, to show this step successfully.

Task 1.2. Installing Defects4J and Downloading Projects [2/5 MARKS]#

Important Note: When using Defects4J, ensure that your Java version is set to Java 11!

In the docker, you can use update-alternatives --config java to select Java version.

Follow the steps below to set up Defects4J and download the required projects in the docker environment:

  1. Install Defects4J by following the instructions provided in the README on the Defects4J GitHub page. [1 MARK]

  2. Use the defects4j checkout command to download two projects from Defects4J [1 MARK]:

    • Collections (buggy version 28) - Save as collection_28_buggy.
    • JxPath (buggy version 22) - Save as jxpath_22_buggy.

    The general command format is:

    defects4j checkout -p project_id -v version_id -w work_dir
    

    For example:

    defects4j checkout -p Lang -v 1b -w ../JavaProjects/lang_1_buggy
    

    Here:

    • -p Lang: Specifies the project ID (in this case, Lang). You can find a full list of project information here.
    • -v 1b: Specifies the version ID, where 1b refers to a buggy version of the project.
    • -w ../JavaProjects/lang_1_buggy: Specifies the directory to save the project.

    You can find more information on the checkout command and project details in the Defects4J documentation.

Task 1.3. Using EvoSuite to Generate Unit Tests [2/5 MARKS]#

Change your Java version to Java 8.

Use EvoSuite to generate test cases for the two projects you downloaded in Task 1.2. Follow the instructions provided in the EvoSuite documentation for generating tests. By default, the generated test code will be saved in a folder named evosuite-tests within each project’s directory.


Task 2: Analyzing the Generated Unit Tests [20 MARKS]#

This task involves evaluating the unit tests generated by EvoSuite and comparing them to the developer-written test cases.

Task 2.1. Copy the assignment folder from the Docker image to your local machine [1/20 MARKS]#

Copy the folder /COMP4130_Assignment_1 from inside the Docker container to your local machine. You will use this for the coverage analysis steps below (e.g. in IntelliJ).

Task 2.2. Calculating Metrics for Test Cases [4/20 MARKS]#

To evaluate the quality and effectiveness of test cases, calculate the following metrics:

  • Line Coverage: The percentage of executable lines of code in the focal methods that are covered by the test case.
  • Method Coverage: The percentage of focal methods invoked by the test case.

For both projects (Collections and JxPath), follow the steps below to compute these metrics.

  1. Calculate Coverage for Developer-Written Tests [2 MARKS]
    • Open IntelliJ IDEA and load the project.
    • Open pom.xml and add the necessary JUnit dependencies.
    • Run the test files located in src/test/java and measure line and method coverage.
    • Note: If any test file in src/test/java contains syntax errors that prevent execution, delete the file before running the tests.
  2. Compute Coverage for EvoSuite-Generated Tests [2 MARKS]
    • Open IntelliJ IDEA and load the project.
    • Delete all developer-written test files in src/test/java, then copy all EvoSuite-generated test files from the evosuite-tests directory into src/test/java.
    • Open pom.xml and add the necessary EvoSuite dependencies to ensure proper test execution.
    • Run the test case in src/test/java and measure line coverage and method coverage. Note: If any EvoSuite-generated test files contain syntax errors that prevent execution, remove the faulty files before running the tests.

Task 2.3. Analyzing the Results [15/20 MARKS]#

Analyze and compare the results as follows:

  1. Report the total number of tests generated by EvoSuite, the runtime required for test generation, and the coverage metrics (line, method) for each project. [3 MARKS]

  2. Compare the line and method coverage of the EvoSuite-generated tests to the developer-written tests. Discuss why coverage achieved by EvoSuite may be higher or lower than that of developer-written tests. [2 MARKS]

  3. Identify one class from these two projects for which EvoSuite achieved higher coverage than the developer-written tests. If no such class exists, select the class with the best performance from EvoSuite. Examine the corresponding tests in detail:
    • What is EvoSuite testing that the developer-written tests did not? [2 MARKS]
    • Why is EvoSuite more likely to generate such tests? [2 MARKS]
    • Evaluate the quality of these tests in terms of readability and their usefulness for debugging. For example, if a test fails, would it help you pinpoint the bug? [2 MARKS]
  4. Identify one class from these two projects for which EvoSuite achieved lower coverage than the developer-written tests. If no such class exists, select the class with the worst performance from EvoSuite. Reflect on the reasons for EvoSuite’s inability to generate sufficient coverage for this class. Consider:
    • How EvoSuite’s underlying mechanisms (e.g., random test generation [4], search algorithms [1, 3]) might have contributed to this result? [2 MARKS]
    • Are there any specific challenges in the code that could have hindered EvoSuite (e.g., dependencies, complex logic, or external constraints)? [2 MARKS]

Submission Instructions#

  1. Deadline: The lab report is due on Thursday, 12th March 2026, at 23:55.

  2. Report Content: Submit a well-documented lab report that includes all experimental results and responses to the given questions. Additionally, compress the entire project directory into a single “.zip” file.

  3. File Naming Convention: Name your report as:
    Lab_1_u0000000.pdf
    

    Replace ‘u0000000’ with your university ID.

  4. Appendix Directory Structure: Ensure that your project follows the directory structure below:
    firstname_lastname_u0000000
    |-- Lab_1_u0000000.pdf
    |-- defects4j
    |   |-- ...
    |-- evosuite-1.0.6.jar
    |-- evosuite-standalone-runtime-1.0.6.jar
    |-- JavaProjects
    |   |-- collection_28_buggy
    |   |   |-- ...
    |   |-- jxpath_22_buggy
    |   |   |-- ...
    
  5. Grading: This assignment is worth 25 marks and accounts for 25% of the total course assessment.

References#

[1] Arianna Blasi et al. “Call Me Maybe: Using NLP to Automatically Generate Unit Test Cases Respecting Temporal Constraints”. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022).

[2] Gordon Fraser and Andrea Arcuri. “EvoSuite: automatic test suite generation for object-oriented software”. In: ESEC/FSE ‘11. 2011.

[3] Mark Harman and Phil McMinn. “A Theoretical and Empirical Study of Search-Based Testing: Local, Global, and Hybrid Search”. In: IEEE Transactions on Software Engineering 36 (2010), pp. 226–247.

[4] Carlos Pacheco et al. “Feedback-Directed Random Test Generation”. In: 29th International Conference on Software Engineering (ICSE’07) (2007), pp. 75–84.

[5] Zhiqiang Yuan et al. “Evaluating and Improving ChatGPT for Unit Test Generation”. In: Proc. ACM Softw. Eng. 1 (2024), pp. 1703–1726.

bars magnifying-glass xmark