Week 7 Automating LLM Calls Tutorial

In this week’s tutorial you will:

  1. Write Python scripts to use the OpenAI API (basic setup)
  2. Use the text completion feature to generate content
  3. Integrate extracted code knowledge into prompts and automatically call the API to generate tests
  4. Format and save results

Tutorials are one hour. Work through the core activities first; extension topics (embedding, fine-tuning, etc.) are for self-study using the references.

Prerequisites

  1. Python 3.8+ — for running scripts
  2. OpenAI API key — from OpenAI platform (or the provided API keys). Store it in an environment variable (e.g. OPENAI_API_KEY) and never commit it to version control.
  3. openai Python package — install with pip install openai

References: OpenAI API documentation, Ultimate guide to OpenAI Python library

Outline (1 hour)

Part Activity Time (guide)
1 API setup and text completion ~20 min
2 Integrate code knowledge into a prompt and generate tests ~25 min
3 Format and save results ~15 min
4 Extension: Embedding and fine-tuning (reference only)

Activity 1: API setup and text completion (~15 min)#

Task 1.1: Environment and first call#

  1. Create a new Python file (e.g. llm_client.py).
  2. Set your API key (e.g. from environment):
    api_key = os.getenv("OPENAI_API_KEY")
  3. Use the OpenAI client to call the chat completions (or text completion) API with a short prompt, e.g. “In one sentence, what is a unit test?”
  4. Print the model’s reply.

Example structure (adapt to your endpoint and model name):

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-4o-mini",  # or the model your course uses
    messages=[{"role": "user", "content": "In one sentence, what is a unit test?"}],
)
print(response.choices[0].message.content)

What would you need to change to send a system message (e.g. “You are a Java testing expert”) plus a user message?

Task 1.2: Text completion with a simple test-generation prompt#

Call the API with a prompt that asks for a single JUnit test method for a given method signature, e.g.
“Generate one JUnit 4 test method for: public int add(int a, int b). Return only the test code.”
Inspect the response: is it valid Java? Does it need trimming (e.g. markdown code fences)?

Responses may be wrapped in ` java ... `. Strip the fences and extract the code before saving or compiling.


Activity 2: Integrate code knowledge and generate tests (~25 min)#

Here you use extracted code knowledge (e.g. from SootUp or your A2 pipeline) inside the prompt, then call the API to generate tests.

Task 2.1: Build a prompt that includes code context#

Assume you have a string variable code_context containing, for example:

  • Focal class and method signature
  • Relevant field types and method names (e.g. from a simple code analysis or stub)

Write a template that combines:

  1. A short system or user instruction: “You are a Java testing expert. Generate JUnit 4 test methods.”
  2. The code context (class name, method signature, and any other knowledge).
  3. A clear user request: “Generate exactly one JUnit test method for the focal method. Return only the test code, no explanation.”

Example:

def build_test_generation_prompt(class_name: str, method_signature: str, extra_context: str = "") -> str:
    instruction = "You are a Java testing expert. Generate JUnit 4 test methods."
    context = f"Class: {class_name}\nFocal method: {method_signature}\n{extra_context}"
    request = "Generate exactly one JUnit test method for the focal method. Return only the test code."
    return f"{instruction}\n\n{context}\n\n{request}"

Call the API with this prompt and capture the returned text.

Task 2.2: Automate one shot per focal method#

Using your template, write a small loop (or single call) that, for one focal method (e.g. from a list or from your A2 output):

  1. Builds the prompt with that method’s code knowledge.
  2. Calls the OpenAI API.
  3. Extracts the raw response (and strips markdown if present).
  4. Stores the result in a list or dict (e.g. method_id -> generated_code).

Respect rate limits and cost. Use a small model or few calls during the tutorial.


Activity 3: Format and save results (~10 min)#

Task 3.1: Normalise and format generated code#

  1. From the raw API response, remove markdown code blocks (e.g. ` java ` and ` `) and leading/trailing whitespace.
  2. Optionally run a formatter (e.g. a Java formatter script or “pretty-print” step) so saved tests look consistent.
  3. Decide a simple convention: e.g. one file per focal method, or one file per class with multiple test methods.

Task 3.2: Save to files#

Write the processed test code to disk:

  • Use a clear naming scheme, e.g. Test_ClassName_methodName.java or ClassName_methodName_test.java.
  • Save under a dedicated folder (e.g. generated_tests/) so you can run them or inspect them later.

You now have a minimal pipeline: code context → prompt → API call → extract → format → save. You can extend this with more code knowledge (e.g. branch coverage goals) or multiple focal methods.


Extension (self-study)#

  • Embedding feature: Use the API to obtain embeddings for method names or code snippets; useful for retrieval or clustering. See the OpenAI embeddings guide (or equivalent).
  • Fine-tuning: For custom behaviour on your codebase, see Fine-tuning. Fine-tuning is not expected to be done during the one-hour tutorial.

References

bars magnifying-glass xmark