The Problem: Unbounded Token Generation

CodeGrind’s AI assistant, powered by Azure OpenAI, was generating AI-guided hints and code snippets without explicit token limits. While the model itself has safety guardrails, the absence of a max_completion_tokens parameter meant:

  • Unpredictable response lengths that could exceed practical use cases
  • Higher API costs due to longer-than-necessary completions
  • Inconsistent UX where some responses were verbose while others were concise

This wasn’t a catastrophic failure—the system worked—but it was inefficient and left cost and quality unmanaged.

The Fix: Explicit Token Ceilings

In PR #706, we implemented a straightforward but impactful change:

// Before: No explicit token limit
const response = await openAIClient.chat.completions.create({
  model: "gpt-4o",
  messages: userMessages,
  temperature: 0.7
});

// After: Explicit max_completion_tokens parameter
const response = await openAIClient.chat.completions.create({
  model: "gpt-4o",
  messages: userMessages,
  temperature: 0.7,
  max_completion_tokens: 1024  // Explicit ceiling
});

Why This Matters

  1. Cost Predictability: Each API call now has a known upper bound, making billing forecasting reliable.
  2. Faster Responses: Shorter maximum lengths mean the model stops generating sooner, reducing latency.
  3. Better UX: Concise, focused hints align with CodeGrind’s goal of guiding users without doing the work for them.
  4. Azure Compliance: Using max_completion_tokens (not the deprecated max_tokens) ensures compatibility with Azure OpenAI’s latest API versions.

Testing the Change

PR #706 included a verification test:

// Verify the AI service correctly passes max_completion_tokens
test('ai service uses max_completion_tokens', () => {
  const spy = jest.spyOn(openAIClient.chat.completions, 'create');
  // ... call AI service ...
  expect(spy).toHaveBeenCalledWith(
    expect.objectContaining({
      max_completion_tokens: expect.any(Number)
    })
  );
});

This ensures future refactors don’t accidentally revert the fix.

Broader Context: AI Snippet Generation Overhaul

This fix was part of a larger initiative (Issue #705, “Fix AI snippet generation”) to tighten CodeGrind’s AI workflows. The underlying goal: make AI assistance helpful but bounded—guiding learners toward solutions rather than handing them out.

Practical Takeaway

If you’re integrating LLMs into your platform, always set explicit token limits. It’s a small change with outsized benefits:

  • ✅ Predictable costs and latency
  • ✅ Controlled response quality
  • ✅ Easier to test and debug
  • ✅ Better user experience

Learn more about CodeGrind’s AI architecture on the CodeGrind website, and explore the full codebase on GitHub.


What’s next for CodeGrind’s AI? Check out Issue #712—we’re evaluating GPT-5.1 as the base model for even smarter hint generation.