Building a Self-Healing LLM JSON Processor with Zod and Ollama (Deepseek 8b)

Updated October 2025: Added Ollama’s native JSON mode support, model comparison guide, and performance benchmarks to help you choose the right model for structured output.

In this post, I want to show you how to build a reliable system for getting structured data from language models. We’ll focus on making sure the JSON responses are valid and fixing them when they’re not. The best part? We’ll do this using Ollama and the DeepSeek-R1 model - a reasoning model that can “think through” problems before answering, which is particularly useful for complex analysis tasks like content moderation.

DeepSeek-R1 is different from standard language models because it outputs its chain-of-thought reasoning process (visible in <think> tags in the examples below). While this makes it excellent for complex decision-making, you might prefer faster, non-reasoning models like Qwen or Llama for simple structured output tasks. We’ll cover when to use each approach later in this guide.

Some benefits of this approach are:

Language models can give us data in a structured format we can use in our applications
We can automatically fix errors when the model gives us bad JSON
The system is type-safe, which helps catch bugs early
We can retry failed requests with helpful feedback
Everything runs locally - perfect for development and debugging

If you haven’t worked with Zod before, it’s a TypeScript-first schema validation library that helps us define and enforce data shapes. Think of it like a strict security guard that checks if data matches exactly what we expect. We’ll use it to make sure our AI responses are in the correct format, and it will help us catch and fix any mismatches.

To follow along, you’ll need:

Node.js installed
Basic TypeScript knowledge (we’ll use types to make our code more reliable)
Ollama installed (see my guide at Ollama setup post)
The deepseek-r1:8b model pulled in Ollama (ollama pull deepseek-r1:8b)

1. The Problem We’re Solving

Let’s say we’re building a content moderation system. We want to analyze user comments and decide if they’re safe to post. The language model needs to tell us several things:

Is the content toxic?
How likely is it to be spam?
What category of content is it?
Should we approve it, reject it, or review it?

Language models are great at understanding and analyzing text, but they can be inconsistent with their output format. We need to make sure we get data in a structure our application can use.

Here’s the format we want:

interface ModerationResult {
  toxic: boolean;
  spamLikelihood: number; // between 0 and 1
  contentCategory: string;
  recommendedAction: 'approve' | 'reject' | 'review';
  confidence: number;
  explanation: string;
}

But language models, including local ones like DeepSeek, don’t always give us perfect JSON. Here are some common problems:

// Missing quotes around strings
{
  toxic: false,
  spamLikelihood: 0.2,
  contentCategory: blog_comment,  // Should be "blog_comment"
  recommendedAction: "approve",
  confidence: 0.95,
  explanation: "Looks like a regular comment"
}

// Wrong data types
{
  "toxic": "false",  // Should be a boolean, not a string
  "spamLikelihood": "low",  // Should be a number
  "recommendedAction": "APPROVE",  // Wrong format
  "confidence": 95,  // Should be between 0 and 1
  "explanation": "Looks like a regular comment"
}

2. Building Our Solution

Let’s build this step by step. First, we’ll define what valid data looks like using Zod:

import { z } from 'zod';

const ModerationSchema = z.object({
  toxic: z.boolean(),
  spamLikelihood: z.number()
    .min(0)
    .max(1)
    .describe('How likely is this spam (0-1)'),
  contentCategory: z.string(),
  recommendedAction: z.enum(['approve', 'reject', 'review']),
  confidence: z.number()
    .min(0)
    .max(1),
  explanation: z.string()
    .min(1)
    .max(500)
});

type ModerationResult = z.infer<typeof ModerationSchema>;

2.1 Choosing the Right Model

Before we continue with the implementation, let’s discuss model selection. Not all models are equal for structured output tasks:

For Complex Analysis with Reasoning (Like Content Moderation):

deepseek-r1:8b or deepseek-r1:14b - Best when you need the model to “think through” complex decisions
- Shows chain-of-thought reasoning in <think> tags
- Excellent for nuanced analysis and edge cases
- Trade-off: Slower (3-8 seconds) and more verbose output
- Use when: Decision-making requires context and judgment

For Fast, Reliable Structured Output:

qwen2.5:7b - Excellent at JSON, faster than reasoning models (1-3 seconds)
llama3.2:3b - Very fast, good for simple schemas (sub-second responses)
mistral:7b - Reliable for structured data with good instruction following

When to Use Each:

Use reasoning models (DeepSeek-R1) when the task requires judgment, context understanding, or complex decision-making
Use standard models (Qwen, Llama) when you just need reliable JSON output from straightforward transformations

For this content moderation example, we’ll stick with DeepSeek-R1 because we want the model to actually reason about whether content is toxic or spam - not just pattern match.

3. Building Our Solution

Now let’s create our processor class that will:

Talk to Ollama
Get JSON from its responses
Check if the JSON is valid
Try again if something goes wrong

Here’s the complete implementation:

import { z } from 'zod';

/**
 * OllamaModerationProcessor handles content moderation using local LLM inference.
 * It processes text content and returns structured moderation decisions with built-in
 * error recovery and validation.
 */
class OllamaModerationProcessor {
  private maxRetries: number;
  private retryDelay: number;

  /**
   * Initialize the processor with retry settings
   * @param options Configuration for retry behavior
   */
  constructor(options: { maxRetries?: number; retryDelay?: number } = {}) {
    this.maxRetries = options.maxRetries ?? 3;
    this.retryDelay = options.retryDelay ?? 1000;
  }

  /**
   * Makes HTTP requests to the Ollama API
   * @param prompt The text prompt to send to the model
   * @returns Raw response string from the model
   */
  private async callOllama(prompt: string): Promise<string> {
    try {
      const response = await fetch('http://localhost:11434/api/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          model: 'deepseek-r1:14b',
          prompt: prompt,
          stream: false,
          format: 'json'  // Forces JSON output - reduces malformed responses
        })
      });

      if (!response.ok) {
        throw new Error(`Ollama API error: ${response.statusText}`);
      }

      const data = await response.json();
      return data.response;
    } catch (error) {
      throw new Error(`Failed to call Ollama: ${error.message}`);
    }
  }

  /**
   * Extracts valid JSON from the model's response text
   * Uses regex to find JSON objects even if surrounded by additional text
   * @param text Raw response from the model
   * @returns Parsed JSON object
   */
  private extractJSON(text: string): any {
    // Regular expression to find JSON object, handling nested structures
    const match = text.match(/{(?:[^{}]|{(?:[^{}]|{[^{}]*})*})*}/);
    if (!match) {
      throw new Error('No JSON object found in response');
    }

    try {
      return JSON.parse(match[0]);
    } catch (error) {
      throw new Error(`Failed to parse JSON: ${error.message}`);
    }
  }

  /**
   * Utility function to pause execution
   * Used between retry attempts to avoid overwhelming the API
   */
  private async delay(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  /**
   * Main method to moderate content
   * Includes retry logic and error handling
   * @param content The text content to moderate
   * @returns Validated moderation result
   */
  async moderateContent(content: string): Promise<ModerationResult> {
    // Initial prompt template with example
    let currentPrompt = `
Analyze this content:
${content}

Give me a JSON object with:
- toxic: true/false for toxic content
- spamLikelihood: number from 0-1
- contentCategory: what kind of content this is
- recommendedAction: "approve", "reject", or "review"
- confidence: number from 0-1
- explanation: why you made this decision

Here's an example of the exact format I need:
{
  "toxic": false,
  "spamLikelihood": 0.1,
  "contentCategory": "blog_comment",
  "recommendedAction": "approve",
  "confidence": 0.95,
  "explanation": "This appears to be a legitimate comment discussing the topic"
}

Only give me valid JSON that matches this format.`;

    // Retry loop with exponential backoff
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        const response = await this.callOllama(currentPrompt);
        const jsonResponse = this.extractJSON(response);
        const validatedResponse = ModerationSchema.parse(jsonResponse);
        return validatedResponse;
      } catch (error) {
        if (attempt === this.maxRetries) {
          throw new Error(`Failed after ${this.maxRetries} tries: ${error.message}`);
        }

        // Update prompt with error information for next attempt
        currentPrompt = `
Your last response had an error: ${error.message}
Please fix it and give me valid JSON.

Original request:
${currentPrompt}`;

        await this.delay(this.retryDelay * attempt);
      }
    }

    throw new Error("Something went wrong");
  }
}

3. Real-World Examples

Let’s see how this works in practice. First, we’ll create our processor:

const processor = new OllamaModerationProcessor();

Performance Expectations

Before we look at examples, here’s what to expect when running locally on a MacBook Pro (M1/M2) or similar hardware:

DeepSeek-R1:8b:

First run (cold start): 15-30 seconds
Subsequent runs: 3-8 seconds
Memory: ~8GB RAM
Best for: Complex analysis requiring reasoning

DeepSeek-R1:14b (more capable but slower):

First run: 20-40 seconds
Subsequent runs: 5-12 seconds
Memory: ~12GB RAM

Alternative Models (if you don’t need reasoning):

Qwen2.5:7b: 1-3 seconds per request, ~6GB RAM
Llama3.2:3b: Sub-second responses, ~4GB RAM

For production use cases requiring <1s response times, consider API-based models (OpenAI, Anthropic) or dedicated GPU servers.

Example 1: Normal Comment

const result1 = await processor.moderateContent(
  "Great article! I learned a lot about Docker containers."
);
console.log('Example 1 result:', result1);

The output for sample 1:

{
  "toxic": false,
  "spamLikelihood": 0.1,
  "contentCategory": "blog_comment",
  "recommendedAction": "approve",
  "confidence": 0.95,
  "explanation": "This is a genuine and positive comment contributing constructively to the discussion."
}

The validation passes and the JSON is generated by Ollama. Deepseek 8b used locally. No need to ask the model to fix it.

Example 2: Spam Comment

const result2 = await processor.moderateContent(
  "BUY NOW!!! Cheap watches r0lex at amazingdeals123.biz"
);
console.log('Example 2 result:', result2);

For example 2 also the JSON output is generated correctly at once:

{
  "toxic": false,
  "spamLikelihood": 0.9,
  "contentCategory": "promotion",
  "recommendedAction": "review",
  "confidence": 0.85,
  "explanation": "The content is promotional and appears to be a commercial advertisement. It uses urgency and uppercase letters typical of spam, but it's not inherently toxic."
}

Example 3: Error and Retry

Sometimes the model gives us invalid JSON. Although it’s rare with good prompting, I removed from the prompt the response fields and sample JSON to make the model fail, then feed back in the error as a prompt. Here’s how our system handles it:

📝 Starting content moderation...
Content to moderate: Great article! I learned a lot about Docker containers.

🔄 Attempt 1 of 3

🚀 Calling Ollama API...
Prompt length: 130 characters
Making API request to http://localhost:11434/api/generate
✅ API request successful
Response length: 1768 characters

Raw response from Ollama: <think>
Alright, so the user provided an analysis of their experience reading an article on Docker containers and wants only valid JSON in a specific format.

First, I need to understand what exactly they're asking for. They mentioned "valid JSON" matching a particular structure. Looking at their example response, it's structured with keys like "article_rating", "content_summary", etc.

I should extract the main points from their content: learning about Docker containers, finding the article helpful and well-written, recommending it to others, being satisfied with the information, and wanting to apply what was learned.

Next, I'll map these points into the JSON structure they provided. Making sure each key corresponds correctly and the values are accurate based on their analysis.

I should also ensure that the JSON syntax is correct, with proper use of quotes and commas, avoiding any trailing commas or syntax errors.

Finally, present this JSON response clearly, so it's easy for them to integrate into whatever system they're using.
</think>

{
  "article_rating": "5/5",
  "content_summary": {
    "topic": "Docker Containers",
    "key_points": [
      "Learned about the core concepts of Docker containers.",
      "Understood their usage in application development and deployment.",
      "Appreciated the efficiency and scalability benefits."
    ],
    "overall_impression": "Extremely informative and well-written article. Highly recommend to anyone interested in Docker technology."
  },
  "personal_reaction": {
    "engagement_level": "Highly engaged",
    "satisfaction": "Very satisfied with the information provided.",
    "action_plan": "Plan to implement Docker containers in future projects based on learned knowledge."
  }
}

🔍 Attempting to extract JSON from response...
Input text length: 1768 characters
Found JSON pattern, attempting to parse...
✅ Successfully parsed JSON

Extracted JSON: {
  "article_rating": "5/5",
  "content_summary": {
    "topic": "Docker Containers",
    "key_points": [
      "Learned about the core concepts of Docker containers.",
      "Understood their usage in application development and deployment.",
      "Appreciated the efficiency and scalability benefits."
    ],
    "overall_impression": "Extremely informative and well-written article. Highly recommend to anyone interested in Docker technology."
  },
  "personal_reaction": {
    "engagement_level": "Highly engaged",
    "satisfaction": "Very satisfied with the information provided.",
    "action_plan": "Plan to implement Docker containers in future projects based on learned knowledge."
  }
}

✨ Validating response against schema...

❌ Attempt 1 failed: [
  {
    "code": "invalid_type",
    "expected": "boolean",
    "received": "undefined",
    "path": [
      "toxic"
    ],
    "message": "Required"
  },
  {
    "code": "invalid_type",
    "expected": "number",
    "received": "undefined",
    "path": [
      "spamLikelihood"
    ],
    "message": "Required"
  },
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "undefined",
    "path": [
      "contentCategory"
    ],
    "message": "Required"
  },
  {
    "expected": "'approve' | 'reject' | 'review'",
    "received": "undefined",
    "code": "invalid_type",
    "path": [
      "recommendedAction"
    ],
    "message": "Required"
  },
  {
    "code": "invalid_type",
    "expected": "number",
    "received": "undefined",
    "path": [
      "confidence"
    ],
    "message": "Required"
  },
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "undefined",
    "path": [
      "explanation"
    ],
    "message": "Required"
  }
]

📝 Updating prompt with error information...
⏳ Delaying for 1000ms before retry...

🔄 Attempt 2 of 3

🚀 Calling Ollama API...
Prompt length: 1210 characters
Making API request to http://localhost:11434/api/generate
✅ API request successful
Response length: 1323 characters

Raw response from Ollama: <think>
Okay, so the user sent me some data, but the initial response had errors because of invalid types. My job is to figure out what's wrong and fix it.

Looking at the error messages, I see that several properties are missing or undefined. The first error says "toxic" is undefined; it expects a boolean. Then there's "spamLikelihood" expecting a number, "contentCategory" needing a string, "recommendedAction" should be one of 'approve', 'reject', or 'review', and both "confidence" and "explanation" are missing.

The user provided a content: "Great article! I learned a lot about Docker containers." So, I need to analyze this. Is it toxic? Probably not. Spam likelihood is low, maybe 0. Maybe content category is something like 'article' or 'educational'.

Recommended action should be 'approve' since the content seems positive. Confidence is how sure I am, so around 0.95. Explanation would briefly say why.

Putting it all together into the correct JSON structure. Let me make sure each field matches what's expected.
</think>

Here is the valid JSON response:

json
{
  "toxic": false,
  "spamLikelihood": 0,
  "contentCategory": "article",
  "recommendedAction": "approve",
  "confidence": 0.95,
  "explanation": "The content is a positive review of Docker containers, which is relevant and non-spam."
}

🔍 Attempting to extract JSON from response...
Input text length: 1323 characters
Found JSON pattern, attempting to parse...
✅ Successfully parsed JSON

Extracted JSON: {
  "toxic": false,
  "spamLikelihood": 0,
  "contentCategory": "article",
  "recommendedAction": "approve",
  "confidence": 0.95,
  "explanation": "The content is a positive review of Docker containers, which is relevant and non-spam."
}

✨ Validating response against schema...
✅ Validation successful
✅ Example 1 result: {
  toxic: false,
  spamLikelihood: 0,
  contentCategory: 'article',
  recommendedAction: 'approve',
  confidence: 0.95,
  explanation: 'The content is a positive review of Docker containers, which is relevant and non-spam.'
}

A Note on Troubleshooting

If you run into issues, first ensure you’re running the latest version of Ollama (ollama --version), as older versions might not support newer models like DeepSeek-8B. The most common problems are usually related to setup: make sure Ollama is running with ollama serve, and that you’ve successfully pulled the model with ollama pull deepseek-r1:8b. When the model first loads, responses might take 15-30 seconds, but subsequent calls are much faster. For JSON parsing issues, our code’s error handling and retry mechanism should handle most edge cases automatically, but you might need to adjust the retry count or delay if you’re getting inconsistent results.

Advanced Tips for Better Results

While our base implementation works well for most cases, here are some advanced patterns you might consider:

Response Correction: You can add pre-validation cleanup steps to handle common issues like string booleans (“true” vs true) or percentages in the wrong format (95 vs 0.95). This makes your system more resilient to minor model output variations.
Smart Retries: Instead of just retrying with the same prompt, you can analyze what parts of the response were valid and specifically ask the model to fix the invalid parts. For example, if only the ‘confidence’ field is wrong, you can focus the retry prompt on just fixing that field.
Context-Aware Rules: Consider adjusting your validation rules based on the input content. For instance, you might want stricter spam checking for content containing URLs, or you might accept lower confidence scores for very short inputs.
Error Pattern Learning: Keep track of the most common validation errors you see. This can help you improve your base prompt or add specific pre-validation fixes for recurring issues.

The beauty of using Zod for validation is that you can start simple and gradually add these improvements as you learn what kinds of errors your specific use case encounters most often.

Ollama’s JSON Mode

One of the most important improvements since this pattern was first developed is Ollama’s native JSON mode. By adding format: "json" to your API request (as shown in our code above), Ollama will enforce JSON output at the model level. This significantly reduces malformed responses.

However, this doesn’t replace Zod validation. Ollama’s JSON mode ensures syntactically valid JSON, but it doesn’t guarantee:

Correct data types (could still return strings instead of booleans)
Required fields are present
Values are within acceptable ranges
Business logic constraints are met

Think of it as a two-layer defense:

Ollama JSON mode - Prevents syntax errors (missing brackets, invalid JSON structure)
Zod validation - Ensures semantic correctness (right types, valid values, business rules)

Together, they make your system highly reliable with minimal retries needed.

Learning From Errors

One of the most powerful aspects of this system is its ability to learn from failures. When the model provides invalid JSON, our retry mechanism doesn’t just try again blindly - it includes specific feedback about what went wrong. This creates a feedback loop where each retry attempt becomes more focused and effective.

For example, if the model consistently formats boolean values as strings (like “true” instead of true), we can update our prompt to explicitly warn against this pattern. The key is to collect and analyze these validation errors over time, helping us refine our prompts and improve the system’s reliability.

Choosing Your Approach

Now that you understand the full pattern, here’s a decision tree for when to use different approaches:

This Pattern (Self-Healing + Zod + Local Models)

Use when:

Complex schemas with business rules and validation
Need type safety and compile-time guarantees
Acceptable 2-8 second latency (reasoning models) or 1-3 second (standard models)
Want to run completely locally for development/testing
Privacy is important (data stays local)

Example use cases: Content moderation, data extraction with validation, form processing

Ollama JSON Mode Alone (No Zod)

Use when:

Simple schemas that don’t change
Trust the model output
Need faster development iteration
Don’t need TypeScript type safety

Trade-off: No compile-time type checking, manual error handling

API-Based Structured Output (OpenAI, Anthropic)

Use when:

Need sub-second response times
Production systems with high availability requirements
Complex schemas where you need guaranteed conformance
Can afford API costs (~$0.001-0.01 per request)

Example: OpenAI’s structured outputs, Anthropic’s JSON mode with schema

Tool/Function Calling

Use when:

Model needs to choose between multiple possible actions
Building agent-like systems
Need the model to decide what to do, not just format data

Available in: Many Ollama models, OpenAI, Anthropic, etc.

For most applications, starting with this local pattern (Ollama + Zod + self-healing) is ideal. You can develop and test everything locally, then migrate to API-based solutions only if you need the performance in production.

Building Trust in LLM Outputs

What we’ve built here goes beyond just getting JSON from a language model - it’s a pattern for making LLM outputs reliable enough for production use. By combining Zod’s strict validation with Ollama’s local inference and our retry mechanism, we’ve created a system that can recover from failures and learn from its mistakes.

Running this locally with DeepSeek-14B makes it perfect for development and testing. You get quick iteration cycles, complete privacy, and no API costs. While you could adapt this code to work with API-based models like Claude or GPT-4, having everything run locally makes development much more efficient.

The principles we’ve covered - strict validation, intelligent retries, and error feedback - can be applied to any scenario where you need structured data from LLMs. Whether you’re building a content moderation system, a data extraction pipeline, or any other LLM-powered tool, this pattern helps bridge the gap between the creative capabilities of language models and the strict requirements of production systems.

Pedro Alonso

Building a Self-Healing LLM JSON Processor with Zod and Ollama (Deepseek 8b)

1. The Problem We’re Solving

2. Building Our Solution

2.1 Choosing the Right Model

3. Building Our Solution

3. Real-World Examples

Performance Expectations

Example 1: Normal Comment

Example 2: Spam Comment

Example 3: Error and Retry

A Note on Troubleshooting

Advanced Tips for Better Results

Ollama’s JSON Mode

Learning From Errors

Choosing Your Approach

This Pattern (Self-Healing + Zod + Local Models)

Ollama JSON Mode Alone (No Zod)

API-Based Structured Output (OpenAI, Anthropic)

Tool/Function Calling

Building Trust in LLM Outputs

Related Articles

LLM Prompting Techniques for Developers

Extending LLM Capabilities with Custom Tools: Beyond the Knowledge Cutoff

Intro to Ollama: Your Personal AI Model Tool

RAG Systems Deep Dive Part 1: Core Concepts and Architecture

1. The Problem We’re Solving

2. Building Our Solution

2.1 Choosing the Right Model

3. Building Our Solution

3. Real-World Examples

Performance Expectations

Example 1: Normal Comment

Example 2: Spam Comment

Example 3: Error and Retry

A Note on Troubleshooting

Advanced Tips for Better Results

Ollama’s JSON Mode

Learning From Errors

Choosing Your Approach

This Pattern (Self-Healing + Zod + Local Models)

Ollama JSON Mode Alone (No Zod)

API-Based Structured Output (OpenAI, Anthropic)

Tool/Function Calling

Building Trust in LLM Outputs

Get Your Free Developer Guide

Related Articles

LLM Prompting Techniques for Developers

Extending LLM Capabilities with Custom Tools: Beyond the Knowledge Cutoff

Intro to Ollama: Your Personal AI Model Tool

RAG Systems Deep Dive Part 1: Core Concepts and Architecture

Get Your Free Developer Guide