Building a Self-Healing LLM JSON Processor with Zod and Ollama (Deepseek 8b)
Updated October 2025: Added Ollama’s native JSON mode support, model comparison guide, and performance benchmarks to help you choose the right model for structured output.
In this post, I want to show you how to build a reliable system for getting structured data from language models. We’ll focus on making sure the JSON responses are valid and fixing them when they’re not. The best part? We’ll do this using Ollama and the DeepSeek-R1 model - a reasoning model that can “think through” problems before answering, which is particularly useful for complex analysis tasks like content moderation.
DeepSeek-R1 is different from standard language models because it outputs its chain-of-thought reasoning process (visible in <think> tags in the examples below). While this makes it excellent for complex decision-making, you might prefer faster, non-reasoning models like Qwen or Llama for simple structured output tasks. We’ll cover when to use each approach later in this guide.
Some benefits of this approach are:
- Language models can give us data in a structured format we can use in our applications
- We can automatically fix errors when the model gives us bad JSON
- The system is type-safe, which helps catch bugs early
- We can retry failed requests with helpful feedback
- Everything runs locally - perfect for development and debugging
If you haven’t worked with Zod before, it’s a TypeScript-first schema validation library that helps us define and enforce data shapes. Think of it like a strict security guard that checks if data matches exactly what we expect. We’ll use it to make sure our AI responses are in the correct format, and it will help us catch and fix any mismatches.
To follow along, you’ll need:
- Node.js installed
- Basic TypeScript knowledge (we’ll use types to make our code more reliable)
- Ollama installed (see my guide at Ollama setup post)
- The
deepseek-r1:8bmodel pulled in Ollama (ollama pull deepseek-r1:8b)
1. The Problem We’re Solving
Let’s say we’re building a content moderation system. We want to analyze user comments and decide if they’re safe to post. The language model needs to tell us several things:
- Is the content toxic?
- How likely is it to be spam?
- What category of content is it?
- Should we approve it, reject it, or review it?
Language models are great at understanding and analyzing text, but they can be inconsistent with their output format. We need to make sure we get data in a structure our application can use.
Here’s the format we want:
interface ModerationResult { toxic: boolean; spamLikelihood: number; // between 0 and 1 contentCategory: string; recommendedAction: 'approve' | 'reject' | 'review'; confidence: number; explanation: string;}But language models, including local ones like DeepSeek, don’t always give us perfect JSON. Here are some common problems:
// Missing quotes around strings{ toxic: false, spamLikelihood: 0.2, contentCategory: blog_comment, // Should be "blog_comment" recommendedAction: "approve", confidence: 0.95, explanation: "Looks like a regular comment"}
// Wrong data types{ "toxic": "false", // Should be a boolean, not a string "spamLikelihood": "low", // Should be a number "recommendedAction": "APPROVE", // Wrong format "confidence": 95, // Should be between 0 and 1 "explanation": "Looks like a regular comment"}2. Building Our Solution
Let’s build this step by step. First, we’ll define what valid data looks like using Zod:
import { z } from 'zod';
const ModerationSchema = z.object({ toxic: z.boolean(), spamLikelihood: z.number() .min(0) .max(1) .describe('How likely is this spam (0-1)'), contentCategory: z.string(), recommendedAction: z.enum(['approve', 'reject', 'review']), confidence: z.number() .min(0) .max(1), explanation: z.string() .min(1) .max(500)});
type ModerationResult = z.infer<typeof ModerationSchema>;2.1 Choosing the Right Model
Before we continue with the implementation, let’s discuss model selection. Not all models are equal for structured output tasks:
For Complex Analysis with Reasoning (Like Content Moderation):
deepseek-r1:8bordeepseek-r1:14b- Best when you need the model to “think through” complex decisions- Shows chain-of-thought reasoning in
<think>tags - Excellent for nuanced analysis and edge cases
- Trade-off: Slower (3-8 seconds) and more verbose output
- Use when: Decision-making requires context and judgment
- Shows chain-of-thought reasoning in
For Fast, Reliable Structured Output:
qwen2.5:7b- Excellent at JSON, faster than reasoning models (1-3 seconds)llama3.2:3b- Very fast, good for simple schemas (sub-second responses)mistral:7b- Reliable for structured data with good instruction following
When to Use Each:
- Use reasoning models (DeepSeek-R1) when the task requires judgment, context understanding, or complex decision-making
- Use standard models (Qwen, Llama) when you just need reliable JSON output from straightforward transformations
For this content moderation example, we’ll stick with DeepSeek-R1 because we want the model to actually reason about whether content is toxic or spam - not just pattern match.
3. Building Our Solution
Now let’s create our processor class that will:
- Talk to Ollama
- Get JSON from its responses
- Check if the JSON is valid
- Try again if something goes wrong
Here’s the complete implementation:
import { z } from 'zod';
/** * OllamaModerationProcessor handles content moderation using local LLM inference. * It processes text content and returns structured moderation decisions with built-in * error recovery and validation. */class OllamaModerationProcessor { private maxRetries: number; private retryDelay: number;
/** * Initialize the processor with retry settings * @param options Configuration for retry behavior */ constructor(options: { maxRetries?: number; retryDelay?: number } = {}) { this.maxRetries = options.maxRetries ?? 3; this.retryDelay = options.retryDelay ?? 1000; }
/** * Makes HTTP requests to the Ollama API * @param prompt The text prompt to send to the model * @returns Raw response string from the model */ private async callOllama(prompt: string): Promise<string> { try { const response = await fetch('http://localhost:11434/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'deepseek-r1:14b', prompt: prompt, stream: false, format: 'json' // Forces JSON output - reduces malformed responses }) });
if (!response.ok) { throw new Error(`Ollama API error: ${response.statusText}`); }
const data = await response.json(); return data.response; } catch (error) { throw new Error(`Failed to call Ollama: ${error.message}`); } }
/** * Extracts valid JSON from the model's response text * Uses regex to find JSON objects even if surrounded by additional text * @param text Raw response from the model * @returns Parsed JSON object */ private extractJSON(text: string): any { // Regular expression to find JSON object, handling nested structures const match = text.match(/{(?:[^{}]|{(?:[^{}]|{[^{}]*})*})*}/); if (!match) { throw new Error('No JSON object found in response'); }
try { return JSON.parse(match[0]); } catch (error) { throw new Error(`Failed to parse JSON: ${error.message}`); } }
/** * Utility function to pause execution * Used between retry attempts to avoid overwhelming the API */ private async delay(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }
/** * Main method to moderate content * Includes retry logic and error handling * @param content The text content to moderate * @returns Validated moderation result */ async moderateContent(content: string): Promise<ModerationResult> { // Initial prompt template with example let currentPrompt = `Analyze this content:${content}
Give me a JSON object with:- toxic: true/false for toxic content- spamLikelihood: number from 0-1- contentCategory: what kind of content this is- recommendedAction: "approve", "reject", or "review"- confidence: number from 0-1- explanation: why you made this decision
Here's an example of the exact format I need:{ "toxic": false, "spamLikelihood": 0.1, "contentCategory": "blog_comment", "recommendedAction": "approve", "confidence": 0.95, "explanation": "This appears to be a legitimate comment discussing the topic"}
Only give me valid JSON that matches this format.`;
// Retry loop with exponential backoff for (let attempt = 1; attempt <= this.maxRetries; attempt++) { try { const response = await this.callOllama(currentPrompt); const jsonResponse = this.extractJSON(response); const validatedResponse = ModerationSchema.parse(jsonResponse); return validatedResponse; } catch (error) { if (attempt === this.maxRetries) { throw new Error(`Failed after ${this.maxRetries} tries: ${error.message}`); }
// Update prompt with error information for next attempt currentPrompt = `Your last response had an error: ${error.message}Please fix it and give me valid JSON.
Original request:${currentPrompt}`;
await this.delay(this.retryDelay * attempt); } }
throw new Error("Something went wrong"); }}3. Real-World Examples
Let’s see how this works in practice. First, we’ll create our processor:
const processor = new OllamaModerationProcessor();Performance Expectations
Before we look at examples, here’s what to expect when running locally on a MacBook Pro (M1/M2) or similar hardware:
DeepSeek-R1:8b:
- First run (cold start): 15-30 seconds
- Subsequent runs: 3-8 seconds
- Memory: ~8GB RAM
- Best for: Complex analysis requiring reasoning
DeepSeek-R1:14b (more capable but slower):
- First run: 20-40 seconds
- Subsequent runs: 5-12 seconds
- Memory: ~12GB RAM
Alternative Models (if you don’t need reasoning):
- Qwen2.5:7b: 1-3 seconds per request, ~6GB RAM
- Llama3.2:3b: Sub-second responses, ~4GB RAM
For production use cases requiring <1s response times, consider API-based models (OpenAI, Anthropic) or dedicated GPU servers.
Example 1: Normal Comment
const result1 = await processor.moderateContent( "Great article! I learned a lot about Docker containers.");console.log('Example 1 result:', result1);The output for sample 1:
{ "toxic": false, "spamLikelihood": 0.1, "contentCategory": "blog_comment", "recommendedAction": "approve", "confidence": 0.95, "explanation": "This is a genuine and positive comment contributing constructively to the discussion."}The validation passes and the JSON is generated by Ollama. Deepseek 8b used locally. No need to ask the model to fix it.
Example 2: Spam Comment
const result2 = await processor.moderateContent( "BUY NOW!!! Cheap watches r0lex at amazingdeals123.biz");console.log('Example 2 result:', result2);For example 2 also the JSON output is generated correctly at once:
{ "toxic": false, "spamLikelihood": 0.9, "contentCategory": "promotion", "recommendedAction": "review", "confidence": 0.85, "explanation": "The content is promotional and appears to be a commercial advertisement. It uses urgency and uppercase letters typical of spam, but it's not inherently toxic."}Example 3: Error and Retry
Sometimes the model gives us invalid JSON. Although it’s rare with good prompting, I removed from the prompt the response fields and sample JSON to make the model fail, then feed back in the error as a prompt. Here’s how our system handles it:
📝 Starting content moderation...Content to moderate: Great article! I learned a lot about Docker containers.
🔄 Attempt 1 of 3
🚀 Calling Ollama API...Prompt length: 130 charactersMaking API request to http://localhost:11434/api/generate✅ API request successfulResponse length: 1768 characters
Raw response from Ollama: <think>Alright, so the user provided an analysis of their experience reading an article on Docker containers and wants only valid JSON in a specific format.
First, I need to understand what exactly they're asking for. They mentioned "valid JSON" matching a particular structure. Looking at their example response, it's structured with keys like "article_rating", "content_summary", etc.
I should extract the main points from their content: learning about Docker containers, finding the article helpful and well-written, recommending it to others, being satisfied with the information, and wanting to apply what was learned.
Next, I'll map these points into the JSON structure they provided. Making sure each key corresponds correctly and the values are accurate based on their analysis.
I should also ensure that the JSON syntax is correct, with proper use of quotes and commas, avoiding any trailing commas or syntax errors.
Finally, present this JSON response clearly, so it's easy for them to integrate into whatever system they're using.</think>
{ "article_rating": "5/5", "content_summary": { "topic": "Docker Containers", "key_points": [ "Learned about the core concepts of Docker containers.", "Understood their usage in application development and deployment.", "Appreciated the efficiency and scalability benefits." ], "overall_impression": "Extremely informative and well-written article. Highly recommend to anyone interested in Docker technology." }, "personal_reaction": { "engagement_level": "Highly engaged", "satisfaction": "Very satisfied with the information provided.", "action_plan": "Plan to implement Docker containers in future projects based on learned knowledge." }}
🔍 Attempting to extract JSON from response...Input text length: 1768 charactersFound JSON pattern, attempting to parse...✅ Successfully parsed JSON
Extracted JSON: { "article_rating": "5/5", "content_summary": { "topic": "Docker Containers", "key_points": [ "Learned about the core concepts of Docker containers.", "Understood their usage in application development and deployment.", "Appreciated the efficiency and scalability benefits." ], "overall_impression": "Extremely informative and well-written article. Highly recommend to anyone interested in Docker technology." }, "personal_reaction": { "engagement_level": "Highly engaged", "satisfaction": "Very satisfied with the information provided.", "action_plan": "Plan to implement Docker containers in future projects based on learned knowledge." }}
✨ Validating response against schema...
❌ Attempt 1 failed: [ { "code": "invalid_type", "expected": "boolean", "received": "undefined", "path": [ "toxic" ], "message": "Required" }, { "code": "invalid_type", "expected": "number", "received": "undefined", "path": [ "spamLikelihood" ], "message": "Required" }, { "code": "invalid_type", "expected": "string", "received": "undefined", "path": [ "contentCategory" ], "message": "Required" }, { "expected": "'approve' | 'reject' | 'review'", "received": "undefined", "code": "invalid_type", "path": [ "recommendedAction" ], "message": "Required" }, { "code": "invalid_type", "expected": "number", "received": "undefined", "path": [ "confidence" ], "message": "Required" }, { "code": "invalid_type", "expected": "string", "received": "undefined", "path": [ "explanation" ], "message": "Required" }]
📝 Updating prompt with error information...⏳ Delaying for 1000ms before retry...
🔄 Attempt 2 of 3
🚀 Calling Ollama API...Prompt length: 1210 charactersMaking API request to http://localhost:11434/api/generate✅ API request successfulResponse length: 1323 characters
Raw response from Ollama: <think>Okay, so the user sent me some data, but the initial response had errors because of invalid types. My job is to figure out what's wrong and fix it.
Looking at the error messages, I see that several properties are missing or undefined. The first error says "toxic" is undefined; it expects a boolean. Then there's "spamLikelihood" expecting a number, "contentCategory" needing a string, "recommendedAction" should be one of 'approve', 'reject', or 'review', and both "confidence" and "explanation" are missing.
The user provided a content: "Great article! I learned a lot about Docker containers." So, I need to analyze this. Is it toxic? Probably not. Spam likelihood is low, maybe 0. Maybe content category is something like 'article' or 'educational'.
Recommended action should be 'approve' since the content seems positive. Confidence is how sure I am, so around 0.95. Explanation would briefly say why.
Putting it all together into the correct JSON structure. Let me make sure each field matches what's expected.</think>
Here is the valid JSON response:
json{ "toxic": false, "spamLikelihood": 0, "contentCategory": "article", "recommendedAction": "approve", "confidence": 0.95, "explanation": "The content is a positive review of Docker containers, which is relevant and non-spam."}
🔍 Attempting to extract JSON from response...Input text length: 1323 charactersFound JSON pattern, attempting to parse...✅ Successfully parsed JSON
Extracted JSON: { "toxic": false, "spamLikelihood": 0, "contentCategory": "article", "recommendedAction": "approve", "confidence": 0.95, "explanation": "The content is a positive review of Docker containers, which is relevant and non-spam."}
✨ Validating response against schema...✅ Validation successful✅ Example 1 result: { toxic: false, spamLikelihood: 0, contentCategory: 'article', recommendedAction: 'approve', confidence: 0.95, explanation: 'The content is a positive review of Docker containers, which is relevant and non-spam.'}A Note on Troubleshooting
If you run into issues, first ensure you’re running the latest version of Ollama (ollama --version), as older versions might not support newer models like DeepSeek-8B. The most common problems are usually related to setup: make sure Ollama is running with ollama serve, and that you’ve successfully pulled the model with ollama pull deepseek-r1:8b. When the model first loads, responses might take 15-30 seconds, but subsequent calls are much faster. For JSON parsing issues, our code’s error handling and retry mechanism should handle most edge cases automatically, but you might need to adjust the retry count or delay if you’re getting inconsistent results.
Advanced Tips for Better Results
While our base implementation works well for most cases, here are some advanced patterns you might consider:
-
Response Correction: You can add pre-validation cleanup steps to handle common issues like string booleans (“true” vs true) or percentages in the wrong format (95 vs 0.95). This makes your system more resilient to minor model output variations.
-
Smart Retries: Instead of just retrying with the same prompt, you can analyze what parts of the response were valid and specifically ask the model to fix the invalid parts. For example, if only the ‘confidence’ field is wrong, you can focus the retry prompt on just fixing that field.
-
Context-Aware Rules: Consider adjusting your validation rules based on the input content. For instance, you might want stricter spam checking for content containing URLs, or you might accept lower confidence scores for very short inputs.
-
Error Pattern Learning: Keep track of the most common validation errors you see. This can help you improve your base prompt or add specific pre-validation fixes for recurring issues.
The beauty of using Zod for validation is that you can start simple and gradually add these improvements as you learn what kinds of errors your specific use case encounters most often.
Ollama’s JSON Mode
One of the most important improvements since this pattern was first developed is Ollama’s native JSON mode. By adding format: "json" to your API request (as shown in our code above), Ollama will enforce JSON output at the model level. This significantly reduces malformed responses.
However, this doesn’t replace Zod validation. Ollama’s JSON mode ensures syntactically valid JSON, but it doesn’t guarantee:
- Correct data types (could still return strings instead of booleans)
- Required fields are present
- Values are within acceptable ranges
- Business logic constraints are met
Think of it as a two-layer defense:
- Ollama JSON mode - Prevents syntax errors (missing brackets, invalid JSON structure)
- Zod validation - Ensures semantic correctness (right types, valid values, business rules)
Together, they make your system highly reliable with minimal retries needed.
Learning From Errors
One of the most powerful aspects of this system is its ability to learn from failures. When the model provides invalid JSON, our retry mechanism doesn’t just try again blindly - it includes specific feedback about what went wrong. This creates a feedback loop where each retry attempt becomes more focused and effective.
For example, if the model consistently formats boolean values as strings (like “true” instead of true), we can update our prompt to explicitly warn against this pattern. The key is to collect and analyze these validation errors over time, helping us refine our prompts and improve the system’s reliability.
Choosing Your Approach
Now that you understand the full pattern, here’s a decision tree for when to use different approaches:
This Pattern (Self-Healing + Zod + Local Models)
Use when:
- Complex schemas with business rules and validation
- Need type safety and compile-time guarantees
- Acceptable 2-8 second latency (reasoning models) or 1-3 second (standard models)
- Want to run completely locally for development/testing
- Privacy is important (data stays local)
Example use cases: Content moderation, data extraction with validation, form processing
Ollama JSON Mode Alone (No Zod)
Use when:
- Simple schemas that don’t change
- Trust the model output
- Need faster development iteration
- Don’t need TypeScript type safety
Trade-off: No compile-time type checking, manual error handling
API-Based Structured Output (OpenAI, Anthropic)
Use when:
- Need sub-second response times
- Production systems with high availability requirements
- Complex schemas where you need guaranteed conformance
- Can afford API costs (~$0.001-0.01 per request)
Example: OpenAI’s structured outputs, Anthropic’s JSON mode with schema
Tool/Function Calling
Use when:
- Model needs to choose between multiple possible actions
- Building agent-like systems
- Need the model to decide what to do, not just format data
Available in: Many Ollama models, OpenAI, Anthropic, etc.
For most applications, starting with this local pattern (Ollama + Zod + self-healing) is ideal. You can develop and test everything locally, then migrate to API-based solutions only if you need the performance in production.
Building Trust in LLM Outputs
What we’ve built here goes beyond just getting JSON from a language model - it’s a pattern for making LLM outputs reliable enough for production use. By combining Zod’s strict validation with Ollama’s local inference and our retry mechanism, we’ve created a system that can recover from failures and learn from its mistakes.
Running this locally with DeepSeek-14B makes it perfect for development and testing. You get quick iteration cycles, complete privacy, and no API costs. While you could adapt this code to work with API-based models like Claude or GPT-4, having everything run locally makes development much more efficient.
The principles we’ve covered - strict validation, intelligent retries, and error feedback - can be applied to any scenario where you need structured data from LLMs. Whether you’re building a content moderation system, a data extraction pipeline, or any other LLM-powered tool, this pattern helps bridge the gap between the creative capabilities of language models and the strict requirements of production systems.
Ready to Build with LLMs?
The concepts in this post are just the start. My free 11-page cheat sheet gives you copy-paste prompts and patterns to get reliable, structured output from any model.
Related Articles
LLM Prompting Techniques for Developers
Learn about LLM Prompting Techniques for Developers
Extending LLM Capabilities with Custom Tools: Beyond the Knowledge Cutoff
Learn about Extending LLM Capabilities with Custom Tools: Beyond the Knowledge Cutoff
Intro to Ollama: Your Personal AI Model Tool
Learn about Intro to Ollama: Your Personal AI Model Tool
RAG Systems Deep Dive Part 3: Advanced Features and Performance Optimization
Learn about RAG Systems Deep Dive Part 3: Advanced Features and Performance Optimization