Home/AI Innovation/Fine-Tuned SLM

Fine-Tuned SLM

Domain-Specific Mistral 7B, Trained on Customer Data

A Mistral 7B Instruct model fine-tuned with LoRA on 53 Bounteous policy PDFs (575 pages, 2,276 Q&A pairs), quantized to GGUF Q5_K_M. Runs locally via llama.cpp on Apple Silicon. Data never leaves the environment — a core differentiator for regulated industries.

Video walkthrough coming soon

2-3 minute overview

Live Demohttp://localhost:7860/gradio
Open in tab
Click "Load Demo" to embed the live application

Key Capabilities

  • Fine-tuned Mistral 7B Instruct v0.3 with LoRA (rank 16, 4-bit quantization)
  • Training data: 53 PDFs, 575 pages, 2,276 Q&A pairs extracted and synthesized
  • GGUF quantization (Q5_K_M) for efficient local inference via llama.cpp
  • Runs on Apple Silicon (M4 Max) — no cloud GPU required
  • Complete pipeline: PDF extraction, Q&A synthesis, training, quantization, deployment
  • Replicable for any customer domain: supply their documents, produce their SLM

By the Numbers

Zero
Data Leakage
On-Prem
Inference
End-to-End
Pipeline

Tech Stack

Mistral 7BLoRA / PEFTllama.cppGGUF Q5_K_MGradio + FastAPISentence Transformers