Fine-Tuned SLM

Domain-Specific Mistral 7B, Trained on Customer Data

A Mistral 7B Instruct model fine-tuned with LoRA on 53 Bounteous policy PDFs (575 pages, 2,276 Q&A pairs), quantized to GGUF Q5_K_M. Runs locally via llama.cpp on Apple Silicon. Data never leaves the environment — a core differentiator for regulated industries.

Video walkthrough coming soon

2-3 minute overview

Live Demohttp://localhost:7860/gradio

Open in tab

Click "Load Demo" to embed the live application

Key Capabilities

Fine-tuned Mistral 7B Instruct v0.3 with LoRA (rank 16, 4-bit quantization)
Training data: 53 PDFs, 575 pages, 2,276 Q&A pairs extracted and synthesized
GGUF quantization (Q5_K_M) for efficient local inference via llama.cpp
Runs on Apple Silicon (M4 Max) — no cloud GPU required
Complete pipeline: PDF extraction, Q&A synthesis, training, quantization, deployment
Replicable for any customer domain: supply their documents, produce their SLM

By the Numbers

Zero

Data Leakage

On-Prem

Inference

End-to-End

Pipeline

Tech Stack

Mistral 7BLoRA / PEFTllama.cppGGUF Q5_K_MGradio + FastAPISentence Transformers

Links

Local Instance