Fine-Tuned SLM
Domain-Specific Mistral 7B, Trained on Customer Data
A Mistral 7B Instruct model fine-tuned with LoRA on 53 Bounteous policy PDFs (575 pages, 2,276 Q&A pairs), quantized to GGUF Q5_K_M. Runs locally via llama.cpp on Apple Silicon. Data never leaves the environment — a core differentiator for regulated industries.
Video walkthrough coming soon
2-3 minute overview
Live Demohttp://localhost:7860/gradio
Click "Load Demo" to embed the live application
Key Capabilities
- Fine-tuned Mistral 7B Instruct v0.3 with LoRA (rank 16, 4-bit quantization)
- Training data: 53 PDFs, 575 pages, 2,276 Q&A pairs extracted and synthesized
- GGUF quantization (Q5_K_M) for efficient local inference via llama.cpp
- Runs on Apple Silicon (M4 Max) — no cloud GPU required
- Complete pipeline: PDF extraction, Q&A synthesis, training, quantization, deployment
- Replicable for any customer domain: supply their documents, produce their SLM
By the Numbers
Zero
Data Leakage
On-Prem
Inference
End-to-End
Pipeline
Tech Stack
Mistral 7BLoRA / PEFTllama.cppGGUF Q5_K_MGradio + FastAPISentence Transformers