Why Run Locally?
Privacy
Your transcriptions never leave your machine. Ideal for sensitive notes.
Offline
Works without an internet connection once models are downloaded.
Zero Cost
No API bills or subscriptions. Use it as much as your hardware allows.
Getting Started
Pipit doesn’t bundle its own LLM engine to keep the app size manageable. Instead, it connects to local “Inference Servers” using an OpenAI-compatible API. The two most popular ways to run local models on macOS are Ollama and LM Studio.Option 1: Ollama (Recommended)
Ollama is a command-line tool that makes running models extremely simple. It’s the most efficient way to use local AI on Mac.1. Installation
Download from ollama.com or install via Homebrew:2. Start the Server
Ollama runs as a background process. You can start it from your Applications folder or via terminal:3. Download a Model
Open a terminal and “pull” the model you want to use. We suggest starting with Llama 3.2:4. Configure Pipit
- Open Pipit Settings → AI Processing.
- Select Custom Endpoint as the provider.
- Use the following settings:
| Setting | Value |
|---|---|
| Endpoint URL | http://localhost:11434/v1 |
| API Key | (Leave blank) |
| Model Name | llama3.2 |
Option 2: LM Studio
LM Studio provides a beautiful graphical interface. Use this if you prefer finding and managing models through a UI rather than the terminal.1. Installation
Download the macOS version from lmstudio.ai.2. Download a Model
- Open LM Studio and search for Llama 3.2 or Qwen 2.5.
- Click Download on a version that fits your RAM (look for “Recommended”).
3. Start the Local Server
- Click the Developer (double chevron) icon in the sidebar.
- Select your downloaded model from the dropdown at the top.
- Click Start Server.
4. Configure Pipit
| Setting | Value |
|---|---|
| Endpoint URL | http://localhost:1234/v1 |
| API Key | (Leave blank) |
| Model Name | Use the exact name shown in LM Studio |
Recommended Models
For the best experience in Pipit (speed vs. accuracy), we recommend:| Model | Size | RAM Required | Performance |
|---|---|---|---|
| Llama 3.2 (3B) | ~2GB | 8GB+ | Fast, great for general cleanup |
| Qwen 2.5 (3B) | ~2GB | 8GB+ | Excellent at following formatting |
| Llama 3.1 (8B) | ~5GB | 16GB+ | More “intelligent” but slower |
| DeepSeek R1 (7B) | ~5GB | 16GB+ | Exceptional for technical content |
Troubleshooting Local Models
Connection Refused
If Pipit says it can’t connect:- Is the server running? Run
curl http://localhost:11434/v1/models(for Ollama) or check the “Start Server” button in LM Studio. - Check the Port: Ensure the port in Pipit matches the server (11434 for Ollama, 1234 for LM Studio).
Model Not Found
- Spelling: The model name must be exact. In Ollama, run
ollama listto see the exact names. - Is it loaded? In LM Studio, you must explicitly load the model into memory before starting the server.
Slow Processing
- Reduce Model Size: If you have 8GB of RAM, avoid 8B+ models. Stick to “3B” or smaller.
- GPU Offloading: In LM Studio, ensure GPU offloading is enabled for Apple Silicon (M1/M2/M3/M4) to utilize the Neural Engine.
- Background Apps: Close memory-heavy apps like Chrome or Photoshop if you experience lag.
