Skip to main content
Running AI models locally is the ultimate way to use Pipit. It ensures your data never leaves your machine, provides offline capabilities, and eliminates per-token costs.

Why Run Locally?

Privacy

Your transcriptions never leave your machine. Ideal for sensitive notes.

Offline

Works without an internet connection once models are downloaded.

Zero Cost

No API bills or subscriptions. Use it as much as your hardware allows.

Getting Started

Pipit doesn’t bundle its own LLM engine to keep the app size manageable. Instead, it connects to local “Inference Servers” using an OpenAI-compatible API. The two most popular ways to run local models on macOS are Ollama and LM Studio.
Ollama is a command-line tool that makes running models extremely simple. It’s the most efficient way to use local AI on Mac.

1. Installation

Download from ollama.com or install via Homebrew:
brew install ollama

2. Start the Server

Ollama runs as a background process. You can start it from your Applications folder or via terminal:
ollama serve
Pro Tip: For automatic startup when your Mac boots, run: brew services start ollama

3. Download a Model

Open a terminal and “pull” the model you want to use. We suggest starting with Llama 3.2:
ollama pull llama3.2

4. Configure Pipit

  1. Open Pipit SettingsAI Processing.
  2. Select Custom Endpoint as the provider.
  3. Use the following settings:
SettingValue
Endpoint URLhttp://localhost:11434/v1
API Key(Leave blank)
Model Namellama3.2
Don’t forget the /v1 at the end of the URL! This is required for OpenAI compatibility.

Option 2: LM Studio

LM Studio provides a beautiful graphical interface. Use this if you prefer finding and managing models through a UI rather than the terminal.

1. Installation

Download the macOS version from lmstudio.ai.

2. Download a Model

  1. Open LM Studio and search for Llama 3.2 or Qwen 2.5.
  2. Click Download on a version that fits your RAM (look for “Recommended”).

3. Start the Local Server

  1. Click the Developer (double chevron) icon in the sidebar.
  2. Select your downloaded model from the dropdown at the top.
  3. Click Start Server.

4. Configure Pipit

SettingValue
Endpoint URLhttp://localhost:1234/v1
API Key(Leave blank)
Model NameUse the exact name shown in LM Studio

For the best experience in Pipit (speed vs. accuracy), we recommend:
ModelSizeRAM RequiredPerformance
Llama 3.2 (3B)~2GB8GB+Fast, great for general cleanup
Qwen 2.5 (3B)~2GB8GB+Excellent at following formatting
Llama 3.1 (8B)~5GB16GB+More “intelligent” but slower
DeepSeek R1 (7B)~5GB16GB+Exceptional for technical content

Troubleshooting Local Models

Connection Refused

If Pipit says it can’t connect:
  1. Is the server running? Run curl http://localhost:11434/v1/models (for Ollama) or check the “Start Server” button in LM Studio.
  2. Check the Port: Ensure the port in Pipit matches the server (11434 for Ollama, 1234 for LM Studio).

Model Not Found

  1. Spelling: The model name must be exact. In Ollama, run ollama list to see the exact names.
  2. Is it loaded? In LM Studio, you must explicitly load the model into memory before starting the server.

Slow Processing

  1. Reduce Model Size: If you have 8GB of RAM, avoid 8B+ models. Stick to “3B” or smaller.
  2. GPU Offloading: In LM Studio, ensure GPU offloading is enabled for Apple Silicon (M1/M2/M3/M4) to utilize the Neural Engine.
  3. Background Apps: Close memory-heavy apps like Chrome or Photoshop if you experience lag.

Timeout Errors

Pipit allows up to 15 seconds for custom/local AI processing. If your local model is very slow, it might timeout. Try a smaller model or ensure your Mac is plugged into power.