Every time you type something into a cloud AI tool, that data travels to a server you don't control.
It gets processed under laws that may not protect your privacy the way Canadian regulations require - and the companies running those servers aren't always transparent about what happens next.
If you've been searching for a guide on how to run LLM locally, you're thinking about this the right way. Running your own AI models means your prompts never leave your machine.
Quick definition: An LLM (large language model) is the AI technology behind tools like ChatGPT. Running one "locally" means it runs directly on your computer - not on a remote server in another country.
This guide covers two tools - Ollama and LM Studio - from installation through first use. It also covers the hardware basics and the security risks most setup tutorials skip entirely.
Getting your local AI stack running is one side of the equation. Understanding the threats around AI tools is the other. Our fully online Cybersecurity Fundamentals (AI Threats) course gives Canadian professionals a practical foundation in AI-specific risks - flexible, self-paced, with same-day certification available.
Quick Comparison: Ollama vs LM Studio
Not sure which tool to use? Here's the short version before you read any further.
|
Feature |
Ollama |
LM Studio |
|
Interface |
Command line (terminal) |
Visual desktop app - no coding needed |
|
Best For |
Developers, API integration |
Beginners, quick setup |
|
Platforms |
Windows, Mac, Linux |
Windows, Mac, Linux |
|
Model Browser |
No - type commands to download |
Yes - built-in browser, click to download |
|
OpenAI API |
Yes |
Yes (Developer Mode) |
|
Skill Level |
Intermediate |
Beginner-friendly |
|
Setup Time |
~5 minutes |
~5 minutes |
Both tools use the same underlying engine and support the same models. The only difference is the experience: terminal vs. app.
Why Move Your AI Stack Offline?
The Cost Problem With Cloud AI
Cloud AI subscriptions add up fast. One seat of ChatGPT Plus or Claude Pro costs around USD $20–25 per month. For a 20-person team, that's $400–500 every month - before you factor in rate limits that interrupt your work at peak hours.
Local AI has no subscription fee and no rate limits. Your hardware is the ceiling, and you own it.
Canadian Privacy Law - The Short Version
Canada has strict rules about how personal data gets processed - including through AI tools.
The short version: if your team is using a US-hosted AI tool to handle employee records, client information, or confidential documents, that data may be accessible to US authorities under American law. It's a compliance risk that catches many Canadian organizations off guard.
Running AI locally keeps data on your machine. It never crosses a border.
Want the specifics? Canada's federal privacy law (PIPEDA) holds organizations accountable for how personal data is processed, including by third-party AI services. Quebec's Law 25 - fully in force since September 2023 - adds stricter documentation requirements for any data processing activity. According to the IBM 2024 Cost of a Data Breach Report, the average Canadian breach now costs USD $5.13 million, above the global average. Local AI removes that particular exposure entirely.

Hardware Blueprint: What You Need to Run LLM Locally
You don't need enterprise-grade hardware. A modern consumer laptop or desktop with 16GB of RAM and a mid-range GPU can run capable 7B-parameter models at a conversational pace.
The VRAM Rule: Matching Models to Your Machine
The simplest estimation: multiply the model's parameter count by 0.5 to get approximate VRAM requirements at Q4 quantization. A 7B-parameter model needs roughly 4–5GB of GPU memory; a 13B model needs 7–9GB.

Understanding Quantization: Why Q4_K_M Is the Sweet Spot
Quantization compresses a model's weights by reducing their numerical precision. A full-precision (FP16) model is accurate but large. A quantized model trades a small quality margin for a substantial reduction in file size and memory footprint.
The standard format for local inference is GGUF, maintained by the llama.cpp project. Within the GGUF ecosystem, Q4_K_M has become the community default. The "K" refers to K-quants - a technique that applies higher precision selectively to the model layers most sensitive to accuracy loss. The "_M" indicates a medium tradeoff between compression and output quality. For most users, Q4_K_M delivers near-FP16 quality at roughly 30–35% of the original file size.
If storage isn't a constraint and output quality matters - for code generation or detailed document analysis - Q8_0 is the next tier up. Avoid Q2_K for serious work; quality degradation becomes noticeable in longer outputs. Once you understand quantization, you have everything you need to know how to run LLM locally on almost any modern machine.
Method 1: Setting Up Ollama (The Developer-First Choice)
The first way to run LLM locally is through Ollama - an open-source framework with a workflow that will feel familiar to anyone who has used Docker. It has crossed 100,000 GitHub stars and supports over 200 model families through its public library. For developers who want API integration, a clean CLI, or programmatic control over model management, it's the natural starting point.

Installation: Windows, Mac, and Linux
Download the native installer from ollama.com. On Linux, a single curl command handles the full setup:
curl -fsSL https://ollama.com/install.sh | sh
macOS and Windows users get a native application installer. After installation, confirm it's running:
ollama --version
Command Line Essentials: Pull, Run, List
Llama 3 8B is a strong starting model - capable, widely supported, and manageable in size. To download and launch it:
ollama pull llama3
ollama run llama3
ollama pull downloads the model weights to your local machine. ollama run opens an interactive terminal session. To view all downloaded models:
ollama list
To end a session from the terminal, type /bye. To run a one-shot prompt non-interactively:
ollama run llama3 "Summarize the key points of Canadian PIPEDA compliance"
Activating Ollama's OpenAI-Compatible API Layer
Ollama exposes a local REST API at http://localhost:11434 that mirrors the OpenAI API structure. Any tool built against OpenAI's API - LangChain, Open-WebUI, Cursor, Continue.dev - can be redirected to your local Ollama instance with a single endpoint change and no API key required.
This makes Ollama a practical drop-in backend for existing developer workflows, with no application code rewriting.
Method 2: Setting Up LM Studio
The second approach to run LLM locally is LM Studio - a visual desktop app that handles everything through a graphical interface.
No terminal. No commands. Click, download, and chat.

Finding and Downloading Models
LM Studio's Discover tab connects directly to Hugging Face - the largest public repository of AI models. Search for a model name, and you'll see all available file sizes and compression options listed clearly.
The interface flags models that won't fit your available VRAM before you download. Stick to models from recognized publishers: Meta, Google, Mistral AI, or Microsoft.
The GPU Offload Slider
Once a model loads, you'll see a slider that controls how many layers run on your GPU vs. your CPU.
More GPU layers means faster responses. If the model won't load, reduce the slider until it does. Start at maximum and work backwards only if needed.
Chatting With Your Own Documents
LM Studio lets you drop PDFs directly into the chat. The model reads your document and responds based on its contents - entirely on your machine, with nothing sent to any server.
For Canadian teams analyzing internal contracts, HR policies, or compliance documents, this is the standout use case for local AI.
Quick definition: RAG (Retrieval-Augmented Generation) means the AI answers questions based on documents you provide - not just its general training. Think of it as asking the AI "based only on this file, what does it say about X?"
The Security Paradox: Are Local AI Models Actually Secure?
Running LLM locally shifts your risk profile - it significantly reduces cloud exposure, but it introduces different threats that most setup guides quietly omit. Understanding these isn't optional if you're deploying local AI in an organizational context.
Supply Chain Risks: Malicious GGUF Model Weights
Model weight files are binary data. Like any binary downloaded from the internet, they can be tampered with. In 2024, JFrog Security researchers identified models hosted on Hugging Face capable of spawning reverse shells on the host machine at load time - disguised as legitimate GGUF files. Only download models from the official release pages of organizations like Meta, Google, Mistral AI, or Microsoft, and cross-reference SHA-256 checksums against publisher release notes where available.
This type of attack - weaponizing AI model distribution as an infection vector - is one of several AI-specific threat patterns examined in our fully online Cybersecurity Fundamentals (AI Threats) course. It's a practical program built for Canadian professionals navigating AI adoption, covering how attackers target AI infrastructure and what defenses apply at both the individual and organizational level.
Prompt Injection and Code Execution on Host Machines
If your local model is connected to tools - file system access, shell commands, web browsing, email - prompt injection becomes a real attack surface. A malicious document processed by a tool-enabled agent could instruct the model to execute arbitrary commands on the host machine. The mitigation is architectural: limit tool permissions to the minimum necessary scope, and confine agent file access to specific designated directories rather than granting filesystem-wide access.
Prompt injection is one of the most underestimated entry points in AI-powered workflows. Our Cybersecurity Fundamentals (AI Threats) course covers the full attack surface - including prompt injection, model manipulation, and agentic exploitation - in a format that requires no prior security background. Fully online, self-paced, and applicable from day one.
Localhost Binding: Keeping Ports Off Your LAN
By default, Ollama binds to 127.0.0.1 - accessible only from the local machine. In 2024, Wiz Research disclosed CVE-2024-37032 ("Probllama"), a path traversal vulnerability in Ollama prior to v0.1.34 that allowed remote code execution when the server was exposed to a network interface. The vulnerability was patched, but the principle holds: never bind your local AI API to 0.0.0.0 without a reverse proxy (Nginx or Caddy) and an authentication layer in front of it.

Conclusion: Performance, Privacy, and a Defensible Setup
Knowing how to run LLM locally is no longer a niche skill for the technically adventurous - it's a practical infrastructure decision with real compliance and cost arguments behind it. The decision on how to run LLM locally comes down to two variables: your comfort with the command line, and how tightly you need to integrate with existing developer tooling. Knowing how to run AI models locally is one thing; making those decisions defensibly within a Canadian regulatory context is another.
If you're comfortable with a terminal and want tight developer tooling integration, Ollama is the right starting point. If you want a visual interface with minimal setup friction, LM Studio gets you running in under ten minutes. Either path gives you a private, cost-controlled way to run LLM locally - start with a 7B Q4_K_M model, source it from a verified publisher, bind your API to localhost, and put a reverse proxy in front of any multi-user deployment.
If this guide has raised questions about how AI threats fit into your organization's broader security posture, our fully online Cybersecurity Fundamentals (AI Threats) course is the structured next step. Built for Canadian professionals, it covers AI-specific threat categories, practical defensive frameworks, and real-world scenarios - all at your own pace, with certification you can apply immediately.
Your data is yours. Your infrastructure is defensible. Keep it that way.
Leave a Comment