How to Set Up and Run Local AI Models Using Ollama and LM Studio

Learn how to run LLM locally with Ollama and LM Studio. Step-by-step setup for Windows, Mac & Linux — private, fast, and cloud-free AI on your own hardware.

Nuhin Jakaria

June 2026
•
11 mins read

How to Set Up and Run Local AI Models Using Ollama and LM Studio

Every time you type something into a cloud AI tool, that data travels to a server you don't control.

It gets processed under laws that may not protect your privacy the way Canadian regulations require - and the companies running those servers aren't always transparent about what happens next.

If you've been searching for a guide on how to run LLM locally, you're thinking about this the right way. Running your own AI models means your prompts never leave your machine.

Quick definition: An LLM (large language model) is the AI technology behind tools like ChatGPT. Running one "locally" means it runs directly on your computer - not on a remote server in another country.

This guide covers two tools - Ollama and LM Studio - from installation through first use. It also covers the hardware basics and the security risks most setup tutorials skip entirely.

Getting your local AI stack running is one side of the equation. Understanding the threats around AI tools is the other. Our fully online Cybersecurity Fundamentals (AI Threats) course gives Canadian professionals a practical foundation in AI-specific risks - flexible, self-paced, with same-day certification available.

Quick Comparison: Ollama vs LM Studio

Not sure which tool to use? Here's the short version before you read any further.

Feature	Ollama	LM Studio
Interface	Command line (terminal)	Visual desktop app - no coding needed
Best For	Developers, API integration	Beginners, quick setup
Platforms	Windows, Mac, Linux	Windows, Mac, Linux
Model Browser	No - type commands to download	Yes - built-in browser, click to download
OpenAI API	Yes	Yes (Developer Mode)
Skill Level	Intermediate	Beginner-friendly
Setup Time	~5 minutes	~5 minutes

Both tools use the same underlying engine and support the same models. The only difference is the experience: terminal vs. app.

Why Move Your AI Stack Offline?

The Cost Problem With Cloud AI

Cloud AI subscriptions add up fast. One seat of ChatGPT Plus or Claude Pro costs around USD $20–25 per month. For a 20-person team, that's $400–500 every month - before you factor in rate limits that interrupt your work at peak hours.

Local AI has no subscription fee and no rate limits. Your hardware is the ceiling, and you own it.

Canadian Privacy Law - The Short Version

Canada has strict rules about how personal data gets processed - including through AI tools.

The short version: if your team is using a US-hosted AI tool to handle employee records, client information, or confidential documents, that data may be accessible to US authorities under American law. It's a compliance risk that catches many Canadian organizations off guard.

Running AI locally keeps data on your machine. It never crosses a border.

Want the specifics? Canada's federal privacy law (PIPEDA) holds organizations accountable for how personal data is processed, including by third-party AI services. Quebec's Law 25 - fully in force since September 2023 - adds stricter documentation requirements for any data processing activity. According to the IBM 2024 Cost of a Data Breach Report, the average Canadian breach now costs USD $5.13 million, above the global average. Local AI removes that particular exposure entirely.

Flowchart comparing data path for cloud AI versus local AI, showing how local models keep data on-device and avoid cross-border data transfer risks

Hardware Blueprint: What You Need to Run LLM Locally

You don't need enterprise-grade hardware. A modern consumer laptop or desktop with 16GB of RAM and a mid-range GPU can run capable 7B-parameter models at a conversational pace.

The VRAM Rule: Matching Models to Your Machine

The simplest estimation: multiply the model's parameter count by 0.5 to get approximate VRAM requirements at Q4 quantization. A 7B-parameter model needs roughly 4–5GB of GPU memory; a 13B model needs 7–9GB.

Colour-coded hardware requirements table for running local LLMs, showing VRAM and RAM needed for 7B, 13B, 34B, and 70B parameter models at Q4_K_M quantization

If your GPU falls short, both Ollama and LM Studio support partial GPU offloading - model layers are split between GPU and CPU. Inference slows down but remains usable for non-latency-sensitive tasks.

Understanding Quantization: Why Q4_K_M Is the Sweet Spot

Quantization compresses a model's weights by reducing their numerical precision. A full-precision (FP16) model is accurate but large. A quantized model trades a small quality margin for a substantial reduction in file size and memory footprint.

The standard format for local inference is GGUF, maintained by the llama.cpp project. Within the GGUF ecosystem, Q4_K_M has become the community default. The "K" refers to K-quants - a technique that applies higher precision selectively to the model layers most sensitive to accuracy loss. The "_M" indicates a medium tradeoff between compression and output quality. For most users, Q4_K_M delivers near-FP16 quality at roughly 30–35% of the original file size.

If storage isn't a constraint and output quality matters - for code generation or detailed document analysis - Q8_0 is the next tier up. Avoid Q2_K for serious work; quality degradation becomes noticeable in longer outputs. Once you understand quantization, you have everything you need to know how to run LLM locally on almost any modern machine.

Method 1: Setting Up Ollama (The Developer-First Choice)

The first way to run LLM locally is through Ollama - an open-source framework with a workflow that will feel familiar to anyone who has used Docker. It has crossed 100,000 GitHub stars and supports over 200 model families through its public library. For developers who want API integration, a clean CLI, or programmatic control over model management, it's the natural starting point.

Terminal window showing Ollama pulling the Llama 3 model and launching an interactive local AI session on a macOS machine

Installation: Windows, Mac, and Linux

Download the native installer from ollama.com. On Linux, a single curl command handles the full setup:

curl -fsSL https://ollama.com/install.sh | sh

macOS and Windows users get a native application installer. After installation, confirm it's running:

ollama --version

Command Line Essentials: Pull, Run, List

Llama 3 8B is a strong starting model - capable, widely supported, and manageable in size. To download and launch it:

ollama pull llama3

ollama run llama3

ollama pull downloads the model weights to your local machine. ollama run opens an interactive terminal session. To view all downloaded models:

ollama list

To end a session from the terminal, type /bye. To run a one-shot prompt non-interactively:

ollama run llama3 "Summarize the key points of Canadian PIPEDA compliance"

Activating Ollama's OpenAI-Compatible API Layer

Ollama exposes a local REST API at http://localhost:11434 that mirrors the OpenAI API structure. Any tool built against OpenAI's API - LangChain, Open-WebUI, Cursor, Continue.dev - can be redirected to your local Ollama instance with a single endpoint change and no API key required.

This makes Ollama a practical drop-in backend for existing developer workflows, with no application code rewriting.

Method 2: Setting Up LM Studio

The second approach to run LLM locally is LM Studio - a visual desktop app that handles everything through a graphical interface.

No terminal. No commands. Click, download, and chat.

LM Studio desktop application model browser showing model options with file sizes and VRAM compatibility indicators

Finding and Downloading Models

LM Studio's Discover tab connects directly to Hugging Face - the largest public repository of AI models. Search for a model name, and you'll see all available file sizes and compression options listed clearly.

The interface flags models that won't fit your available VRAM before you download. Stick to models from recognized publishers: Meta, Google, Mistral AI, or Microsoft.

The GPU Offload Slider

Once a model loads, you'll see a slider that controls how many layers run on your GPU vs. your CPU.

More GPU layers means faster responses. If the model won't load, reduce the slider until it does. Start at maximum and work backwards only if needed.

Chatting With Your Own Documents

LM Studio lets you drop PDFs directly into the chat. The model reads your document and responds based on its contents - entirely on your machine, with nothing sent to any server.

For Canadian teams analyzing internal contracts, HR policies, or compliance documents, this is the standout use case for local AI.

Quick definition: RAG (Retrieval-Augmented Generation) means the AI answers questions based on documents you provide - not just its general training. Think of it as asking the AI "based only on this file, what does it say about X?"

The Security Paradox: Are Local AI Models Actually Secure?

Running LLM locally shifts your risk profile - it significantly reduces cloud exposure, but it introduces different threats that most setup guides quietly omit. Understanding these isn't optional if you're deploying local AI in an organizational context.

Supply Chain Risks: Malicious GGUF Model Weights

Model weight files are binary data. Like any binary downloaded from the internet, they can be tampered with. In 2024, JFrog Security researchers identified models hosted on Hugging Face capable of spawning reverse shells on the host machine at load time - disguised as legitimate GGUF files. Only download models from the official release pages of organizations like Meta, Google, Mistral AI, or Microsoft, and cross-reference SHA-256 checksums against publisher release notes where available.

This type of attack - weaponizing AI model distribution as an infection vector - is one of several AI-specific threat patterns examined in our fully online Cybersecurity Fundamentals (AI Threats) course. It's a practical program built for Canadian professionals navigating AI adoption, covering how attackers target AI infrastructure and what defenses apply at both the individual and organizational level.

Prompt Injection and Code Execution on Host Machines

If your local model is connected to tools - file system access, shell commands, web browsing, email - prompt injection becomes a real attack surface. A malicious document processed by a tool-enabled agent could instruct the model to execute arbitrary commands on the host machine. The mitigation is architectural: limit tool permissions to the minimum necessary scope, and confine agent file access to specific designated directories rather than granting filesystem-wide access.

Prompt injection is one of the most underestimated entry points in AI-powered workflows. Our Cybersecurity Fundamentals (AI Threats) course covers the full attack surface - including prompt injection, model manipulation, and agentic exploitation - in a format that requires no prior security background. Fully online, self-paced, and applicable from day one.

Localhost Binding: Keeping Ports Off Your LAN

By default, Ollama binds to 127.0.0.1 - accessible only from the local machine. In 2024, Wiz Research disclosed CVE-2024-37032 ("Probllama"), a path traversal vulnerability in Ollama prior to v0.1.34 that allowed remote code execution when the server was exposed to a network interface. The vulnerability was patched, but the principle holds: never bind your local AI API to 0.0.0.0 without a reverse proxy (Nginx or Caddy) and an authentication layer in front of it.

Network diagram comparing safe localhost binding at 127.0.0.1 versus risky 0.0.0.0 binding that exposes local AI server to LAN and internet, with recommended reverse proxy and authentication configuration

Conclusion: Performance, Privacy, and a Defensible Setup

Knowing how to run LLM locally is no longer a niche skill for the technically adventurous - it's a practical infrastructure decision with real compliance and cost arguments behind it. The decision on how to run LLM locally comes down to two variables: your comfort with the command line, and how tightly you need to integrate with existing developer tooling. Knowing how to run AI models locally is one thing; making those decisions defensibly within a Canadian regulatory context is another.

If you're comfortable with a terminal and want tight developer tooling integration, Ollama is the right starting point. If you want a visual interface with minimal setup friction, LM Studio gets you running in under ten minutes. Either path gives you a private, cost-controlled way to run LLM locally - start with a 7B Q4_K_M model, source it from a verified publisher, bind your API to localhost, and put a reverse proxy in front of any multi-user deployment.

If this guide has raised questions about how AI threats fit into your organization's broader security posture, our fully online Cybersecurity Fundamentals (AI Threats) course is the structured next step. Built for Canadian professionals, it covers AI-specific threat categories, practical defensive frameworks, and real-world scenarios - all at your own pace, with certification you can apply immediately.

Your data is yours. Your infrastructure is defensible. Keep it that way.

Item added to your cart

How to Set Up and Run Local AI Models Using Ollama and LM Studio

Nuhin Jakaria

Quick Comparison: Ollama vs LM Studio

Why Move Your AI Stack Offline?

The Cost Problem With Cloud AI

Canadian Privacy Law - The Short Version

Hardware Blueprint: What You Need to Run LLM Locally

The VRAM Rule: Matching Models to Your Machine

Understanding Quantization: Why Q4_K_M Is the Sweet Spot

Method 1: Setting Up Ollama (The Developer-First Choice)

Installation: Windows, Mac, and Linux

Command Line Essentials: Pull, Run, List

Activating Ollama's OpenAI-Compatible API Layer

Method 2: Setting Up LM Studio

Finding and Downloading Models

The GPU Offload Slider

Chatting With Your Own Documents

The Security Paradox: Are Local AI Models Actually Secure?

Supply Chain Risks: Malicious GGUF Model Weights

Prompt Injection and Code Execution on Host Machines

Localhost Binding: Keeping Ports Off Your LAN

Conclusion: Performance, Privacy, and a Defensible Setup

Leave a Comment

Country/region

How to Set Up and Run Local AI Models Using Ollama and LM Studio

Nuhin Jakaria

Quick Comparison: Ollama vs LM Studio

Why Move Your AI Stack Offline?

The Cost Problem With Cloud AI

Canadian Privacy Law - The Short Version

Hardware Blueprint: What You Need to Run LLM Locally

The VRAM Rule: Matching Models to Your Machine

Understanding Quantization: Why Q4_K_M Is the Sweet Spot

Method 1: Setting Up Ollama (The Developer-First Choice)

Installation: Windows, Mac, and Linux

Command Line Essentials: Pull, Run, List

Activating Ollama's OpenAI-Compatible API Layer

Method 2: Setting Up LM Studio

Finding and Downloading Models

The GPU Offload Slider

Chatting With Your Own Documents

The Security Paradox: Are Local AI Models Actually Secure?

Supply Chain Risks: Malicious GGUF Model Weights

Prompt Injection and Code Execution on Host Machines

Localhost Binding: Keeping Ports Off Your LAN

Conclusion: Performance, Privacy, and a Defensible Setup

Leave a Comment