A Galaxy-class starship doesn't phone home to Starfleet Command every time it needs to make a decision. It has its own computer core — autonomous, private, and always available, even in the depths of uncharted space.
Running a local LLM with Ollama gives you the same advantages:
Installing Ollama is like fitting a new isolinear chip array into the main computer. One command, and the core comes online:
# macOS / Linux — the replicator handles everything
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.com for all platforms
Verify the installation — run a Level 1 diagnostic:
ollama --version
# ollama version is 0.x.x
The computer core needs its neural pathways. Pull a model — we'll use llama3.2:3b, compact enough for most workstations yet surprisingly capable:
# Download the model weights (~2GB)
ollama pull llama3.2:3b
This downloads the model to your local storage. Like loading a new database into the ship's memory banks — it only happens once.
Let's make sure the computer core responds:
# Interactive session — like talking to the ship's computer
ollama run llama3.2:3b
>>> What is the current stardate?
If you get a coherent response, the computer core is operational. Press Ctrl+D to exit.
Now we connect our bridge console (Python) to the ship's computer. We'll use langchain as the universal interface:
pip install langchain langchain-ollama
from langchain.chat_models import init_chat_model
# Initialize the ship's computer — local, private, always ready
ship_computer = init_chat_model('ollama:llama3.2:3b', temperature=0)
# Issue a command
response = ship_computer.invoke('Computer, status report on warp core.')
print(response.content)
The temperature=0 setting makes the computer deterministic — like switching from creative mode to precise diagnostic mode. For consistent, repeatable answers, keep it at 0.
Here's where it gets powerful. The ship's computer doesn't just answer questions — it delegates to ship systems. When you ask "Computer, scan the nearby planet", it doesn't generate a made-up scan. It routes the command to the actual sensor array (an MCP tool).
The flow works like this:
# Example: the ship's computer deciding to use a tool
from langchain.chat_models import init_chat_model
ship_computer = init_chat_model('ollama:llama3.2:3b', temperature=0)
# When connected to MCP tools, the computer can:
# - Read sensor data (Resources)
# - Fire phasers / execute actions (Tools)
# - Run pre-programmed maneuvers (Prompts)
#
# The LLM is the BRAIN. MCP servers are the BODY.
Before connecting the computer to live ship systems, test your MCP servers with the MCP Inspector — the engineering diagnostic console:
# Launch the inspector — opens a web UI for testing
mcp dev your_server.py
The inspector lets you:
Think of it as running a full systems diagnostic from Main Engineering before taking the ship to warp.
The computer core is online. In the next log entry, we'll initialize the warp core — build a full MCP server with FastMCP, configure transports, set up logging, and prepare for deployment. The away team is about to beam down.
Engage.