Ollama: Self-Hosted Large Language Models Replacing OpenAI and Claude APIs Training Course

Ollama is an open-source tool for running large language models locally on consumer and enterprise hardware. It abstracts model quantization, GPU allocation, and API serving into a single command-line interface, enabling organizations to self-host LLMs like Llama, Mistral, and Qwen without sending prompts or data to OpenAI, Anthropic, or Google.

This instructor-led, live training (online or onsite) is aimed at intermediate AI engineers and platform operators who wish to use Ollama to replace cloud LLM APIs with self-hosted, sovereign language model inference.

By the end of this training, participants will be able to:

Install Ollama on Linux, macOS, and Windows with GPU support.
Pull, quantize, and serve models from the Ollama registry and HuggingFace.
Build custom Modelfiles with system prompts and parameter tuning.
Integrate local LLMs with applications via the OpenAI-compatible API.
Optimize inference performance for CPU-only and multi-GPU setups.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Kenya or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

AI Sovereignty and LLM Local Deployment

Risks of cloud LLMs: data retention, training on inputs, foreign jurisdiction.
Ollama architecture: model server, registry, and OpenAI-compatible API.
Comparison with vLLM, llama.cpp, and Text Generation Inference.
Model licensing: Llama, Mistral, Qwen, and Gemma terms.

Installation and Hardware Setup

Installing Ollama on Linux with CUDA and ROCm support.
CPU-only fallback and AVX/AVX2 optimization.
Docker deployment and persistent volume mapping.
Multi-GPU setup and VRAM allocation strategies.

Model Management

Pulling models from the Ollama registry: ollama pull llama3.
Importing GGUF models from HuggingFace and TheBloke.
Quantization levels: Q4_K_M, Q5_K_M, Q8_0 tradeoffs.
Model switching and concurrent model loading limits.

Custom Modelfiles

Writing Modelfile syntax: FROM, PARAMETER, SYSTEM, TEMPLATE.
Temperature, top_p, and repeat_penalty tuning.
System prompt engineering for role-specific behavior.
Creating and publishing custom models to local registry.

API Integration

OpenAI-compatible /v1/chat/completions endpoint.
Streaming responses and JSON mode.
Integrating with LangChain, LlamaIndex, and custom apps.
Authentication and rate limiting with reverse proxy.

Performance Optimization

Context window sizing and KV cache management.
Batch inference and parallel request handling.
CPU thread allocation and NUMA awareness.
Monitoring GPU utilization and memory pressure.

Security and Compliance

Network isolation for model serving endpoints.
Input filtering and output moderation pipelines.
Audit logging of prompts and completions.
Model provenance and hash verification.

Requirements

Intermediate Linux and container administration.
Understanding of machine learning and transformer models at high level.
Familiarity with REST APIs and JSON.

Audience

AI engineers and developers replacing cloud LLM APIs.
Organizations with data sensitivity preventing cloud model usage.
Government and defense teams requiring air-gapped language models.

14 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Ollama: Self-Hosted Large Language Models Replacing OpenAI and Claude APIs Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Ollama: Self-Hosted Large Language Models Replacing OpenAI and Claude APIs Training Course

Course Outline

Requirements

Related Courses

Advanced Ollama Model Debugging & Evaluation

Building Private AI Workflows with Ollama

Deploying and Optimizing LLMs with Ollama

EXO: End-to-End Local AI Cluster Deployment

EXO for DevOps: Building Private AI Infrastructure

EXO Security and Governance: Offline Model Management

Fine-Tuning and Customizing AI Models on Ollama

Secure Local Agentic AI: On-Prem Ollama Development for Regulated Industries

Multimodal Applications with Ollama

Getting Started with Ollama: Running Local AI Models

Ollama & Data Privacy: Secure Deployment Patterns

Ollama Applications in Finance

Ollama Applications in Healthcare

Ollama for Responsible AI and Governance

Sovereign AI for Regulated Organizations: Controlling Data, Models and Inference Environments

Related Categories

Ollama

AI Sovereignty

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites