Get in Touch

Course Outline

AI Sovereignty and Local LLM Deployment

  • Challenges of cloud LLMs: data retention policies, usage for training, and foreign jurisdiction issues.
  • Ollama architecture: model server, registry, and OpenAI-compatible API integration.
  • Comparing Ollama with vLLM, llama.cpp, and Text Generation Inference.
  • Understanding model licensing for Llama, Mistral, Qwen, and Gemma.

Installation and Hardware Configuration

  • Installing Ollama on Linux with CUDA and ROCm support.
  • CPU-only fallback options and AVX/AVX2 optimization techniques.
  • Deploying via Docker and configuring persistent volume mappings.
  • Setting up multi-GPU environments and managing VRAM allocation.

Model Management

  • Downloading models from the Ollama registry, e.g., 'ollama pull llama3'.
  • Importing GGUF models from HuggingFace and TheBloke.
  • Understanding quantization levels: Q4_K_M, Q5_K_M, and Q8_0 trade-offs.
  • Switching between models and understanding concurrent loading limits.

Custom Modelfiles

  • Writing Modelfile syntax using FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
  • Tuning parameters such as temperature, top_p, and repeat_penalty.
  • Engineering system prompts for specific behavioral outcomes.
  • Creating and publishing custom models to your local registry.

API Integration

  • Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
  • Handling streaming responses and enabling JSON mode.
  • Integrating local LLMs with LangChain, LlamaIndex, and custom applications.
  • Implementing authentication and rate limiting using a reverse proxy.

Performance Optimization

  • Managing context window sizing and KV cache efficiency.
  • Handling batch inference and parallel requests.
  • Allocating CPU threads and understanding NUMA (Non-Uniform Memory Access) awareness.
  • Monitoring GPU utilization and memory pressure in real-time.

Security and Compliance

  • Ensuring network isolation for model serving endpoints.
  • Implementing input filtering and output moderation pipelines.
  • Maintaining audit logs for prompts and completions.
  • Verifying model provenance and hash integrity.

Requirements

  • Intermediate knowledge of Linux and container administration.
  • High-level understanding of machine learning concepts and transformer models.
  • Familiarity with REST APIs and JSON data formats.

Target Audience

  • AI engineers and developers seeking alternatives to cloud LLM APIs.
  • Organizations handling sensitive data that cannot be stored in the cloud.
  • Government and defense teams requiring air-gapped, secure language models.
 14 Hours

Related Categories