Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and Local LLM Deployment
- Challenges of cloud LLMs: data retention policies, usage for training, and foreign jurisdiction issues.
- Ollama architecture: model server, registry, and OpenAI-compatible API integration.
- Comparing Ollama with vLLM, llama.cpp, and Text Generation Inference.
- Understanding model licensing for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Configuration
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization techniques.
- Deploying via Docker and configuring persistent volume mappings.
- Setting up multi-GPU environments and managing VRAM allocation.
Model Management
- Downloading models from the Ollama registry, e.g., 'ollama pull llama3'.
- Importing GGUF models from HuggingFace and TheBloke.
- Understanding quantization levels: Q4_K_M, Q5_K_M, and Q8_0 trade-offs.
- Switching between models and understanding concurrent loading limits.
Custom Modelfiles
- Writing Modelfile syntax using FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
- Tuning parameters such as temperature, top_p, and repeat_penalty.
- Engineering system prompts for specific behavioral outcomes.
- Creating and publishing custom models to your local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Handling streaming responses and enabling JSON mode.
- Integrating local LLMs with LangChain, LlamaIndex, and custom applications.
- Implementing authentication and rate limiting using a reverse proxy.
Performance Optimization
- Managing context window sizing and KV cache efficiency.
- Handling batch inference and parallel requests.
- Allocating CPU threads and understanding NUMA (Non-Uniform Memory Access) awareness.
- Monitoring GPU utilization and memory pressure in real-time.
Security and Compliance
- Ensuring network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Maintaining audit logs for prompts and completions.
- Verifying model provenance and hash integrity.
Requirements
- Intermediate knowledge of Linux and container administration.
- High-level understanding of machine learning concepts and transformer models.
- Familiarity with REST APIs and JSON data formats.
Target Audience
- AI engineers and developers seeking alternatives to cloud LLM APIs.
- Organizations handling sensitive data that cannot be stored in the cloud.
- Government and defense teams requiring air-gapped, secure language models.
14 Hours