- Add comprehensive documentation in botbook/ with 12 chapters - Add botapp/ Tauri desktop application - Add botdevice/ IoT device support - Add botlib/ shared library crate - Add botmodels/ Python ML models service - Add botplugin/ browser extension - Add botserver/ reorganized server code - Add bottemplates/ bot templates - Add bottest/ integration tests - Add botui/ web UI server - Add CI/CD workflows in .forgejo/workflows/ - Add AGENTS.md and PROD.md documentation - Add dependency management scripts (DEPENDENCIES.sh/ps1) - Remove legacy src/ structure and migrations - Clean up temporary and backup files
6.4 KiB
NVIDIA GPU Setup for LXC Containers
This guide covers setting up NVIDIA GPU passthrough for botserver running in LXC containers, enabling hardware acceleration for local LLM inference.
Prerequisites
- NVIDIA GPU (RTX 3060 or better with 12GB+ VRAM recommended)
- NVIDIA drivers installed on the host system
- LXD/LXC installed
- CUDA-capable GPU
LXD Configuration (Interactive Setup)
When initializing LXD, use these settings:
sudo lxd init
Answer the prompts as follows:
- Would you like to use LXD clustering? →
no - Do you want to configure a new storage pool? →
no(will create/generalbotslater) - Would you like to connect to a MAAS server? →
no - Would you like to create a new local network bridge? →
yes - What should the new bridge be called? →
lxdbr0 - What IPv4 address should be used? →
auto - What IPv6 address should be used? →
auto - Would you like the LXD server to be available over the network? →
no - Would you like stale cached images to be updated automatically? →
no - Would you like a YAML "lxd init" preseed to be printed? →
no
Storage Configuration
- Storage backend name: →
default - Storage backend driver: →
zfs - Create a new ZFS pool? →
yes
NVIDIA GPU Configuration
On the Host System
Create a GPU profile and attach it to your container:
# Create GPU profile
lxc profile create gpu
# Add GPU device to profile
lxc profile device add gpu gpu gpu gputype=physical
# Apply GPU profile to your container
lxc profile add gb-system gpu
Inside the Container
Configure NVIDIA driver version pinning and install drivers:
- Pin NVIDIA driver versions to ensure stability:
cat > /etc/apt/preferences.d/nvidia-drivers << 'EOF'
Package: *nvidia*
Pin: version 560.35.05-1
Pin-Priority: 1001
Package: cuda-drivers*
Pin: version 560.35.05-1
Pin-Priority: 1001
Package: libcuda*
Pin: version 560.35.05-1
Pin-Priority: 1001
Package: libxnvctrl*
Pin: version 560.35.05-1
Pin-Priority: 1001
Package: libnv*
Pin: version 560.35.05-1
Pin-Priority: 1001
EOF
- Install NVIDIA drivers and CUDA toolkit:
# Update package lists
apt update
# Install NVIDIA driver and nvidia-smi
apt install -y nvidia-driver nvidia-smi
# Add CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
# Install CUDA toolkit
apt-get update
apt-get -y install cuda-toolkit-12-8
apt-get install -y cuda-drivers
Verify GPU Access
After installation, verify GPU is accessible:
# Check GPU is visible
nvidia-smi
# Should show your GPU with driver version 560.35.05
Configure botserver for GPU
Update your bot's config.csv to use GPU acceleration:
name,value
llm-server-gpu-layers,35
The number of layers depends on your GPU memory:
- RTX 3060 (12GB): 20-35 layers
- RTX 3070 (8GB): 15-25 layers
- RTX 4070 (12GB): 30-40 layers
- RTX 4090 (24GB): 50-99 layers
Troubleshooting
GPU Not Detected
If nvidia-smi doesn't show the GPU:
-
Check host GPU drivers:
# On host nvidia-smi lxc config device list gb-system -
Verify GPU passthrough:
# Inside container ls -la /dev/nvidia* -
Check kernel modules:
lsmod | grep nvidia
Driver Version Mismatch
If you encounter driver version conflicts:
- Ensure host and container use the same driver version
- Remove the version pinning file and install matching drivers:
rm /etc/apt/preferences.d/nvidia-drivers apt update apt install nvidia-driver-560
CUDA Library Issues
If CUDA libraries aren't found:
# Add CUDA to library path
echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf
ldconfig
# Add to PATH
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
Custom llama.cpp Compilation
If you need custom CPU/GPU optimizations or specific hardware support, compile llama.cpp from source:
Prerequisites
sudo apt update
sudo apt install build-essential cmake git
Compilation Steps
# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Create build directory
mkdir build
cd build
# Configure with CUDA support
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
# Compile using all available cores
make -j$(nproc)
Compilation Options
For different hardware configurations:
# CPU-only build (no GPU)
cmake .. -DLLAMA_CURL=OFF
# CUDA with specific compute capability
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CUDA_FORCE_COMPUTE=75
# ROCm for AMD GPUs
cmake .. -DLLAMA_HIPBLAS=ON
# Metal for Apple Silicon
cmake .. -DLLAMA_METAL=ON
# AVX2 optimizations for modern CPUs
cmake .. -DLLAMA_AVX2=ON
# F16C for half-precision support
cmake .. -DLLAMA_F16C=ON
After Compilation
# Copy compiled binary to botserver
cp bin/llama-server /path/to/botserver-stack/bin/llm/
# Update config.csv to use custom build
llm-server-path,/path/to/botserver-stack/bin/llm/
Benefits of Custom Compilation
- Hardware-specific optimizations for your exact CPU/GPU
- Custom CUDA compute capabilities for newer GPUs
- AVX/AVX2/AVX512 instructions for faster CPU inference
- Reduced binary size by excluding unused features
- Support for experimental features not in releases
Performance Optimization
Memory Settings
For optimal LLM performance with GPU:
name,value
llm-server-gpu-layers,35
llm-server-mlock,true
llm-server-no-mmap,false
llm-server-ctx-size,4096
Multiple GPUs
For systems with multiple GPUs, specify which GPU to use:
# List available GPUs
lxc profile device add gpu gpu0 gpu gputype=physical id=0
lxc profile device add gpu gpu1 gpu gputype=physical id=1
Benefits of GPU Acceleration
With GPU acceleration enabled:
- 5-10x faster inference compared to CPU
- Higher context sizes possible (8K-32K tokens)
- Real-time responses even with large models
- Lower CPU usage for other tasks
- Support for larger models (13B, 30B parameters)
Next Steps
- Installation Guide - Complete botserver setup
- Quick Start - Create your first bot
- Configuration Reference - All GPU-related parameters