Documentation

Technical Resources

Full technical documentation, deployment guides, and configuration references provided during evaluation.

Architecture

System design, memory tiers, policy engine

Deployment

Installation, configuration, production setup

API & CLI

Commands, metrics, telemetry endpoints

Request Access

Hardware Platforms

NVIDIA CUDA
RTX series, A-series, H100, H200
AMD ROCm
MI series accelerators
Intel Gaudi / Xeon
AI accelerators, CPU
Google TPU
v4, v5 pods
Qualcomm AI
Edge accelerators
Apple Silicon
M-series processors
CPU Servers
x86, ARM architectures
Custom Hardware
Additional platforms on request

Inference Backends

PyTorch
Supported
Native inference
vLLM
Supported
PagedAttention optimization
llama.cpp
Supported
GGUF models, CPU/GPU
TensorRT-LLM
Roadmap
NVIDIA optimization
Triton
Roadmap
NVIDIA inference server
Ollama
Roadmap
Developer tooling
HuggingFace Transformers
Roadmap
Direct library integration
SGLang
Roadmap
Structured generation
MLC LLM
Roadmap
Universal deployment
ExLlamaV2
Roadmap
GPTQ inference
Custom Backends
On Request
Additional engines and integrations