Question 1

What is Sector88?

Accepted Answer

Sector88 is an AI inference platform for constrained hardware. We provide the operating layer between AI models and the systems they run on. S88 Runtime is the inference engine that manages memory orchestration across GPU VRAM, system RAM, and storage so models run reliably without crashing. It supports multiple inference backends (vLLM, llama.cpp), multiple hardware vendors (NVIDIA, AMD, Intel, CPU-only), and runs fully offline. S88 Hub is the management and control plane with real-time monitoring, performance analytics, fleet management, and Prometheus integration. Together they form a complete system for deploying, managing, and scaling AI inference in air-gapped, edge, and sovereign environments.

Question 2

What are S88 Runtime and S88 Hub?

Accepted Answer

S88 Runtime is the inference engine. It sits between AI models and hardware, managing memory orchestration across VRAM, RAM, and SSD so models that would normally crash run reliably on constrained systems. Supports multiple inference backends (vLLM, llama.cpp) and hardware vendors (NVIDIA, AMD, Intel, CPU-only). Ships as a Docker container with an OpenAI-compatible API. Zero external dependencies. S88 Hub is the operational control plane with real-time monitoring, performance analysis, fleet management for multi-node deployments, and built-in Prometheus integration.

Question 3

How is Sector88 different from vLLM, Triton, Ollama, or llama.cpp?

Accepted Answer

Sector88 works with these engines, not against them. While vLLM, Triton, Ollama, and llama.cpp optimize for throughput, Sector88 focuses on the operational layer: safe defaults, telemetry, and reliability controls for constrained and regulated environments.

Question 4

How much does Sector88 cost?

Accepted Answer

Sector88 offers tiered access. Contact us for current pricing and deployment options tailored to your environment.

Question 5

What counts as a node?

Accepted Answer

A node is any machine running S88 Runtime - server, workstation, edge device, or VM. One license key per node. Contact us for volume licensing for 10+ nodes.

Question 6

How can I partner or pilot with Sector88?

Accepted Answer

We partner with integrators, OEMs, hardware vendors, and enterprise resellers. Pilot programs include technical evaluation on your infrastructure, baseline performance analysis, deployment support, and access to S88 for testing. Contact partnerships@sector88.co to discuss opportunities.

Question 7

What does OOM crash mean?

Accepted Answer

OOM (Out Of Memory) crashes happen when GPU VRAM is exhausted during inference, killing the process mid-request. S88 prevents this through intelligent memory orchestration and graceful degradation - the system stays operational under resource pressure instead of crashing.

Question 8

What hardware does S88 support?

Accepted Answer

S88 is hardware-agnostic. NVIDIA CUDA, AMD ROCm, Intel Gaudi/Xeon, Apple Silicon, and CPU-only servers. If it can run an inference engine, S88 runs on it.

Question 9

Is S88 available now?

Accepted Answer

Yes. S88 is production-ready and available now. Request access and we will get you set up.

Question 10

Does it really prevent OOM crashes?

Accepted Answer

Yes. S88 never terminates on OOM. Back-pressure queuing, context clipping, and intelligent eviction keep the service running under memory pressure. The system stays responsive with zero downtime from memory constraints.

Question 11

Can S88 run in air-gapped environments?

Accepted Answer

Yes. S88 is purpose-built for air-gapped and sovereign deployments. Zero external dependencies at runtime. Models load from local filesystem. Telemetry stays local via Prometheus. All data remains on-premises. Designed for SCIF, classified, and offline environments.

Question 12

What models can I run?

Accepted Answer

Any LLM supported by vLLM or llama.cpp. This includes Llama, Mistral, Qwen, DeepSeek, and thousands of HuggingFace models. Both HuggingFace format and GGUF quantized models are supported.

Question 13

What's included in each tier?

Accepted Answer

Community includes full S88 Runtime access with CPU inference and models up to 7B. Pro includes GPU acceleration, unlimited model sizes, Hub dashboard, and automated benchmark reports. Enterprise includes dedicated deployment support, infrastructure assessment, and custom SLAs.

Question 14

How long does deployment take?

Accepted Answer

Under a minute. Pull the Docker image, run it, and you have an inference API. Full production deployment with custom configuration typically takes hours, not weeks.

Question 15

What about compliance and security?

Accepted Answer

S88 is built for regulated environments. Zero prompt logging by default. Audit-ready telemetry without content exposure. Configurable retention policies. Complete offline operation. Designed for HIPAA, PCI-DSS, FedRAMP, and defense compliance requirements.

Question 16

Do you support multi-GPU or distributed setups?

Accepted Answer

S88 scales from single GPUs to distributed clusters. Multi-GPU tensor parallelism and multi-node deployments are supported. Fleet management capabilities available through S88 Hub.

Question 17

What telemetry and monitoring is included?

Accepted Answer

Built-in Prometheus metrics, structured event logs, and performance tracking including tokens/sec, latency, TTFT, VRAM, RAM, GPU temperature, power, and utilization. S88 Hub provides real-time dashboards, SLO tracking, and professional scorecard generation with PDF and CSV export.

Question 18

Can I try Sector88 before committing?

Accepted Answer

Yes. Request access and we will work with you to evaluate S88 on your hardware and use case. No commitment required.

Frequently Asked Questions