NEROX/Multi-GPU
Solver

Multi-GPU

Distribute independent annealing runs across 2, 4, or 8 GPUs for ensemble quality and tighter optimality gaps on the same wall-clock budget.

How Multi-GPU works

Each GPU runs a fully independent set of annealing chains with different random seeds and potentially different cooling schedules. After all GPUs complete, results are aggregated and the global best solution is returned. This is embarrassingly parallel — no inter-GPU communication during solving, only at the start (Q matrix broadcast) and end (result collection).

Because the chains are independent, solution quality improves approximately as the best-of-N order statistic: with N GPUs, you sample N× more independent restarts and the best-found solution improves, especially when the energy landscape has many deep local minima.

Usage

python
import nerox

client = nerox.Client()

job = client.optimize.qubo(
    Q=Q_matrix,
    solver="multi-gpu",
    n_gpus=4,              # 1 | 2 | 4 | 8
    n_runs=512,            # runs per GPU (512 × 4 = 2048 total)
    n_sweeps=20000,
)

result = job.wait()
print(f"Solution quality: {result.objective}")
print(f"GPUs used: {result.n_gpus}")
print(f"GPU-seconds billed: {result.gpu_seconds:.1f}")

Cost model

Multi-GPU jobs are billed by total GPU-seconds: a 4-GPU job that runs for 30 seconds bills 120 GPU-seconds. This is identical in cost to running a single-GPU job for 120 seconds — but Multi-GPU achieves better solutions in less wall-clock time for quality-sensitive workloads.

When Multi-GPU outperforms single-GPU

GPUsAvg gap (n=1000)Wall time
1 GPU0.42%60s
2 GPUs0.31%60s
4 GPUs0.22%60s
8 GPUs0.15%60s

Multi-GPU is available on Pro plan (up to 2 GPUs) and Business plan (up to 8 GPUs). Contact sales for dedicated GPU clusters on Enterprise.