Rate limits by plan
Rate limit errors return HTTP 429 with a Retry-After header in seconds and a X-RateLimit-Reset Unix timestamp.
Exponential backoff with jitter
Never retry rate-limited requests with fixed delays — all clients will collide on the same second. Use exponential backoff with full jitter.
import time
import random
import nerox
from nerox.exceptions import RateLimitError, JobFailedError
def submit_with_retry(client, problem_type, payload, max_retries=5):
for attempt in range(max_retries):
try:
job = client.optimize.submit(problem_type=problem_type, **payload)
return job.wait(timeout=600)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
# Full jitter: sleep between 0 and cap
base = min(2 ** attempt, 60) # cap at 60s
sleep_s = random.uniform(0, base)
print(f"Rate limited. Retry {attempt+1}/{max_retries} in {sleep_s:.1f}s")
time.sleep(sleep_s)
except JobFailedError as e:
raise # don't retry job failures — fix the inputAsync concurrency control
When submitting many jobs in parallel, cap concurrency to your plan limit to avoid wasted 429 round-trips.
import asyncio
import nerox
async def process_batch(matrices, max_concurrent=5):
async with nerox.AsyncClient() as client:
semaphore = asyncio.Semaphore(max_concurrent)
async def solve_one(matrix):
async with semaphore:
job = await client.optimize.tsp(distance_matrix=matrix, solver="gpu")
return await job.result()
return await asyncio.gather(*[solve_one(m) for m in matrices])API key management
Never hardcode keys
Load API keys from environment variables or a secrets manager. Never commit keys to source control — rotate immediately if exposed.
# .env (never commit this file) NEROX_API_KEY=replace-with-nerox-api-key # Application startup export NEROX_API_KEY=$(aws secretsmanager get-secret-value --secret-id nerox-api-key --query SecretString --output text)
Use scoped keys for different services
Create separate API keys per service or environment from the dashboard. Restrict each key to minimum required permissions. This limits blast radius if a key is compromised.
Monitoring and alerting
Track GPU-second usage
import nerox
client = nerox.Client()
usage = client.usage.get()
print(f"Used: {usage.used_gpu_seconds:.0f}s of {usage.included_gpu_seconds:.0f}s")
print(f"Overage: ${usage.overage_usd:.2f}")
print(f"Resets: {usage.period_end}")Prometheus metrics
The NEROX SDK emits Prometheus-compatible metrics when NEROX_METRICS=1 is set. Track nerox_job_duration_seconds, nerox_job_errors_total, and nerox_rate_limit_hits_total. Alert when error rate exceeds 1% or p99 latency exceeds your SLA.
Cost control
GPU-seconds are billed from job start to job completion, not queue time. To control costs: set a timeout on all job.wait() calls, cancel jobs that exceed your quality threshold early via job.cancel(), and use the streaming API to cancel as soon as you reach a good-enough solution rather than waiting for full convergence.
import nerox
client = nerox.Client()
job = client.optimize.tsp(distance_matrix=matrix, solver="gpu")
target_gap = 0.005 # accept within 0.5% of optimal
known_optimal = 7542
for event in job.stream():
current_gap = (event.best_energy - known_optimal) / known_optimal
if current_gap <= target_gap:
job.cancel()
print(f"Target reached: gap={current_gap*100:.3f}%")
break
result = job.result