Docs/Production Guide

Production Guide

Rate limits, retry patterns, observability, and cost control for production API usage.

Rate limits by plan

PlanJobs / monthReq / minMax concurrent
Starter20101
Pro2,000605
BusinessUnlimited30020
EnterpriseUnlimitedCustomCustom

Rate limit errors return HTTP 429 with a Retry-After header in seconds and a X-RateLimit-Reset Unix timestamp.

Exponential backoff with jitter

Never retry rate-limited requests with fixed delays — all clients will collide on the same second. Use exponential backoff with full jitter.

python
import time
import random
import nerox
from nerox.exceptions import RateLimitError, JobFailedError

def submit_with_retry(client, problem_type, payload, max_retries=5):
    for attempt in range(max_retries):
        try:
            job = client.optimize.submit(problem_type=problem_type, **payload)
            return job.wait(timeout=600)
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Full jitter: sleep between 0 and cap
            base = min(2 ** attempt, 60)       # cap at 60s
            sleep_s = random.uniform(0, base)
            print(f"Rate limited. Retry {attempt+1}/{max_retries} in {sleep_s:.1f}s")
            time.sleep(sleep_s)
        except JobFailedError as e:
            raise   # don't retry job failures — fix the input

Async concurrency control

When submitting many jobs in parallel, cap concurrency to your plan limit to avoid wasted 429 round-trips.

python
import asyncio
import nerox

async def process_batch(matrices, max_concurrent=5):
    async with nerox.AsyncClient() as client:
        semaphore = asyncio.Semaphore(max_concurrent)

        async def solve_one(matrix):
            async with semaphore:
                job = await client.optimize.tsp(distance_matrix=matrix, solver="gpu")
                return await job.result()

        return await asyncio.gather(*[solve_one(m) for m in matrices])

API key management

Never hardcode keys

Load API keys from environment variables or a secrets manager. Never commit keys to source control — rotate immediately if exposed.

bash
# .env (never commit this file)
NEROX_API_KEY=replace-with-nerox-api-key

# Application startup
export NEROX_API_KEY=$(aws secretsmanager get-secret-value --secret-id nerox-api-key --query SecretString --output text)

Use scoped keys for different services

Create separate API keys per service or environment from the dashboard. Restrict each key to minimum required permissions. This limits blast radius if a key is compromised.

Monitoring and alerting

Track GPU-second usage

python
import nerox

client = nerox.Client()
usage = client.usage.get()

print(f"Used: {usage.used_gpu_seconds:.0f}s of {usage.included_gpu_seconds:.0f}s")
print(f"Overage: ${usage.overage_usd:.2f}")
print(f"Resets: {usage.period_end}")

Prometheus metrics

The NEROX SDK emits Prometheus-compatible metrics when NEROX_METRICS=1 is set. Track nerox_job_duration_seconds, nerox_job_errors_total, and nerox_rate_limit_hits_total. Alert when error rate exceeds 1% or p99 latency exceeds your SLA.

Cost control

GPU-seconds are billed from job start to job completion, not queue time. To control costs: set a timeout on all job.wait() calls, cancel jobs that exceed your quality threshold early via job.cancel(), and use the streaming API to cancel as soon as you reach a good-enough solution rather than waiting for full convergence.

python
import nerox

client = nerox.Client()
job = client.optimize.tsp(distance_matrix=matrix, solver="gpu")

target_gap = 0.005   # accept within 0.5% of optimal
known_optimal = 7542

for event in job.stream():
    current_gap = (event.best_energy - known_optimal) / known_optimal
    if current_gap <= target_gap:
        job.cancel()
        print(f"Target reached: gap={current_gap*100:.3f}%")
        break

result = job.result