Nemotron 3 Ultra 550B

Nemotron-3-Ultra-550B-A55B is NVIDIA's frontier open-weight sparse MoE model — 550B total parameters with ~55B active per token, a Mamba-2 hybrid backbone, and a native Multi-Token Prediction head. Served on the Blackbox Inference Engine at 420.2 tok/s (c=1), the fastest inference in the industry, behind end-to-end encrypted inference.

Context 1M·Max out 65.5K

Quickstart

import { streamText } from 'ai';
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';

const blackbox = createOpenAICompatible({
  name: 'blackbox',
  apiKey: process.env.BLACKBOX_API_KEY!,
  baseURL: 'https://api.blackbox.ai/v1',
});

const result = streamText({
  model: blackbox('blackboxai/nvidia/nemotron-3-ultra-550b-a55b'),
  prompt: 'Why is the sky blue?',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Specs & pricing

Input $/M: $0.37
Output $/M: $1.08
Context window: 1M
Max output: 65.5K
Supported parameters: tool-usereasoningjson-modestreaming

Related models

Model	Type	Context	Input $/M
NVIDIA: Nemotron 3 Ultra	Text	262K	$0.37
Nemotron 3 Nano 30B	Text	262.1K	$0.05
Nemotron Nano 12B VL	Text	128K	$0.20
Blackbox Pro	Text	400K	$1.75
Claude Opus 4.7	Text	1M	$5