NVIDIA: Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is a 550B parameter (55B activated) open reasoning model built for long-running autonomous agents handling orchestration and complex tasks across coding, deep research, and enterprise workflows. Its hybrid Mamba-Transformer MoE architecture combines Latent MoE — which calls 4 experts at the inference cost of 1 — with Multi-Token Prediction for reduced generation time on long sequences, and Token Budget support for optimal accuracy with minimum reasoning token output. The model supports a 1M token context window and is fully open under the NVIDIA Open Model License with open weights, training data, and recipes.

ReasoningToolsCache

Context 1M·Max out 16.4K

Quickstart

import { streamText } from 'ai';
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';

const blackbox = createOpenAICompatible({
  name: 'blackbox',
  apiKey: process.env.BLACKBOX_API_KEY!,
  baseURL: 'https://api.blackbox.ai/v1',
});

const result = streamText({
  model: blackbox('blackboxai/nvidia/nemotron-3-ultra'),
  prompt: 'Why is the sky blue?',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Specs & pricing

Input $/M: 0.6
Output $/M: 2.40
Context window: 1M
Max output: 16.4K
Supported parameters: tool-usereasoningstreaming

Related models

Model	Type	Context	Input $/M
Nemotron 3 Nano 30B	Text	262.1K	0.05
Nemotron Nano 12B VL	Text	128K	0.2
NVIDIA Nemotron 3 Super 120B A12B	Text	256K	0.15
Nemotron 3 Ultra	Text	1M	0.6
Nvidia Nemotron Nano 9B V2	Text	131.1K	0.06