Recapping RTX 50 Series Deep Dives and CES Editor's Day Info
Shortened version skipping a lot of info. If you want the full picture then wait for independent reviews and the Blackwell Consumer Whitepaper. If you want all the info or things explained better then I recommend reading the TechPowerUp deep dive.
Sources used:
FE Cooler Design
- 5090 + 5080 = Double flow through + 3D VC + split PCB + Liquid metal TIM
- 5070 TI = no FE card
- 5070 = Dual axial flow through + VC + single PCB + TIM undisclosed
- Two slot double flow through design = superior and linear W to dB(A) performance vs prior designs
Specs Overview
[Specs list] | RTX 5090 | RTX 5080 | RTX 5070 TI | RTX 5070 |
---|---|---|---|---|
Shader TFLOPS | 105 | 56 | 44 | 31 |
RT TFLOPS | 318 | 171 | 133 | 94 |
AI TOPS (FP4) | 3352 | 1800 | 1406 | 988 |
FP32 ALUs | 21760 | 10752 | 8960 | 6144 |
SMs | 170 | 84 | 70 | 48 |
Base mhz | 2010 | 2300 | 2300 | 2160 |
Boost mhz | 2410 | 2620 | 2450 | 2510 |
Mem BW (GB/s) | 1780 | 960 | 896 | 672 |
VRAM (GBs) | 32 | 16 | 16 | 12 |
Bus Width (bit) | 512 | 256 | 256 | 192 |
Mem (Gbps) | 28 | 30 | 28 | 28 |
TDP (W) | 575 | 360 | 300 | 250 |
PCIe + DisplayPort | gen5 + 2.1b | gen5 + 2.1b | gen5 + 2.1b | gen5 + 2.1b |
Process node | 4N | 4N | 4N | 4N |
Die size (mm2) | 750 | 378 | 377 | 263 |
Transistors (B) | 92.2 | 45.6 | 45.6 | 31 |
Density (Mtr/mm2) | 122.9 | 120.6 | 120.6 | 117.9 |
MSRP | 1999 | 999 | 749 | 549 |
BS Segmentation | 4K 240hz PT | 2x RTX 4080 | 2x RTX 4070 TI | 2x RTX 4070 |
Blackwell Architecture
- Tensor cores = FP4 support = reduces bandwidth, compute and memory requirements
- RT cores = 2x ray triangle intersections (8x vs Ada's 4x), Linear swept spheres (RTX hair), Triangle Cluster Intersection Engine + Triangle Cluster Decompression Engine + 0.75x RT memory footprint
- - Linear swept spheres = hair ray tracing with 3x less data to compute, lower VRAM cost, amd faster using swept spheres instead of triangles for ray intersections.
- - Triangle cluster engines = RTX Mega Geometry HW acceleration
- CUDA cores = 2x INT32 BW and compute. 128INT32/FP32 per SM.
- GDDR7 = PAM3 signaling + superior efficiency (pJ/bit)
- Tensor + CUDA cores can share neural workloads + neural and shader code intermixing using DirectX Cooperative Vectors API
- AI Management Processor = task scheduling and smart queue priorization (smoothness and responsiveness) for AI workloads and neural rendering
- SER efficiency improved by 2X, benefits neural shaders, work graphs and path tracing
- Display Engine = DisplayPort 2.1 up to UBHR20 20gbps + high speed HW flip metering
- Media Engine = up to 3 x 9th gen Encoder and 2 x 6th gen decoder = AV1 UHQ, 2x H.264x Decode, MV-HEVC (3D and VR), 4:2:2 Encode/decode (professional) + 5% better AV1 and HEVC encoding quality
- FP4 + enhanced flip metering + AI-management processor for MFG
Efficiency Functionality
- Voltage optimized GDDR7 with ultra low power states
- Accelerated frequency switching = 1000x faster clock responsiveness (µs level switching speed) enables higher SM efficiency through rapid clock adjustments in dynamic work loads + drop frequency quick when idle
- 300mhz higher active state frequency vs Ada.
- Low latency sleep + deeper power states
- Advanced Power gating:
- - clock gating = clock tree can be disabled = turn off memory
- - power gating = turn of SMs and logic when idle
- - rail gating = 15x lower rail gate to core time + Separated memory and core voltage rails
MFG + DLSS transformer
- New FG model in Warhammer 40,000 = +40% FG speed, -30% VRAM +10% FPS = -400MB + accurate frame pacing
- DLSS Transformers (RR, Upscaler + DLAA) = superior detail, clarity (reduced ghosting) and temporal stability
- Reflex 2 = Frame warp + inpaiting = 50% lower PC latency vs Reflex or 75% lower vs native
- Superior MFG speed (FP4+AMP) and frame consistency (flip metering) = 5-10x less frame variability and superior latency vs prior FG with MFG enabled
Neural Rendering
- Available with RTX Kit (RTX skin and hair + more) + coming to RTX Remix
- Neural Textures = 7x lossless compression efficiency vs BCx
- Neural Materials = less VRAM (47 -> 16MB) + offline render multi-layer approximation
- Neural Faces = supercomputer training for each character + more accurate face
- Neural Volumes (future)
- Neural Radiance Fields (NeRFs, future)
- Neural Radiance Cache = approximate infinite light bouncing (offline rendering) using a radiance cache and path tracer = ~10% higher FPS and superior image quality. Runs on device model training while gaming.
- RTX Mega Geometry = ray trace against full detail Nanite (UE5 implementation) models and animated models, no low poly meshes = superior path tracing accuracy.
AI Democratization
- Everyone is a developer using Graph UI, Chat UI and Model Tuning
- Use vendor tools like Crew.AI, ComfyUI, Flow Wise.AI etc...
- NVIDIA NIM = Optimized prepackaged microservices for generative AI = Language, regional language, vision language, RAG, Speech, Animation, computer vision, image...
- AI Blueprint for RTX = accelerated development
- Support by many hardware vendors like ASUS, MSI, HP...
AI in Games
- Goal = Game worlds will become a Matrix of autonomous AI
- NVIDIA ACE = Autonomous Game characters using user interaction, perception, cognition and memory, action and animation and rendering + dedicated models for everything
- Audio2Face = automated facial animations
- AI Body Motion = automated character rigging and body animation
- Examples: PUBG Ally (Autonomous companions), inZoi Smart Zoi (Autonomous systems), Mir5 (Autonomous enemies).
- Project G assist = diagnose and monitor performance, optimize settings, customize peripherals lighting and fan noise + control system with voice or text commands. Powering MSI center, MSI afterburner Omen Gaming Hub and Streamlabs
Other
A lot about professional workflows, generative AI and why Blackwell is great for that + telling reviewers how to do their job when reviewing Blackwell.