Recapping RTX 50 Series Deep Dives and CES Editor's Day Info

Shortened version skipping a lot of info. If you want the full picture then wait for independent reviews and the Blackwell Consumer Whitepaper. If you want all the info or things explained better then I recommend reading the TechPowerUp deep dive.

Sources used:

  1. Die sizes and transistor count
  2. Techpowerup Deep dive
  3. 50 series announcement GeForce blogpost

FE Cooler Design

  • 5090 + 5080 = Double flow through + 3D VC + split PCB + Liquid metal TIM
  • 5070 TI = no FE card
  • 5070 = Dual axial flow through + VC + single PCB + TIM undisclosed
  • Two slot double flow through design = superior and linear W to dB(A) performance vs prior designs

Specs Overview

[Specs list] RTX 5090 RTX 5080 RTX 5070 TI RTX 5070
Shader TFLOPS 105 56 44 31
RT TFLOPS 318 171 133 94
AI TOPS (FP4) 3352 1800 1406 988
FP32 ALUs 21760 10752 8960 6144
SMs 170 84 70 48
Base mhz 2010 2300 2300 2160
Boost mhz 2410 2620 2450 2510
Mem BW (GB/s) 1780 960 896 672
VRAM (GBs) 32 16 16 12
Bus Width (bit) 512 256 256 192
Mem (Gbps) 28 30 28 28
TDP (W) 575 360 300 250
PCIe + DisplayPort gen5 + 2.1b gen5 + 2.1b gen5 + 2.1b gen5 + 2.1b
Process node 4N 4N 4N 4N
Die size (mm2) 750 378 377 263
Transistors (B) 92.2 45.6 45.6 31
Density (Mtr/mm2) 122.9 120.6 120.6 117.9
MSRP 1999 999 749 549
BS Segmentation 4K 240hz PT 2x RTX 4080 2x RTX 4070 TI 2x RTX 4070

Blackwell Architecture

  1. Tensor cores = FP4 support = reduces bandwidth, compute and memory requirements
  2. RT cores = 2x ray triangle intersections (8x vs Ada's 4x), Linear swept spheres (RTX hair), Triangle Cluster Intersection Engine + Triangle Cluster Decompression Engine + 0.75x RT memory footprint
  3. - Linear swept spheres = hair ray tracing with 3x less data to compute, lower VRAM cost, amd faster using swept spheres instead of triangles for ray intersections.
  4. - Triangle cluster engines = RTX Mega Geometry HW acceleration
  5. CUDA cores = 2x INT32 BW and compute. 128INT32/FP32 per SM.
  6. GDDR7 = PAM3 signaling + superior efficiency (pJ/bit)
  7. Tensor + CUDA cores can share neural workloads + neural and shader code intermixing using DirectX Cooperative Vectors API
  8. AI Management Processor = task scheduling and smart queue priorization (smoothness and responsiveness) for AI workloads and neural rendering
  9. SER efficiency improved by 2X, benefits neural shaders, work graphs and path tracing
  10. Display Engine = DisplayPort 2.1 up to UBHR20 20gbps + high speed HW flip metering
  11. Media Engine = up to 3 x 9th gen Encoder and 2 x 6th gen decoder = AV1 UHQ, 2x H.264x Decode, MV-HEVC (3D and VR), 4:2:2 Encode/decode (professional) + 5% better AV1 and HEVC encoding quality
  12. FP4 + enhanced flip metering + AI-management processor for MFG

Efficiency Functionality

  1. Voltage optimized GDDR7 with ultra low power states
  2. Accelerated frequency switching = 1000x faster clock responsiveness (µs level switching speed) enables higher SM efficiency through rapid clock adjustments in dynamic work loads + drop frequency quick when idle
  3. 300mhz higher active state frequency vs Ada.
  4. Low latency sleep + deeper power states
  5. Advanced Power gating:
  6. - clock gating = clock tree can be disabled = turn off memory
  7. - power gating = turn of SMs and logic when idle
  8. - rail gating = 15x lower rail gate to core time + Separated memory and core voltage rails

MFG + DLSS transformer

  1. New FG model in Warhammer 40,000 = +40% FG speed, -30% VRAM +10% FPS = -400MB + accurate frame pacing
  2. DLSS Transformers (RR, Upscaler + DLAA) = superior detail, clarity (reduced ghosting) and temporal stability
  3. Reflex 2 = Frame warp + inpaiting = 50% lower PC latency vs Reflex or 75% lower vs native
  4. Superior MFG speed (FP4+AMP) and frame consistency (flip metering) = 5-10x less frame variability and superior latency vs prior FG with MFG enabled

Neural Rendering

  1. Available with RTX Kit (RTX skin and hair + more) + coming to RTX Remix
  2. Neural Textures = 7x lossless compression efficiency vs BCx
  3. Neural Materials = less VRAM (47 -> 16MB) + offline render multi-layer approximation
  4. Neural Faces = supercomputer training for each character + more accurate face
  5. Neural Volumes (future)
  6. Neural Radiance Fields (NeRFs, future)
  7. Neural Radiance Cache = approximate infinite light bouncing (offline rendering) using a radiance cache and path tracer = ~10% higher FPS and superior image quality. Runs on device model training while gaming.
  8. RTX Mega Geometry = ray trace against full detail Nanite (UE5 implementation) models and animated models, no low poly meshes = superior path tracing accuracy.

AI Democratization

  1. Everyone is a developer using Graph UI, Chat UI and Model Tuning
  2. Use vendor tools like Crew.AI, ComfyUI, Flow Wise.AI etc...
  3. NVIDIA NIM = Optimized prepackaged microservices for generative AI = Language, regional language, vision language, RAG, Speech, Animation, computer vision, image...
  4. AI Blueprint for RTX = accelerated development
  5. Support by many hardware vendors like ASUS, MSI, HP...

AI in Games

  1. Goal = Game worlds will become a Matrix of autonomous AI
  2. NVIDIA ACE = Autonomous Game characters using user interaction, perception, cognition and memory, action and animation and rendering + dedicated models for everything
  3. Audio2Face = automated facial animations
  4. AI Body Motion = automated character rigging and body animation
  5. Examples: PUBG Ally (Autonomous companions), inZoi Smart Zoi (Autonomous systems), Mir5 (Autonomous enemies).
  6. Project G assist = diagnose and monitor performance, optimize settings, customize peripherals lighting and fan noise + control system with voice or text commands. Powering MSI center, MSI afterburner Omen Gaming Hub and Streamlabs

Other

A lot about professional workflows, generative AI and why Blackwell is great for that + telling reviewers how to do their job when reviewing Blackwell.