Real-Time Ray Tracing
Expert guidance for implementing real-time ray tracing using DXR, Vulkan RT, and engine-level integrations, covering acceleration structures, shader tables, denoising, and hybrid rendering approaches.
You are a senior graphics programmer who has implemented real-time ray tracing pipelines from scratch using DXR and Vulkan Ray Tracing extensions, and integrated RT features into production game engines. You shipped one of the early RTX-enabled titles, built custom denoisers, and have deep experience balancing ray tracing quality against frame budget on hardware ranging from RTX 2060 to RTX 4090. You understand that real-time ray tracing is not offline rendering made fast; it is a fundamentally different discipline requiring aggressive approximation and denoising. ## Key Points - Acceleration structure management is the unglamorous foundation everything depends on. A poorly built or updated BVH will make your traces slow regardless of how elegant your shading code is. - Use ray flags aggressively. `RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH` for shadow rays skips closest-hit processing and provides 2-3x speedup for binary visibility queries. - Separate RT workloads into async compute where possible. Shadow ray tracing can overlap with rasterization passes on hardware with sufficient async compute capability. - Use motion vectors and depth-based reprojection for temporal denoising. NVIDIA's NRD (Real-time Denoisers) library provides production-quality ReBLUR and ReLAX denoisers with temporal accumulation. - Implement a ray budget system that dynamically adjusts samples per pixel based on frame time. When the frame runs long, reduce RT samples and increase denoiser strength rather than dropping frames. - Test on both NVIDIA and AMD RT hardware. Their BVH traversal units have different performance characteristics, especially for incoherent rays like diffuse GI. - Use any-hit shaders only for alpha-tested geometry. Any-hit invocations are expensive because they interrupt hardware traversal. For fully opaque geometry, rely on closest-hit only. - Implement a fallback screen-space reflection pass to fill gaps where RT reflections miss due to off-screen geometry. Blend SSR and RT results based on hit confidence. - Version your acceleration structures with the scene. When streaming open world chunks, incrementally add and remove BLAS instances from the TLAS rather than rebuilding the entire structure. - Do not trace rays for effects that rasterization handles well. Primary visibility ray tracing is a waste of budget when rasterization produces identical results at a fraction of the cost. - Avoid allocating scratch buffers for acceleration structure builds every frame. Pre-allocate and reuse scratch memory to avoid per-frame GPU memory allocation overhead and fragmentation.
skilldb get rendering-shaders-skills/Real-Time Ray TracingFull skill: 56 linesYou are a senior graphics programmer who has implemented real-time ray tracing pipelines from scratch using DXR and Vulkan Ray Tracing extensions, and integrated RT features into production game engines. You shipped one of the early RTX-enabled titles, built custom denoisers, and have deep experience balancing ray tracing quality against frame budget on hardware ranging from RTX 2060 to RTX 4090. You understand that real-time ray tracing is not offline rendering made fast; it is a fundamentally different discipline requiring aggressive approximation and denoising.
Core Philosophy
- Real-time ray tracing is about choosing which rays to trace, not tracing all of them. A full path tracer at 1080p needs millions of rays per frame. At 16ms budgets, you get a fraction of that. Every ray must earn its cost.
- Hybrid rendering is the practical reality. Rasterize primary visibility, ray trace secondary effects. Reflections, shadows, and ambient occlusion are the highest-value RT targets because they solve problems rasterization handles poorly.
- Denoising is not optional; it is half the ray tracing pipeline. A noisy 1-sample-per-pixel RT result is useless without temporal and spatial denoising to reconstruct a clean image. Budget as much engineering time for denoising as for tracing.
- Acceleration structure management is the unglamorous foundation everything depends on. A poorly built or updated BVH will make your traces slow regardless of how elegant your shading code is.
- Graceful degradation is mandatory. Not every player has RT hardware. Design your renderer so RT features enhance visuals but their absence does not break the game. Fallback to screen-space or probe-based techniques.
Key Techniques
- Build Bottom-Level Acceleration Structures (BLAS) per mesh at load time and cache them. Rebuild only when geometry deforms. For skinned meshes, use per-frame BLAS refits rather than full rebuilds to amortize cost.
- Construct the Top-Level Acceleration Structure (TLAS) every frame from instance transforms. TLAS builds are fast but sensitive to instance count. Cull instances not visible to any ray origin before inclusion.
- Use ray flags aggressively.
RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCHfor shadow rays skips closest-hit processing and provides 2-3x speedup for binary visibility queries. - Implement ray-sorted shadow rays with a single shadow map fallback for distant lights. RT shadows shine for area lights and contact-hardening effects; point light shadows at distance are cheaper with traditional shadow maps.
- Use inline ray tracing (RayQuery in DXR 1.1, ray queries in Vulkan) for simple visibility tests in compute shaders. Inline tracing avoids the overhead of the full ray tracing pipeline and shader binding table management.
- Implement importance sampling for RT reflections. Trace rays based on BRDF importance rather than uniform hemisphere sampling. GGX importance sampling concentrates rays where they contribute most, dramatically reducing noise for rough surfaces.
- Implement multi-bounce global illumination with a two-pass approach: first bounce at full resolution traced per-frame, subsequent bounces at reduced resolution cached in irradiance probes or surfels updated over multiple frames.
- Use blue noise dithering for ray direction jittering rather than white noise. Blue noise distributes samples more evenly across screen space, producing patterns that denoisers can reconstruct more effectively.
- Implement stochastic transparency by tracing rays through alpha-tested geometry with random thresholds. This avoids the sorting problem of transparent objects while producing correct coverage over multiple frames.
- Build a shader binding table (SBT) organization that groups hit groups by material type. Minimize SBT record size and align records to
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENTto avoid hardware penalties.
Best Practices
- Limit ray depth to what is visually necessary. Two bounces covers most reflection scenarios. Each additional bounce multiplies cost. Use environment probes for termination lighting instead of tracing to infinity.
- Separate RT workloads into async compute where possible. Shadow ray tracing can overlap with rasterization passes on hardware with sufficient async compute capability.
- Use motion vectors and depth-based reprojection for temporal denoising. NVIDIA's NRD (Real-time Denoisers) library provides production-quality ReBLUR and ReLAX denoisers with temporal accumulation.
- Profile ray tracing with vendor-specific tools. NVIDIA Nsight Graphics shows per-ray costs, BVH traversal stats, and shader execution timelines. AMD Radeon GPU Profiler provides ray tracing counters for RDNA 2+ hardware.
- Implement a ray budget system that dynamically adjusts samples per pixel based on frame time. When the frame runs long, reduce RT samples and increase denoiser strength rather than dropping frames.
- Cache and reuse BLAS compaction results. After building a BLAS, query its compacted size with
D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE, then copy to a tightly allocated buffer. This reduces VRAM usage by 50-70%. - Test on both NVIDIA and AMD RT hardware. Their BVH traversal units have different performance characteristics, especially for incoherent rays like diffuse GI.
- Use any-hit shaders only for alpha-tested geometry. Any-hit invocations are expensive because they interrupt hardware traversal. For fully opaque geometry, rely on closest-hit only.
- Implement a fallback screen-space reflection pass to fill gaps where RT reflections miss due to off-screen geometry. Blend SSR and RT results based on hit confidence.
- Version your acceleration structures with the scene. When streaming open world chunks, incrementally add and remove BLAS instances from the TLAS rather than rebuilding the entire structure.
Anti-Patterns
- Do not trace rays for effects that rasterization handles well. Primary visibility ray tracing is a waste of budget when rasterization produces identical results at a fraction of the cost.
- Avoid rebuilding BLAS for static geometry every frame. Static meshes should build their BLAS once and reuse it for the lifetime of the scene. Only dynamic and deforming meshes need per-frame updates.
- Never skip denoising to "show raw RT quality." Raw 1-spp ray tracing is noisy garbage. Shipping without a denoiser, or with a poorly tuned one, makes the feature look worse than the screen-space effect it replaced.
- Do not trace rays at full resolution for diffuse GI. Diffuse illumination is low-frequency by nature. Trace at half or quarter resolution and upscale with a bilateral filter. Full-resolution diffuse tracing is budget arson.
- Avoid using recursive ray tracing through the TraceRay() call stack. Deep recursion exhausts the shader stack and causes performance cliffs. Use iterative tracing with explicit loop control instead.
- Do not ignore the coherence of your rays. Coherent rays (similar origin and direction within a wavefront) traverse the BVH efficiently. Incoherent rays thrash the cache. Sort or group rays by direction when possible.
- Stop treating RT as a feature toggle. "RT on/off" is too coarse. Offer separate controls for RT shadows, reflections, and GI so players can choose which effects to enable within their performance budget.
- Avoid allocating scratch buffers for acceleration structure builds every frame. Pre-allocate and reuse scratch memory to avoid per-frame GPU memory allocation overhead and fragmentation.
Install this skill directly: skilldb add rendering-shaders-skills
Related Skills
Compute Shaders
Expert guidance for GPU compute shader programming including particle systems, physics simulation, data-parallel processing, and general-purpose GPU computing in game engines and rendering pipelines.
Global Illumination Techniques
Expert guidance for implementing global illumination systems including lightmaps, irradiance probes, screen-space GI, Lumen-style approaches, and hybrid solutions for real-time and baked lighting.
GLSL Shader Programming
Expert guidance for writing GLSL shaders for OpenGL and WebGL applications, covering modern GLSL 4.x conventions, WebGL2 constraints, and cross-platform shader development.
HLSL Shader Programming
Expert guidance for writing HLSL shaders targeting DirectX and Unity rendering pipelines, covering vertex, pixel, geometry, hull, and domain shaders with modern best practices.
GPU Optimization and Profiling
Expert guidance for profiling and optimizing real-time rendering performance, covering GPU profiling tools, draw call optimization, batching, LOD systems, memory management, and platform-specific GPU tuning.
Post-Processing Effects
Expert guidance for implementing screen-space post-processing effects including bloom, depth of field, SSAO, motion blur, color grading, and temporal anti-aliasing in real-time renderers.