Introduction
This post will focus on ordering of the commands in the RenderContexts
. I briefly touched on this subject in the last post and if you’ve implemented a rendering engine before you’re probably not new to this problem. Basically we need a way to make sure our RenderJobPackages
(draw calls) end up on the screen in the correct order, both from a visual point of view as well as from a performance point of view. Some concrete examples,
- Make sure g-buffers and shadow maps are rendered before any lighting happens.
- Make sure opaque geometry is rendered front to back to reduce overdraw.
- Make sure transparent geometry is rendered back to front for alpha blending to generate correct results.
- Make sure the sky dome is rendered after all opaque geometry but before any transparent geometry.
- All of the above but also strive to reduce state switches as much as possible.
- All of the above but depending on GPU architecture maybe shift some work around to better utilize the hardware.
There are many ways of tackling this problem and it’s not uncommon that engines uses multiple sorting systems and spend quite a lot of frame time getting this right.
Personally I’m a big fan of explicit ordering with a single stable sort. What I mean by explicit ordering is that every command that gets recorded to a RenderContext
already has the knowledge of when it will be executed relative to other commands. For us this knowledge is in the form of a 64 bit sort_key
, in the case where we get two commands with the exact same sort_key
we rely on the sort being stable to not introduce any kind of temporal instabilities in the final output.
The reasons I like this approach are many,
- It’s trivial to implement compared to various bucketing schemes and sorting of those buckets.
- We only need to visit renderable objects once per view (when calling their
render()
function), no additional pre-visits for sorting are needed. - The sort is typically fast, and cost is isolated and easy to profile.
- Parallel rendering works out of the box, we can just take all the
Command
arrays of all theRenderContexts
and merge them before sorting.
To make this work each command needs to know its absolute sort_key
. Let’s breakdown the sort_key
we use when working with our data-driven rendering pipe in Stingray. (Note: if the user doesn’t care about playing nicely together with our system for data-driven rendering it is fine to completely ignore the bit allocation patterns described below and roll their own.)
sort_key
breakdown
Most significant bit on the left, here are our bit ranges:
MSB [ 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ] LSB
^ ^ ^ ^ ^^ ^
| | | | || |- 3 bits - Shader System (Pass Immediate)
| | | | ||- 16 bits - Depth
| | | | |- 1 bit - Instance bit
| | | |- 32 bits - User defined
| | |- 3 bits - Shader System (Pass Deferred)
| - 7 bits - Layer System
|- 2 bits - Unused
2 bits - Unused
Nothing to see here, moving on… (Not really sure why these 2 bits are unused, I guess they weren’t at some point but for the moment they are always zero) :)
7 bits - Layer System
This 7-bits range is managed by the “Layer system”. The Layer system is responsible for controlling the overall scheduling of a frame and is set up in the render_config
file. It’s a central part of the data-driven rendering architecture in Stingray. It allows you to configure what layers to expose to the shader system and in which order these layers should be drawn. We will look closer at the implementation of the layer system in a later post but in the interest of clarifying how it interops with the sort_key
here’s a small example:
default = [
// sort_key = [ 00000000 10000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
// sort_key = [ 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer"
profiling_scope="decal" sort="EXPLICIT" }
// sort_key = [ 00000001 10000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ resource_generator="lighting" profiling_scope="lighting" }
// sort_key = [ 00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ] LSB
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="emissive" }
]
Above we have three layers exposed to the shader system and one kick of a resource_generator
called lighting
(more about resource_generators
in a later post). The layers are rendered in the order they are declared, this is handled by letting each new layer increment the 7 bits range belonging to the Layer System with 1 (as can be seen in the sort_key
comments above).
The shader author dictates into which layer(s) it wants to render. When a RenderJobPackage
is recorded to the RenderContext
(as described in the last post) the correct layer sort_keys
are looked up from the layer system and the result is bitwise ORed together with the sort_key
value piped as argument to RenderContext::render()
.
3 bits - Shader System (Pass Deferred)
The next 3 bits are controlled by the Shader System. These three bits encode the shader pass index within a layer. When I say shader in this context I refer to our ShaderTemplate::Context
which is basically a wrapper around multiple linked shaders rendering into one or many layers. (Nathan Reed recently blogged about “The Many Meanings of “Shader””, in his analogy our ShaderTemplate
is the same as an “Effect”)
Since we can have a multi-pass shader rendering into the same layer we need to encode the pass index into the sort_key
, that is what this 3 bit range is used for.
32 bits - User defined
We then have 32 user defined bits, these bits are primarily used by our “Resource Generator” system (I will be covering this system in the post about render_config
& data-driven rendering later), but the user is free to use them anyway they like and still maintain compatibility with the data-driven rendering system.
1 bit - Instance bit
This single bit also comes from the Shader System and is set if the shader implements support for “Instance Merging”. I will be covering this in a bit more detail in my next post about the RenderDevice
but essentially this bit allows us to scan through all commands and find ranges of commands that potentially can be merged together to fewer draw calls.
16 bits - Depth
One of the arguments piped to RenderContext::render()
is an unsigned normalized depth value (0.0-1.0). This value gets quantized into these 16 bits and is what drives the front-to-back vs back-to-front sorting of RenderJobPackages
. If the sorting criteria for the layer (see layer example above) is set to back-to-front we simply flip the bits in this range.
3 bits - Shader System (Pass Immediate)
A shader can be configured to run in “Immediate Mode” instead of “Deferred Mode” (default). This forces passes in a multi-pass shader to run immediately after each other and is achieved by moving the pass index bits into the least significant bits of the sort_key
. The concept is probably easiest to explain with an artificial example and some pseudo code:
Take a simple scene with a few instances of the same mesh, each mesh recording one RenderJobPackages
to one or many RenderContexts
and all RenderJobPackages
are being rendered with the same multi-pass shader.
In “Deferred Mode” (i.e pass indices encoded in the “Shader System (Pass Deferred)” range) you would get something like this:
foreach (pass in multi-pass-shader)
foreach (render-job in render-job-packages)
render (render-job)
end
end
If shader is configured to run in “Immediate Mode” you would instead get something like this:
foreach (render-job in render-job-packages)
foreach (pass in multi-pass-shader)
render (render-job)
end
end
As you probably can imagine the latter results in more shader / state switches but can sometimes be necessary to guarantee correctly rendered results. A typical example is when using multi-pass shaders that does alpha blending.
Wrap up
The actual sort is implemented using a standard stable radix sort and happens immediately after the user has called RenderDevice::dispatch()
handing over n-number of RenderContexts
to the RenderDevice
for translation into graphics API calls.
Next post will cover this and give an overview of what a typical rendering back-end (RenderDevice
) looks like in Stingray. Stay tuned.
Office Setup
ReplyDeleteoffice.com/setup
www.office.com/setup
Office Com Setup
Printer Tech Support
www.norton.com/setup
hp Printer Tech Support
Printer Technical Support number
www.norton.com/setup
norton com setup
I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
ReplyDeleteData Science Training in Bangalore
nice blog
ReplyDeletedata science training in bangalore
blockchain training in bangalore
python online training
ReplyDeleteI've been surfing on the web more than 3 hours today, yet I never found any stupefying article like yours. It's imperatively worth for me. As I would see it, if all web proprietors and bloggers made confusing substance as you did, the net will be in a general sense more profitable than at whatever point in late memory.
Digital Marketing Training in Mumbai
Six Sigma Training in Dubai
Six Sigma Abu Dhabi
I think this is a great site to post and I have read most of contents and I found it useful for my Career .Thanks for the useful information. For any information or Queries Comment like and share it.
ReplyDeletePMP Training Abu Dhabi
GDPR Training in Hyderabad
Digital Marketing Training in Hyderabad
six sigma Training Pune
Gaining Python certifications will validate your skills and advance your career.
ReplyDeletepython certification
Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
ReplyDeleteGood discussion. Thank you.
Anexas
Six Sigma Training in Abu Dhabi
Six Sigma Training in Dammam
Six Sigma Training in Riyadh
Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing ! Kindly Visit Us @ Best Travels in Madurai | Tours and Travels in Madurai | Madurai Travels
ReplyDeleteInformation from this blog is very useful for me, am very happy to read this blog Kindly visit us @ Luxury Watch Box | Shoe Box Manufacturer | Luxury Cosmetics Box
ReplyDeleteThanks for sharing a worthy information. This is really helpful for learning. Keep doing more.
ReplyDeleteTOEFL Classes in Chennai
Best TOEFL Classes in Chennai
TOEFL in Chennai
Best TOEFL Class in Chennai
TOEFL Training Center in Chennai
TOEFL Coaching near me
TOEFL Training in Chennai
I am really enjoying reading your well written articles.
ReplyDeleteIt looks like you spend a lot of effort and time on your blog.Keep Doing.
German Learning Institutes in Bangalore
German Training Institutes in Bangalore
German Speaking Course in Bangalore
Digital Marketing Training Bangalore
Digital Marketing classes in Bangalore
Digital Marketing Certification in Bangalore
Very clear and precise post. Seems like you've taken a lot of effort for this. Keep posting. Looking forward for more from you.
ReplyDeleteLINUX Training in Chennai | Best LINUX Training institute in Chennai | Learn LINUX | LINUX Course in Chennai | LINUX Certification Courses in Chennai
Nice Blog
ReplyDeleteblockchain training in Bangalore
Nice post..
ReplyDeletedata science training in BTM
best data science courses in BTM
data science institute in BTM
data science certification BTM
data analytics training in BTM
data science training institute in BTM
Nice Blog
ReplyDeleteblockchain training in Bangalore
The post was amazing. It showcases your knowledge on the topic. Thanks for Posting.
ReplyDeleteCPHQ Online Training in Kabul. Get Certified Online|
CPHQ Training Classes in Al Farwaniyah
Great post! Thanks for sharing.
ReplyDeleteWordPress Training in Chennai | WordPress Training | WordPress Course in Chennai | Training institutes in Chennai with Placement | Tally Course in Chennai | Ionic Course in Chennai
I think this is the best article today about the future technology. Thanks for taking your own time to discuss this topic, I feel happy about that curiosity has increased to learn more about this topic. Artificial Intelligence Training in Bangalore. Keep sharing your information regularly for my future reference.
ReplyDeletewhatsapp group links 2019
ReplyDelete