Friday, February 17, 2017

Stingray Renderer Walkthrough #5: RenderDevice

Stingray Renderer Walkthrough #5: RenderDevice

Overview

The RenderDevice is essentially our abstraction layer for platform specific rendering APIs. It is implemented as an abstract base class that various rendering back-ends (D3D11, D3D12, OGL, Metal, GNM, etc.) implement.

The RenderDevice has a bunch of helper functions for initializing/shutting down the graphics APIs, creating/destroying swap chains, etc. All of which are fairly straightforward so I won’t cover them in this post, instead I will put my focus on the two dispatch functions consuming RenderResourceContexts and RenderContexts:


class RenderDevice {
public: 
    virtual void dispatch(uint32_t n_contexts, RenderResourceContext **rrc, 
        uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;

    virtual void dispatch(uint32_t n_contexts, RenderContext **rc, 
        uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
};

Resource Management

As covered in the post about RenderResourceContexts, they provide a free-threaded interface for allocating and deallocating GPU resources. However, it is not until the user has called RenderDevice::dispatch() handing over the RenderResourceContexts as their representation gets created on the RenderDevice side.

All implementations of a RenderDevice have some form of resource management that deals with creating, updating and destroying of the graphics API specific representations of resources. Typically we track the state of all various types of resources in a single struct, here’s a stripped down example from the DX12 RenderDevice implementation called D3D12ResourceContext:


struct D3D12VertexBuffer
{
    D3D12_VERTEX_BUFFER_VIEW view;
    uint32_t allocation_index;
    int32_t size;
};

struct D3D12IndexBuffer
{
    D3D12_INDEX_BUFFER_VIEW view;
    uint32_t allocation_index;
    int32_t size;
};

struct D3D12ResourceContext 
{
    Array<D3D12VertexBuffer> vertex_buffers;
    Array<uint32_t> unused_vertex_buffers;

    Array<D3D12IndexBuffer> index_buffers;
    Array<uint32_t> unused_index_buffers;

    // .. lots of other resources

    Array<uint32_t> resource_lut;
};

As you might remember, the linking between the engine representation and the RenderDevice representation is done using the RenderResource::render_resource_handle. It encodes both the type of the resource as well as a handle. The resource_lut is an indirection to go from the engine handle to a local index for a specific type (e.g vertex_buffers or index_buffers in the sample above). We also track freed indices for each type (e.g. unused_vertex_buffers) to simplify recycling of slots.

The implementation of the dispatch function is fairly straight forward. We simply iterate over all the RenderResourceContexts and for each context iterate over its commands and either allocate or deallocate resources in the D3D12ResourceContext. It is important to note that this is a synchronous operation, nothing else is peeking or poking on the D3D12ResourceContext when the dispatch of RenderResourceContexts is happening, which makes our life a lot easier.

Unfortunately that isn’t the case when we dispatch RenderContexts as in that case we want to go wide (i.e. forking the workload and process it using multiple worker threads) when translating the commands into API calls. While we don’t allow allocating and deallocating new resources from the RenderContexts we do allow updating them which mutates the state of the RenderDevice representations (e.g. a D3D12VertexBuffer).

At the moment our solution for this isn’t very nice, basically we don’t allow asynchronous updates for anything else than DYNAMIC buffers. UPDATABLE buffers are always updated serially before we kick the worker threads no matter what their sort_key is. All worker threads access resources through their own copy of something we call a ResourceAccessor, it is responsible for tracking the worker threads state of dynamic buffers (among other things). In the future I think we probably should generalize this and treat UPDATABLE buffers in a similar way.

(Note: this limitation doesn’t mean you can’t update an UPDATABLE buffer more than once per frame, it simply means you cannot update it more than once per dispatch).

Shaders

Resources in the D3D12ResourceContext are typically buffers. One exception that stands out is the RenderDevice representation of a “shader”. A “shader” on the RenderDevice side maps to a ShaderTemplate::Context on the engine side, or what I guess we could call a multi-pass shader. Here’s some pseudo code:


struct ShaderPass
{
    struct ShaderProgram
    {
        Array<uint8_t> bytecode;
        struct ConstantBufferBindInfo;
        struct ResourceBindInfo;
        struct SamplerBindInfo;
    };
    ShaderProgram vertex_shader;
    ShaderProgram domain_shader;
    ShaderProgram hull_shader;
    ShaderProgram geometry_shader;
    ShaderProgram pixel_shader;
    ShaderProgram compute_shader;

    struct RenderStates;
};

struct Shader
{
    Vector<ShaderPass> passes;
    enum SortMode { IMMADIATE, DEFERRED };
    uint32_t sort_mode;
};

The pseudo code above is essentially the RenderDevice representation of a shader that we serialize to disk during data compilation. From that we can create all the necessary graphics API specific objects expressing an executable shader together with its various state blocks (Rasterizer, Depth Stencil, Blend, etc.).

As discussed in the last post the sort_key encodes the shader pass index. Using Shader::sort_mode, we know which bit range to extract from the sort_key as pass index, which we then use to look up the ShaderPass from Shader::passes. A ShaderPass contains one ShaderProgram per active shader stage and each ShaderProgram contains the byte code for the shader to compile as well as “bind info” for various resources that the shader wants as input.

We will look at this in a bit more detail in the post about “Shaders & Materials”, for now I just wanted to familiarize you with the concept.

Render Context translation

Let’s move on and look at the dispatch for translating RenderContexts into graphics API calls:

class RenderDevice {
public: 
    virtual void dispatch(uint32_t n_contexts, RenderContext **rc, 
        uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
};

The first thing all RenderDevice implementation do when receiving a bunch of RenderContexts is to merge and sort their Commands. All implementations share the same code for doing this:

void prepare_command_list(RenderContext::Commands &output, unsigned n_contexts, RenderContext **contexts);

This function basically just takes the RenderContext::Commands from all RenderContexts and merges them into a new array, runs a stable radix sort, and returns the sorted commands in output. To avoid memory allocations the RenderDevice implementation owns the memory of the output buffer.

Now we have all the commands nicely sorted based on their sort_key. Next step is to do the actual translation of the data referenced by the commands into graphics API calls. I will explain this process with the assumption that we are running on a graphics API that allows us to build graphics API command lists in parallel (e.g. DX12, GNM, Vulkan, Metal), as that feels most relevant in 2017.

Before we start figuring out our per thread workloads for going wide, we have one more thing to do; “instance merging”.

Instance Merging

I’ve mentioned the idea behind instance merging before [1,2], basically we want to try to reduce the number of RenderJobPackages (i.e. draw calls) by identifying packages that are similar enough to be merged. In Stingray “similar enough” basically means that they must have identical inputs to the input assembler as well as identical resources bound to all shader stages, the only thing that is allowed to differ are constant buffer variables. (Note: by todays standards this can be considered a bit old school, new graphics APIs and hardware allows to tackle this problem more aggressively using “bindless” concepts. )

The way it works is by filtering out ranges of RenderContexts::Commands where the “instance bit” of the sort_key is set and all bits above the instance bit are identical. Then for each of those ranges we fork and go wide to analyze the actual RenderJobPackage data to see if the instance_hash and the shader are the same, and if so we know its safe to merge them.

The actual merge is done by extracting the instance specific constants (these are tagged by the shader author) from the constant buffers and propagating them into a dynamic RawBuffer that gets bound as input to the vertex shader.

Depending on how the scene is constructed, instance merging can significantly reduce the number of draw calls needed to render the final scene. The instance merger in itself is not graphics API specific and is isolated in its own system, it just happens to be the responsibility of the RenderDevice to call it. The interface looks like this:

namespace instance_merger {

struct ProcessMergedCommandsResult
{
    uint32_t n_instances;
    uint32_t instanced_batches;
    uint32_t instance_buffer_size;
};

ProcessMergedCommandsResult process_merged_commands(Merger &instance_merger, 
    RenderContext::Commands &merged_commands);

}

Pass in a reference to the sorted RenderContext::Commands in merged_commands and after the instance merger is done running you hopefully have fewer commands in the array. :)

You could argue that merging, sorting and instance merging should all happen before we enter the world of the RenderDevice. I wouldn’t argue against that.

Prepare workloads

Last step before we can start translating our commands into state / draw / dispatch calls is to split the workload into reasonable chunks and prepare the execution contexts for our worker threads.

Typically we just divide the number of RenderContext::Commands we have to process with the number of worker threads we have available. We don’t care about the type of different commands we will be processing and trying to load balance differently. The reasoning behind this is that we anticipate that draw calls will always represent the bulk of the commands and the rest of the commands can be considered as unavoidable “noise”. We do, however, make sure that we don’t do less than x-number of commands per worker threads, where x can differ a bit depending on platform but is usually ~128.

For each execution context we create a ResourceAccessors (described above) as well as make sure we have the correct state setup in terms of bound render targets and similar. To do this we are stuck with having to do a synchronous serial sweep over all the commands to find bigger state changing commands (such as RenderContext::set_render_target).

This is where the Command::command_flags bit-flag comes into play, instead of having to jump around in memory to figure out what type of command the Command::head points to, we put some hinting about the type in the Command::command_flags, like for example if it is a “state command”. This way the serial sweep doesn’t become very costly even when dealing with large number of commands. During this sweep we also deal with updating of UPDATABLE resources, and on newer graphics APIs we track fences (discussed in the post about Render Contexts).

The last thing we do is to set up the execution contexts with create graphics API specific representations of command lists (e.g. ID3D12GraphicsCommandList in DX12),

Translation

When getting to this point doing the actual translation is fairly straight forward. Within each worker thread we simply loop over its dedicated range of commands, fetch its data from Command::head and generate any number of API specific commands necessary based on the type of command.

For a RenderJobPackage representing a draw call it involves:

  • Look up the correct shader pass and, unless already bound, bind all active shader stages
  • Look up the state blocks (Rasterizer, Depth stencil, Blending, etc.) from the shader and bind them unless already bound
  • Look up and bind the resources for each shader stage using the RenderResource::render_resource_handle translated through the D3D12ResourceAccessor
  • Setup the input assembler by looping over the RenderResource::render_resource_handles pointed to by the RenderJobPackage::resource_offset and translated through the D3D12ResourceAccessor
  • Bind and potentially update constant buffers
  • Issue the draw call

The execution contexts also holds most-recently-used caches to avoid unnecessary binds of resources/shaders/states etc.

Note: In DX12 we also track where resource barriers are needed during this stage. After all worker threads are done we might also end up having to inject further resource barriers between the command lists generated by the worker threads. We have ideas on how to improve on this by doing at least parts of this tracking when building the RenderContexts but haven’t gotten around looking into it yet.

Execute

When the translation is done we pass the resulting command lists to the correct queues for execution.

Note: In DX12 this is a bit more complicated as we have to interleave signaling / waiting on fences between command list execution (ExecuteCommandList).

Next up

I’ve deliberately not dived into too much details in this post to make it a bit easier to digest. I think I’ve manage to cover the overall design of a RenderDevice though, enough to make it easier for people diving into the code for the first time.

With this post we’ve reached half-way through this series, we have covered the “low-level” aspects of the Stingray rendering architecture. As of next post we will start looking at more high-level stuff, starting with the RenderInterface which is the main interface for other threads to talk with the renderer.

64 comments:

  1. Very interesting series, thanks a lot!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Your Shader::SortMode-enum has a typo in it.
    It should be IMMEDIATE, not IMMADIATE.

    ReplyDelete
  4. Top airlines in the world
    An airline is an organization that gives air transport administrations to voyaging travelers and cargo. Carriers use flying machine to supply these administrations, and may frame organizations or coalitions with different airlines for codeshare understandings. For the most part, airline organizations are perceived with an air working testament or permit issued by a legislative aeronautics body
    Visit for more :-Qantas Airlines Phone Number

    ReplyDelete
  5. PosLaju parcel tracker of the Malaysia & World. Add tracking number to track your PosLaju packages as well as obtain delivery status online.
    https://poslajutracking.xyz/
    poslaju tracking
    poslaju track and trace
    poslaju tracking number
    poslaju tracking express

    ReplyDelete
  6. This is a great article, with lots of information in it, These types of articles interest users in your site. Please continue to share more interesting articles!

    ReplyDelete
  7. Among, Infrastructure as a Service (IaaS), Software as a Service (SaaS), and Platform as a Service (PaaS), AWS chooses right kind of distributed computing and gives it to the business. AWS is known for adaptability with recognizable design, databases, operating systems, and programming dialects. It likewise guarantees security for the framework including physical, operational and programming measures. Thusly, in every one of the ways, AWS enables organizations to bring down their IT costs.

    For More Info:- AWS Institute in Gurgaon

    ReplyDelete
  8. excellant information please keep sharing such useful information.
    PACKERS AND MOVERS

    ReplyDelete
  9. Finding the best Help with Medical Assignment is not easy unless one is keen to establish a professional medical assignment help & medical homework help online.

    ReplyDelete
  10. Superb topic Resource Management you share.Thanks for taking the time to discuss this. I feel about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with more information? It is extremely helpful for me.Get best Mobile App Development Dubai you visit here for more info.

    ReplyDelete
  11. Hello! I think these information Will be helpful for you.

    openergroup.com
    kopithecat25.wixsite.com/style1982
    Thank you!

    ReplyDelete
  12. Such a great post you share with us, I really appreciate your work and content idea and I have found here lots of knowledgeable information this website perfect for my need. for More Information Please Visit: Outsource SEO Link Building Services

    ReplyDelete
  13. what great info. It is truly amazing. I have not read this type of post to date. Thank you very much for sharing this information.
    Is the print job being interrupted by Epson Error Code 0XF1? Do not waste outside. You can find a solution. In this report, we have provided a list of solutions for Epson Printer Error 0XF1. To remove this Epson 0XF1 printer error code, you may visit our website.

    ReplyDelete
  14. Hey, what a blog you write,is such wonderful wrinting i never read this ind of stuff.
    Lets talk about one of the best senior care sevices all over Colorado or sarrounding areas. So i rescomand you the Gardens Care Homes Company who is top leading organization in the country for assisted living indutry. Dont worry about the prices, our cheap rates always suits your pocket.
    Arvada Assisted Living
    Lakewood Assisted Living
    Castle Rock Assisted Living
    Aurora Assisted Living
    Federal Heights Assisted Living
    Denver Assisted Living
    Colorado Assisted Living

    ReplyDelete
  15. Some facts I agree to your points but some I don't. Yes, I want to appreciate your hardwork for sharing this information but at my part I have to research more. Though there are some interesting view angle I could find in your remark. Thanks for sharing.
    hire wordpress developer india
    php developers
    outsource digital marketing services

    ReplyDelete
  16. I have just come across your post and I believe this is exactly what I am looking for. I want an economics assignment help from a tutor who can guarantee me a top grade. Do you charge per page or does it depend on the
    bulk of the economics homework help being completed? More to that if the work is not good enough do you offer free corrections.

    ReplyDelete
  17. Matlab Assignment Help helped me to complete my seventh Matlab assignment, which was also the best-performed! It scored 92/100, which I've never scored before on any other assignment/exam in my lifetime. Otherwise, their service was as quick as usual. The delivery was also on time. I'm now requesting to use this same programmer multiple times. He seems the best in Image Processing tasks. Meanwhile, I'll ask for more Matlab Homework Help soon.

    ReplyDelete
  18. I am looking for a Statistics Assignment Help expert for Statistics Homework Help. I have struggled enough with statistics and therefore I just can't do it anymore on my own. . I have come across your post and I think you are the right person to provide me with SPSS homework help. Let me know how much you charge per assignment so that I can hire you today.

    ReplyDelete
  19. Besides my C course, I have a job and family, both of which compete to get my time. I couldn't find sufficient time for the challenging C assignments, and these people came in and saved my skin. I must commend them for the genius Programming Assignment Help. Their C Homework Help tutors did the best job and got me shining grades.

    ReplyDelete
  20. Hi, other than economics assignment help are there other subjects that you cover? I am having several assignments one needs an economics homework help expert and the other one needs a financial expert. If you can guarantee quality work on both then I can hire you to complete them. All I am sure of is that I can hire you for the economics one but the finance one I am not sure.

    ReplyDelete
  21. Me and my classmates took too long to understand Matlab Assignment Help pricing criteria. we're always grateful for unique solutions on their Matlab assignments. Matlab Homework Help experts have the right experience and qualifications to work on any programming student's homework. They help us in our project.

    ReplyDelete
  22. Hey STATA homework help expert, I need to know if you can conduct the Kappa measurement of agreement. This is what is in my assignment. I can only hire someone for statistics assignment help if they are aware of the kappa measurement of agreement. If you can do it, then reply to me with a few lines of what the kappa measure of agreement is. Let me know also how much you charge for statistics homework help in SAS.

    ReplyDelete
  23. The ardent Programming Homework Help tutor that nailed down my project was very passionate. He answered my Python questions with long, self-explanatory solutions that make it easy for any average student to revise. Moreover, he didn't hesitate to answer other questions, too, even though they weren't part of the exam. If all Python Homework Help experts can be like this then they can trend as the best Programming school ever online.

    ReplyDelete
  24. Hello. Please check the task I have just sent and reply as soon as possible. I want an adjustment assignment done within a period of one week. I have worked with an Accounting Homework Help tutor from your team and therefore I know it’s possible to complete it within that period. Let me know the cost so that I can settle it now as your Accounting Assignment Help experts work on it.

    ReplyDelete
  25. That is a huge number of students. Are they from the same country or different countries? I also want your math assignment help. I want to perform in my assignments and since this is what you have been doing for years, I believe you are the right person for me. Let me know how much you charge for your math homework help services.

    ReplyDelete
  26. I don’t have time to look for another expert and therefore I am going to hire you with the hope that I will get quality economics assignment help .Being aneconomics homework help professor I expect that your solutions are first class. All I want to tell you is that if the solutions are not up to the mark I am going to cancel the project.

    ReplyDelete
  27. Hey there, I need an Statistics Homework Help expert to help me understand the topic of piecewise regression. In our lectures, the concept seemed very hard, and I could not understand it completely. I need someone who can explain to me in a simpler way that I can understand the topic. he/she should explain to me which is the best model, the best data before the model and how to fit the model using SPSS. If you can deliver quality work then you would be my official Statistics Assignment Help partner.

    ReplyDelete
  28. Water damage can be a homeowner’s worst nightmare. Not only is your home rendered unlivable for the foreseeable future, but you’ve got a massive water damage cleanup and restoration process to deal with.

    ReplyDelete
  29. The Original Forex Trading System: tradeatf Is The Original Forex Trading System. It Is 100% Automated And Provides An Easy-to-follow Trading System. You Get Access To Real-time Signals, Proven Methods, And A Money-back Guarantee.

    ReplyDelete
  30. Daebak!! this was an extremely good post. Taking the time and actual effort to produce a top notch article… But what can I say… Just check my website and see more 카지노사이트

    ReplyDelete
  31. I had this article saved some time before but my computer crashed. I have since gotten a new one and it took me a while to locate this! 파워볼

    ReplyDelete
  32. Great article, This post helps me a lot Thank you. Anyways I have this site recommendation for you, Just follow the given link here:

    스포츠토토
    토토
    안전놀이터
    토토사이트

    ReplyDelete
  33. I came onto your blog while focusing just slightly submits. Nice strategy for next, I will be bookmarking at once seize your complete rises
    경마사이트

    magosucowep

    ReplyDelete
  34. This is the perfect post.안전놀이터 It helped me a lot. If you have time, I hope you come to my site and share your opinions. Have a nice day.

    ReplyDelete
  35. Thanks for your post! Through your pen I found the problem up interesting! I believe there are many other people who are interested in them just like me! How long does it take to complete this article? I have read through other blogs, but they are cumbersome and confusing.
    happy wheels

    토토사이트
    메이저사이트 목록

    ReplyDelete
  36. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up

    토토사이트
    먹튀검증

    ReplyDelete
  37. I have read your blog and I gathered some new information through your blog. Thanks for sharing the information

    온라인카지노
    카지노

    ReplyDelete
  38. It's perfect time to make a few plans for the future and it is time to be happy. I've learn this submit and if I may just I desire to recommend you some attention-grabbing things or advice. Maybe you could write subsequent articles referring to this article. I want to read even more issues approximately it!



    Also visit my web page :

    카지노사이트추천
    온라인카지노

    ReplyDelete
  39. AximTrade Review Offers A Safe And Secure Platform To Do Forex Trading And CFDs And Our Customer Support Is Ready To Help You 24/7. You Can Easily Sign Up Your Aximtrade Login Account Here.

    ReplyDelete
  40. I do not even know how I ended up here, but I thought this post was good. I do not know who you are but definitely you are going to a famous blogger if you aren’t already ?? Cheers!
    II안전토토사이트

    ReplyDelete
  41. Thanks for sharing this amazing and nice post. Looking for the best dissertation help uk turnout to Assignments Planet for all dissertation services at a cheap price.

    ReplyDelete
  42. That's a really impressive new idea! 메이저놀이터 It touched me a lot. I would love to hear your opinion on my site. Please come to the site I run once and leave a comment. Thank you.


    ReplyDelete
  43. Hello ! I am the one who writes posts on these topics크레이지슬롯 I would like to write an article based on your article. When can I ask for a review?


    ReplyDelete
  44. I accidentally searched and visited your site. I still saw several posts during my visit, but the text was neat and readable. I will quote this post and post it on my blog. Would you like to visit my blog later? keonha cai


    ReplyDelete
  45. I've been looking for photos and articles on this topic over the past few days due to a school assignment, 안전놀이터 and I'm really happy to find a post with the material I was looking for! I bookmark and will come often! Thanks :D


    ReplyDelete
  46. I've learn a few excellent stuff here. Definitely value bookmarking
    for revisiting. I wonder how so much effort you set to create
    such a magnificent informative website. 토토사이트


    ReplyDelete
  47. Youre so right. Im there with you. Your weblog is definitely worth a read if anyone comes throughout it. Im lucky I did because now Ive received a whole new view of this. 메이저사이트

    ReplyDelete
  48. I've been troubled for several days with this topic. 메이저놀이터추천, But by chance looking at your post solved my problem! I will leave my blog, so when would you like to visit it?


    ReplyDelete
  49. In my opinion, the item you posted is perfect for being selected as the best item of the year. You seem to be a genius to combine 먹튀사이트 and . Please think of more new items in the future!

    ReplyDelete
  50. Wow, that’s what I was searching for, what a information! present here at this blog, thanks admin of this web page. 더킹카지노

    ReplyDelete