tag:blogger.com,1999:blog-19941307838741752662024-03-19T08:25:23.808+01:00bitsquid: development blogNiklashttp://www.blogger.com/profile/10055379994557504977noreply@blogger.comBlogger120125tag:blogger.com,1999:blog-1994130783874175266.post-59174320563622158132017-09-28T19:20:00.000+02:002017-09-28T19:21:54.020+02:00Physical Cameras in Stingray<p>This is a quick blog to share some of the progress <a href="https://twitter.com/olivier_dionne">Olivier Dionne</a> and I made lately with Physical Cameras in Stingray. Our goal of implementing a solid physically based pipeline has always been split in three phases. First we <a href="http://bitsquid.blogspot.ca/2017/07/validating-materials-and-lights-in.html">validated our standard material</a>. We then added physical lights. And now we are wrapping it up with a physical camera.</p>
<p>We define a physical camera as an entity controlled by the same parameters a real world camera would use. These parameters are split into two groups which corresponds to the two main parts of a camera. The camera <em>body</em> is defined by it's sensor size, iso sensitivity, and a range of available shutter speeds. The camera <em>lens</em> is defined by it's focal length, focus range, and range of aperture diameters. Setting all of these parameters should expose the incoming light the same way a real world camera would.</p>
<h3><a href="#stingray-representation" aria-hidden="true" class="anchor" id="user-content-stingray-representation"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Stingray Representation</h3>
<p>Just like our physical light, our camera is expressed as an entity with a bunch of components. The main two components being the Camera Body and the Camera Lens. We then have a transform component and a camera component which together represents the view projection matrix of the camera. After that we have a list of shading environment components which we deem relevant to be controlled by a physical camera (all post effects relevent to a camera). The state of these shading environment components is controled through a script component called the "Physical Camera Properties Mapper" (more on this later). Here is a glimpse of what the Physical Camera entity may look like (wip):</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNFmtzaMUWBTgrFLfjNJhRW8UtCQVvo_0CQs3fEAINJHT3274ms864Myu3ffgx9_5sn86IMEyIofw4jc6drGnBvfgUGQ4rTvXl8DHupzkyEJoiLJtWj0EPC1hv7nda4kNns75GWi_XO7St/s1600/res11.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNFmtzaMUWBTgrFLfjNJhRW8UtCQVvo_0CQs3fEAINJHT3274ms864Myu3ffgx9_5sn86IMEyIofw4jc6drGnBvfgUGQ4rTvXl8DHupzkyEJoiLJtWj0EPC1hv7nda4kNns75GWi_XO7St/s640/res11.jpg" width="640" height="392" data-original-width="773" data-original-height="473" /></a></div>
<p>So while there are a lot of components that belongs to a physical camera, the user is expected to interact mainly with the body and the lens components.</p>
<h3><a href="#post-effects" aria-hidden="true" class="anchor" id="user-content-post-effects"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Post Effects</h3>
<p>A lot of our post effects are dedicated to simulate some sort of camera/lens artifact (DOF, motion blur, film grain, vignetting, bloom, chromatic aberation, ect). One thing we wanted was the ability for physical cameras to override the post processes defined in our global shading environments. We also wanted to let users easily opt out of the physically based mapping that occurred between a camera and it's corresponding post-effect. For example a physical camera will generate an accurate circle of confusion for the depth of field effect, but a user might be frustrated by the limitations imposed by a physically correct dof effect. In this case a user can opt out by simply deleting the "Depth Of Field" component from the camera entity.</p>
<p>It's nice to see how the expressiveness of the Stingray entity system is shaping up and how it enables us to build these complex entities without the need to change much of the engine.</p>
<h3><a href="#properties-mapper" aria-hidden="true" class="anchor" id="user-content-properties-mapper"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Properties Mapper</h3>
<p>All of the mapping occurs in the properties mapper component which I mentioned earlier. This is simply a lua script that gets executed whenever any of the entity properties are edited.</p>
<p>The most important property we wanted to map was the exposure value. We wanted the f-stop, shutter speed, and ISO values to map to an exposure value which would simulate how a real camera sensor reacts to incoming light. Lucky for us this topic is very well covered by Sebastien Lagarde and Charles de Rousiers in their awesome awesome awesome <a href="https://seblagarde.files.wordpress.com/2015/07/course_notes_moving_frostbite_to_pbr_v32.pdf">Moving Frostbite to Physically Based Rendering</a> document. The mapping basically boils down to:</p>
<pre><code>local function compute_ev(aperture, shutter_time, iso)
local ev_100 = log2((aperture * aperture * 100) / (shutter_time * iso))
local max_luminance = 1.2 * math.pow(2, ev_100)
return (1 / max_luminance)
end
</code></pre>
<p>The second property we were really interested in mapping is the field of view of the camera. Usually the horizontal FOV is calculated as <em>2 x atan(h/2f)</em> where <em>h</em> is the camera sensor's width and <em>f</em> is the current focal length of the lens. This by itself gives a good approximation of the FOV of a lens, but as was pointed out by the <a href="https://youtu.be/FQMbxzTUuSg?t=50m12s">MGS5 & Fox Engine presentation</a>, the focus distance of the lens should also be considered when calculating the FOV from the camera properties.</p>
<p>Intuitively we though that the change in the FOV was caused by a change in the effective focal length of the lens. Adjusting the focus usually shifts a group of lenses up and down the optical axis of a camera lens. Our best guess was that this shift would increase or decrease the effective focal length of the camera lens. Using this idea we we're able to simulate the effect that changing the focus point has on the FOV of a camera:</p>
<iframe width="700" height="354" src="https://www.youtube.com/embed/KDwUi-vYYMQ" frameborder="0" allowfullscreen></iframe>
<pre><code>local function compute_fov(focal_length, film_back_height, focus)
local normalized_focus = (focus - 0.38)/(5.0 - 0.38)
local focal_length_offset = lerp(0.0, 1.0, normalized_focus)
return 2.0 * math.atan(film_back_height/(2.0 * (focal_length + focal_length_offset)))
end
</code></pre>
<p>While this gave us plausible results in some cases, it does not map accurately to a real world camera with certain lens settings. For example we can choose a focal length offset that gives good FOV mapping for a zoom lens set to 24mm but incorrect FOV results when it's set to 70mm (see <a href="https://www.youtube.com/watch?v=KDwUi-vYYMQ&feature=youtu.be">video</a> above). This area of lens optics is one we would like to explore more in the future.</p>
<p>In the future we will map more camera properties to their corresponding post-effects. More on this in a follow up blog.</p>
<h3><a href="#validating-results" aria-hidden="true" class="anchor" id="user-content-validating-results"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Validating Results</h3>
<p>To validate our mappings we designed a small, controlled environment room that we re-created in stingray. This idea was inspired by the "Conference Room" setup that was presented by Hideo Kojima in the <a href="https://youtu.be/FQMbxzTUuSg?t=20m22s">MGS5 & Fox Engine presentation</a>. We used our simplified, environment room to compare our rendered results with real world photographs.</p>
<p>Controlled Environment:
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuCurcamsq2Yqo1TUhCUntCChvNorP9JKtuUYb9Oa2ucu5EJR4_HB3REw_jWs197BVcUU5W1aaLKpkxmuv1qhrSMR9M2oTziZSJlbAKC4EdthdRrke530C2vRC2O7PkYKHG2RoPNXwNhbd/s1600/res4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuCurcamsq2Yqo1TUhCUntCChvNorP9JKtuUYb9Oa2ucu5EJR4_HB3REw_jWs197BVcUU5W1aaLKpkxmuv1qhrSMR9M2oTziZSJlbAKC4EdthdRrke530C2vRC2O7PkYKHG2RoPNXwNhbd/s640/res4.jpg" width="640" height="372" data-original-width="1099" data-original-height="638" /></a></div>
<p>Stingray Equivalent:
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ4cV2Y0W0az3X2gMXcfxRPBa_SUSYSx37GENF5Rsfsv8LOxWLgKKU8qCv0qWtRjlwRzdegkUMI8GNt4grfMdsJWPOov_1XXUq8aBGRSMIkDn3tTpj6yWWum1jaUZu0zGo8p_wpqJ1Rcpz/s1600/res3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ4cV2Y0W0az3X2gMXcfxRPBa_SUSYSx37GENF5Rsfsv8LOxWLgKKU8qCv0qWtRjlwRzdegkUMI8GNt4grfMdsJWPOov_1XXUq8aBGRSMIkDn3tTpj6yWWum1jaUZu0zGo8p_wpqJ1Rcpz/s640/res3.jpg" width="640" height="372" data-original-width="1099" data-original-height="638" /></a></div>
<p>Since there is no convenient way to adjust the white balancing in Stingray, we decided to white balance our camera data and use a pure white light in our Stingray scene. We also decided to compare the photos and renders in linear space in the hope to minimize the source of potential error.</p>
<p>White balancing our photographs:
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfO0d-x94Z4IjaeUKZwyg53lEnQ7xF8JNlM5L54RRTmK94EF-H6XBJlBCkwBfEyfsY2B4uGlwTUDQC-EGgrgE7nq6VrkYImEgDFfqmk9_bnUXHiwOiaQHNclyTvmRR0n5SB8xqZfTeNOXR/s1600/res6.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfO0d-x94Z4IjaeUKZwyg53lEnQ7xF8JNlM5L54RRTmK94EF-H6XBJlBCkwBfEyfsY2B4uGlwTUDQC-EGgrgE7nq6VrkYImEgDFfqmk9_bnUXHiwOiaQHNclyTvmRR0n5SB8xqZfTeNOXR/s640/res6.gif" width="640" height="362" data-original-width="1336" data-original-height="756" /></a></div>
<p>Our very first comparison we're disapointing:
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEcL2FwtrVk1tEy1L0gl4owICUeJz1kQh-7zOasQrizVQZADMIX5QR2kNakjay-EHr-DulitXyU7eQt3GSkPGnLK8c3ApB-SBfgd9uqZNB0ykLhTOanvzkR9JZy5DrSGidw5QIL2ppAFC-/s1600/res10.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEcL2FwtrVk1tEy1L0gl4owICUeJz1kQh-7zOasQrizVQZADMIX5QR2kNakjay-EHr-DulitXyU7eQt3GSkPGnLK8c3ApB-SBfgd9uqZNB0ykLhTOanvzkR9JZy5DrSGidw5QIL2ppAFC-/s640/res10.jpg" width="640" height="186" data-original-width="1600" data-original-height="465" /></a></div>
<p>We tracked down the difference in brightness to a problem with how we expressed our light intensity. We discovered that we made the mistake of using the specified lumen value of our lights as it's light intensity. The total luminous flux is expressed in lumens, but the luminous intensity (what the material shader is interested in) is actually the luminous flux per solid angle. So while we let the users enter the "intensity" of lights in lumens, we need to map this value to luminous intensity. The mapping is done as <em>lumens/2π(1-cos(½α))</em> where <em>α</em> is the apex angle of the light. Lots of details can be found <a href="https://www.compuphase.com/electronics/candela_lumen.htm">here</a>. This works well for point lights and spot lights. In the future our directional lights will be assumed to be the sun or moon and will be expressed in lux, perhaps with a corresponding disk size.</p>
<p>With this fix in place we started getting more encouraging results:
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihU9SjPwlGPKVhz5Hu-FFKrXLDS8h9YEBzzcgwkliCThx81PNAc7IKJS1GkyOdPVu_yhd8jnS5RYTUAA1TAq2fW1xEXdZGRBdh28JRKQvIa9Ge62AglwpmNZHTsIQOxYOJdzTez_pp3GfN/s1600/res1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihU9SjPwlGPKVhz5Hu-FFKrXLDS8h9YEBzzcgwkliCThx81PNAc7IKJS1GkyOdPVu_yhd8jnS5RYTUAA1TAq2fW1xEXdZGRBdh28JRKQvIa9Ge62AglwpmNZHTsIQOxYOJdzTez_pp3GfN/s640/res1.jpg" width="640" height="186" data-original-width="1600" data-original-height="465" /></a></div>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTDnywJ6LQEOD18u_SlPOf2NSIRYReaGRWlxeWgoKK2dPjTJ3NZW00zlnnrzPG0WBZ4BfkDojE4VEUjMs2Vrxl3zIUf662qV65V1XdCul_OQVZBnHzBGOA0Y53bxPVG4EHs9QRdxbybrJM/s1600/res2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTDnywJ6LQEOD18u_SlPOf2NSIRYReaGRWlxeWgoKK2dPjTJ3NZW00zlnnrzPG0WBZ4BfkDojE4VEUjMs2Vrxl3zIUf662qV65V1XdCul_OQVZBnHzBGOA0Y53bxPVG4EHs9QRdxbybrJM/s640/res2.jpg" width="640" height="186" data-original-width="1600" data-original-height="465" /></a></div>
<p>There is lots left to do but this feels like a very good start to our physically based cameras. This is my last post on the Stingray blog but you can follow <a href="https://twitter.com/olivier_dionne">Olivier</a> on Twitter if you want to stay up to date with the advances made by the Stingray rendering team. Cheers!</p>Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com296tag:blogger.com,1999:blog-1994130783874175266.post-58666852745152956012017-08-14T01:50:00.002+02:002017-08-14T01:50:44.440+02:00Notes On Screen Space HIZ Tracing<p>Note: The <a href="https://github.com/greje656/Questions/blob/master/hiz.md">Markdown version</a> of this document is available and might have better formatting on phones/tablets.</p>
<p>The following is a small gathering of notes and findings that we made throughout the implementation of hiz tracing in screen space for ssr in Stingray. I recently heard a few claims regarding hiz tracing which motivated me to share some notes on the topic. Note that I also wrote about how we reproject reflections in a <a href="http://bitsquid.blogspot.ca/2017/06/reprojecting-reflections_22.html">previous entry</a> which might be of interest. Also note that I've included all the code at the bottom of the blog.</p>
<p>The original implementation of our hiz tracing method was basically a straight port of the "Hi-Z Screen-Space Tracing" described in <a href="https://www.crcpress.com/GPU-Pro-5-Advanced-Rendering-Techniques/Engel/p/book/9781482208634">GPU-Pro 5</a> by <a href="https://twitter.com/yasinuludag">Yasin Uludag</a>. The very first results we got looked something like this:</p>
<p>Original scene:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr1.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr1.jpg" alt="" style="max-width:100%;"></a></p>
<p>Traced ssr using hiz tracing:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr2.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr2.jpg" alt="" style="max-width:100%;"></a></p>
<h2><a id="user-content-artifacts" class="anchor" href="#artifacts" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Artifacts</h2>
<p>The weird horizontal stripes were reported when ssr was enabled in the Stingray editor. They only revealed themselves for certain resolution (they would appear and disappear as the viewport got resized). I started writing some tracing visualization views to help me track each hiz trace event:</p>
<p><a href="https://github.com/greje656/Questions/blob/master/images/ssr-gif7.gif" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr-gif7.gif" alt="" style="max-width:100%;"></a></p>
<p>Using these kinds of debug views I was able to see that for some resolution, the starting position of a ray when traced at half-res happened to be exactly at the edge of a hiz cell. Since tracing the hiz structure relies on intersecting the current position of a ray with the boundary of cell it lies in, it means that we need to do a ray/plane intersection. As the numerator of (planes - pos.xy)/dir.xy got closer and closer to zero the solutions for the intersection started to loose precision until it completely fell apart.</p>
<p>To tackle this problem we snap the origin of each traced rays to the center of a hiz cell:</p>
<pre><code>float2 cell_count_at_start = cell_count(HIZ_START_LEVEL);
float2 aligned_uv = floor(input.uv * cell_count_at_start)/cell_count_at_start + 0.25/cell_count_at_start;
</code></pre>
<p>Rays traced with and without snapping to starting pos of the hiz cell center:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr-gif6.gif" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr-gif6.gif" alt="" style="max-width:100%;"></a></p>
<p>This looked better. However it didn't address all of the tracing artifacts we were seeing. The results were still plagued with lots of small pixels whose traced rays failed. When investigating these failing cases I noticed that they would sometimes get stuck for no apparent reason in a cell along the way. It also occurred more frequently when rays travelled in the screen space axes (±1,0) or (0,±1). After drawing a bunch of ray diagrams on paper I realized that the cell intersection method proposed in GPU-Pro had a failing case! To ensure hiz cells are always crossed, the article offsets the intersection planes of a cell by a small offset. This is to ensure that the intersection point crosses the boundaries of the cell it's intersecting so that the trace continues to make progress.</p>
<p>While this works in most cases there is one scenario which results in a ray that will not cross over into the next hiz cell (see diagram bellow). When this happens the ray wastes the rest of it's allocated trace iterations intersecting the same cell without ever crossing it. To address this we changed the proposed method slightly. Instead of offsetting the bounding planes, we choose the appropriate offset to add depending on which plane was intersected (horizontal or vertical). This ensures that we will always cross a cell when tracing:</p>
<pre><code>float2 cell_size = 1.0 / cell_count;
float2 planes = cell_id/cell_count + cell_size * cross_step + cross_offset;
float2 solutions = (planes - pos.xy)/dir.xy;
float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);
return intersection_pos;
</code></pre>
<pre><code>float2 cell_size = 1.0 / cell_count;
float2 planes = cell_id/cell_count + cell_size * cross_step;
float2 solutions = (planes - pos)/dir.xy;
float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);
intersection_pos.xy += (solutions.x < solutions.y) ? float2(cross_offset.x, 0.0) : float2(0.0, cross_offset.y);
return intersection_pos;
</code></pre>
<p><a href="https://github.com/greje656/Questions/blob/master/images/ssr19.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr19.jpg" alt="" style="max-width:100%;"></a></p>
<p>Incorrect VS correct cell crossing:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr-gif9.gif" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr-gif9.gif" alt="" style="max-width:100%;"></a></p>
<p>Final result:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr6.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr6.jpg" alt="" style="max-width:100%;"></a></p>
<h2><a id="user-content-ray-marching-towards-the-camera" class="anchor" href="#ray-marching-towards-the-camera" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Ray Marching Towards the Camera</h2>
<p>At the end of the GPU-Pro chapter there is a small mention that raymarching towards the camera with hiz tracing would require storing both the minimum and maximum depth value in the hiz structure (requiring to bump the format to a R32G32F format). However if you visualize the trace of a ray leaving the surface and travelling towards the camera (i.e. away from the depth buffer plane) then you can simply acount for that case and augment the algorithm described in GPU-Pro to navigate up and down the hierarchy until the ray finds the first hit with a hiz cell:</p>
<p><a href="https://github.com/greje656/Questions/blob/master/images/ssr-cam1.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr-cam1.jpg" alt="" style="max-width:100%;"></a></p>
<pre><code>if(v.z > 0) {
float min_minus_ray = min_z - ray.z;
tmp_ray = min_minus_ray > 0 ? ray + v_z*min_minus_ray : tmp_ray;
float2 new_cell_id = cell(tmp_ray.xy, current_cell_count);
if(crossed_cell_boundary(old_cell_id, new_cell_id)) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
level = min(HIZ_MAX_LEVEL, level + 2.0f);
}
} else if(ray.z < min_z) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
level = min(HIZ_MAX_LEVEL, level + 2.0f);
}
</code></pre>
<p>This has proven to be fairly solid and enabled us to trace a wider range of the screen space:</p>
<iframe width="700" height="394" src="https://www.youtube.com/embed/BjoMu-yI3k8" frameborder="0" allowfullscreen></iframe>
<h2><a id="user-content-ray-marching-behind-surfaces" class="anchor" href="#ray-marching-behind-surfaces" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Ray Marching Behind Surfaces</h2>
<p>Another alteration that can be made to the hiz tracing algorithm is to add support for rays to travel behind surface. Of course to do this you must define a thickness to the surface of the hiz cells. So instead of tracing against extruded hiz cells you trace against "floating" hiz cells.</p>
<p><a href="https://github.com/greje656/Questions/blob/master/images/ssr23.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr23.jpg" alt="" style="max-width:100%;"></a></p>
<p>With that in mind we can tighten the tracing algorithm so that it cannot end the trace unless it finds a collision with one of these floating cells:</p>
<pre><code>if(level == HIZ_START_LEVEL && min_minus_ray > depth_threshold) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
level = HIZ_START_LEVEL + 1;
}
</code></pre>
<p>Tracing behind surfaces disabled VS enabled:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr13.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr13.jpg" alt="" style="max-width:100%;"></a>
<a href="https://github.com/greje656/Questions/blob/master/images/ssr17.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr17.jpg" alt="" style="max-width:100%;"></a></p>
<p>Unfortunately this often means that the traced rays travelling behind a surface degenerate into a linear search and the cost can skyrocket for these traced pixels:</p>
<p>Number of iterations to complete the trace (black=0, red=64):
<a href="https://github.com/greje656/Questions/blob/master/images/ssr14.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr14.jpg" alt="" style="max-width:100%;"></a></p>
<h2><a id="user-content-the-problem-of-tracing-a-discrete-depth-buffer" class="anchor" href="#the-problem-of-tracing-a-discrete-depth-buffer" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Problem of Tracing a Discrete Depth Buffer</h2>
<p>For me the most difficult artifact to understand and deal with when implementing ssr is (by far) the implications of tracing a discreet depth buffer. Unless you can fully commit to the idea of tracing objects with infinite thicknesses, you will need to use some kind of depth threshold to mask a reflection if it's intersection with the geometry is not valid. If you do use a depth threshold then you can (will?) end up getting artifacts like these:</p>
<iframe width="700" height="394" src="https://www.youtube.com/embed/ZftaDG2q3D0" frameborder="0" allowfullscreen></iframe>
<p>The problem <em>as far as I understand it</em>, is that rays can osciliate from passing and failing the depth threshold test. It is essentially an amplified alliasing problem caused by the finite resolution of the depth buffer:
<a href="https://github.com/greje656/Questions/blob/master/images/ssr24.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr24.jpg" alt="" style="max-width:100%;"></a></p>
<p>I have experimented with adapting the depth threshold based on different properties of the intersection point (direction of reflected ray, angle of insidence at intersection, surface inclination at intersection) but I have never been able to find a silver bullet (or anything that resembles a bullet to be honest). Perhaps a good approach could be to interpolate the depth value of neighboring cells <em>if</em> the neighbors belong to the same geometry? I think that <a href="https://twitter.com/ikarosav">Mikkel Svendsen</a> proposed a solution to this problem while presenting <a href="https://youtu.be/RdN06E6Xn9E?t=40m27s">Low Complexity, High Fidelity: The Rendering of "INSIDE"</a> but I have yet to wrap my head around the proposed solution and try it.</p>
<h2><a id="user-content-all-or-nothing" class="anchor" href="#all-or-nothing" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>All or Nothing</h2>
<p>Finally it's worth pointing out that hiz tracing is a very "all or nothing" way to find an intersection point. Neighboring rays that exhaust their maximum number of allowed iterations to find an intersection can end up in very different screen spaces which can cause a noticeable discontinuity in the ssr buffer:</p>
<p><a href="https://github.com/greje656/Questions/blob/master/images/ssr26.jpg" target="_blank"><img src="https://github.com/greje656/Questions/raw/master/images/ssr26.jpg" alt="" style="max-width:100%;"></a></p>
<p>This is something that can be very distracting and made much worst when dealing with a jittered depth buffer when combined with taa. This side-effect should be considered carefully when choosing a tracing solution for ssr.</p>
<h2><a id="user-content-code" class="anchor" href="#code" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Code</h2>
<pre><code>float2 cell(float2 ray, float2 cell_count, uint camera) {
return floor(ray.xy * cell_count);
}
float2 cell_count(float level) {
return input_texture2_size / (level == 0.0 ? 1.0 : exp2(level));
}
float3 intersect_cell_boundary(float3 pos, float3 dir, float2 cell_id, float2 cell_count, float2 cross_step, float2 cross_offset, uint camera) {
float2 cell_size = 1.0 / cell_count;
float2 planes = cell_id/cell_count + cell_size * cross_step;
float2 solutions = (planes - pos)/dir.xy;
float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);
intersection_pos.xy += (solutions.x < solutions.y) ? float2(cross_offset.x, 0.0) : float2(0.0, cross_offset.y);
return intersection_pos;
}
bool crossed_cell_boundary(float2 cell_id_one, float2 cell_id_two) {
return (int)cell_id_one.x != (int)cell_id_two.x || (int)cell_id_one.y != (int)cell_id_two.y;
}
float minimum_depth_plane(float2 ray, float level, float2 cell_count, uint camera) {
return input_texture2.Load(int3(vr_stereo_to_mono(ray.xy, camera) * cell_count, level)).r;
}
float3 hi_z_trace(float3 p, float3 v, in uint camera, out uint iterations) {
float level = HIZ_START_LEVEL;
float3 v_z = v/v.z;
float2 hi_z_size = cell_count(level);
float3 ray = p;
float2 cross_step = float2(v.x >= 0.0 ? 1.0 : -1.0, v.y >= 0.0 ? 1.0 : -1.0);
float2 cross_offset = cross_step * 0.00001;
cross_step = saturate(cross_step);
float2 ray_cell = cell(ray.xy, hi_z_size.xy, camera);
ray = intersect_cell_boundary(ray, v, ray_cell, hi_z_size, cross_step, cross_offset, camera);
iterations = 0;
while(level >= HIZ_STOP_LEVEL && iterations < MAX_ITERATIONS) {
// get the cell number of the current ray
float2 current_cell_count = cell_count(level);
float2 old_cell_id = cell(ray.xy, current_cell_count, camera);
// get the minimum depth plane in which the current ray resides
float min_z = minimum_depth_plane(ray.xy, level, current_cell_count, camera);
// intersect only if ray depth is below the minimum depth plane
float3 tmp_ray = ray;
if(v.z > 0) {
float min_minus_ray = min_z - ray.z;
tmp_ray = min_minus_ray > 0 ? ray + v_z*min_minus_ray : tmp_ray;
float2 new_cell_id = cell(tmp_ray.xy, current_cell_count, camera);
if(crossed_cell_boundary(old_cell_id, new_cell_id)) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
level = min(HIZ_MAX_LEVEL, level + 2.0f);
}else{
if(level == 1 && abs(min_minus_ray) > 0.0001) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
level = 2;
}
}
} else if(ray.z < min_z) {
tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
level = min(HIZ_MAX_LEVEL, level + 2.0f);
}
ray.xyz = tmp_ray.xyz;
--level;
++iterations;
}
return ray;
}
</code></pre>Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com507tag:blogger.com,1999:blog-1994130783874175266.post-22879722261514204382017-07-16T04:48:00.001+02:002017-07-16T04:59:12.854+02:00Validating materials and lights in Stingray<div dir="ltr" style="text-align: left;" trbidi="on">
<article class="markdown-body entry-content" itemprop="text">
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMDbNZB06JZKHgh7-3Ar_i7nwPljqluSXz5mkyJ_8mDGHwFjV5fDY_kw2h3ZS8HjVapDRWBEMx-j-aEgsOCXQ0XtAWsA6IRtnIzdOtaDaKv8nfUlPm33hqWD1-oPyxicWyWQk2RAcnJS6a/s1600/comp-01.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMDbNZB06JZKHgh7-3Ar_i7nwPljqluSXz5mkyJ_8mDGHwFjV5fDY_kw2h3ZS8HjVapDRWBEMx-j-aEgsOCXQ0XtAWsA6IRtnIzdOtaDaKv8nfUlPm33hqWD1-oPyxicWyWQk2RAcnJS6a/s640/comp-01.gif" width="640" height="311" data-original-width="1182" data-original-height="574" /></a></div>
<p>Stingray 1.9 is just around the corner and with it will come our new physical lights. I wanted to write a little bit about the validation process that we went through to increase our confidence in the behaviour of our materials and lights.</p>
<p>Early on we were quite set on building a small controlled "light room" similar to what the <a href="https://youtu.be/FQMbxzTUuSg?t=19m25s">Fox Engine team presented at GDC</a> as a validation process. But while this seemed like a fantastic way to confirm the entire pipeline is giving plausible results, it felt like identifying the source of discontinuities when comparing photographs vs renders might involve a lot of guess work. So we decided to delay the validation process through a controlled light room and started thinking about comparing our results with a high quality offline renderer. Since <a href="https://www.solidangle.com/">SolidAngle</a> joined Autodesk last year and that we had access to an <a href="https://www.solidangle.com/arnold/">Arnold</a> license server it seemed like a good candidate. Note that the Arnold SDK is extremely easy to use and can be <a href="https://www.solidangle.com/arnold/download">downloaded</a> for free. If you don't have a license you still have access to all the features and the only limitation is that the rendered frames are watermarked.</p>
<p>We started writing a Stingray plugin that supported simple scene reflection into Arnold. We also implemented a custom Arnold Output Driver which allowed us to forward Arnold's linear data directly into the Stingray viewport where they would be gamma corrected and tonemapped by Stingray (minimizing as many potential sources of error).</p>
<h3>Material parameters mapping</h3>
<p>The trickiest part of the process was to find an Arnold material which we could use to validate. When we started this work we used Arnold 4.3 and realized early that the Arnold's <a href="https://support.solidangle.com/display/AFMUG/Standard">Standard shader</a> didn't map very well to the Metallic/Roughness model. We had more luck using the <a href="http://www.anderslanglands.com/alshaders/alSurface.html">alSurface shader</a> with the following mapping:</p>
<pre><code>// "alSurface"
// =====================================================================================
AiNodeSetRGB(surface_shader, "diffuseColor", color.x, color.y, color.z);
AiNodeSetInt(surface_shader, "specular1FresnelMode", 0);
AiNodeSetInt(surface_shader, "specular1Distribution", 1);
AiNodeSetFlt(surface_shader, "specular1Strength", 1.0f - metallic);
AiNodeSetRGB(surface_shader, "specular1Color", white.x, white.y, white.z);
AiNodeSetFlt(surface_shader, "specular1Roughness", roughness);
AiNodeSetFlt(surface_shader, "specular1Ior", 1.5f); // ior = (n-1)^2/(n+1)^2 for 0.04
AiNodeSetRGB(surface_shader, "specular1Reflectivity", white.x, white.y, white.z);
AiNodeSetRGB(surface_shader, "specular1EdgeTint", white.x, white.y, white.z);
AiNodeSetInt(surface_shader, "specular2FresnelMode", 1);
AiNodeSetInt(surface_shader, "specular2Distribution", 1);
AiNodeSetFlt(surface_shader, "specular2Strength", metallic);
AiNodeSetRGB(surface_shader, "specular2Color", color.x, color.y, color.z);
AiNodeSetFlt(surface_shader, "specular2Roughness", roughness);
AiNodeSetRGB(surface_shader, "specular2Reflectivity", white.x, white.y, white.z);
AiNodeSetRGB(surface_shader, "specular2EdgeTint", white.x, white.y, white.z);
</code></pre>
<p>Stingray VS Arnold: roughness = 0, metallicness = [0, 1]
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxk0awKUL9YaQFDLSMjcMIzaiMx98ZTDXKUSzK4fJ3orLXsCRdG3xrWf_zzHhVH1kMf3euHB7G4uhEAkHs-z9rTkNW2Zfa2P16CnPSiNYvmM6ZZxN5CE8yoB_qQIQNes_bVTtngUIuQ4SE/s1600/res1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxk0awKUL9YaQFDLSMjcMIzaiMx98ZTDXKUSzK4fJ3orLXsCRdG3xrWf_zzHhVH1kMf3euHB7G4uhEAkHs-z9rTkNW2Zfa2P16CnPSiNYvmM6ZZxN5CE8yoB_qQIQNes_bVTtngUIuQ4SE/s640/res1.jpg" width="640" height="179" data-original-width="1399" data-original-height="391" /></a></div>
<p>Stingray VS Arnold: metallicness = 1, roughness = [0, 1]
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEVHnzRviknia95JQjKD3ZQ9CRJR1pQ7iJbe-SH6CZ2Hw4e53eioKH7azMIaH1vyP-rok3Ks-Zqsf3nPD0wvpgqsV6qPMIj6FUnXZzbYwgmaZvGPL3qMr4LFLcfa02wlCuKieZ9GJrkA2p/s1600/res3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEVHnzRviknia95JQjKD3ZQ9CRJR1pQ7iJbe-SH6CZ2Hw4e53eioKH7azMIaH1vyP-rok3Ks-Zqsf3nPD0wvpgqsV6qPMIj6FUnXZzbYwgmaZvGPL3qMr4LFLcfa02wlCuKieZ9GJrkA2p/s640/res3.jpg" width="640" height="179" data-original-width="1399" data-original-height="391" /></a></div>
<p>Halfway through the validation process Arnold 5.0 got released and with it came the new <a href="https://support.solidangle.com/display/A5AFMUG/Standard+Surface">Standard Surface shader</a> which is based on a Metalness/Roughness workflow. This allowed for a much simpler mapping:</p>
<pre><code>// "aiStandardSurface"
// =====================================================================================
AiNodeSetFlt(standard_shader, "base", 1.f);
AiNodeSetRGB(standard_shader, "base_color", color.x, color.y, color.z);
AiNodeSetFlt(standard_shader, "diffuse_roughness", 0.f); // Use Lambert for diffuse
AiNodeSetFlt(standard_shader, "specular", 1.f);
AiNodeSetFlt(standard_shader, "specular_IOR", 1.5f); // ior = (n-1)^2/(n+1)^2 for 0.04
AiNodeSetRGB(standard_shader, "specular_color", 1, 1, 1);
AiNodeSetFlt(standard_shader, "specular_roughness", roughness);
AiNodeSetFlt(standard_shader, "metalness", metallic);
</code></pre>
<h3>Investigating material differences</h3>
<p>The first thing we noticed is an excess in reflection intensity for reflections with large incident angles. Arnold supports <a href="https://support.solidangle.com/display/A5AFMUG/Introduction+to+Light+Path+Expressions">Light Path Expressions</a> which made it very easy to compare and identify the term causing the differences. In this particular case we quickly identified that we had an energy conservation problem. Specifically the contribution from the Fresnel reflections was not removed from the diffuse contribution:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhub5bnbhbnwViOsCT-6mI0iY_ag3DvMvAUFmSu0Mv8uibyNxlNVBY9tX4qFpsa5ds0ObeCvODwbnvkEgbQfjz6L-6P3cIdiMOHgOD5NusWLIYCI6j9w-nbiMakkPiMEK2vbKJMRE0XqwU6/s1600/fix1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhub5bnbhbnwViOsCT-6mI0iY_ag3DvMvAUFmSu0Mv8uibyNxlNVBY9tX4qFpsa5ds0ObeCvODwbnvkEgbQfjz6L-6P3cIdiMOHgOD5NusWLIYCI6j9w-nbiMakkPiMEK2vbKJMRE0XqwU6/s640/fix1.jpg" width="640" height="274" data-original-width="1399" data-original-height="599" /></a></div>
<p>Scenes with a lot of smooth reflective surfaces demonstrates the impact of this issue noticeably:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis3QYZA0GI0JeZ6CDn_5S2-f0acFQ2Y6oCNkdyzavikIEZLCZ3qJlKnKgSNWW4b7HA5xmyBw8m_8PNsIw1B3Em8HBY1IW9PyGLFWPmeGF6peupuL_nvf6jCFaAzqBLvG-qG2HymhFGTQJY/s1600/fixa.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis3QYZA0GI0JeZ6CDn_5S2-f0acFQ2Y6oCNkdyzavikIEZLCZ3qJlKnKgSNWW4b7HA5xmyBw8m_8PNsIw1B3Em8HBY1IW9PyGLFWPmeGF6peupuL_nvf6jCFaAzqBLvG-qG2HymhFGTQJY/s640/fixa.gif" width="640" height="256" data-original-width="1424" data-original-height="569" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVx94A6ljXPnBm60rnKwGA2cM8zBJGoOeBcFi86n1HBEtW2ndQwIr5hhb7exRkQZ5gsE5WUl02PCT4wcbKmUNMfLflOVq2zuF04u4BrivOyVAa3nkg51vS4ptcUWjBFO9Ir5RQh9QgwI8b/s1600/fixb.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVx94A6ljXPnBm60rnKwGA2cM8zBJGoOeBcFi86n1HBEtW2ndQwIr5hhb7exRkQZ5gsE5WUl02PCT4wcbKmUNMfLflOVq2zuF04u4BrivOyVAa3nkg51vS4ptcUWjBFO9Ir5RQh9QgwI8b/s640/fixb.gif" width="640" height="256" data-original-width="1424" data-original-height="569" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3OUpgMmb6ztH2l0hbjD0lDk3HQ14x_dKJj_Cw0lx5O7MMa2c9sBDgDdP9AVuqbeEzokMac1hn-N4x9AEW0CRQwxMODSIFlCRLKkiFXZT1j3iRdMy4Spb04Zt0_5salT8tJA5h1BUyoT4B/s1600/fixc.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3OUpgMmb6ztH2l0hbjD0lDk3HQ14x_dKJj_Cw0lx5O7MMa2c9sBDgDdP9AVuqbeEzokMac1hn-N4x9AEW0CRQwxMODSIFlCRLKkiFXZT1j3iRdMy4Spb04Zt0_5salT8tJA5h1BUyoT4B/s640/fixc.gif" width="640" height="256" data-original-width="1424" data-original-height="569" /></a></div>
<p>Another source of differences and confusion came from the tint of the Fresnel term for metallic surfaces. Different shaders I investigaed had different behaviors. Some tinted the Fresnel term with the base color while some others didn't:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR-YtfQGCg3qRCGGhofvDc20HgIr0S0CLON1cwEFvrDAZ2XAf3OBYo22K67wo5JrmndNutAauQhdHu-XPpBodItKSDG3SerNUp-GHryH7_dpBcBsWjk8Z_RIdAJ0Hi49YaZI9jYnYU8Ci2/s1600/metal3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR-YtfQGCg3qRCGGhofvDc20HgIr0S0CLON1cwEFvrDAZ2XAf3OBYo22K67wo5JrmndNutAauQhdHu-XPpBodItKSDG3SerNUp-GHryH7_dpBcBsWjk8Z_RIdAJ0Hi49YaZI9jYnYU8Ci2/s640/metal3.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<p>It wasn't clear to me how Fresnel's law of reflection applied to metals. I asked on Twitter what peoples thoughts were on this and got this simple and elegant <a href="https://twitter.com/BrookeHodgman/status/884532159331028992">claim</a> made by Brooke Hodgman: <em>"Metalic reflections are coloured because their Fresnel is wavelength varying, but Fresnel still goes to 1 at 90deg for every wavelength"</em>. This convinced me instantly that indeed the correct thing to do was to use an un-tinted Fresnel contribution regardless of the metallicness of the material. I later found this <a href="https://en.wikipedia.org/wiki/Reflectance">graph</a> which also confirmed this:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA-TdFvg3pchwGuwZrohGYH9zoMz8b1lRMYYnyqmy23QKAXhl02zlXvA9W09DZcZvVHnO87pONbj76RoNy0aNFmhPRfIXQ7eHsOjmeFNzfVV-gNTn46cT84sUeHaOiJbYHQ6vG6jXj9V9q/s1600/reflectance.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA-TdFvg3pchwGuwZrohGYH9zoMz8b1lRMYYnyqmy23QKAXhl02zlXvA9W09DZcZvVHnO87pONbj76RoNy0aNFmhPRfIXQ7eHsOjmeFNzfVV-gNTn46cT84sUeHaOiJbYHQ6vG6jXj9V9q/s640/reflectance.jpg" width="640" height="451" data-original-width="1100" data-original-height="775" /></a></div>
<p>For the Fresnel term we use a pre filtered Fresnel offset stored in a 2d lut (as proposed by Brian Karis in <a href="http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_slides.pdf">Real Shading in Unreal Engine 4</a>). While results can diverge slightly from Arnold's Standard Surface Shader (see "the effect of metalness" from Zap Anderson's <a href="https://www.dropbox.com/s/jt8dk65u14n2mi5/Physical%20Material%20-%20Whitepaper%20-%201.01.pdf?dl=0">Physical Material Whitepaper</a>), in most cases we get an edge tint that is pretty close:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6wDknyfbpj_9r_iQ3PxTJTaI9aqlZPEPe0pEfQQkspuMqtq6f58nJyFPB4tgkfNaFZ271DzMbPGL-S7Q_ppyhrIYShe3fanLZmNKL2lIHipbKFUSrSeDq3GV7N3ngA1ENbHemXhAqhyphenhyphenVk/s1600/metal4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6wDknyfbpj_9r_iQ3PxTJTaI9aqlZPEPe0pEfQQkspuMqtq6f58nJyFPB4tgkfNaFZ271DzMbPGL-S7Q_ppyhrIYShe3fanLZmNKL2lIHipbKFUSrSeDq3GV7N3ngA1ENbHemXhAqhyphenhyphenVk/s640/metal4.jpg" width="640" height="320" data-original-width="866" data-original-height="433" /></a></div>
</p>
<h3>Investigating light differences</h3>
<p>With the brdf validated we could start looking into validating our physical lights. Stingray currently supports point, spot, and directional lights (with more to come). The main problem we discovered with our lights is that the attenuation function we use is a bit awkward. Specifically we attenuate by I/(d+1)^2 as opposed to I/d^2 (Where 'I' is the intensity of the light source and 'd' is the distance to the light source from the shaded point). The main reason behind this decision is to manage the overflow that could occur in the light accumulation buffer. Adding the +1 effectively clamps the maximum value intensity of the light as the intensity set for that light itself i.e. as 'd' approaches zero 'I' approaches the intensity set for that light (as opposed to infinity). Unfortunatly this decision also means we can't get physically <a href="https://www.desmos.com/calculator/jydb51epow">correct light falloffs</a> in a scene:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj12PrBd-vvCx_NRfbbCIJBo7FD_yzMM1bu_2_G8A7qNcnqs6g5Ei6JrWrNuctSCcldb97m6gNGkD48eYzL6sEXpCDSLex6xT_f8KwftpjpbvxIhLPkNVkhyphenhyphenSw0PTX5SJsgobGQeeHvxH9g/s1600/graph.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj12PrBd-vvCx_NRfbbCIJBo7FD_yzMM1bu_2_G8A7qNcnqs6g5Ei6JrWrNuctSCcldb97m6gNGkD48eYzL6sEXpCDSLex6xT_f8KwftpjpbvxIhLPkNVkhyphenhyphenSw0PTX5SJsgobGQeeHvxH9g/s640/graph.gif" width="640" height="440" data-original-width="1472" data-original-height="1013" /></a></div>
<p>Even if we scale the intensity of the light to match the intensity for a certain distance (say 1m) we still have a different falloff curve than the physically correct attenuation. It's not too bad in a game context, but in the architectural world this is a bigger issue:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhikoI7lTCWrPGdrnNYtykXIV-S5sezWlL_kH7P9pa0pOBn-MPjM7CZVpGBKa9V3TL_AXzDUdQwjMNEqX2P_mj6EbbNy0iHwQXRQ-lu1iaFn5cgf6mExorDeKl9CtQjTEXiAEoXFztXaAug/s1600/fix-int5.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhikoI7lTCWrPGdrnNYtykXIV-S5sezWlL_kH7P9pa0pOBn-MPjM7CZVpGBKa9V3TL_AXzDUdQwjMNEqX2P_mj6EbbNy0iHwQXRQ-lu1iaFn5cgf6mExorDeKl9CtQjTEXiAEoXFztXaAug/s640/fix-int5.gif" width="640" height="267" data-original-width="1600" data-original-height="667" /></a></div>
<p>This issue will be fixed in Stingray 1.10. Using I/(d+e)^2 (where 'e' is 1/max_value along) with an EV shift up and down while writing and reading from the accumulation buffer as described by <a href="http://www.reedbeta.com/blog/artist-friendly-hdr-with-exposure-values/">Nathan Reed</a> is a good step forward.</p>
<p>Finally we were also able to validate our ies profile parser/shader and our color temperatures behaved as expected:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGS-20uTgAQ8Fm8i8OfiACoPx8rZQyHvpzjx9ldgkULo018MmSzF6Du9jQKTvqTnValqzyHl07Mjd-heqP17eaMJbLzOTW4173N2fgrAac25-6QNAzZ1_f0yUrqaKA8n1E_jX8AxScgnyy/s1600/ies2.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGS-20uTgAQ8Fm8i8OfiACoPx8rZQyHvpzjx9ldgkULo018MmSzF6Du9jQKTvqTnValqzyHl07Mjd-heqP17eaMJbLzOTW4173N2fgrAac25-6QNAzZ1_f0yUrqaKA8n1E_jX8AxScgnyy/s640/ies2.gif" width="640" height="265" data-original-width="1500" data-original-height="620" /></a></div>
<p/>
<h3>Results and final thoughts</h3>
<p/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMDbNZB06JZKHgh7-3Ar_i7nwPljqluSXz5mkyJ_8mDGHwFjV5fDY_kw2h3ZS8HjVapDRWBEMx-j-aEgsOCXQ0XtAWsA6IRtnIzdOtaDaKv8nfUlPm33hqWD1-oPyxicWyWQk2RAcnJS6a/s1600/comp-01.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMDbNZB06JZKHgh7-3Ar_i7nwPljqluSXz5mkyJ_8mDGHwFjV5fDY_kw2h3ZS8HjVapDRWBEMx-j-aEgsOCXQ0XtAWsA6IRtnIzdOtaDaKv8nfUlPm33hqWD1-oPyxicWyWQk2RAcnJS6a/s640/comp-01.gif" width="640" height="311" data-original-width="1182" data-original-height="574" /></a></div>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-jFAPWgpvqsjUJ7g0cekha1sQ-UHh6e014X4deUuplLkMOocLvSCsXIMWehqmFTyR79hKXuu7GgaS_EZzScxWvrvgfVtEmuGZqqgrzu40uaV7CHi69qTViY3zwYhWBkxKK_467j6j2ZG5/s1600/comp-03.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-jFAPWgpvqsjUJ7g0cekha1sQ-UHh6e014X4deUuplLkMOocLvSCsXIMWehqmFTyR79hKXuu7GgaS_EZzScxWvrvgfVtEmuGZqqgrzu40uaV7CHi69qTViY3zwYhWBkxKK_467j6j2ZG5/s640/comp-03.gif" width="640" height="311" data-original-width="1184" data-original-height="575" /></a></div>
<p>Integrating a high quality offline renderer like Arnold has proven invaluable in the process of validating our lights in Stingray. A similar validation process could be applicable to many other aspects of our rendering pipeline (antialiasing, refractive materials, fur, hair, post-effects, volumetrics, ect)</p>
<p>I also think that it can be a very powerful tool for content creators to build intuition on the impact of indirect lighting in a particular scene. For example in a simple level like this, adding a diffuse plane dramatically changes the lighting on the Buddha statue:</p>
<p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs-p4_wWd0X-ghcIcCTYrBEMCatirZqRRAOsE0S6kswZT57yE4lZfrVojjTPqAKDLRDKykNlulWm2V4JxjXp-Vt2NugPQsU88XoEGWxR-1Fl4A5NKKkzYUEh7YXF3czTPCvLb4wZudyHCz/s1600/diffuse.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs-p4_wWd0X-ghcIcCTYrBEMCatirZqRRAOsE0S6kswZT57yE4lZfrVojjTPqAKDLRDKykNlulWm2V4JxjXp-Vt2NugPQsU88XoEGWxR-1Fl4A5NKKkzYUEh7YXF3czTPCvLb4wZudyHCz/s640/diffuse.gif" width="640" height="424" data-original-width="843" data-original-height="558" /></a></div>
<p>The next step is now to compare our results with photographs gathered from a controlled environments. To be continued...</p>
</article>
<br /></div>
Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com292tag:blogger.com,1999:blog-1994130783874175266.post-83166831594725215892017-07-03T19:22:00.001+02:002017-07-05T20:22:59.807+02:00Physically Based Lens Flare<div dir="ltr" style="text-align: left;" trbidi="on">
While playing Horizon Zero Dawn I got inspired by the lens flare they supported and decided to look into implementing some basic ones in Stingray. There were four types of flare I was particularly interested in.
<br/>
<ol>
<li>Anisomorphic flare</li>
<li>Aperture diffraction (Starbursts)</li>
<li>Camera ghosts due to Sun or Moon (High Quality - What this post will cover)</li>
<li>Camera ghosts due to all other light sources (Low Quality - Screen Space Effect)</li>
</ol>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidjDZ0J8kkhP1NCBGy_4QbBTGS0LJsOBBOqO7_qAplg43mb4cyEqRF5PpcuB1i_rD0IPwkXMuPyAxeTIZFkjL75CLcwtRrIkW513s5hXsD8nzdJn6ct8psJ9BpheeesRxbZIfnxEeDTC-B/s1600/intro.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidjDZ0J8kkhP1NCBGy_4QbBTGS0LJsOBBOqO7_qAplg43mb4cyEqRF5PpcuB1i_rD0IPwkXMuPyAxeTIZFkjL75CLcwtRrIkW513s5hXsD8nzdJn6ct8psJ9BpheeesRxbZIfnxEeDTC-B/s640/intro.jpg" width="640" height="265" data-original-width="1600" data-original-height="662" /></a>Image credits:
<a href="http://www.imdb.com/title/tt0095016/">Die Hard</a> (1), <a href="https://blog.lopau.com/uv-lens-and-night-photography/">Just Another Dang Blog</a> (2), <a href="https://www.pexels.com/photo/sunrise-sunset-lens-flare-6889/">PEXELS</a> (3), <a href="http://www.imdb.com/title/tt0234215/">The Matrix Reloaded</a> (4)
</div>
<br/>
Once finished I'll do a follow up blog post on the Camera Lens Flare plugin, but for now I want to share the implementation details of the high-quality ghosts which are an implementation of <a href="http://resources.mpi-inf.mpg.de/lensflareRendering">"Physically-Based Real-Time Lens Flare Rendering"</a>.
<br/><br/>
<h3>Code and Results</h3>
All the code used to generate the images and videos of this article can can be found on <a href="https://github.com/greje656/PhysicallyBasedLensFlare">github.com/greje656/PhysicallyBasedLensFlare</a>.
<br/>
<br/><iframe width="685" height="343" src="https://www.youtube.com/embed/uMFu2EmPQw8" frameborder="0" allowfullscreen></iframe><br/>
<br/>
<h3>Ghosts</h3>
The basic idea of the "Physically-Based Lens Flare" paper is to ray trace "bundles" into a lens system which will end up on a sensor to form a ghost. A ghost here refers to the de-focused light that reaches the sensor of a camera due to the light reflecting off the lenses. Since a camera lens is not typically made of a single optical lens but many lenses there can be many ghosts that form on it's sensor. If we only consider the ghosts that are formed from two bounces, that's a total of <a href="https://www.desmos.com/calculator/rsrjo1mhy1">nCr(n,2)</a> possible ghosts <a href="https://en.wikipedia.org/wiki/Combination">combinations</a> (where n is the number of lens components in a camera lens)
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb5A9w-NjQcrKeb4sMtuO3-0wo6UWv817Ge4X1997eCsv_y6Wev5AimXNL0z2Zo9LMvFVjjBc9sr_5en_8fpe0mOXayB8ObhgSloS3AqiWHm03H7C7EHFsM78-o0k59VNrkft43DFANoHO/s1600/ghost04.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb5A9w-NjQcrKeb4sMtuO3-0wo6UWv817Ge4X1997eCsv_y6Wev5AimXNL0z2Zo9LMvFVjjBc9sr_5en_8fpe0mOXayB8ObhgSloS3AqiWHm03H7C7EHFsM78-o0k59VNrkft43DFANoHO/s640/ghost04.jpg" width="640" height="210" data-original-width="1600" data-original-height="524" /></a></div>
<br/>
<h3>Lens Interface Description</h3>
Ok let's get into it. To trace rays in an optical system we obviously need to build an optical system first. This part can be tedious. Not only have you got to find the "Lens Prescription" you are looking for, you also need to manually parse it. For example parsing the Nikon 28-75mm patent data might look something like this:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjHtrDwICd2WOj_UrAIXCKX3qQqZy3KE45xdZehOKsopJfc78RfYUf1iqK4Md-gAFBf2w-h5NL0ttNcHInS2ZzLktImI3CJHPzIlIgOYFVvLxUKzURTa74gJjBfMaHiR0Vjbg0M-VSsaAQ/s1600/lens-description.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjHtrDwICd2WOj_UrAIXCKX3qQqZy3KE45xdZehOKsopJfc78RfYUf1iqK4Md-gAFBf2w-h5NL0ttNcHInS2ZzLktImI3CJHPzIlIgOYFVvLxUKzURTa74gJjBfMaHiR0Vjbg0M-VSsaAQ/s640/lens-description.jpg" width="640" height="562" data-original-width="1426" data-original-height="1252" /></a></div>
<br/>
There is no standard way of describing such systems. You may find all the information you need from a lens patent, but often (especially for older lenses) you end up staring at an old document that seems to be missing important information required for the algorithm. For example, the Russian lens MIR-1 apparently produces beautiful lens flare, but the only lens description I could find for it was this:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfDF1gelXKdAfPaEGBHlUfSMtor_XwbMaX29wQc2ViSbMHo4xRmSAHckD-6-ODxNAxt89iRuAcYjalDYW8dEdaj90R_1MuEsfJ_NVds6J3hB-ig49-IB2o_Z3K3sy6P1w58fLtVXZKlfGV/s1600/mir-1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfDF1gelXKdAfPaEGBHlUfSMtor_XwbMaX29wQc2ViSbMHo4xRmSAHckD-6-ODxNAxt89iRuAcYjalDYW8dEdaj90R_1MuEsfJ_NVds6J3hB-ig49-IB2o_Z3K3sy6P1w58fLtVXZKlfGV/s640/mir-1.jpg" width="640" height="284" data-original-width="1040" data-original-height="462" /></a>MIP.1B manual<a href="http://allphotolenses.com/public/files/pdfs/ce6dd287abeae4f6a6716e27f0f82e41.pdf"></a></div>
<h3>Ray Tracing</h3>
Once you have parsed your lens description into something your trace algorithm can consume, you can then start to ray trace. The idea is to initialize a tessellated patch at the camera's light entry point and trace through each of the points in the direction of the incoming light. There are a couple of subtleties to note regarding the tracing algorithm.
<br/><br/>
First, when a ray misses a lens component the raytracing routine isn't necessarily stopped. Instead if the ray can continue with a path that is meaningful the ray trace continues until it reaches the sensor. Only if the ray misses the sphere formed by the radius of the lens do we break the raytracing routine. The idea behind this is to get as many traced points to reach the sensor so that the interpolated data can remain as continuous as possible. Rays track the maximum relative distance it had with a lens component while tracing through the interface. This relative distance will be used in the pixel shader later to determine if a ray had left the interface.
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDSyTV6zGB7jq8d9qNoLp-7mkvaHzfKjiArcL_0FFTf_cUCmkjt-8ZEqeR_yf5H5hkZXCI3QSVao17XBYJ47TAhug8SSVJl69fsB_t8JMrqhO6HgoZosS3f2dI_-gdqwp9uLJUEcxB2Y9z/s1600/trace-05.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDSyTV6zGB7jq8d9qNoLp-7mkvaHzfKjiArcL_0FFTf_cUCmkjt-8ZEqeR_yf5H5hkZXCI3QSVao17XBYJ47TAhug8SSVJl69fsB_t8JMrqhO6HgoZosS3f2dI_-gdqwp9uLJUEcxB2Y9z/s640/trace-05.jpg" width="640" height="216" data-original-width="1600" data-original-height="539" /></a> Relative distance visualized as green/orange gradient (black means ray missed lens component completely)</div>
<br/>
Secondly, a ray bundle carries a fixed amount of energy so it is important to consider the distortion of the bundle area that occurs while tracing them. In In the paper, the author states:
<center><blockquote><i>"At each vertex, we store the average value of its surrounding neighbours. The regular grid of rays, combined with the transform feedback (or the stream-out) mechanism of modern graphics hardware, makes this lookup of neighbouring quad values very easy"</i></blockquote></center>
I don't understand how the transform feedback, along with the available adjacency information of the geometry shader could be enough to provide the information of the four surrounding quads of a vertex (if you know please leave a comment). Luckily we now have compute and UAVs which turn this problem into a fairly trivial one. Currently I only calculate an approximation of the surrounding areas by assuming the neighbouring quads are roughly parallelograms. I estimate their bases and heights as the average lengths of their top/bottom, left/right segments. The results are seen as caustics forming on the sensor where some bundles converge into tighter area patches while some other dilates:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaY-_KBSsn_RtPk6F_gazFGk7DzDb-zd3VMNI7G6IbfNHcS4Kfyoi1UNzizTepZPaooP7b278v7XK5mQZmfo2Sjt2Ap8pL_NtMvulttZF_700YyYPmK68X4DhQoQfjUBjq9k_U3bVTNDCV/s1600/lens-area.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaY-_KBSsn_RtPk6F_gazFGk7DzDb-zd3VMNI7G6IbfNHcS4Kfyoi1UNzizTepZPaooP7b278v7XK5mQZmfo2Sjt2Ap8pL_NtMvulttZF_700YyYPmK68X4DhQoQfjUBjq9k_U3bVTNDCV/s640/lens-area.jpg" width="640" height="213" data-original-width="1422" data-original-height="474" /></a></div>
<br/>
This works fairly well but is <a href="https://github.com/greje656/PhysicallyBasedLensFlare/blob/master/Lens/lens.hlsl#L198">expensive</a>. Something that I intend to improve in the future.
<br/><br/>
Now that we have a traced patch we need to make some sense out of it. The patch "as is" can look intimidating at first. Due to early exits of some rays the final vertices can sometimes look like something went terribly wrong. Here is a particularly distorted ghost:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt5lkhJV66v8UQ4-DVAsvm02g59gld_mnZTKKi4TMam3qD3W20kRv06yzWPCXi8PXItt1aisqoMtuOcrJacrb4a93Uanp-7-9VNcdAnM1kqJ09VwOYeqeSfU4IksbkOXgg3jJ8FCeiWroq/s1600/discard03.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt5lkhJV66v8UQ4-DVAsvm02g59gld_mnZTKKi4TMam3qD3W20kRv06yzWPCXi8PXItt1aisqoMtuOcrJacrb4a93Uanp-7-9VNcdAnM1kqJ09VwOYeqeSfU4IksbkOXgg3jJ8FCeiWroq/s640/discard03.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
The first thing to do is discard pixels that exited the lens system:
<br/>
<pre><span style="font-family: "courier new" , "courier" , monospace;">float intensity1 = max_relative_distance < 1.0f;
float intensity = intensity1;
if(intensity == 0.f) discard;
</code></span></pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRvqKfg69FTZ-7tlkLMqTjOSzmaRlpYCs8HC-pnCASt5bep6YMCTTu5jYM923WQY7avts2pHNgb4nRF1I00Q0asTdAsuz2cC4mAfroSXgWDHV5bIZUo5MtyVmiAbHxxGGyyCh-HFj2vtp_/s1600/discard04.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRvqKfg69FTZ-7tlkLMqTjOSzmaRlpYCs8HC-pnCASt5bep6YMCTTu5jYM923WQY7avts2pHNgb4nRF1I00Q0asTdAsuz2cC4mAfroSXgWDHV5bIZUo5MtyVmiAbHxxGGyyCh-HFj2vtp_/s640/discard04.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
Then we can discard the rays that didn't have any energy as they entered to begin with (say outside the sun disk):
<pre><span style="font-family: "courier new" , "courier" , monospace;">float lens_distance = length(entry_coordinates.xy);
float sun_disk = 1 - saturate((lens_distance - 1.f + fade)/fade);
sun_disk = smoothstep(0, 1, sun_disk);
//...
float intensity2 = sun_disk;
float intensity = intensity1 * intensity2;
if(intensity == 0.f) discard;
</code></span></pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiZcFscDFsYT2Bw0go5qmP-MjavtZPvghw_CI5INErWAh3BTM8-jY6yGYY1ouz8hAAmReO2cBoHqIj5XGkaLS-sS7TYhrH9HSKaNpEFmOJag9Gli5x0Dks1qnlkc1Ent8944L2jw6vYIS/s1600/discard05.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiZcFscDFsYT2Bw0go5qmP-MjavtZPvghw_CI5INErWAh3BTM8-jY6yGYY1ouz8hAAmReO2cBoHqIj5XGkaLS-sS7TYhrH9HSKaNpEFmOJag9Gli5x0Dks1qnlkc1Ent8944L2jw6vYIS/s640/discard05.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
Then we can discard the rays that we're blocked by the aperture:
<pre><span style="font-family: "courier new" , "courier" , monospace;">//...
float intensity3 = aperture_sample;
float intensity = intensity1 * intensity2 * intensity3;
if(intensity == 0.f) discard;
</code></span></pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC26KoIsXCXjCa6k48Tg1fTHXPYHuNEXFir3tL1DDI1DxBY0zMD_lAGwSAlqKDqyEu85wAGS8WMQTSgv1UyELyZAHzkofB_dyXz5G-2TAprhsktuGY5F01IFoHzar_9y6t7wwHDZ_hR4WJ/s1600/discard06.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC26KoIsXCXjCa6k48Tg1fTHXPYHuNEXFir3tL1DDI1DxBY0zMD_lAGwSAlqKDqyEu85wAGS8WMQTSgv1UyELyZAHzkofB_dyXz5G-2TAprhsktuGY5F01IFoHzar_9y6t7wwHDZ_hR4WJ/s640/discard06.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
Finally we adjust the radiance of the beams based on their final areas:
<pre><span style="font-family: "courier new" , "courier" , monospace;">//...
float intensity4 = (original_area/(new_area + eps)) * energy;
float intensity = intensity1 * intensity2 * intensity3 * intensity4;
if(intensity == 0.f) discard;
</code></span></pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNa0uHxkWeaecrmcB5YlvAC7MGlWnEwTB3B7yU5wmI-YRxJRcgyrSnpUQION42SG8ZCsCLSuolKsZiSIbwY62PnOwHCkNsLuekIkHhL5tVjCqXhw27C5Vsu6FdAaRdAngkEswe_gmOVuAV/s1600/discard07.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNa0uHxkWeaecrmcB5YlvAC7MGlWnEwTB3B7yU5wmI-YRxJRcgyrSnpUQION42SG8ZCsCLSuolKsZiSIbwY62PnOwHCkNsLuekIkHhL5tVjCqXhw27C5Vsu6FdAaRdAngkEswe_gmOVuAV/s640/discard07.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
The final value is the rgb reflectance value of the ghost modulated by the incoming light color:
<pre><span style="font-family: "courier new" , "courier" , monospace;">float3 color = intensity * reflectance.xyz * TempToColor(INCOMING_LIGHT_TEMP);</code></span></pre>
<h3>Aperture</h3>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8rKKzCb07w4MyoIue4WqS-5Z2EWu_QEgJGR6NQ61DZoQ7pv_CmCupvPFRz7ozt5tK-htcqSNoPCxOonhrHjL_AqxF1GGOrygZ3Xo8j83YRl-VnOd74fFRZgKDp6cCPEE4Ki4q9mni_qLd/s1600/apertures4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8rKKzCb07w4MyoIue4WqS-5Z2EWu_QEgJGR6NQ61DZoQ7pv_CmCupvPFRz7ozt5tK-htcqSNoPCxOonhrHjL_AqxF1GGOrygZ3Xo8j83YRl-VnOd74fFRZgKDp6cCPEE4Ki4q9mni_qLd/s640/apertures4.jpg" width="640" height="164" data-original-width="1600" data-original-height="411" /></a>Image credits: <a href="http://6iee.com/755819.html">6iee</a></div>
<br/>
The aperture shape is built procedurally. As suggested by <a href="https://placeholderart.wordpress.com/2015/01/19/implementation-notes-physically-based-lens-flares/">Padraic Hennessy's blog</a> I use a signed distance field confined by "n" segments and threshold it against some distance value. I also experimented with approximating the light diffraction that occurs at the edge of the apperture blades using a <a href="https://www.desmos.com/calculator/munv7q2ez3">simple function</a>:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYh-8Dg8w_uHnJrhyphenhyphen_5B6FeiRCcL_l8ooKweIjh6kUKVek-rTCrvZfZw5IKOpG2DrY2RC22Uu7OsDrNWvTApgxNZQE1t-jMyWXdO63H0B-hJAiVQV76NokanmlyHrsJNE7S3MUlaYE7NVv/s1600/apertures1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYh-8Dg8w_uHnJrhyphenhyphen_5B6FeiRCcL_l8ooKweIjh6kUKVek-rTCrvZfZw5IKOpG2DrY2RC22Uu7OsDrNWvTApgxNZQE1t-jMyWXdO63H0B-hJAiVQV76NokanmlyHrsJNE7S3MUlaYE7NVv/s640/apertures1.jpg" width="640" height="213" data-original-width="1533" data-original-height="511" /></a></div>
<br/>
Finally, I offset the signed distance field with a repeating sin function which can give curved aperture blades:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjU-0YNsysJPv38_WWOEx2DnYhyphenhyphenlFYuHFOnkFJhQtzpUGZUtIU9QWMdTpBmAUy62RWl8M_sQPVoSskxYo-ltv1Oj1ifxGexM2xwUfZiuZ3QX-1Y0Pufb8_TYpCANDtZtlB6HbrXB4ujDT_l/s1600/apertures2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjU-0YNsysJPv38_WWOEx2DnYhyphenhyphenlFYuHFOnkFJhQtzpUGZUtIU9QWMdTpBmAUy62RWl8M_sQPVoSskxYo-ltv1Oj1ifxGexM2xwUfZiuZ3QX-1Y0Pufb8_TYpCANDtZtlB6HbrXB4ujDT_l/s640/apertures2.jpg" width="640" height="213" data-original-width="1533" data-original-height="511" /></a></div>
<br/>
<h3>Starburst</h3>
The starburst phenomena is due to light diffraction that passes through the small aperture hole. It's a phenomena known as the "single slit diffraction of light". The author got really convincing results to simulate this using the Fraunhofer approximation. The challenge with this approach is that it requires bringing the aperture texture into Fourier space which is not trivial. In previous projects I used Cuda's math library to perform the FFT of a signal but since the goal is to bring this into Stingray I didn't want to have such a dependency. Luckily I found this little gem posted by <a href="https://software.intel.com/en-us/articles/fast-fourier-transform-for-image-processing-in-directx-11">Joseph S. from intel</a>. He provides a clean and elegant compute implementation of the butterfly passes method which bring a signal to and from Fourier space. Using it I can feed in the aperture shape and extract the Fourier Power Spectrum:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtT1kR7Gbb0aSrNtXGdYRh5XzmOXE_rOJgSeFtN-yMlky-1H0noMCYyeAoj_Cmm5oqIZTQ2Z0DVmarh2T8yyeMddNE_FPsS8eujo7zX1ogsliwV0rPDSRMB6eNdLpGZk0G4te9PrcB7GPO/s1600/starburst04.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtT1kR7Gbb0aSrNtXGdYRh5XzmOXE_rOJgSeFtN-yMlky-1H0noMCYyeAoj_Cmm5oqIZTQ2Z0DVmarh2T8yyeMddNE_FPsS8eujo7zX1ogsliwV0rPDSRMB6eNdLpGZk0G4te9PrcB7GPO/s640/starburst04.jpg" width="640" height="320" data-original-width="1020" data-original-height="510" /></a></div>
<br/>
This spectrum needs to be filtered further in order to look like a starburst. This is where the Fraunhofer approximation comes in. The idea is to basically reconstruct the diffraction of white light by summing up the diffraction of multiple wavelengths. The key observation is that the same Fourier signal can be used for all wavelengths. The only thing needed is to scale the sampling coordinates of the Fourier power spectrum:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;">(x0,y0) = (u,v)·λ·z0 for λ = 350nm/435nm/525nm/700nm<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyUAixiq8H__9iB2r31me3eZ6Lid92lcMC0-KXXW6myZYUe871bu9EVAvTLR_cPe-5ZiVsjwdWk2Cryk9PzygPrvFPYUlMieaG2ybJlmJeBlsKsrdm_Gwaalo1DL31icf0jm0niGMWzNUy/s1600/starburst05.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyUAixiq8H__9iB2r31me3eZ6Lid92lcMC0-KXXW6myZYUe871bu9EVAvTLR_cPe-5ZiVsjwdWk2Cryk9PzygPrvFPYUlMieaG2ybJlmJeBlsKsrdm_Gwaalo1DL31icf0jm0niGMWzNUy/s640/starburst05.jpg" width="640" height="214" data-original-width="1600" data-original-height="534" /></a></div>
<br/>
Summing up the wavelengths gives the starburst image. To get more interesting results I apply an extra filtering step. I use a spiral pattern mixed with a small rotation to get rid of any left over radial ringing artifacts (judging by the author's starburst results I suspect this is a step they are also doing):
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhz3nfrpFzM61ZRyOS0JJMUR5wXLcwE1889S3Ql3QjU8DQrxa1tWjFLSCASLN1RgmNSrxXQF0U5zDz34EFDU3jfH6wzyvghwizoLBdIp-Riy7CdZ-GnJa6MVoHkp6-WEAIr7KOEq8vBvzDF/s1600/starburst06.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhz3nfrpFzM61ZRyOS0JJMUR5wXLcwE1889S3Ql3QjU8DQrxa1tWjFLSCASLN1RgmNSrxXQF0U5zDz34EFDU3jfH6wzyvghwizoLBdIp-Riy7CdZ-GnJa6MVoHkp6-WEAIr7KOEq8vBvzDF/s640/starburst06.jpg" width="640" height="320" data-original-width="1600" data-original-height="800" /></a></div>
<br/>
<h3>Anti Reflection Coating</h3>
While some appreciate the artistic aspect of lens flare, lens manufacturers work hard to minimize them by coating lenses with anti-reflection coatings. The coating applied to each lenses are usually designed to minimize the reflection of a specific wavelength. They are defined by their thickness and index of refraction. Given the wavelength to minimize reflections for, and the IORs of the two medium involved in the reflection (say n0 and n2), the ideal IOR (n1) and thickness (d) of the coating are defined as n1 = sqrt(n0·n2) and d=λ/4·n1. This is known as a quarter wavelength anti-reflection coating. I've found <a href="http://www.pveducation.org/pvcdrom/anti-reflection-coatings">this site</a> very helpful to understand this phenomenon.
<br/><br/>
In the current implementation each lens coating specifies a wavelength the coating should be optimized for and the ideal thickness and IOR are used by default. I added a controllable offset to thicken the AR coating layer in order to conveniently reduce it's anti-reflection properties:
<br/><br/>
<div class="separator" style="clear: both; text-align: center;">No AR Coating:<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq0g2klLXjLjH4lN1oaMEqxs1fbJIyk08fHW6IsuzKJ_pdA6AjDfFCSn-4mjVoYGutcDKcbjeO75q4IMe48_o3D4ioQmvAGk9t5bK8Md5d6KWfwxv5ClXWSHXRl5A23Z9x_3rt1gNFlV8o/s1600/arc01.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq0g2klLXjLjH4lN1oaMEqxs1fbJIyk08fHW6IsuzKJ_pdA6AjDfFCSn-4mjVoYGutcDKcbjeO75q4IMe48_o3D4ioQmvAGk9t5bK8Md5d6KWfwxv5ClXWSHXRl5A23Z9x_3rt1gNFlV8o/s640/arc01.jpg" width="640" height="216" data-original-width="1600" data-original-height="539" /></a></div>
<br/>
<div class="separator" style="clear: both; text-align: center;">Ideal AR Coating:<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZMYjcHSmVdhyphenhyphen-1N8Y1umhwUAd9fTz1xrZLORwz_iIht2iRq7RF4dwjwDxUj7Gmjjk1NdMA0RbRhAB04hETjgLHpleuxFCbkxGomzrngrpLFaSlokbaFrghu3mSWtAW42EvJD-KnL7Olq9/s1600/arc02.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZMYjcHSmVdhyphenhyphen-1N8Y1umhwUAd9fTz1xrZLORwz_iIht2iRq7RF4dwjwDxUj7Gmjjk1NdMA0RbRhAB04hETjgLHpleuxFCbkxGomzrngrpLFaSlokbaFrghu3mSWtAW42EvJD-KnL7Olq9/s640/arc02.jpg" width="640" height="216" data-original-width="1600" data-original-height="539" /></a></div>
<br/>
<div class="separator" style="clear: both; text-align: center;">AR Coating with offsetted thickness:<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiU3Y2z9qZhuj-WLsI2PVjSQ7CMKTWasC-V0uBWuOQdY0k2mVkoUDFl4IK9UZ7vXeMTi1PH0qcGKGAVuft1OHvA-r9WDWkcJMcifjlJEfF3CFcBN9DcFucpY9NIT7fCkWSSu7nxnlMhB42/s1600/arc03.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiU3Y2z9qZhuj-WLsI2PVjSQ7CMKTWasC-V0uBWuOQdY0k2mVkoUDFl4IK9UZ7vXeMTi1PH0qcGKGAVuft1OHvA-r9WDWkcJMcifjlJEfF3CFcBN9DcFucpY9NIT7fCkWSSu7nxnlMhB42/s640/arc03.jpg" width="640" height="216" data-original-width="1600" data-original-height="539" /></a></div>
<br/>
<h3>Optimisations</h3>
Currently the full cost of the effect for a Nikon 28-75mm lens is 12ms (3ms to ray march 352x32x32 points and 9ms to draw the 352 patches). The performance degrades as the sun disk is made bigger since it results in more and more overshading during the rasterisation of each ghosts. With a simpler lens interface like the 1955 Angenieux the cost decreases significantly. In the current implementation every possible "two bounce ghost" is traced and drawn. For a lens system like the Nikon 28-75mm which has 27 lens components, that's n!/r!(n-r)! = 352 ghosts. It's easy to see that this number can <a href="https://www.desmos.com/calculator/rsrjo1mhy1">increase dramatically</a> with the number of component.
<br/><br/>
An obvious optimization would be to skip ghosts that have intensities so low that their contributions are imperceptible. Using Compute/DrawIndirect it would be fairly simple to first run a coarse pass and use it to cull non contributing ghosts. This would reduce the compute and rasterization pressure on the gpu dramatically. Something I intend to do in future.
<br/><br/>
<h3>Conclusion</h3>
I'm not sure if this approach was ever used in a game. It would probably be hard to justify it's heavy cost. I feel this would have a better use case in the context of pre-visualization where a director might be interested in having early feedback on how a certain lens might behave in a shot.
<br/><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8Y4usYUd5q-jEUu2akLLyb9hQ6yQy6bZCMLdpAgZqJnjzQPZi8UTM7OYHVRSZfHqT7lQuJQXSW8YeurLFUzSQJ0lrkIIdwH80iZgOA7YrLX2a67wVnfIUMQdQ3N1kfZj2W-pcVL270kuP/s1600/example03.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8Y4usYUd5q-jEUu2akLLyb9hQ6yQy6bZCMLdpAgZqJnjzQPZi8UTM7OYHVRSZfHqT7lQuJQXSW8YeurLFUzSQJ0lrkIIdwH80iZgOA7YrLX2a67wVnfIUMQdQ3N1kfZj2W-pcVL270kuP/s640/example03.jpg" width="640" height="427" data-original-width="1600" data-original-height="1067" /></a>Image credits: <a href="http://wallup.net/car-sunset-lexus/">Wallup</a></div>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6b9C9_AjGUTyxPIppSErjavkAGjNa6QXhvrIgBwoMOhXzoo8XlyDMmVT21gcHni4o67mQSRmqpF6DZWolklH5NatrcGUvl4bkZWAF2OByvpTvFcIxgWFT1g2HAYCnWE6AsFBIMe6YCILU/s1600/example02.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6b9C9_AjGUTyxPIppSErjavkAGjNa6QXhvrIgBwoMOhXzoo8XlyDMmVT21gcHni4o67mQSRmqpF6DZWolklH5NatrcGUvl4bkZWAF2OByvpTvFcIxgWFT1g2HAYCnWE6AsFBIMe6YCILU/s640/example02.jpg" width="640" height="427" data-original-width="1600" data-original-height="1067" /></a>Image credits: <a href="http://www.wallpapers-web.com/sunset-field-wallpapers/5485554.html/">Wallpapers Web</a></div>
<br/>
Finally, be aware the author has filled a patent for the algorithm described in his paper, which may put limits on how you may use parts of what is described in my post. Please contact the <a href="http://resources.mpi-inf.mpg.de/lensflareRendering/">paper's author</a> for more information on what restrictions might be in place.
</div>
Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com311tag:blogger.com,1999:blog-1994130783874175266.post-29845867847897517772017-06-22T19:33:00.003+02:002017-06-27T15:30:41.570+02:00Reprojecting Reflections<div dir="ltr" style="text-align: left;" trbidi="on">
Screen space reflections are such a pain. When combined with taa they are even harder to manage. Raytracing against a jittered depth/normal g-buffer can easily cause reflection rays to have widely different intersection points from frame to frame. When using neighborhood clamping, it can become difficult to handle the flickering caused by too much clipping especially for surfaces that have normal maps with high frequency patterns in them.
<br/><br/>
On top of this, reflections are very hard to reproject. Since they are view dependent simply fetching the motion vector from the current pixel tends to make the reprojection "smudge" under camera motion. Here's a small video grab that I did while playing Uncharted 4 (notice how the reflections trails under camera motion)
<br/>
<br/><iframe width="685" height="374" src="https://www.youtube.com/embed/wBO8GX-R4R4" frameborder="0" allowfullscreen></iframe><br/>
<br/>
Last year I spent some time trying to understand this problem a little bit more. I first drew a ray diagram describing how a reflection could be reprojected in theory. Consider the goal of reprojecting the reflection that occurs at incidence point v0 (see diagram bellow), then to reproject the reflection which occurred at that point you would need to:
<br/>
<ol>
<li>Retrieve the surface motion vector (ms) corresponding to the reflection incidence point (v0)</li>
<li>Reproject the incidence point using (ms)</li>
<li>Using the depth buffer history, reconstruct the reflection incidence point (v1)</li>
<li>Retrieve the motion vector (mr) corresponding to the reflected point (p0)</li>
<li>Reproject the reflection point using (mr)</li>
<li>Using the depth buffer history, reconstruct the previous reflection point (p1)</li>
<li>Using the previous view matrix transform, reconstruct the previous surface normal of the incidence point (n1)</li>
<li>Project the camera position (deye) and the reconstructed reflection point (dp1) onto the previous plane (defined by surface normal = n1, and surface point = v1)</li>
<li>Solve for the position of the previous reflection point (r) knowing (deye) and (dp1)</li>
<li>Finally, using the previous view-projection matrix, evaluate (r) in the previous reflection buffer</li>
</ol>
<br/><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFWztBN_03f5FOufnU6XXXnsbbZ76DQ7tA581iQQeN3ViAuAuQMjGbZQhslcSEp0CSOeTDaRiM3Q0kfw9iKYGmIeSp0FJ_Qi58F3uVTLJE1LQ41fmaZlIqCBtdj_-jyxLoCW630C8aNlmc/s1600/diagram.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFWztBN_03f5FOufnU6XXXnsbbZ76DQ7tA581iQQeN3ViAuAuQMjGbZQhslcSEp0CSOeTDaRiM3Q0kfw9iKYGmIeSp0FJ_Qi58F3uVTLJE1LQ41fmaZlIqCBtdj_-jyxLoCW630C8aNlmc/s640/diagram.jpg" width="640" height="349" data-original-width="1200" data-original-height="655" /></a></div>
<br/>
By adding to Stingray a history depth buffer and using the previous view-projection matrix I was able to confirm this approach could successfully reproject reflections.
<pre><span style="font-family: "courier new" , "courier" , monospace;"><code class="language-C++">float3 proj_point_in_plane(float3 p, float3 v0, float3 n, out float d) {
d = dot(n, p - v0);
return p - (n * d);
}
float3 find_reflection_incident_point(float3 p0, float3 p1, float3 v0, float3 n) {
float d0 = 0;
float d1 = 0;
float3 proj_p0 = proj_point_in_plane(p0, v0, n, d0);
float3 proj_p1 = proj_point_in_plane(p1, v0, n, d1);
if(d1 < d0)
return (proj_p0 - proj_p1) * d1/(d0+d1) + proj_p1;
else
return (proj_p1 - proj_p0) * d0/(d0+d1) + proj_p0;
}
float2 find_previous_reflection_position(
float3 ss_pos, float3 ss_ray,
float2 surface_motion_vector, float2 reflection_motion_vector,
float3 world_normal) {
float3 ss_p0 = 0;
ss_p0.xy = ss_pos.xy - surface_motion_vector;
ss_p0.z = TEX2D(input_texture5, ss_p0.xy).r;
float3 ss_p1 = 0;
ss_p1.xy = ss_ray.xy - reflection_motion_vector;
ss_p1.z = TEX2D(input_texture5, ss_p1.xy).r;
float3 view_n = normalize(world_to_prev_view(world_normal, 0));
float3 view_p0 = float3(0,0,0);
float3 view_v0 = ss_to_view(ss_p0, 1);
float3 view_p1 = ss_to_view(ss_p1, 1);
float3 view_intersection =
find_reflection_incident_point(view_p0, view_p1, view_v0, view_n);
float3 ss_intersection = view_to_ss(view_intersection, 1);
return ss_intersection.xy;
}
</code></span></pre>
<br />
You can see in these videos that most of the reprojection distortion in the reflections are addressed:
<br/>
<br/><iframe width="685" height="374" src="https://www.youtube.com/embed/D7eFSL_Q6j8" frameborder="0" allowfullscreen></iframe><br/>
<br/><iframe width="685" height="374" src="https://www.youtube.com/embed/bvGtX0pMEeI" frameborder="0" allowfullscreen></iframe><br/>
<br/>
Ghosting was definitely minimized under camera motion. The video bellow compares the two reprojection method side by side.
<br/>
<br/>
LEFT: Simple Reprojection, RIGHT: Correct Reprojection
<br/>(note that I disabled neighborhood clamping in this video to visualize the reprojection better)
<iframe width="685" height="374" src="https://www.youtube.com/embed/XvELB4NnLIk" frameborder="0" allowfullscreen></iframe><br/>
<br/>
So instead I tried a different approach. The new idea was to pick a few reprojection vectors that are likely to be meaningful in the context of a reflection. Originally I looked into:
<ul>
<li>Motion vector at ray incidence</li>
<li>Motion vector at ray intersection</li>
<li>Parallax corrected motion vector at ray incidence</li>
<li>Parallax corrected motion vector at ray intersection</li>
</ul>
<br/>
The idea of doing parallax correction on motion vectors for reflections came from the <a href="https://www.ea.com/frostbite/news/stochastic-screen-space-reflections/">Stochastic Screen-Space Reflections</a> talk presented by Tomasz Stachowiak at Siggraph 2015. Right now here's how it's currently implemented although I'm not 100% sure that's as correct as it could be (there's a PARALLAX_FACTOR define which I needed to manually tweak to get optimal results. Perhaps there's a better way of doing this)?
<pre><span style="font-family: "courier new" , "courier" , monospace;"><code class="language-C++">float2 parallax_velocity = velocity * saturate(1.0 - total_ray_length * PARALLAX_FACTOR);
</code></span></pre>
Once all those interesting vectors are retrieved, the one with the smallest magnitude is declared as "the most likely succesful reprojection vector". This simple idea alone has improved the reprojection of the ssr buffer quite significantly (note that if casting multiple rays per pixel, then averaging the sum of all succesful reprojection vectors still gave us a better reprojection than what we had previously)
<br/>
<br/><iframe width="685" height="374" src="https://www.youtube.com/embed/xpkWhUjxWjU" frameborder="0" allowfullscreen></iframe><br/>
<br/>
Screen space reflections is one of the most difficult screen space effect I've had to deal with. They are plagued with artifacts which can often be difficult to explain or understand. In the last couple of years I've seen people propose really creative ways to minimize some of these artifacts that are inherent to ssr. I hope this continues!
<br/>
</div>
Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com201tag:blogger.com,1999:blog-1994130783874175266.post-28921511689477758432017-05-16T23:56:00.001+02:002017-05-17T00:17:16.331+02:00Rebuilding the Entity Index<h2>
Background</h2>
If you are not familiar with the Stingray Entity system you can find good resources to catch up here:<br />
<ul>
<li><a href="https://www.youtube.com/watch?v=PmEeW9hjqrM&">Stingray Engine Code Walkthrough #18 Entities</a></li>
<li><a href="http://bitsquid.blogspot.ch/2014/08/building-data-oriented-entity-system.html">Autodesk Stingray Blog: Building a Data-Oriented Entity System (part 1)</a></li>
<li><a href="http://bitsquid.blogspot.ch/2014/09/building-data-oriented-entity-system.html">Autodesk Stingray Blog: Building a Data-Oriented Entity System (Part 2: Components)</a></li>
<li><a href="http://bitsquid.blogspot.ch/2014/10/building-data-oriented-entity-system.html">Autodesk Stingray Blog: Building a Data-Oriented Entity System (Part 3: The Transform Component)</a></li>
<li>A<a href="http://bitsquid.blogspot.ch/2014/10/building-data-oriented-entity-system_10.html">utodesk Stingray Blog: Building a Data-Oriented Entity System (Part 4: Entity Resources)</a></li>
</ul>
The Entity system is a very central part of the future of Stingray and as we integrate it with more parts new requirements pops up. One of those is the ability to interact with Entity Components via the visual scripting language in Stingray - Flow. We want to provide a generic interface to Entites in Flow without adding weight to the fundamental Entity system.<br />
<br />
To accomplish this we added a “Property” system that Flow and other parts of the Stingray Engine can use which is optional for each Component to implement in addition to having its own specialized API. The Property System enables an API to read and write entity component properties using the name of component, the property name and the property value. The Property System needs to be able to find a specific Component Instance by name for an Entity, and the Entity System does not directly track an Entity / Component Instance relationship. It does not even track the Entity / Component Manager relationship.<br />
<br />
So what we did was to add the Entity Index, a registry where we add all Component Instances created for an Entity as it is constructed from an Entity Resource. To make it usable we also added the rule that each Component in an Entity Resource should have a unique name within the resource so the user can identify it by name when using the Flow system.<br />
<br />
In order for the Flow system to work we need to be able to find a specific component instance by name for an Entity so we could get and set properties of that instance. This is the job of the Entity Index. In the Entity Index you can register an Entitys components by name so you later can do a lookup.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Property_System_and_Entity_Index_21"></a>Property System and Entity Index</h2>
When creating an Entity we use the name of the component instance together with the component type name, i.e the Component Manager, and create an <i>Entity Index</i> that maps the name to the component instance and the <i>Component Manager</i>. In the Stingray Entity system an Entity cannot have two component instances with the same name.<br />
<br />
<h3>
<a href="https://www.blogger.com/null" id="Example_27"></a>Example:</h3>
<h4>
<a href="https://www.blogger.com/null" id="Entity_29"></a> </h4>
<h4>
Entity</h4>
<ul>
<li>Transform - Transform Component</li>
<li>Fog - Render Data Component</li>
<li>Vignette - Render Data Component</li>
</ul>
<br />
For this Entity we would instantiate one Transform Component Instance and two Render Data Component Instances. We get back an InstanceId for each Component Instance which can be used to identify which of Fog or Vignette we are talking about even though they are created from the same Entity using the same Component Manager.<br />
<br />
We also register this in the Entity Index as:<br />
<br />
<table border="1">
<thead>
<tr>
<th>Key</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Entity</td>
<td>Array<Components></td>
</tr>
</tbody>
</table>
<br />
The Array<Components> contains one or more entries which each contain the following:<br />
<br />
<table border="1">
<thead>
<tr>
<th>Components</th>
</tr>
</thead>
<tbody>
<tr>
<td>Component Manager</td>
</tr>
<tr>
<td>InstanceId</td>
</tr>
<tr>
<td>Name</td>
</tr>
</tbody>
</table>
<br />
Lets add the a few entities and components to the Entity Index:<br />
<br />
<h4>
<a href="https://www.blogger.com/null" id="entity_1id_53"></a>entity_1.id</h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>13</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
<td>4</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
<td>5</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_2id_60"></a> </h4>
<h4>
entity_2.id</h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>14</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
<td>6</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
<td>7</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_3id_67"></a> </h4>
<h4>
entity_3.id</h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>2</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_2</td>
<td>4</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_2</td>
<td>5</td>
</tr>
</tbody>
</table>
<br />
This allows Flow to set and get properties using the Entity and the Component Name. Using the Entity and Component Name we can look up which Component Manager has the component instance and which InstanceId it has assigned to it so we can get the Instance and operate on the data.<br />
<br />
The problem with this implementation is that it will become very large - we need a large registry with one key-array pair for each Entity where the array contains one entry for each Component Instance for the Entity, not very efficient as the number of entites grow. There is no reuse at all in the Entity Index - and it can’t be - each entry in the index is unique with no overlap.<br />
<br />
Here are some measurements using a synthetic test that creates entities, add and looks up components on the entities and deleted entities. It deletes parts of the entities as it runs and does garbage collection. The number entities given in the tables is the total number created during the test, not the number of simultaneous entities which varies over time. The entities has 75 different types of component compositions, ranging from a single component to eleven for other entities. The test is single threaded and no locking besides some on the memory sub system which makes the times match up well with CPU usage.<br />
<br />
<table border="1">
<thead>
<tr>
<th style="text-align: right;">Entity Count</th>
<th style="text-align: right;">Test run time (s)</th>
<th style="text-align: right;">Memory used (Mb)</th>
<th style="text-align: right;">Time/Entity (us)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">10k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">5.79</td>
<td style="text-align: right;">0.977</td>
</tr>
<tr>
<td style="text-align: right;">20k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">5.79</td>
<td style="text-align: right;">0.488</td>
</tr>
<tr>
<td style="text-align: right;">40k</td>
<td style="text-align: right;">0.03</td>
<td style="text-align: right;">11.88</td>
<td style="text-align: right;">0.732</td>
</tr>
<tr>
<td style="text-align: right;">80k</td>
<td style="text-align: right;">0.06</td>
<td style="text-align: right;">11.88</td>
<td style="text-align: right;">0.732</td>
</tr>
<tr>
<td style="text-align: right;">160k</td>
<td style="text-align: right;">0.13</td>
<td style="text-align: right;">25.69</td>
<td style="text-align: right;">0.793</td>
</tr>
<tr>
<td style="text-align: right;">320k</td>
<td style="text-align: right;">0.32</td>
<td style="text-align: right;">31.04</td>
<td style="text-align: right;">0.977</td>
</tr>
<tr>
<td style="text-align: right;">640k</td>
<td style="text-align: right;">1.08</td>
<td style="text-align: right;">55.90</td>
<td style="text-align: right;">1.648</td>
</tr>
<tr>
<td style="text-align: right;">1.28m</td>
<td style="text-align: right;">2.58</td>
<td style="text-align: right;">65.82</td>
<td style="text-align: right;">1.922</td>
</tr>
<tr>
<td style="text-align: right;">2.56m</td>
<td style="text-align: right;">6.35</td>
<td style="text-align: right;">65.55</td>
<td style="text-align: right;">2.366</td>
</tr>
<tr>
<td style="text-align: right;">5.12m</td>
<td style="text-align: right;">13.42</td>
<td style="text-align: right;">120.55</td>
<td style="text-align: right;">2.500</td>
</tr>
<tr>
<td style="text-align: right;">10.24m</td>
<td style="text-align: right;">25.69</td>
<td style="text-align: right;">130.55</td>
<td style="text-align: right;">2.393</td>
</tr>
</tbody>
</table>
<br />
As you can see we start to take longer and longer time and use more and more memory as we double the number of entities and as we get to the larger numbers the time and memory increases pretty dramatically.<br />
<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnJmM6XpUFp_Fcx6yqMywOE_1DNN8WO4StJ-HRJ5BWk1z85gkxNRM883xlB6Y1miDaRhD1B_T-e74Rh2oL6eOD_091bX-VSIQX-qjhd6gHY7KBi_9D5b5LOHCvM2yfl5oB75TAh6Uq007l/s1600/legacy-time-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnJmM6XpUFp_Fcx6yqMywOE_1DNN8WO4StJ-HRJ5BWk1z85gkxNRM883xlB6Y1miDaRhD1B_T-e74Rh2oL6eOD_091bX-VSIQX-qjhd6gHY7KBi_9D5b5LOHCvM2yfl5oB75TAh6Uq007l/s1600/legacy-time-graph.png" /></a><br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6xVjh9LRLz4n4wJ3fh2YJCbR45ViiaJBMel96CXZl75BqBhv-4wyR39qdD0QH0Y5WCPp41Ck9MsF1lv_6DlUcy3Mi_LvflIyeL7qbZRg4zVc19btq5JICtTK-BahqK-QPYqIoswtZ3HDE/s1600/legacy-memory-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6xVjh9LRLz4n4wJ3fh2YJCbR45ViiaJBMel96CXZl75BqBhv-4wyR39qdD0QH0Y5WCPp41Ck9MsF1lv_6DlUcy3Mi_LvflIyeL7qbZRg4zVc19btq5JICtTK-BahqK-QPYqIoswtZ3HDE/s1600/legacy-memory-graph.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Since we plan to use the entity system extensively we need an index that is more efficient with memory and scales more linearly in CPU usage.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Shifting_control_of_the_InstanceId_101"></a>Shifting control of the InstanceId</h2>
The InstanceId is defined to be unique to the Entity instance for a specific Component Manager - it does not have to be unique for all components in a Component Manager, nor does it have to be unique across different Component Managers.<br />
<br />
The create and lookup functions for an Component Instance looks like this:<br />
<br />
<pre><span style="font-family: "courier new" , "courier" , monospace;"><code class="language-C++">InstanceWithId instance_with_id = transform_manager.create(entity);
InstanceId my_transform_id = instance_with_id.id;
.....
Instance instance = transform_manager.lookup(entity, my_transform_id);</code></span></pre>
<br />
The interface is somewhat confusing since the create function returns both the component instance id and the instance. This is done so you don’t have to do a lookup of the instance directly after create. As you can see we have no knowledge of what the resulting InstanceId will be so we can’t make any assumptions in the Entity Index forcing us to have unique entries for each Component instance of every Entity.<br />
<br />
But we already set up the rule that in the Entity Resource, each Component should have a unique name for the Property System to work - this is a new requirement that was added at a later stage than when designing the initial Entity system. Now that it is there we can make use of this to simplify the Entity Index.<br />
<br />
Instead of letting each Component Manager decide the InstanceId we let the caller to the create function decide the InstanceId. We can decide that the InstanceId should be the 32-bit hash of the Component Name from the Entity Resource. Doing this will restrict the possible optimization that a component manager could do if it had control of the InstanceId, but so far we have had no real use case for it and the benefits of changing this are greater than the loss of a possible optimization that we <i>might</i> do sometime in the future.<br />
<br />
So we change the API like this:<br />
<br />
<pre><code class="language-C++">Instance instance = transform_manager.create(entity, hash(<span class="hljs-string">"Transform"</span>));
.....
Instance instance = transform_manager.lookup(entity, hash(<span class="hljs-string">"Transform"</span>)); </code></pre>
<br />
Nice, clean and symmetrical. Note though that the InstanceId is entierly up to the caller to control, it does not have to be a hash of a string. It must be unique for an Entity within a specific component manager. Having it work with the Entity Index and the Property System the InstanceId needs to be unique across all Component Instances in all Component Managers for each Entity instance. This is enforced when an Entity is created from a resource but not when constructing Component Instances by hand in code. If you want a component added outside the resource construction to work with the Property System care needs to be taken so it does not collide with other names of component instances for the Entity.<br />
<br />
Lets add the entities and components again using the new rule set, the Entity Index now look like this:<br />
<br />
<h4>
<a href="https://www.blogger.com/null" id="entity_1id_137"></a>entity_1.id</h4>
<h4>
</h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>hash(“Transform”)</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
<td>hash(“Fog”)</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
<td>hash(“Vignette”)</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_2id_144"></a> </h4>
<h4>
entity_2.id </h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>hash(“Transform”)</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
<td>hash(“Fog”)</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
<td>hash(“Vignette”)</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_3id_151"></a> </h4>
<h4>
entity_3.id </h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
<th>InstanceId</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
<td>hash(“Transform”)</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_2</td>
<td>hash(“Fog”)</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_2</td>
<td>hash(“Vignette”)</td>
</tr>
</tbody>
</table>
<br />
As we now see the Instance Id column now contain redundant data - we only need to store the Component Manager pointer. We use the Entity and hash the component name to find our Component Manager which can be used to look up the Instance.<br />
<br />
<h4>
<a href="https://www.blogger.com/null" id="entity_1id_160"></a>entity_1.id</h4>
<h4>
</h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_2id_167"></a> </h4>
<h4>
entity_2.id </h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_1</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_1</td>
</tr>
</tbody>
</table>
<h4>
<a href="https://www.blogger.com/null" id="entity_3id_174"></a> </h4>
<h4>
entity_3.id </h4>
<table border="1">
<thead>
<tr>
<th>Name</th>
<th>Component Manager</th>
</tr>
</thead>
<tbody>
<tr>
<td>hash(“Transform”)</td>
<td>&transform_manager</td>
</tr>
<tr>
<td>hash(“Fog”)</td>
<td>&render_data_manager_2</td>
</tr>
<tr>
<td>hash(“Vignette”)</td>
<td>&render_data_manager_2</td>
</tr>
</tbody>
</table>
<br />
<br />
We now also see that the lookup array for entity_1 and entity_2 are identical so two keys could point to the same value.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Options_for_implementation_183"></a>Options for implementation</h2>
We could opt for an index that has a map from entity_id to a list or map of entries for lookup:<br />
<br />
<pre><code>entity_1.id = [ hash("Transform"), &transform_manager ], [ hash("Fog"), &render_data_manager_1 ], [ hash("Vignette"), &render_data_manager_1 ]
entity_2.id = [ hash("Transform"), &transform_manager ], [ hash("Fog"), &render_data_manager_1 ], [ hash("Vignette"), &render_data_manager_1 ]
entity_3.id = [ hash("Transform"), &transform_manager ], [ hash("Fog"), &render_data_manager_2 ], [ hash("Vignette"), &render_data_manager_2 ]
</code></pre>
<br />
We should probably not store the same entry lookup list multiple times if it can be resused by multiple entity instances as this wastes space, but at any time a new component instance can be added or removed from an entity and its entry list would then change - that would mean administrating memory for the lookup lists and detecting when two entities starts to diverge so we can make a new extended copy of the entry list for the changed entity. We should probably also remove lookup lists that are no longer used as it would waste memory.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Entity_and_component_creation_193"></a>Entity and component creation</h2>
The call sequence for creating entities from resources (or even programmatically) looks something like this:<br />
<br />
<pre><code>Entity e = create();
Instance transform = transform_manager.create(e, hash("Transform"));
Instance fog = render_data_manager_1.create(e, hash("Fog"));
Instance vignette = render_data_manager_1.create(e, hash("Vignette"));
</code></pre>
<br />
In this scenario we could potentially build a entity lookup list for the entity which contains lookup for the transform, fog and vignette instances:<br />
<br />
<pre><code>entity_index.register(e, [ hash("Transform"), &transform_manager ], [ hash("Fog"), &render_data_manager_1 ], [ hash("Vignette"), &render_data_manager_1 ]);
</code></pre>
<br />
But as stated previously - component instances can be added and removed at any point in time making the lookup table change during the lifetime of the Entity. We need to be able to extend it at will, so it should look something like this:<br />
<br />
<pre><code>Entity e = create();
Instance transform = transform_manager.create(e, hash("Transform"));
entity_index.register(e, [ hash("Transform"), &transform_manager ]);
Instance fog = render_data_manager_1.create(e, hash("Fog"));
entity_index.register(e, [ hash("Fog"), &render_data_manager_1 ]);
Instance vignette = render_data_manager_1.create(e, hash("Vignette"));
entity_index.register(e, [ hash("Vignette"), &render_data_manager_1 ]);
</code></pre>
<br />
Now we just extend the lookup list of the entity as we add new components. This means that two entities that started out life as having identical lookup lists after being spawned from a resource might diverge over time so the Entity Index needs to handle that.<br />
<br />
Component Instances can also be destroyed, so we should handle that as well. Even if we do not remove component instances things will still work - if we keep a lookup to an Instance that has been removed we would then just fail the lookup in the corresponding Component Manager. It would lead to waste of memory though, something we need to be aware of going forward.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Building_a_Prototype_chain_223"></a>Building a Prototype chain</h2>
Looking at how we build up the Component instances for an Entity it goes something like this: first add the Transform, then add Fog and finally Vignette. This looks sort of like an inheritance chain…<br />
Lets call a lookup list that contains a specific set of entry values a <i>Prototype</i>.<br />
<br />
An entity starts with an empty lookup list that contains nothing [], this is the base Prototype, lets call that P0.<br />
<ul>
<li>Add the “Transform” component and your prototype is now P0 + [&transform_manager, “Transform”], lets call that prototype P1.</li>
<li>Add the “Fog” component, now the prototype is P1 + [&render_data_manager_1, “Fog”] - call it P2.</li>
<li>Add the “Vignette” component, now the prototype is P2 + [&render_data_manager_1, “Vignette”] - call it P3.</li>
</ul>
Your entity is now using the prototype P3, and from that you can find all the lookup entries you need.<br />
The prototype registry will contain:<br />
<br />
<pre><code>P0 = []
P1 = [] + [&transform_manager, "Transform"]
P2 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"]
P3 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"] + [&render_data_manager_1, "Vignette"]
</code></pre>
<br />
If you create another entity which uses the same Components with the same names you will end up with the same prototype:<br />
<br />
Create entity_2, it will have the empty prototype P0.<br />
<ul>
<li>Add the “Transform” component and your prototype now P1.</li>
<li>Add the “Fog” component, now the prototype is P2.</li>
<li>Add the “Vignette” component, now the prototype is P3.</li>
</ul>
<br />
We end up with the same prototype P3 as the other entity - as long as we add the entities in the same order we end up with the same prototype. For entites created from resources this will be true for all entities created from the same entity resource. For components that are added programatically it will only work if the code adds components in the same order, but even if they do not <i>always</i> do this we still will have a very large overlap for most of the entities.<br />
<br />
Lets look at the third example where we do not have an exact match, entity_3:<br />
<br />
Create entity_3, it will have the empty prototype P0.<br />
<ul>
<li>Add the “Transform” component and your prototype is now P0 + [&transform_manager:Transform, “Transform”] = P1.</li>
<li>Add the “Fog” component - this render data component manager is not the same as entity_1 and entity_2 so we get P1 + [&render_data_manager_2, “Fog”], this does not match P2 so we make a new prototype P4 instead.</li>
<li>Add the “Vignette” component, now the prototype is P4 + [&render_data_manager_2, “Vignette”] -> P5.</li>
</ul>
The prototype registry will contain:<br />
<br />
<pre><code>P0 = []
P1 = [] + [&transform_manager, "Transform"]
P2 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"]
P3 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"] + [&render_data_manager_1, "Vignette"]
P4 = [] + [&transform_manager, "Transform"] + [&render_data_manager_2, "Fog"]
P5 = [] + [&transform_manager, "Transform"] + [&render_data_manager_2, "Fog"] + [&render_data_manager_2, "Vignette"]
</code></pre>
<h2>
<a href="https://www.blogger.com/null" id="Storage_of_the_prototype_271"></a>Storage of the prototype</h2>
We can either for each prototype store all the component lookup entries - this makes it easy to get all the component instance look-ups in one go at the expense of memory due to data duplication. Each entity will store which prototype it uses.<br />
<ul>
<li>entity_1 -> P3</li>
<li>entity_2 -> P3</li>
<li>entity_3 -> P5</li>
</ul>
The prototype registry now contains:<br />
<br />
<pre><code>P0 = []
P1 = [] + [&transform_manager, "Transform"]
P2 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"]
P3 = [] + [&transform_manager, "Transform"] + [&render_data_manager_1, "Fog"] + [&render_data_manager_1, "Vignette"]
P4 = [] + [&transform_manager, "Transform"] + [&render_data_manager_2, "Fog"]
P5 = [] + [&transform_manager, "Transform"] + [&render_data_manager_2, "Fog"] + [&render_data_manager_2, "Vignette"]
</code></pre>
<br />
Some of the entries (P2 and P4) could technically be removed since they are not actively used - we would need to temporarily re-create them as new entries with the same structure were added.<br />
A different option is to actually <i>use</i> the intermediate entries by referencing them, like so:<br />
<br />
<pre><code>P0 = []
P1 = P0 + [&transform_manager, "Transform"]
P2 = P1 + [&render_data_manager_1, "Fog"]
P3 = P2 + [&render_data_manager_1, "Vignette"]
P4 = P1 + [&render_data_manager_2, "Fog"]
P5 = P4 + [&render_data_manager_2, "Vignette"]
</code></pre>
<br />
Less wasteful but requires lookup up in the chain to find all the components for an entity. On the other hand we can make this very efficient storage-wise by having a lookup table like this:<br />
Map from Prototype to {base_prototype, component_manager, component_name}. The prototype data is small and has no dynamic size so they can be stored very effiently.<br />
<br />
The prototype will add all the prototypes to the same prototype map and since the HashMap implementation lookup gives us O(1) lookup cost, traversing the chain will only cost us the potential cache-misses of the lookup. Since the hashmap is likely to be pretty compact (via prototype reuse) this hopefully should not be a huge issue. If it turns out to be, a different storage approach might be needed trading memory use for lookup speed.<br />
<br />
Since the amount of data we store for each Prototype would be very small - roughly 16 bytes - we can be a bit more relaxed with unused prototypes - we do not need to remove them as aggressively as we would if each prototype contained a complete lookup table for all components.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Building_the_Prototype_index_307"></a>Building the Prototype index</h2>
So how do we “name” the prototypes effectively for fast lookup? Well, the first lookup would be Entity -> Prototype and then from Prototype -> Prototype definition.<br />
A simple approach would be hashing - use the content of the Prototype as the hash data to get a unique identifier.<br />
<br />
The first base prototype has an empty definition so we let that be zero.<br />
To calculate a prototype, mix the prototype you are basing it of with the hash of the protoype data, in our case we hash the Component Manager pointer and Component Name, and mix it with the base prototype.<br />
<br />
<pre><code class="language-C++">Prototype prototype = mix(base_prototype, mix(hash(&component_manager), hash(component_name)))
</code></pre>
<br />
The entry is stored with the prototype as key and the value as [base_prototype, &component_manager, component_name].<br />
<br />
When you add a new Component to an entity we add/find the new prototype and update the Entity -> Prototype map to match the new prototype.<br />
<br />
So, we end up with a structure like this:<br />
<pre><code class="language-C++"><span class="hljs-keyword">struct</span> PrototypeDescription {
Prototype base_prototype;
ComponentMananger *component_manager;
IdString32 component_name;
}
Map<Entity, Prototype> entity_prototype_lookup;
Map<Prototype, PrototypeDescription> prototypes;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">register_component</span><span class="hljs-params">(Entity, ComponentManager, component_name)</span>
</span>{
Prototype p = entity_prototype_lookup[Entity];
Prototype new_p = mix(p, mix(hash(ComponentManager), hash(component_name)));
<span class="hljs-keyword">if</span> (!prototypes.has(new_p))
prototypes.insert(new_p, {p, &ComponentManager, component_name});
enity_index[Entity] = new_p;
}
<span class="hljs-function">ComponentMananger *<span class="hljs-title">find_component_manager</span><span class="hljs-params">(Entity, component_name)</span>
</span>{
Prototype p = entity_index[Entity];
<span class="hljs-keyword">while</span> (p != <span class="hljs-number">0</span>)
{
PrototypeDescription description = prototypes[p];
<span class="hljs-keyword">if</span> (description.component_name == component_name)
<span class="hljs-keyword">return</span> description.component_manager;
p = description.base_prototype;
}
<span class="hljs-keyword">return</span> <span class="hljs-literal">nullptr</span>;
}
</code></pre>
<br />
This could lead to a lot of hashing and look-ups but we can change the api to register new components to multiple Entities in one go which would lead to dramatically less number of hashing and look-ups, we already do that kind of optimization when creating entities from resources so it would be a natural fit. Also, we can easily cache the base prototype index to avoid more of the hash look-ups in <i>find_component_manager</i>.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Measuring_the_results_362"></a>Measuring the results</h2>
Lets run the synthetic test again and see how our new entity index match up to the old one.<br />
<br />
<table border="1">
<thead>
<tr>
<th style="text-align: right;">Entity Count</th>
<th style="text-align: right;">Test run time (s)</th>
<th style="text-align: right;">Memory used (Mb)</th>
<th style="text-align: right;">Time/Entity (us)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">10k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.26</td>
<td style="text-align: right;">0.977</td>
</tr>
<tr>
<td style="text-align: right;">20k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.51</td>
<td style="text-align: right;">0.488</td>
</tr>
<tr>
<td style="text-align: right;">40k</td>
<td style="text-align: right;">0.03</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.832</td>
</tr>
<tr>
<td style="text-align: right;">80k</td>
<td style="text-align: right;">0.06</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.610</td>
</tr>
<tr>
<td style="text-align: right;">160k</td>
<td style="text-align: right;">0.11</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.671</td>
</tr>
<tr>
<td style="text-align: right;">320k</td>
<td style="text-align: right;">0.23</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.702</td>
</tr>
<tr>
<td style="text-align: right;">640k</td>
<td style="text-align: right;">0.46</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.702</td>
</tr>
<tr>
<td style="text-align: right;">1.28m</td>
<td style="text-align: right;">0.94</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.700</td>
</tr>
<tr>
<td style="text-align: right;">2.56m</td>
<td style="text-align: right;">1.88</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.700</td>
</tr>
<tr>
<td style="text-align: right;">5.12m</td>
<td style="text-align: right;">3.78</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.704</td>
</tr>
<tr>
<td style="text-align: right;">10.24m</td>
<td style="text-align: right;">7.57</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">0.705</td>
</tr>
</tbody>
</table>
<br />
The run time now scales very close to linearly and is overall faster than the old implementation. Most notable is the win when using a lot of entities. Memory usage has gone down as well and the time/entity is also scaling more gracefully.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc4Si61491ZHUiDV0qPejXUH7pxzPJz7Lc-mgJ9NF1dPojgPUSgCY2JKuGnemsmxZj2fEciCmoJRFHrRwdq4_nK17FVNqcVdlMUKeNuRTk34yT1b7QDdQZGOlWw46uAWAsTC3X1bJx-Qnb/s1600/new-time-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc4Si61491ZHUiDV0qPejXUH7pxzPJz7Lc-mgJ9NF1dPojgPUSgCY2JKuGnemsmxZj2fEciCmoJRFHrRwdq4_nK17FVNqcVdlMUKeNuRTk34yT1b7QDdQZGOlWw46uAWAsTC3X1bJx-Qnb/s1600/new-time-graph.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhj-aOJaCph4q73_eyyt0FmKVC4USXtGsv69pzeN-J525AOtr2M5r-vW70eBokzMEoKWHZIFIwBuODN3XVKHvMf4_CIWobwEuVcdOhBtvgYBwFBykk6nW8fq1HEAWsqyzpfglyA7B5s3WE7/s1600/new-memory-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhj-aOJaCph4q73_eyyt0FmKVC4USXtGsv69pzeN-J525AOtr2M5r-vW70eBokzMEoKWHZIFIwBuODN3XVKHvMf4_CIWobwEuVcdOhBtvgYBwFBykk6nW8fq1HEAWsqyzpfglyA7B5s3WE7/s1600/new-memory-graph.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Memory usage looks a little strange but there is an easy explanation - the mapping from entity to prototype is using almost all that memory (via a hashmap) and the actual prototypes takes less than 30 Kb. Note that the old index uses the same amount of memory for the Entity to Prototype mapping.<br />
<br />
Lets compare the graphs between the old and new implementation:<br />
<br />
<br />
<table border="1">
<thead>
<tr>
<th style="text-align: right;">Entity Count</th>
<th style="text-align: right;">Time New (s)</th>
<th style="text-align: right;">Time Legacy (s)</th>
<th style="text-align: right;">Memory New (Mb)</th>
<th style="text-align: right;">Memory Legacy (Mb)</th>
<th style="text-align: right;">Time/Entity New (us)</th>
<th style="text-align: right;">Time/Entity Legacy (us)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">10k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.26</td>
<td style="text-align: right;">5.79</td>
<td style="text-align: right;">0.977</td>
<td style="text-align: right;">0.977</td>
</tr>
<tr>
<td style="text-align: right;">20k</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.01</td>
<td style="text-align: right;">0.51</td>
<td style="text-align: right;">5.79</td>
<td style="text-align: right;">0.488</td>
<td style="text-align: right;">0.488</td>
</tr>
<tr>
<td style="text-align: right;">40k</td>
<td style="text-align: right;">0.03</td>
<td style="text-align: right;">0.03</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">11.88</td>
<td style="text-align: right;">0.832</td>
<td style="text-align: right;">0.732</td>
</tr>
<tr>
<td style="text-align: right;">80k</td>
<td style="text-align: right;">0.05</td>
<td style="text-align: right;">0.06</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">11.88</td>
<td style="text-align: right;">0.610</td>
<td style="text-align: right;">0.732</td>
</tr>
<tr>
<td style="text-align: right;">160k</td>
<td style="text-align: right;">0.11</td>
<td style="text-align: right;">0.13</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">25.69</td>
<td style="text-align: right;">0.671</td>
<td style="text-align: right;">0.793</td>
</tr>
<tr>
<td style="text-align: right;">320k</td>
<td style="text-align: right;">0.23</td>
<td style="text-align: right;">0.32</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">31.04</td>
<td style="text-align: right;">0.702</td>
<td style="text-align: right;">0.977</td>
</tr>
<tr>
<td style="text-align: right;">640k</td>
<td style="text-align: right;">0.46</td>
<td style="text-align: right;">1.08</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">55.90</td>
<td style="text-align: right;">0.702</td>
<td style="text-align: right;">1.648</td>
</tr>
<tr>
<td style="text-align: right;">1.28m</td>
<td style="text-align: right;">0.94</td>
<td style="text-align: right;">2.58</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">65.82</td>
<td style="text-align: right;">0.700</td>
<td style="text-align: right;">1.922</td>
</tr>
<tr>
<td style="text-align: right;">2.56m</td>
<td style="text-align: right;">1.88</td>
<td style="text-align: right;">6.53</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">65.55</td>
<td style="text-align: right;">0.700</td>
<td style="text-align: right;">2.366</td>
</tr>
<tr>
<td style="text-align: right;">5.12m</td>
<td style="text-align: right;">3.78</td>
<td style="text-align: right;">13.42</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">120.55</td>
<td style="text-align: right;">0.704</td>
<td style="text-align: right;">2.500</td>
</tr>
<tr>
<td style="text-align: right;">10.24m</td>
<td style="text-align: right;">7.57</td>
<td style="text-align: right;">25.69</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">130.55</td>
<td style="text-align: right;">0.705</td>
<td style="text-align: right;">2.393</td>
</tr>
</tbody>
</table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWohZK9z_oB7jdBLWh0Lh12e-2Yv0cfDR_5-d6mxgGWjxRqRCghxpU2xH3l6d0MU8OGRNTjc9z7kX9W6YQzxMLAoN0CzWxagPcD1FbGow2yK1buVgaljtyKwaRMHlLzLBlQNaCQCWKKSvO/s1600/compare-time-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWohZK9z_oB7jdBLWh0Lh12e-2Yv0cfDR_5-d6mxgGWjxRqRCghxpU2xH3l6d0MU8OGRNTjc9z7kX9W6YQzxMLAoN0CzWxagPcD1FbGow2yK1buVgaljtyKwaRMHlLzLBlQNaCQCWKKSvO/s1600/compare-time-graph.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip8DGp_U831G1-2vqqq4HOwUqLw1FO9pN7G-fNlrWyQolYakM6KCBIpSrFNr_GLnLBizSigP0Poncm2WTqObV5F_0d_GuyXrbhL4QyAWFU5hLTEaBr2Of5rUpr69rgjbDjwQ5gd558eC6v/s1600/compare-memory-graph.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip8DGp_U831G1-2vqqq4HOwUqLw1FO9pN7G-fNlrWyQolYakM6KCBIpSrFNr_GLnLBizSigP0Poncm2WTqObV5F_0d_GuyXrbhL4QyAWFU5hLTEaBr2Of5rUpr69rgjbDjwQ5gd558eC6v/s1600/compare-memory-graph.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Looks like a pretty good win.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Final_words_408"></a>Final words</h2>
By taking into account the new requirements as the Entity system evolved we were able to create a much more space efficient and more performant Entity Index.<br />
<br />
The implementation chosen here has focused on reducing the amount of data we use in the Entity Index at the cost of lookup complexity, I think this is the right trade-of, especially since it performs better as well. Since the interface for the Entity Index is fairly non-complex and does not dictate how we store the data we could change the implementation to optimize for lookup speed if need be.Dan Engelbrechthttp://www.blogger.com/profile/12177635194073845370noreply@blogger.com450tag:blogger.com,1999:blog-1994130783874175266.post-82968706069779853712017-03-14T10:28:00.000+01:002017-03-14T10:28:05.521+01:00Stingray Renderer Walkthrough #8: stingray-renderer & mini-renderer<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #8: stingray-renderer & mini-renderer</title><style></style></head><body id="preview">
<h1><a id="Introduction_0"></a>Introduction</h1>
<p>In the last <a href="http://bitsquid.blogspot.com/2017/03/stingray-renderer-walkthrough-7-data.html">post</a> we looked at our systems for doing data-driven rendering in Stingray. Today I will go through the two default rendering pipes we ship as templates with Stingray. Both are entirely described in data using two <code>render_config</code> files and a bunch of <code>shader_source</code> files.</p>
<p>We call them the <strong>“stingray renderer”</strong> and the <strong>“mini renderer”</strong></p>
<h1><a id="Stingray_Renderer_6"></a>Stingray Renderer</h1>
<p>The “stingray renderer” is the default rendering pipe and is used in almost all template and sample projects. It’s a fairly standard “high-end” real-time rendering pipe and supports the regular buzzword features.</p>
<p>The <code>render_config</code> file is approx 1500 lines of <em>sjson</em>. While 1500 might sound a bit massive it’s important to remember that this configuration is highly configurable, pretty much all features can be dynamically switched on/off. It also run on a broad variety of different platforms (mobile -> consoles -> high-end PC), supports a bunch of different debug visualization modes, and features four different stereo rendering paths in addition to the default mono path.</p>
<p>If you are interested in taking a closer look at the actual implementation you can download stingray and you’ll find it under <code>core/stingray_renderer/renderer.render_config</code>.</p>
<p>Going through the entire file and all the implementation details would require multiple blog posts, instead I will try to do a high-level break down of the default <a href="http://bitsquid.blogspot.se/2017/03/stingray-renderer-walkthrough-7-data.html"><code>layer_configuration</code></a> and talk a bit about the feature set. Before we begin, please keep in mind that this rendering pipe is designed to handle lots of different content and run on lots of different platforms. A game project would typically use it as a base and then extend, optimize and simplify it based on the project specific knowledge of the content and target platforms.</p>
<p>Here’s a somewhat simplified dump of the contents of the <code>layer_configs/default</code> array found in <code>core/stingray_renderer/renderer.render_config</code> in Stingray v1.8:</p>
<pre><code>// run any render_config_extensions that have requested to insert work at the insertion point named "first"
{ extension_insertion_point = "first" }
// kick resource generator for rendering all shadow maps
{ resource_generator="shadow_mapping" profiling_scope="shadow mapping" }
// kick resource generator for assigning light sources to clustered shading structure
{ resource_generator="clustered_shading" profiling_scope="clustered shading" }
// special layer, only responsible for clearing hdr0, gbuffer2 and the depth_stencil_buffer
{ render_targets=["hdr0", "gbuffer2"] depth_stencil_target="depth_stencil_buffer"
clear_flags=["SURFACE", "DEPTH", "STENCIL"] profiling_scope="clears" }
// if vr is supported kick a resource generator laying down a stencil mask to reject pixels outside of the lens shape
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
pass = [
{ resource_generator="vr_mask" profiling_scope="vr_mask" }
]
}
// g-buffer layer, bulk of all materials renders into this
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
{ extension_insertion_point = "gbuffer" }
// linearize depth into a R32F surface
{ resource_generator="stabilize_and_linearize_depth" profiling_scope="linearize_depth" }
// layer for blending decals into the gbuffer0 and gbuffer1
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer"
profiling_scope="decal" sort="EXPLICIT" }
{ extension_insertion_point = "decals" }
// generate and merge motion vectors for non written pixels with motion vectors in gbuffer
{ type="static_branch" platforms=["win", "xb1", "ps4", "web", "linux"]
pass = [
{ resource_generator="generate_motion_vectors" profiling_scope="motion vectors" }
]
}
// render localized reflection probes into hdr1
{ name="reflections" render_targets=["hdr1"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="reflections probes" }
{ extension_insertion_point = "reflections" }
// kick resource generator for screen space reflections
{ type="static_branch" platforms=["win", "xb1", "ps4"]
pass = [
{ resource_generator="ssr_reflections" profiling_scope="ssr" }
]
}
// kick resource generator for main scene lighting
{ resource_generator="lighting" profiling_scope="lighting" }
{ extension_insertion_point = "lighting" }
// layer for emissive materials
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="emissive" }
// kick debug visualization
{ type="static_branch" render_caps={ development=true }
pass=[
{ resource_generator="debug_visualization" profiling_scope="debug_visualization" }
]
}
// kick resource generator for laying down fog
{ resource_generator="fog" profiling_scope="fog" }
// layer for skydome rendering
{ name="skydome" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="BACK_FRONT" profiling_scope="skydome" }
{ extension_insertion_point = "skydome" }
// layer for transparent materials
{ name="hdr_transparent" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ extension_insertion_point = "hdr_transparent" }
// kick resource generator for reading back any requested render targets / buffers to the CPU
{ resource_generator="stream_capture_buffers" profiling_scope="stream_capture" }
// kick resource generator for capturing reflection probes
{ type="static_branch" platform=["win"] render_caps={ development=true }
pass = [
{ resource_generator="cubemap_capture" }
]
}
// layer for rendering object selections from the editor
{ type="static_branch" platforms=["win", "ps4", "xb1"]
pass = [
{ type = "static_branch" render_settings={ selection_enabled=true }
pass = [
{ name="selection" render_targets=["gbuffer0" "ldr1_dev_r"]
depth_stencil_target="depth_stencil_buffer_selection" sort="BACK_FRONT"
clear_flags=["SURFACE" "DEPTH"] profiling_scope="selection"}
]
}
]
}
// kick resource generators for AA resolve and post processing
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ extension_insertion_point = "post_processing" }
// layer for rendering LDR materials, primarily used for rendering HUD and debug rendering
{ name="transparent" render_targets=["output_target"] depth_stencil_target="stable_depth_stencil_buffer_alias"
sort="BACK_FRONT" profiling_scope="transparent" }
// kick resource generator for rendering shadow map debug overlay
{ type="static_branch" render_caps={ development=true }
pass = [
{ resource_generator="debug_shadows" profiling_scope="debug_shadows" }
]
}
// kick resource generator for compositing left/right eye
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
pass = [
{ resource_generator="vr_present" profiling_scope="present" }
]
}
{ extension_insertion_point = "last" }
</code></pre>
<p>So what we have above is a fairly standard breakdown of a rendered frame, if you have worked with real-time rendering before there shouldn’t be much surprises in there. Something that is kind of cool with having the frame flow in this representation and pairing that with the hot-reloading functionality of <code>render_configs</code>, is that it really encourages experimentations: move things around, comment stuff out, inject new resource generators, etc.</p>
<p>Let’s go through the frame in a bit more detail:</p>
<h2><a id="Extension_insertion_points_155"></a>Extension insertion points</h2>
<p>First of all there are a bunch of <code>extension_insertion_point</code> at various locations during the frame, these are used by <a href="http://bitsquid.blogspot.se/2016/08/render-config-extensions.html"><code>render_config_extensions</code></a> to be able to schedule work into an existing <code>render_config</code>. You could argue that an extensions system to the <code>render_configs</code> is a bit superfluous, and for an in-house game engine targeting a specific industry that might very well be the case. But for us the extension system allows building features a bit more modular, it also encourages sharing of various rendering features across teams.</p>
<h2><a id="Shadows_159"></a>Shadows</h2>
<pre><code>// kick resource generator for rendering all shadow maps
{ resource_generator="shadow_mapping" profiling_scope="shadow mapping" }
</code></pre>
<p>We start off by rendering shadow maps. As we want to handle shadow receiving on alpha blended geometry there’s no simple way to reuse our shadow maps by interleaving the rendering of them into the lighting code. Instead we simply gather all shadow casting lights, try to prioritize them based on screen coverage, intensity, etc. and then render all shadows into two shadow maps.</p>
<p>One shadow map is dedicated to handle a single directional light which uses a cascaded shadow map approach, rendering each cascade into a region of a larger shadow map atlas. The other shadow map is an atlas for all local light sources, such as spot and point lights (interpreted as 6 spot lights).</p>
<h2><a id="Clustered_shading_170"></a>Clustered shading</h2>
<pre><code>// kick resource generator for assigning light sources to clustered shading structure
{ resource_generator="clustered_shading" profiling_scope="clustered shading" }
</code></pre>
<p>We separate local light sources into two kinds: “simple” and “custom”. Simple lights are either spot lights or point lights that don’t have a custom material graph assigned. Simple light sources, which tend to be the bulk of all visible light sources in a frame, get inserted into a <a href="http://www.humus.name/Articles/PracticalClusteredShading.pdf">clustered shading acceleration structure</a>.</p>
<p>While simple lights will affect both opaque and transparent materials, custom lights will only affect opaque geometry as they run a more traditional deferred shading path. We will touch on the lighting a bit more soon.</p>
<h2><a id="Clearing__VR_mask_181"></a>Clearing & VR mask</h2>
<pre><code>// special layer, only responsible for clearing hdr0, gbuffer2 and the depth_stencil_buffer
{ render_targets=["hdr0", "gbuffer2"] depth_stencil_target="depth_stencil_buffer"
clear_flags=["SURFACE", "DEPTH", "STENCIL"] profiling_scope="clears" }
// if vr is supported kick a resource generator laying down a stencil mask to reject pixels outside of the lens shape
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
pass = [
{ resource_generator="vr_mask" profiling_scope="vr_mask" }
]
}
</code></pre>
<p>Here we use the layer system to record a bind and a clear for a few render targets into a <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-3-render.html"><code>RenderContext</code></a> generated by the <a href="http://bitsquid.blogspot.se/2017/03/stingray-renderer-walkthrough-7-data.html"><code>LayerManager</code></a>.</p>
<p>Then, depending on if the <code>vr_supported</code> render setting is true or not we kick a resource generator that marks in the stencil buffer any pixels falling outside of the lens region. This resource generator only does something if the renderer is running in stereo mode. Also note that the branch above is a <code>static_branch</code> so if <code>vr_supported</code> is set to false the execution of the <code>vr_mask</code> resource generator will get eliminated completely during boot up of the renderer.</p>
<h2><a id="Gbuffer_200"></a>G-buffer</h2>
<pre><code>// g-buffer layer, bulk of all materials renders into this
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
{ extension_insertion_point = "gbuffer" }
// linearize depth into a R32F surface
{ resource_generator="stabilize_and_linearize_depth" profiling_scope="linearize_depth" }
// layer for blending decals into the gbuffer0 and gbuffer1
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer"
profiling_scope="decal" sort="EXPLICIT" }
{ extension_insertion_point = "decals" }
// generate and merge motion vectors for non written pixels with motion vectors in gbuffer
{ type="static_branch" platforms=["win", "xb1", "ps4", "web", "linux"]
pass = [
{ resource_generator="generate_motion_vectors" profiling_scope="motion vectors" }
]
}
</code></pre>
<p>Next we lay down the gbuffer. We are using a fairly fat “floating” gbuffer representation. By floating I mean that we interpret the gbuffer channels differently depending on material. I won’t go into details of the gbuffer layout in this post but everything builds upon a standard metallic PBR material model, same as most modern engines runs today. We also stash high precision motion vectors to be able to do accurate reprojection for TAA, RGBM encoded irradiance from light maps (if present, else irradiance is looked up from an IBL probe), high precision normals, AO, etc. Things quickly add up, in the default configuration on PC we are looking at 192 bpp for the color targets (i.e not counting depth/stencil). The gbuffer layout could use some love, I think we should be able to shrink it somewhat without losing any features.</p>
<p>We then kick a resource generator called <code>stabilize_and_linerize_depth</code>, this resource generator does two things:</p>
<ol>
<li>It linearizes the depth buffer and stores the result in an R32F target using a <code>fullscreen_pass</code>.</li>
<li>It does a hacky TAA resolve pass for depth in an attempt to remove some intersection flickering for materials rendering after TAA resolve. We call the output of this pass <code>stable_depth</code> and use it when rendering editor selections, gizmos, debug lines, etc. We also use this buffer during post processing for any effects that depends on depth (e.g. depth of field) as those runs after AA resolve.</li>
</ol>
<p>After that we have another more minimalistic gbuffer layer for splatting deferred decals.</p>
<p>Last but not least we kick another resource generator that calculates per pixel velocity for any pixels that haven’t been rendered to during the gbuffer pass (i.e skydome).</p>
<h2><a id="Reflections__Lighting_236"></a>Reflections & Lighting</h2>
<pre><code>// render localized reflection probes into hdr1
{ name="reflections" render_targets=["hdr1"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="reflections probes" }
{ extension_insertion_point = "reflections" }
// kick resource generator for screen space reflections
{ type="static_branch" platforms=["win", "xb1", "ps4"]
pass = [
{ resource_generator="ssr_reflections" profiling_scope="ssr" }
]
}
// kick resource generator for main scene lighting
{ resource_generator="lighting" profiling_scope="lighting" }
{ extension_insertion_point = "lighting" }
</code></pre>
<p>At this point we are fully done with the gbuffer population and are ready to do some lighting. We start by laying down the indirect specular / reflections into a separate buffer. We use a rather standard three-step fallback scheme for our reflections: screen-space reflections, falling back to localized parallax corrected pre-convoluted radiance cubemaps, falling back to a global pre-convoluted radiance cubemap.</p>
<p>The <code>reflections</code> layer is the target layer for all cubemap based reflections. We are naively rendering the cubemap reflections by treating each reflection probe as a light source with a custom material. These lights gets picked up by a resource generator performing traditional deferred shading - i.e it renders proxy volumes for each light. One thing that some people struggle to wrap their heads around is that the resource generator responsible for running the deferred shading modifier isn’t kicked until a few lines down (in the <code>lighting</code> resource generator). If you’ve paid attention in my previous posts this shouldn’t come as a surprise for you, as what we describe here is the <em>GPU</em> scheduling of a frame, nothing else.</p>
<p>When the reflection probes are laid down we move on and run a resource generator for doing Screen-Space Reflections. As SSR typically runs in half-res we store the result in a separate render target.</p>
<p>We then finally kick the <code>lighting</code> resource generator, which is responsible for the following:</p>
<ol>
<li>Build a screen space mask for sun shadows, this is done by running multiple <code>fullscreen_passes</code>. The <code>fullscreen_passes</code> transform the pixels into cascaded shadow map space and perform PCF. Stencil culling makes sure the shader only runs for pixels within a certain cascade.</li>
<li>SSAO with a bunch of different quality settings.</li>
<li>A fullscreen pass we refer to as the “global lighting” pass. This is the pass that does most of the heavy lifting when it comes to the lighting. It handles mixing SSR with probe reflections, mixing of SSAO with material AO, lighting from all simple lights looked up from the clustered shading structure as well as calculates sun lighting masked with the result from sun shadow mask (step 1).</li>
<li>Run a traditional deferred shading modifier for all light sources that has a material graph assigned. If the shader doesn’t target a specific layer the lights proxy volume will be rendered at this point, else it will be scheduled to render into whatever layer the shader has specified.</li>
</ol>
<p>At this point we have a fully lit HDR output for all of our opaque materials.</p>
<h2><a id="Various_stuff_273"></a>Various stuff</h2>
<pre><code>// layer for emissive materials
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="emissive" }
// kick debug visualization
{ type="static_branch" render_caps={ development=true }
pass=[
{ resource_generator="debug_visualization" profiling_scope="debug_visualization" }
]
}
// kick resource generator for laying down fog
{ resource_generator="fog" profiling_scope="fog" }
// layer for skydome rendering
{ name="skydome" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="BACK_FRONT" profiling_scope="skydome" }
{ extension_insertion_point = "skydome" }
// layer for transparent materials
{ name="hdr_transparent" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ extension_insertion_point = "hdr_transparent" }
// kick resource generator for reading back any requested render targets / buffers to the CPU
{ resource_generator="stream_capture_buffers" profiling_scope="stream_capture" }
// kick resource generator for capturing reflection probes
{ type="static_branch" platform=["win"] render_caps={ development=true }
pass = [
{ resource_generator="cubemap_capture" }
]
}
// layer for rendering object selections from the editor
{ type="static_branch" platforms=["win", "ps4", "xb1"]
pass = [
{ type = "static_branch" render_settings={ selection_enabled=true }
pass = [
{ name="selection" render_targets=["gbuffer0" "ldr1_dev_r"]
depth_stencil_target="depth_stencil_buffer_selection" sort="BACK_FRONT"
clear_flags=["SURFACE" "DEPTH"] profiling_scope="selection"}
]
}
]
}
</code></pre>
<p>Next follows a bunch of layers for doing various stuff, most of this is straightforward:</p>
<ul>
<li><code>emissive</code> - Layer for adding any emissive material influences to the light accumulation target (<code>hdr0</code>)</li>
<li><code>debug_visualization</code> - Kick of a resource generator for doing debug rendering. When debug rendering is enabled, the post processing pipe is disabled so we can render straight to the output target / back buffer here. Note: This doesn’t need to be scheduled exactly here, it could be moved later down the pipe.</li>
<li><code>fog</code> - Kick of a resource generator for blending fog into the accumulation target.</li>
<li><code>skydome</code> - Layer for rendering anything skydome related.</li>
<li><code>hdr_transparent</code> - Layer for rendering transparent materials, traditional forward shading using the clustered shading acceleration structure for lighting. VFX with blending usually also goes into this layer.</li>
<li><code>stream_capture_buffer</code> - Arbitrary location for capturing various render targets and dumping them into system memory.</li>
<li><code>cubemap_capture</code> - Capturing point for reflection cubemap probes.</li>
<li><code>selection</code> - Layer for rendering selection outlines.</li>
</ul>
<p>So basically a bunch of miscellaneous stuff that needs to happen before we enter post processing…</p>
<h2><a id="Post_Processing_337"></a>Post Processing</h2>
<pre><code>// kick resource generators for AA resolve and post processing
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ extension_insertion_point = "post_processing" }
</code></pre>
<p>Up until this point we’ve been in linear color space accumulating lighting into a 4xf16 render target (<code>hdr0</code>). Now its time to take that buffer and push it through the post processing resource generator.</p>
<p>The post processing pipe in the Stingray Renderer does:</p>
<ol>
<li>Temporal AA resolve</li>
<li>Depth of Field</li>
<li>Motion Blur</li>
<li>Lens Effects (chromatic aberration, distortion)</li>
<li>Bloom</li>
<li>Auto exposure</li>
<li>Scene Combine (exposure, tone map, sRGB, LUT color grading)</li>
<li>Debug rendering</li>
</ol>
<p>All steps of the post processing pipe can dynamically be enabled/disabled (not entirely true, we will always have to run some variation of step 7 as we need to output our result to the back buffer).</p>
<h2><a id="Final_touches_361"></a>Final touches</h2>
<pre><code>// layer for rendering LDR materials, primarily used for rendering HUD and debug rendering
{ name="transparent" render_targets=["output_target"] depth_stencil_target="stable_depth_stencil_buffer_alias"
sort="BACK_FRONT" profiling_scope="transparent" }
// kick resource generator for rendering shadow map debug overlay
{ type="static_branch" render_caps={ development=true }
pass = [
{ resource_generator="debug_shadows" profiling_scope="debug_shadows" }
]
}
// kick resource generator for compositing left/right eye
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
pass = [
{ resource_generator="vr_present" profiling_scope="present" }
]
}
</code></pre>
<p>Before we present we allow rendering of unlit geometry in LDR (mainly used for HUDs and debug rendering), potentially do some more debug rendering and if we’re in VR mode we kick a resource generator that handles left/right eye combining (if needed).</p>
<p>That’s it - a very high-level breakdown of a rendered frame when running Stingray with the default “Stingray Renderer” <code>render_config</code> file.</p>
<h1><a id="Mini_Renderer_387"></a>Mini Renderer</h1>
<p>We also have a second rendering pipe that we ship with Stingray called the “Mini Renderer” - <em>mini</em> as in <em>minimalistic</em>. It is not as broadly used as the Stingray Renderer so I won’t walk you through it, just wanted to mention it’s there and say a few words about it.</p>
<p>The main design goal behind the mini renderer was to build a rendering pipe with as little overhead from advanced lighting effects and post processing as possible. It’s primarily used for doing mobile VR rendering. High-resolution, high-performance rendering on mobile devices is hard! You pretty much need to avoid all kinds of fullscreen effects to hit target frame rate. Therefore the mini renderer has a very limited feature set:</p>
<ul>
<li>It’s a forward renderer. While it’s capable of doing per pixel lighting through clustered shading it rarely gets used, instead most applications tend to bake their lighting completely or run with only a single directional light source.</li>
<li>No post processing.</li>
<li>While all lighting is done in linear color space we don’t store anything in HDR, instead we expose, tonemap and output sRGB directly into an LDR target (usually directly to the back buffer).</li>
</ul>
<p>The <code>mini_renderer.render_config</code> file is ~400 lines, i.e. less than 1/3 of the stingray renderer. It is still in a somewhat experimental state but is the fastest way to get up and running doing mobile VR. I also feel that it makes sense for us to ship an example of a more lightweight rendering pipe; it is simpler to follow than the <code>render_config</code> for the full stingray renderer, and it makes it easy to grasp the benefits of data-driven rendering compared to a more static hard-coded rendering pipe (especially if you don’t have source access to the full engine as then the hard-coded rendering pipe would likely be a complete black box for the user).</p>
<h1><a id="Wrap_up_399"></a>Wrap up</h1>
<p>I realize that some of you might have hoped for a more complete walkthrough of the various lighting and post processing techniques we use in the Stingray renderer. Unfortunately that would have become a very long post and also it feels a bit out of context as my goal with this blog series has been to focus on the architecture of the stingray rendering pipe rather than specific rendering techniques. Most of the techniques we use can probably be considered “industry standard” within real-time rendering nowadays. If you are interested in learning more there are lots of excellent information available, to name a few:</p>
<ul>
<li>Sébastien Lagarde & Charles de Rousiers amazing course notes from their Siggraph 2014 presentation: “Moving Frostbite to PBR”: <a href="http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/">http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/</a></li>
<li>Morgan McGuire’s excellent Siggraph 2016 presentation: “Peering Through a Glass, Darkly<br>
at the Future of Real-Time Transparency”: <a href="http://graphics.cs.williams.edu/papers/TransparencySIGGRAPH16/">http://graphics.cs.williams.edu/papers/TransparencySIGGRAPH16/</a></li>
<li>Everything from Natalya Tatarchuk’s Siggraph courses: “Advances in Real-Time Rendering in 3D Graphics and Games”: <a href="http://advances.realtimerendering.com/">http://advances.realtimerendering.com/</a></li>
<li>Everything from Stephen Hill’s and Stephen McAuley’s Siggraph courses: “Physically Based Shading in Theory and Practice”: <a href="http://blog.selfshadow.com/publications/s2016-shading-course/">http://blog.selfshadow.com/publications/s2016-shading-course/</a></li>
</ul>
<p>In the next and final post of this series we will take a look at the shader and material system we have in Stingray.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com102tag:blogger.com,1999:blog-1994130783874175266.post-72435316126029533602017-03-09T16:21:00.001+01:002017-03-09T16:21:47.264+01:00Stingray Renderer Walkthrough #7: Data-driven rendering<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #7: Data-driven rendering</title><style></style></head><body id="preview">
<h1><a id="Introduction_0"></a>Introduction</h1>
<p>With all the low-level stuff in place it’s time to take a look at how we drive rendering in Stingray, i.e how a final frame comes together. I’ve covered this in various presentations over the years but will try do go through everything again to give a more complete picture of how things fit together.</p>
<p>Stingray features what we call a data-driven rendering pipe, basically what we mean by that is that all shaders, GPU resource creation and manipulation, as well as the entire flow of a rendered frame is defined in data. In our case the data is a set of different <em>json</em> files.</p>
<p>These <em>json</em>-files are hot-reloadable on all platforms, providing a nice workflow with fast iteration times when experimenting with various rendering techniques. It also makes it easy for a project to optimize the renderer for its specific needs (in terms of platforms, features, etc.) and/or to push it in other directions to better suit the art direction of the project.</p>
<p>There are four different types of <em>json</em>-files driving the Stingray renderer:</p>
<ul>
<li><code>.render_config</code> - the heart of a rendering pipe.</li>
<li><code>.render_config_extension</code> - extensions to an existing <code>.render_config</code> file.</li>
<li><code>.shader_source</code> - shader source and meta data for compiling statically declared shaders.</li>
<li><code>.shader_node</code> - shader source and meta data used by the graph based shader system.</li>
</ul>
<p>Today we will be looking at the <code>render_config</code>, both from a user’s perspective as well as how it works on the engine side.</p>
<h1><a id="Meet_the_render_config_17"></a>Meet the <em><code>render_config</code></em></h1>
<p>The <code>render_config</code> is a <a href="http://bitsquid.blogspot.se/2009/10/simplified-json-notation.html"><em>sjson</em></a> file describing everything from which render settings to expose to the user to the flow of an entire rendered frame. It can be broken down into four parts: <em>render settings</em>, <em>resource sets</em>, <em>layer configurations</em> and <em>resource generators</em>. All of which are fairly simple and minimalistic systems on the engine side.</p>
<h1><a id="Render_Settings__Misc_21"></a>Render Settings & Misc</h1>
<p>Render settings is a simple key:value map exposed globally to the entire rendering pipe as well as an interface for the end user to peek and poke at. Here’s an example of how it might look in the <code>render_config</code> file:</p>
<pre><code>render_settings = {
sun_shadows = true
sun_shadow_map_size = [ 2048, 2048 ]
sun_shadow_map_filter_quality = "high"
local_lights_shadow_atlas_size = [ 2048, 2048 ]
local_lights_shadow_map_filter_quality = "high"
particles_local_lighting = true
particles_receive_shadows = true
debug_rendering = false
gbuffer_albedo_visualization = false
gbuffer_normal_visualization = false
gbuffer_roughness_visualization = false
gbuffer_specular_visualization = false
gbuffer_metallic_visualization = false
bloom_visualization = false
ssr_visualization = false
}
</code></pre>
<p>As you will see we have branching logics for most systems in the <code>render_config</code> which allows the renderer to take different paths depending on the state of properties in the <code>render_settings</code>. There is also a block called <code>render_caps</code> which is very similar to the <code>render_settings</code> block except that it is read only and contains knowledge of the capabilities of the hardware (GPU) running the engine.</p>
<p>On the engine side there’s not that much to cover about the <code>render_settings</code> and <code>render_caps</code>, keys are always strings getting murmur hashed to 32 bits and the value can be a <code>bool</code>, <code>float</code>, array of <code>floats</code> or another hashed <code>string</code>.</p>
<p>When booting the renderer we populate the <code>render_settings</code> by first reading them from the <code>render_config</code> file, then looking in the project specific <code>settings.ini</code> file for potential overrides or additions, and last allowing to override certain properties again from the user’s configuration file (if loaded).</p>
<p>The <code>render_caps</code> block usually gets populated when the <code>RenderDevice</code> is booted and we’re in a state where we can enumerate all device capabilities. This makes the keys and values of the <code>render_caps</code> block somewhat of a black box with different contents depending on platform, typically they aren’t that many though.</p>
<p>So that covers the <code>render_settings</code> and <code>render_caps</code> blocks, we will look at how they are actually used for branching in later sections of this post.</p>
<p>There are also a few other miscellaneous blocks in the <code>render_config</code>, most important being:</p>
<ul>
<li><code>shader_pass_flags</code> - Array of strings building up a bit flag that can be used to dynamically turn on/off various shader passes.</li>
<li><code>shader_libraries</code> - Array of what <code>shader_source</code> files to load when booting the renderer. The <code>shader_source</code> files are libraries with pre-compiled shader libraries mainly used by the resource generators.</li>
</ul>
<h1><a id="Resource_Sets_63"></a>Resource Sets</h1>
<p>We have the concept of a <code>RenderResourceSet</code> on the engine side, it simply maps a hashed string to a GPU resource. <code>RenderResourceSets</code> can be locally allocated during rendering, creating a form of scoping mechanism. The resources are either allocated by the engine and inserted into a <code>RenderResourceSet</code> or allocated through the <code>global_resources</code> block in a <code>render_config</code> file.</p>
<p>The <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-6.html"><code>RenderInterface</code></a> owns a global <code>RenderResourceSet</code> populated by the <code>global_resources</code> array from the <code>render_config</code> used to boot the renderer.</p>
<p>Here’s an example of a <code>global_resources</code> array:</p>
<pre><code>global_resources = [
{ type="static_branch" platforms=["ios", "android", "web", "linux"]
pass = [
{ name="output_target" type="render_target" depends_on="back_buffer"
format="R8G8B8A8" }
]
fail = [
{ name="output_target" type="alias" aliased_resource="back_buffer" }
]
}
{ name="depth_stencil_buffer" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="DEPTH_STENCIL" }
{ name="gbuffer0" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R8G8B8A8" }
{ name="gbuffer1" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R8G8B8A8" }
{ name="gbuffer2" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R16G16B16A16F" }
{ type="static_branch" render_settings={ sun_shadows = true }
pass = [
{ name="sun_shadow_map" type="render_target" size_from_render_setting="sun_shadow_map_size"
format="DEPTH_STENCIL" }
]
}
{ name="hdr0" type="render_target" depends_on="output_target" w_scale=1 h_scale=1
format="R16G16B16A16F" }
]
</code></pre>
<p>So while the above example mainly shows how to create what we call <code>DependentRenderTargets</code> (i.e render targets that inherit its properties from another render target and then allow overriding properties locally), it can also create other buffers of various kinds.</p>
<p>We’ve also introduced the concept of a <code>static_branch</code>, there are two types of branching in the <code>render_config</code> file: <code>static_branch</code> and <code>dynamic_branch</code>. In the <code>global_resource</code> block only static branching is allowed as it only runs once, during set up of the renderer. (<em>Note:</em> The branch syntax is far from nice and we nowadays have come up with a much cleaner syntax that we use in the shader system, unfortunately it hasn’t made its way back to the <code>render_config</code> yet.)</p>
<p>So basically what this example boils down to is the creation of a set of render targets. The <code>output_target</code> is a bit special though, on PC and consoles we simply just setup an alias for an already created render target - the back buffer, while on gl based platforms we create a new separate render target. (This is because we render the scene up-side-down on gl-platforms to get consistent UV coordinate systems between all platforms.)</p>
<p>The other special case from the example above is the <code>sun_shadow_map</code> which grabs the resolution from a <code>render_setting</code> called <code>sun_shadow_map_size</code>. This is done because we want to expose the ability to tweak the shadow map resolution to the user.</p>
<p>When rendering a frame we typically pipe the global <code>RenderResourceSet</code> owned by the <code>RenderInterface</code> down to the various rendering systems. Any resource declared in the <code>RenderResourceSet</code> is accessible from the shader system by name. Each rendering system can at any point decide to create its own local version of a <code>RenderResourceSet</code> making it possible to scope shader resource access.</p>
<p>Worth pointing out is that the resources declared in the <code>global_resource</code> block of the <code>render_config</code> used when booting the engine are all allocated in the set up phase of the renderer and not released until the renderer is closed.</p>
<h1><a id="Layer_Configurations_116"></a>Layer Configurations</h1>
<p>A <code>render_config</code> can have multiple <code>layer_configurations</code>. A Layer Configuration is essentially a description of the flow of a rendered frame, it is responsible for triggering rendering sub-systems and scheduling the GPU work for a frame. Here’s a simple example of a deferred rendering pipe:</p>
<pre><code>
layer_configs = {
simple_deferred = [
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
{ resource_generator="lighting" profiling_scope="lighting" }
{ name="emissive" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="emissive" }
{ name="skydome" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="skydome" }
{ name="hdr_transparent" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ name="ldr_transparent" render_targets=["output_target"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="transparent" }
]
}
</code></pre>
<p>Each line in the <code>simple_deferred</code> array specifies either a named <em>layer</em> that the shader system can reference to direct rendering into (i.e a renderable object, like e.g. a mesh, has shaders assigned and the shaders know into which <em>layer</em> they want to render - e.g <code>gbuffer</code>), or it can trigger a <code>resource_generator</code>.</p>
<p>The order of execution is top->down and the way the GPU scheduling works is that each line increments a bit in the “Layer System” bit range covered in the post about <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-4-sorting.html">sorting</a>.</p>
<p>On the engine side the layer configurations are managed by a system called the <code>LayerManager</code>, owned by the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-6.html"><code>RenderInterface</code></a>. It is a tiny system that basically just maps the named <code>layer_config</code> to an array of “Layers”:</p>
<pre><code>struct Layer {
uint64_t sort_key;
IdString32 name;
render_sorting::DepthSort depth_sort;
IdString32 render_targets[MAX_RENDER_TARGETS];
IdString32 depth_stencil_target;
IdString32 resource_generator;
uint32_t clear_flags;
#if defined(DEVELOPMENT)
const char *profiling_scope;
#endif
};
</code></pre>
<ul>
<li><code>sort_key</code> - As mentioned above and in the post about how we do <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-4-sorting.html">sorting</a>, each layer gets a <code>sort_key</code> assigned from the “Layer System” bit range. By looking up the layer’s <code>sort_key</code> and using that when recording <code>Commands</code> to <code>RenderContexts</code> we get a simple way to reason about overall ordering of a rendered frame.</li>
<li><code>name</code> - the shader system can use this name to look up the layer’s <code>sort_key</code> to group draw calls into layers.</li>
<li><code>depth_sort</code> - describes how to encode the depth range bits of the sort key when recording a <code>RenderJobPackage</code> to a <code>RenderContext</code>. <code>depth_sort</code> is an enum that indicates if sorting should be done front-to-back or back-to-front.</li>
<li><code>render_targets</code> - array of named render target resources to bind for this layer</li>
<li><code>depth_stencil_target</code> - named render target resource to bind for this layer</li>
<li><code>resource_generator</code> -</li>
<li><code>clear_flags</code> - bit flag hinting if color, depth or stencil should be cleared for this layer</li>
<li><code>profiling_scope</code> - used to record markers on the <code>RenderContext</code> that later can be queried for GPU timings and statistics.</li>
</ul>
<p>When rendering a <code>World</code> (see: <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-6.html">RenderInterface</a>) the user passes a viewport to the <code>render_world</code> function, the viewport knows which <code>layer_config</code> to use. We look up the array of <code>Layers</code>from the <code>LayerManager</code> and record a <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-3-render.html"><code>RenderContext</code></a> with state commands for binding and clearing render targets using the <code>sort_keys</code> from the <code>Layer</code>. We do this dynamically each time the user calls <code>render_world</code> but in theory we could cache the <code>RenderContext</code> between <code>render_world</code> calls.</p>
<p>The name <code>Layer</code> is a bit misleading as a layer also can be responsible for making sure that a <code>ResourceGenerator</code> runs, in practice a <code>Layer</code> is either a target for the shader system to render into or it is the execution point for a <code>ResourceGenerator</code>. It can in theory be both but we never use it that way.</p>
<h1><a id="Resource_Generators_185"></a>Resource Generators</h1>
<p>The Resource Generators is a minimalistic framework for manipulating GPU resources and triggering various rendering sub-systems. Similar to a layer configuration a resource generator is described as an array of “modifiers”. Modifiers get executed in the order they were declared. Here’s an example:</p>
<pre><code>auto_exposure = {
modifiers = [
{ type="dynamic_branch" render_settings={ auto_exposure_enabled=true } profiling_scope="auto_exposure"
pass = [
{ type="fullscreen_pass" shader="quantize_luma" inputs=["hdr0"]
outputs=["quantized_luma"] profiling_scope="quantize_luma" }
{ type="compute_kernel" shader="compute_histogram" thread_count=[40 1 1] inputs=["quantized_luma"]
uavs=["histogram"] profiling_scope="compute_histogram" }
{ type="compute_kernel" shader="adapt_exposure" thread_count=[1 1 1] inputs=["quantized_luma"]
uavs=["current_exposure" "current_exposure_pos" "target_exposure_pos"] profiling_scope="adapt_exposure" }
]
}
]
}
</code></pre>
<p>First modifier in the above example is a <code>dynamic_branch</code>. In contrast to a <code>static_branch</code> which gets evaluated during loading of the render_config, a <code>dynamic_branch</code> is evaluated each time the resource generator runs making it possible to take different paths through the rendering pipeline based on settings and other game context that might change over time. Dynamic branching is also supported in the <code>layer_config</code> block.</p>
<p>If the branch is taken (i.e if <code>auto_exposure_enabled</code> is true) the modifiers in the <code>pass</code> array will run.</p>
<p>The first modifier is of the type <code>fullscreen_pass</code> and is by far the most commonly used modifier type. It simply renders a single triangle covering the entire viewport using the named <code>shader</code>. Any resource listed in the <code>inputs</code> array is exposed to the shader. Any resource(s) listed in the <code>outputs</code> array are bound as a render target(s).</p>
<p>The second and third modifiers are of the type <code>compute_kernel</code> and will dispatch a compute shader. <code>inputs</code> array is the same as for the <code>fullscreen_pass</code> and <code>uavs</code> lists resources to bind as UAVs.</p>
<p>This is obviously a very basic example, but the idea is the same for more complex resource generators. By chaining a bunch of modifiers together you can create interesting rendering effects entirely in data.</p>
<p>Stingray ships with a toolbox of various modifiers, and the user can also extend it with their own modifiers if needed. Here’s a list of some of the other modifiers we ship with:</p>
<ul>
<li><code>cascaded_shadow_mapping</code> - Renders a cascaded shadow map from a directional light.</li>
<li><code>atlased_shadow_mapping</code> - Renders a shadow map atlas from a set of spot and omni lights.</li>
<li><code>generate_mips</code> - Renders a mip chain for a resource by interleaving a resource generator that samples from sub-resource <em>n-1</em> while rendering into sub-resource <em>n</em>.</li>
<li><code>clustered_shading</code> - Assign a set of light sources to a clustered shading structure (on CPU at the moment).</li>
<li><code>deferred_shading</code> - Renders proxy volumes for a set of light sources with specified shaders (i.e. traditional deferred shading).</li>
<li><code>stream_capture</code> - Reads back the specified resource to CPU (usually multi-buffered to avoid stalls).</li>
<li><code>fence</code> - Synchronization of graphics and compute queues.</li>
<li><code>copy_resource</code> - Copies a resource from one GPU to another.</li>
</ul>
<p>In Stingray we encourage building all lighting and post processing using resource generators. So far it has proved very successful for us as it gives great per project flexibility. To make sharing of various rendering effects easier we also have a system called <a href="http://bitsquid.blogspot.se/2016/08/render-config-extensions.html"><code>render_config_extension</code></a> that we rolled out last year, which is essentially a plugin system to the <code>render_config</code> files.</p>
<p>I won’t go into much detail how the resource generator system works on the engine side, it’s fairly simple though; There’s a <code>ResourceGeneratorManager</code> that knows about all the generators, each time the user calls <code>render_world</code> we ask the manager to execute all generators referenced in the <code>layer_config</code> using the layers sort key. We don’t restrain modifiers in any way, they can be implemented to do whatever and have full access to the engine. E.g they are free to create their own <code>ResourceContexts</code>, spawn worker threads, etc. When the modifiers for all generators are done executing we are handed all <code>RenderContexts</code> they’ve created and can dispatch them together with the contexts from the regular scene rendering. To get scheduling between modifiers in a resource generators correct we use the 32-bit “user defined” range in the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-4-sorting.html">sort key</a>.</p>
<h1><a id="Future_improvements_234"></a>Future improvements</h1>
<p>Before we wrap up I’d like to cover some ideas for future improvements.</p>
<p>The Stingray engine has had a data-driven renderer from day one, so it has been around for quite some time by now. And while the <code>render_config</code> has served us good so far there are a few things that we’ve discovered that could use some attention moving forward.</p>
<h2><a id="Scalability_240"></a>Scalability</h2>
<p>The complexity of the default rendering pipe continues to increase as the demand for new rendering features targeting different industries (games, design visualization, film, etc.) increases. While the data-driven approach we have addresses the feature set scalability needs decently well, there is also an increasing demand to have feature parity across lots of different hardware. This tends to result in lots of branching in <code>render_config</code> making it a bit hard to follow.</p>
<p>In addition to that we also start seeing the need for managing multiple paths through the rendering pipe on the same platform, this is especially true when dealing with stereo rendering. On PC we currently we have 5 different paths through the default rendering pipe:</p>
<ul>
<li>Mono - Traditional mono rendering.</li>
<li>Stereo - Old school stereo rendering, one <code>render_world</code> call per eye. Almost identical to the mono path but still there are some stereo specific work for assembling the final image that needs to happen.</li>
<li>Instanced Stereo - Using “hardware instancing” to do stereo propagation to left/right eye. Single scene traversal pass, culling using a uber-frustum. A bunch of shader patch up work and some branching in the <code>render_config</code>.</li>
<li>Nvidia Single Pass Stereo (SPS) - Somewhat similar to instanced stereo but using nvidia specific hardware for doing multicasting to left/right eye.</li>
<li>Nvidia VRSLI - DX11 path for rendering left/right eye on separate GPUs.</li>
</ul>
<p>We estimate that the number of paths through the rendering pipe will continue to increase also for mono rendering, we’ve already seen that when we’ve experimented with explicit multi-GPU stuff under DX12. Things quickly becomes hairy when you aren’t running on a known platform. Also, depending on hardware it’s likely that you want to do different scheduling of the rendered frame - i.e its not as simple as saying: here are our 4 different paths we select from based on if the user has 1-4 GPUs in their systems, as that breaks down as soon as you don’t have the exact same GPUs in the system.</p>
<p>In the future I think we might want to move to an even higher level of abstraction of the rendering pipe that makes it easier to reason about different paths through it. Something that decouples the strict flow through the rendering pipe and instead only reasons about various “jobs” that needs to be executed by the GPUs and what their dependencies are. The engine could then dynamically re-schedule the frame load depending on hardware automatically… at least in theory, in practice I think it’s more likely that we would end up with a few different “frame scheduling configurations” and then select one of them based on benchmarking / hardware setup.</p>
<h2><a id="Memory_256"></a>Memory</h2>
<p>As mentioned earlier our system for dealing with GPU resources is very static, resources declared in the <code>global_resource</code> set are allocated as the renderer boots up and not released until the renderer is closed. On last gen consoles we had support for aliasing memory of resources of different types but we removed that when deprecating those platforms. With the rise of DX12/Vulkan and the move to 4K rendering this static resource system is in need of an overhaul. While we can (and do) try to recycle temporary render targets and buffers throughout the a frame it is easy to break some code path without noticing.</p>
<p>We’ve been toying with similar ideas to the “Transient Resource System” described in Yuriy O’Donnell’s excellent GDC2017 presentation: <a href="http://www.frostbite.com/2017/03/framegraph-extensible-rendering-architecture-in-frostbite/">FrameGraph: Extensible Rendering Architecture in Frostbite</a> but have so far not got around to test it out in practice.</p>
<h2><a id="DX12_improvements_262"></a>DX12 improvements</h2>
<p>Today our system implicitly deals with binding of input resources to shader stages. We expose pretty much everything to the shader system by name and if a shader stage binds a resource for reading we don’t know about it until we create the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-3-render.html"><code>RenderJobPackage</code></a>. This puts us in a somewhat bad situation when it comes to dealing with resource transitions as we end up having to do some rather complicated tracking to inject resource barriers at the right places during the <em>dispatch</em> stage of the <code>RenderContexts</code> (See: <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-5.html"><code>RenderDevice</code></a>).</p>
<p>We could instead enforce declaration of all writable GPU resources when they get bound as input to a layer or resource generator. As we already have explicit knowledge of when a GPU resource gets written to by a layer or resource generator, adding the explicit knowledge of when we read from one would complete the circle and we would have all the needed information to setup barriers without complicated tracking.</p>
<h1><a id="Wrap_up_268"></a>Wrap up</h1>
<p>Last week at GDC 2017 there were a few presentations (and a lot of discussions) around the concepts of having more high-level representations of a rendered frame and what benefits that brings. If you haven’t already I highly encourage you to check out both Yuriy O’Donnell’s presentation <a href="http://www.frostbite.com/2017/03/framegraph-extensible-rendering-architecture-in-frostbite/">“FrameGraph: Extensible Rendering Architecture in Frostbite”</a> and Aras Pranckevičius’s presentation: <a href="http://aras-p.info/texts/files/2017_GDC_UnityScriptableRenderPipeline.pdf">“Scriptable Render Pipeline”</a>.</p>
<p>In the next post I will briefly cover the feature set of the two <code>render_configs</code> that we ship as template rendering pipes with Stingray.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com116tag:blogger.com,1999:blog-1994130783874175266.post-16502856681632337142017-02-22T15:50:00.000+01:002017-02-22T15:50:33.213+01:00Stingray Renderer Walkthrough #6: RenderInterface<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #6: RenderInterface</title><style></style></head><body id="preview">
<p>Today we will be looking at the <code>RenderInterface</code>. I’ve struggled a bit with deciding if it is worth covering this piece of the code or not, as most of the stuff described will likely feel kind of obvious. In the end I still decided to keep it to give a more complete picture of how everything fits together. Feel free to skim through it or sit tight and wait for the coming two posts that will dive into the data-driven aspects of the Stingray renderer.</p>
<h1><a id="The_glue_layer_2"></a>The glue layer</h1>
<p>The <code>RenderInterface</code> is responsible for tying together a bunch of rendering sub-systems. Some of which we have covered in earlier posts (like e.g the <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-5.html"><code>RenderDevice</code></a>) and a bunch of other, more high-level, systems that forms the foundation of our data-driven rendering architecture.</p>
<p>The <code>RenderInterface</code> has a bunch of various responsibilities, including:</p>
<ul>
<li>
<p>Tracking of windows and swap chains.</p>
<p>While windows are managed by the simulation thread, swap chains are managed by the render thread. The <code>RenderInterface</code> is responsible for creating the swap chains and keep track of the mapping between a window and a swap chain. It is also responsible for signaling resizing and other state information from the window to the renderer.</p>
</li>
<li>
<p>Managing of <code>RenderWorlds</code>.</p>
<p>As mentioned in the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a> post, the renderer has its own representation of game <code>Worlds</code> called <code>RenderWorlds</code>. The <code>RenderInterface</code> is responsible for creating, updating and destroying the <code>RenderWorlds</code>.</p>
</li>
<li>
<p>Owner of the four main building blocks of our data-driven rendering architecture: <code>LayerManager</code>, <code>ResourceGeneratorManager</code>, <code>RenderResourceSet</code>, <code>RenderSettings</code></p>
<p>Will be covered in the next post (I’ve talked about them in various presentations before [<a href="https://www.dropbox.com/s/rehpgc9qkzo831k/flexible-rendering-multiple-platforms.pdf?dl=0">1</a>] [<a href="https://www.dropbox.com/s/rjcjiricpc89362/benefits-of-a-data-driven-renderer.pdf?dl=0">2</a>]).</p>
</li>
<li>
<p>Owner of the shader manager.</p>
<p>Centralized repository for all available/loaded shaders. Controls scheduling for loading, unload and hot-reloading of shaders.</p>
</li>
<li>
<p>Owner of the render resource streamer.</p>
<p>While all resource loading is asynchronous in Stingray (See [<a href="https://www.youtube.com/watch?v=nIxuGy6Jh-0&index=12&list=UU5XCn51L8rqL3XgZfOQN6qQ">3</a>]), the resource streamer I’m referring to in this context is responsible for dynamically loading in/out mip-levels of textures based on their screen coverage. Since this streaming system piggybacks on the view frustum culling system, it is owned and updated by the <code>RenderInterface</code>.</p>
</li>
</ul>
<h1><a id="The_interface_28"></a>The interface</h1>
<p>In addition to being the glue layer, the <code>RenderInterface</code> is also the interface to communicate with the renderer from other threads (simulation, resource streaming, etc.). The renderer operates under its own “controller thread” (as covered in the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a> post), and exposes two different types of functions: blocking and non-blocking.</p>
<h2><a id="Blocking_functions_32"></a>Blocking functions</h2>
<p>Blocking functions will enforce a flush of all outstanding rendering work (i.e. synchronize the calling thread with the rendering thread), allowing the caller to operate directly on the state of the renderer. This is mainly a convenience path when doing bigger state changes / reconfiguring the entire renderer, and should typically not be used during game simulation as it might cause stuttering in the frame rate.</p>
<p>Typical operations that are blocking:</p>
<ul>
<li>
<p>Opening and closing of the <code>RenderDevice</code>.</p>
<p>Sets up / shuts down the graphics API by calling the appropriate functions on the <code>RenderDevice</code>.</p>
</li>
<li>
<p>Creation and destruction of the swap chains.</p>
<p>Creating and destroying swap chains associated to a <code>Window</code>. Done by forwarding the calls to the <code>RenderDevice</code>.</p>
</li>
<li>
<p>Loading of the <code>render_config</code> / configuring the data-driven rendering pipe.</p>
<p>The <code>render_config</code> is a configuration file describing how the renderer should work for a specific project. It describes the entire flow of a rendered frame and without it the renderer won’t know what to do. It is the <code>RenderInterface</code> responsibility to make sure that all the different sub-systems (<code>LayerManager</code>, <code>ResourceGeneratorManager</code>, <code>RenderResourceSet</code>, <code>RenderSettings</code>) are set up correctly from the loaded <code>render_config</code>. More on this topic in the next post.</p>
</li>
<li>
<p>Loading, unloading and reloading of shaders.</p>
<p>The shader system doesn’t have a thread safe interface and is only meant to be accessed from the rendering thread. Therefor any loading, unloading and reloading of shaders needs to synchronize with the rendering thread.</p>
</li>
<li>
<p>Registering and unregistering of <code>Worlds</code></p>
<p>Creates or destroys a corresponding <code>RenderWorld</code> and sets up mapping information to go from <code>World*</code> to <code>RenderWorld*</code>.</p>
</li>
</ul>
<h2><a id="Nonblocking_functions_58"></a>Non-blocking functions</h2>
<p>Non-blocking functions communicates by posting messages to a ring-buffer that the rendering thread consumes. Since the renderer has its own representation of a “World” there is not much communication over this ring-buffer, in a normal frame we usually don’t have more than 10-20 messages posted.</p>
<p>Typical operations that are non-blocking:</p>
<ul>
<li>
<p>Rendering of a <code>World</code>.</p>
<pre><code>void render_world(World &world, const Camera &camera, const Viewport &viewport,
const ShadingEnvironment &shading_env, uint32_t swap_chain);
</code></pre>
<p>Main interface for rendering of a world viewed from a certain <code>Camera</code> into a certain <code>Viewport</code>. The <code>ShadingEnvironment</code> is basically just a set of shader constants and resources defined in data (usually containing a description of the lighting environment, post effects and similar). <code>swap_chain</code> is a handle referencing which window that will present the final result.</p>
<p>When the user calls this function a <code>RenderWorldMsg</code> will be created and posted to the ring buffer holding handles to the rendering representations for the world, camera, viewport and shading environment. When the message is consumed by rendering thread it will enter the first of the three stages described in the <a href="http://bitsquid.blogspot.se/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a> post - <em>Culling</em>.</p>
</li>
<li>
<p>Reflection of state from a <code>World</code> to the <code>RenderWorld</code>.</p>
<p>Reflects the “state delta” (from the last frame) for all objects on the simulation thread over to the render thread. For more details see [<a href="http://bitsquid.blogspot.com/2016/10/the-implementation-of-frustum-culling.html">4</a>].</p>
</li>
<li>
<p>Synchronization.</p>
<pre><code>uint32_t create_fence();
void wait_for_fence(uint32_t fence);
</code></pre>
<p>Synchronization methods for making sure the renderer is finished processing up to a certain point. Used to handle blocking calls and to make sure the simulation doesn’t run more than one frame ahead of the renderer.</p>
</li>
<li>
<p>Presenting a swap chain.</p>
<pre><code>void present_frame(uint32_t swap_chain = 0);
</code></pre>
<p>When the user is done with all rendering for a frame (i.e has no more <code>render_world</code> calls to do), the application will present the result by looping over all swap chains touched (i.e referenced in a previous call to <code>render_world</code>) and posting one or many <code>PresentFrameMsg</code> messages to the renderer.</p>
</li>
<li>
<p>Providing statistics from the <code>RenderDevice</code>.</p>
<p>As mentioned in the <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html"><code>RenderContext</code></a> post, we gather various statistics and (if possible) GPU timings in the <code>RenderDevice</code>. Exactly what is gathered depends on the implementation of the <code>RenderDevice</code>. The <code>RenderInterface</code> is responsible for providing a non blocking interface for retrieving the statistics. Note: the statistics returned will be 2 frames old as we update them after the rendering thread is done processing a frame (GPU timings are even older). This typically doesn’t matter though as usually they don’t fluctuate much from one frame to another.</p>
</li>
<li>
<p>Executing user callbacks.</p>
<pre><code>typedef void (*Callback)(void *user_data);
void run_callback(Callback callback, void *user, uint32_t user_data_size);
</code></pre>
<p>Generic callback mechanics to easily inject code to be executed by the rendering thread.</p>
</li>
<li>
<p>Creation, dispatching and releasing of <code>RenderContexts</code> and <code>RenderResourceContexts</code>.</p>
<p>While most systems tends to create, dispatch and release <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html"><code>RenderContexts</code></a> and <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-2.html"><code>RenderResourceContexts</code></a> from the rendering thread there can be use cases for doing it from another thread (e.g. the resource thread creates <code>RenderResourceContexts</code>). The <code>RenderInterface</code> provides the necessary functions for doing so in a thread-safe way without having to block the rendering thread.</p>
</li>
</ul>
<h1><a id="Wrap_up_112"></a>Wrap up</h1>
<p>The <code>RenderInterface</code> in itself doesn’t get more interesting than that. Something needs to be responsible for coupling of various rendering systems and manage the interface for communicating with the controlling thread of the renderer - the <code>RenderInterface</code> is that something.</p>
<p>In the next post we will walk through the various components building the foundation of the data-driven rendering architecture and go through some examples of how to configure them to do something fun from the <code>render_config</code> file.</p>
<p>Stay tuned.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com48tag:blogger.com,1999:blog-1994130783874175266.post-83146536258093499962017-02-17T13:55:00.000+01:002017-02-17T13:55:03.976+01:00Stingray Renderer Walkthrough #5: RenderDevice<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #5: RenderDevice</title><style></style></head><body id="preview">
<h1><a id="Overview_0"></a>Overview</h1>
<p>The <code>RenderDevice</code> is essentially our abstraction layer for platform specific rendering APIs. It is implemented as an abstract base class that various rendering back-ends (D3D11, D3D12, OGL, Metal, GNM, etc.) implement.</p>
<p>The <code>RenderDevice</code> has a bunch of helper functions for initializing/shutting down the graphics APIs, creating/destroying swap chains, etc. All of which are fairly straightforward so I won’t cover them in this post, instead I will put my focus on the two <code>dispatch</code> functions consuming <code>RenderResourceContexts</code> and <code>RenderContexts</code>:</p>
<pre><code>
class RenderDevice {
public:
virtual void dispatch(uint32_t n_contexts, RenderResourceContext **rrc,
uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
virtual void dispatch(uint32_t n_contexts, RenderContext **rc,
uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
};
</code></pre>
<h1><a id="Resource_Management_19"></a>Resource Management</h1>
<p>As covered in the post about <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-2.html"><code>RenderResourceContexts</code></a>, they provide a free-threaded interface for allocating and deallocating GPU resources. However, it is not until the user has called <code>RenderDevice::dispatch()</code> handing over the <code>RenderResourceContexts</code> as their representation gets created on the <code>RenderDevice</code> side.</p>
<p>All implementations of a <code>RenderDevice</code> have some form of resource management that deals with creating, updating and destroying of the graphics API specific representations of resources. Typically we track the state of all various types of resources in a single struct, here’s a stripped down example from the DX12 <code>RenderDevice</code> implementation called <code>D3D12ResourceContext</code>:</p>
<pre><code>
struct D3D12VertexBuffer
{
D3D12_VERTEX_BUFFER_VIEW view;
uint32_t allocation_index;
int32_t size;
};
struct D3D12IndexBuffer
{
D3D12_INDEX_BUFFER_VIEW view;
uint32_t allocation_index;
int32_t size;
};
struct D3D12ResourceContext
{
Array<D3D12VertexBuffer> vertex_buffers;
Array<uint32_t> unused_vertex_buffers;
Array<D3D12IndexBuffer> index_buffers;
Array<uint32_t> unused_index_buffers;
// .. lots of other resources
Array<uint32_t> resource_lut;
};
</code></pre>
<p>As you might <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-2.html">remember</a>, the linking between the engine representation and the <code>RenderDevice</code> representation is done using the <code>RenderResource::render_resource_handle</code>. It encodes both the type of the resource as well as a handle. The <code>resource_lut</code> is an indirection to go from the engine handle to a local index for a specific type (e.g <code>vertex_buffers</code> or <code>index_buffers</code> in the sample above). We also track freed indices for each type (e.g. <code>unused_vertex_buffers</code>) to simplify recycling of slots.</p>
<p>The implementation of the dispatch function is fairly straight forward. We simply iterate over all the <code>RenderResourceContexts</code> and for each context iterate over its commands and either allocate or deallocate resources in the <code>D3D12ResourceContext</code>. It is important to note that this is a synchronous operation, nothing else is peeking or poking on the <code>D3D12ResourceContext</code> when the dispatch of <code>RenderResourceContexts</code> is happening, which makes our life a lot easier.</p>
<p>Unfortunately that isn’t the case when we dispatch <code>RenderContexts</code> as in that case we want to go wide (i.e. forking the workload and process it using multiple worker threads) when translating the commands into API calls. While we don’t allow allocating and deallocating new resources from the <code>RenderContexts</code> we do allow updating them which mutates the state of the <code>RenderDevice</code> representations (e.g. a <code>D3D12VertexBuffer</code>).</p>
<p>At the moment our solution for this isn’t very nice, basically we don’t allow asynchronous updates for anything else than <code>DYNAMIC</code> buffers. <code>UPDATABLE</code> buffers are always updated serially before we kick the worker threads no matter what their sort_key is. All worker threads access resources through their own copy of something we call a <code>ResourceAccessor</code>, it is responsible for tracking the worker threads state of dynamic buffers (among other things). In the future I think we probably should generalize this and treat <code>UPDATABLE</code> buffers in a similar way.</p>
<p>(Note: this limitation doesn’t mean you can’t update an <code>UPDATABLE</code> buffer more than once per frame, it simply means you cannot update it more than once per <code>dispatch</code>).</p>
<h2><a id="Shaders_67"></a>Shaders</h2>
<p>Resources in the <code>D3D12ResourceContext</code> are typically buffers. One exception that stands out is the <code>RenderDevice</code> representation of a “shader”. A “shader” on the <code>RenderDevice</code> side maps to a <code>ShaderTemplate::Context</code> on the engine side, or what I guess we could call a multi-pass shader. Here’s some pseudo code:</p>
<pre><code>
struct ShaderPass
{
struct ShaderProgram
{
Array<uint8_t> bytecode;
struct ConstantBufferBindInfo;
struct ResourceBindInfo;
struct SamplerBindInfo;
};
ShaderProgram vertex_shader;
ShaderProgram domain_shader;
ShaderProgram hull_shader;
ShaderProgram geometry_shader;
ShaderProgram pixel_shader;
ShaderProgram compute_shader;
struct RenderStates;
};
struct Shader
{
Vector<ShaderPass> passes;
enum SortMode { IMMADIATE, DEFERRED };
uint32_t sort_mode;
};
</code></pre>
<p>The pseudo code above is essentially the <code>RenderDevice</code> representation of a shader that we serialize to disk during data compilation. From that we can create all the necessary graphics API specific objects expressing an executable shader together with its various state blocks (Rasterizer, Depth Stencil, Blend, etc.).</p>
<p>As discussed in the last <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-4-sorting.html">post</a> the <code>sort_key</code> encodes the shader pass index. Using <code>Shader::sort_mode</code>, we know which bit range to extract from the <code>sort_key</code> as pass index, which we then use to look up the <code>ShaderPass</code> from <code>Shader::passes</code>. A <code>ShaderPass</code> contains one <code>ShaderProgram</code> per active shader stage and each <code>ShaderProgram</code> contains the byte code for the shader to compile as well as “bind info” for various resources that the shader wants as input.</p>
<p>We will look at this in a bit more detail in the post about “Shaders & Materials”, for now I just wanted to familiarize you with the concept.</p>
<h1><a id="Render_Context_translation_107"></a>Render Context translation</h1>
<p>Let’s move on and look at the dispatch for translating <code>RenderContexts</code> into graphics API calls:</p>
<pre><code>class RenderDevice {
public:
virtual void dispatch(uint32_t n_contexts, RenderContext **rc,
uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
};
</code></pre>
<p>The first thing all <code>RenderDevice</code> implementation do when receiving a bunch of <code>RenderContexts</code> is to merge and sort their <code>Commands</code>. All implementations share the same code for doing this:</p>
<pre><code>void prepare_command_list(RenderContext::Commands &output, unsigned n_contexts, RenderContext **contexts);
</code></pre>
<p>This function basically just takes the <code>RenderContext::Commands</code> from all <code>RenderContexts</code> and merges them into a new array, runs a stable radix sort, and returns the sorted commands in <code>output</code>. To avoid memory allocations the <code>RenderDevice</code> implementation owns the memory of the output buffer.</p>
<p>Now we have all the commands nicely sorted based on their <code>sort_key</code>. Next step is to do the actual translation of the data referenced by the commands into graphics API calls. I will explain this process with the assumption that we are running on a graphics API that allows us to build graphics API command lists in parallel (e.g. DX12, GNM, Vulkan, Metal), as that feels most relevant in 2017.</p>
<p>Before we start figuring out our per thread workloads for going wide, we have one more thing to do; “instance merging”.</p>
<h2><a id="Instance_Merging_131"></a>Instance Merging</h2>
<p>I’ve mentioned the idea behind instance merging before [<a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html">1</a>,<a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-4-sorting.html">2</a>], basically we want to try to reduce the number of <code>RenderJobPackages</code> (i.e. draw calls) by identifying packages that are similar enough to be merged. In Stingray “similar enough” basically means that they must have identical inputs to the input assembler as well as identical resources bound to all shader stages, the only thing that is allowed to differ are constant buffer variables. (Note: by todays standards this can be considered a bit old school, new graphics APIs and hardware allows to tackle this problem more aggressively using “bindless” concepts. )</p>
<p>The way it works is by filtering out ranges of <code>RenderContexts::Commands</code> where the “instance bit” of the <code>sort_key</code> is set and all bits above the instance bit are identical. Then for each of those ranges we fork and go wide to analyze the actual <code>RenderJobPackage</code> data to see if the <code>instance_hash</code> and the shader are the same, and if so we know its safe to merge them.</p>
<p>The actual merge is done by extracting the instance specific constants (these are tagged by the shader author) from the constant buffers and propagating them into a dynamic <code>RawBuffer</code> that gets bound as input to the vertex shader.</p>
<p>Depending on how the scene is constructed, instance merging can significantly reduce the number of draw calls needed to render the final scene. The instance merger in itself is not graphics API specific and is isolated in its own system, it just happens to be the responsibility of the <code>RenderDevice</code> to call it. The interface looks like this:</p>
<pre><code>namespace instance_merger {
struct ProcessMergedCommandsResult
{
uint32_t n_instances;
uint32_t instanced_batches;
uint32_t instance_buffer_size;
};
ProcessMergedCommandsResult process_merged_commands(Merger &instance_merger,
RenderContext::Commands &merged_commands);
}
</code></pre>
<p>Pass in a reference to the sorted <code>RenderContext::Commands</code> in <code>merged_commands</code> and after the instance merger is done running you hopefully have fewer commands in the array. :)</p>
<p>You could argue that merging, sorting and instance merging should all happen before we enter the world of the <code>RenderDevice</code>. I wouldn’t argue against that.</p>
<h2><a id="Prepare_workloads_161"></a>Prepare workloads</h2>
<p>Last step before we can start translating our commands into state / draw / dispatch calls is to split the workload into reasonable chunks and prepare the execution contexts for our worker threads.</p>
<p>Typically we just divide the number of <code>RenderContext::Commands</code> we have to process with the number of worker threads we have available. We don’t care about the type of different commands we will be processing and trying to load balance differently. The reasoning behind this is that we anticipate that draw calls will always represent the bulk of the commands and the rest of the commands can be considered as unavoidable “noise”. We do, however, make sure that we don’t do less than <em>x</em>-number of commands per worker threads, where <em>x</em> can differ a bit depending on platform but is usually ~128.</p>
<p>For each execution context we create a <code>ResourceAccessors</code> (described above) as well as make sure we have the correct state setup in terms of bound render targets and similar. To do this we are stuck with having to do a synchronous serial sweep over all the commands to find bigger state changing commands (such as <code>RenderContext::set_render_target</code>).</p>
<p>This is where the <code>Command::command_flags</code> bit-flag comes into play, instead of having to jump around in memory to figure out what type of command the <code>Command::head</code> points to, we put some hinting about the type in the <code>Command::command_flags</code>, like for example if it is a “state command”. This way the serial sweep doesn’t become very costly even when dealing with large number of commands. During this sweep we also deal with updating of <code>UPDATABLE</code> resources, and on newer graphics APIs we track fences (discussed in the post about <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html">Render Contexts</a>).</p>
<p>The last thing we do is to set up the execution contexts with create graphics API specific representations of command lists (e.g. <code>ID3D12GraphicsCommandList</code> in DX12),</p>
<h2><a id="Translation_173"></a>Translation</h2>
<p>When getting to this point doing the actual translation is fairly straight forward. Within each worker thread we simply loop over its dedicated range of commands, fetch its data from <code>Command::head</code> and generate any number of API specific commands necessary based on the type of command.</p>
<p>For a <code>RenderJobPackage</code> representing a draw call it involves:</p>
<ul>
<li>Look up the correct shader pass and, unless already bound, bind all active shader stages</li>
<li>Look up the state blocks (Rasterizer, Depth stencil, Blending, etc.) from the shader and bind them unless already bound</li>
<li>Look up and bind the resources for each shader stage using the <code>RenderResource::render_resource_handle</code> translated through the <code>D3D12ResourceAccessor</code></li>
<li>Setup the input assembler by looping over the <code>RenderResource::render_resource_handles</code> pointed to by the <code>RenderJobPackage::resource_offset</code> and translated through the <code>D3D12ResourceAccessor</code></li>
<li>Bind and potentially update constant buffers</li>
<li>Issue the draw call</li>
</ul>
<p>The execution contexts also holds most-recently-used caches to avoid unnecessary binds of resources/shaders/states etc.</p>
<p>Note: In DX12 we also track where resource barriers are needed during this stage. After all worker threads are done we might also end up having to inject further resource barriers between the command lists generated by the worker threads. We have ideas on how to improve on this by doing at least parts of this tracking when building the <code>RenderContexts</code> but haven’t gotten around looking into it yet.</p>
<h2><a id="Execute_190"></a>Execute</h2>
<p>When the translation is done we pass the resulting command lists to the correct queues for execution.</p>
<p>Note: In DX12 this is a bit more complicated as we have to interleave signaling / waiting on fences between command list execution (<code>ExecuteCommandList</code>).</p>
<h1><a id="Next_up_196"></a>Next up</h1>
<p>I’ve deliberately not dived into too much details in this post to make it a bit easier to digest. I think I’ve manage to cover the overall design of a <code>RenderDevice</code> though, enough to make it easier for people diving into the code for the first time.</p>
<p>With this post we’ve reached half-way through this series, we have covered the “low-level” aspects of the Stingray rendering architecture. As of next post we will start looking at more high-level stuff, starting with the <code>RenderInterface</code> which is the main interface for other threads to talk with the renderer.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com60tag:blogger.com,1999:blog-1994130783874175266.post-54951780518470957862017-02-14T14:56:00.002+01:002017-02-14T14:56:40.441+01:00Stingray Renderer Walkthrough #4: Sorting<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #4: Sorting</title><style></style></head><body id="preview">
<h1><a id="Introduction_0"></a>Introduction</h1>
<p>This post will focus on ordering of the commands in the <code>RenderContexts</code>. I briefly touched on this subject in the last <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html">post</a> and if you’ve implemented a rendering engine before you’re probably not new to this problem. Basically we need a way to make sure our <code>RenderJobPackages</code> (draw calls) end up on the screen in the correct order, both from a visual point of view as well as from a performance point of view. Some concrete examples,</p>
<ol>
<li>Make sure g-buffers and shadow maps are rendered before any lighting happens.</li>
<li>Make sure opaque geometry is rendered front to back to reduce overdraw.</li>
<li>Make sure transparent geometry is rendered back to front for alpha blending to generate correct results.</li>
<li>Make sure the sky dome is rendered after all opaque geometry but before any transparent geometry.</li>
<li>All of the above but also strive to reduce state switches as much as possible.</li>
<li>All of the above but depending on GPU architecture maybe shift some work around to better utilize the hardware.</li>
</ol>
<p>There are many ways of tackling this problem and it’s not uncommon that engines uses multiple sorting systems and spend quite a lot of frame time getting this right.</p>
<p>Personally I’m a big fan of explicit ordering with a single stable sort. What I mean by explicit ordering is that every command that gets recorded to a <code>RenderContext</code> already has the knowledge of when it will be executed relative to other commands. For us this knowledge is in the form of a 64 bit <code>sort_key</code>, in the case where we get two commands with the exact same <code>sort_key</code> we rely on the sort being stable to not introduce any kind of temporal instabilities in the final output.</p>
<p>The reasons I like this approach are many,</p>
<ol>
<li>It’s trivial to implement compared to various bucketing schemes and sorting of those buckets.</li>
<li>We only need to visit renderable objects once per view (when calling their <code>render()</code> function), no additional pre-visits for sorting are needed.</li>
<li>The sort is typically fast, and cost is isolated and easy to profile.</li>
<li>Parallel rendering works out of the box, we can just take all the <code>Command</code> arrays of all the <code>RenderContexts</code> and merge them before sorting.</li>
</ol>
<p>To make this work each command needs to know its absolute <code>sort_key</code>. Let’s breakdown the <code>sort_key</code> we use when working with our data-driven rendering pipe in Stingray. (Note: if the user doesn’t care about playing nicely together with our system for data-driven rendering it is fine to completely ignore the bit allocation patterns described below and roll their own.)</p>
<h1><a id="sort_key_breakdown_24"></a><code>sort_key</code> breakdown</h1>
<p>Most significant bit on the left, here are our bit ranges:</p>
<pre><code>MSB [ 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ] LSB
^ ^ ^ ^ ^^ ^
| | | | || |- 3 bits - Shader System (Pass Immediate)
| | | | ||- 16 bits - Depth
| | | | |- 1 bit - Instance bit
| | | |- 32 bits - User defined
| | |- 3 bits - Shader System (Pass Deferred)
| - 7 bits - Layer System
|- 2 bits - Unused
</code></pre>
<p><strong><code>2 bits - Unused</code></strong></p>
<p>Nothing to see here, moving on… (Not really sure why these 2 bits are unused, I guess they weren’t at some point but for the moment they are always zero) :)</p>
<p><strong><code>7 bits - Layer System</code></strong></p>
<p>This 7-bits range is managed by the “Layer system”. The Layer system is responsible for controlling the overall scheduling of a frame and is set up in the <code>render_config</code> file. It’s a central part of the data-driven rendering architecture in Stingray. It allows you to configure what layers to expose to the shader system and in which order these layers should be drawn. We will look closer at the implementation of the layer system in a later post but in the interest of clarifying how it interops with the <code>sort_key</code> here’s a small example:</p>
<pre><code>
default = [
// sort_key = [ 00000000 10000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
// sort_key = [ 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer"
profiling_scope="decal" sort="EXPLICIT" }
// sort_key = [ 00000001 10000000 00000000 00000000 00000000 00000000 00000000 00000000 ]
{ resource_generator="lighting" profiling_scope="lighting" }
// sort_key = [ 00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ] LSB
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer"
sort="FRONT_BACK" profiling_scope="emissive" }
]
</code></pre>
<p>Above we have three layers exposed to the shader system and one kick of a <code>resource_generator</code> called <code>lighting</code> (more about <code>resource_generators</code> in a later post). The layers are rendered in the order they are declared, this is handled by letting each new layer increment the 7 bits range belonging to the Layer System with 1 (as can be seen in the <code>sort_key</code> comments above).</p>
<p>The shader author dictates into which layer(s) it wants to render. When a <code>RenderJobPackage</code> is recorded to the <code>RenderContext</code> (as described in the last <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html">post</a>) the correct layer <code>sort_keys</code> are looked up from the layer system and the result is bitwise ORed together with the <code>sort_key</code> value piped as argument to <code>RenderContext::render()</code>.</p>
<p><strong><code>3 bits - Shader System (Pass Deferred)</code></strong></p>
<p>The next 3 bits are controlled by the Shader System. These three bits encode the shader pass index <em>within</em> a layer. When I say shader in this context I refer to our <code>ShaderTemplate::Context</code> which is basically a wrapper around multiple linked shaders rendering into one or many layers. (Nathan Reed recently blogged about <a href="http://reedbeta.com/blog/many-meanings-of-shader/">“The Many Meanings of “Shader””</a>, in his analogy our <code>ShaderTemplate</code> is the same as an “Effect”)</p>
<p>Since we can have a multi-pass shader rendering into the same layer we need to encode the pass index into the <code>sort_key</code>, that is what this 3 bit range is used for.</p>
<p><strong><code>32 bits - User defined</code></strong></p>
<p>We then have 32 user defined bits, these bits are primarily used by our “Resource Generator” system (I will be covering this system in the post about <code>render_config</code> & data-driven rendering later), but the user is free to use them anyway they like and still maintain compatibility with the data-driven rendering system.</p>
<p><strong><code>1 bit - Instance bit</code></strong></p>
<p>This single bit also comes from the Shader System and is set if the shader implements support for “Instance Merging”. I will be covering this in a bit more detail in my next post about the <code>RenderDevice</code> but essentially this bit allows us to scan through all commands and find ranges of commands that potentially can be merged together to fewer draw calls.</p>
<p><strong><code>16 bits - Depth</code></strong></p>
<p>One of the arguments piped to <code>RenderContext::render()</code> is an unsigned normalized depth value (0.0-1.0). This value gets quantized into these 16 bits and is what drives the front-to-back vs back-to-front sorting of <code>RenderJobPackages</code>. If the sorting criteria for the layer (see layer example above) is set to back-to-front we simply flip the bits in this range.</p>
<p><strong><code>3 bits - Shader System (Pass Immediate)</code></strong></p>
<p>A shader can be configured to run in “Immediate Mode” instead of “Deferred Mode” (default). This forces passes in a multi-pass shader to run immediately after each other and is achieved by moving the pass index bits into the least significant bits of the <code>sort_key</code>. The concept is probably easiest to explain with an artificial example and some pseudo code:</p>
<p>Take a simple scene with a few instances of the same mesh, each mesh recording one <code>RenderJobPackages</code> to one or many <code>RenderContexts</code> and all <code>RenderJobPackages</code> are being rendered with the same multi-pass shader.</p>
<p>In “Deferred Mode” (i.e pass indices encoded in the “Shader System (Pass Deferred)” range) you would get something like this:</p>
<pre><code>foreach (pass in multi-pass-shader)
foreach (render-job in render-job-packages)
render (render-job)
end
end
</code></pre>
<p>If shader is configured to run in “Immediate Mode” you would instead get something like this:</p>
<pre><code>foreach (render-job in render-job-packages)
foreach (pass in multi-pass-shader)
render (render-job)
end
end
</code></pre>
<p>As you probably can imagine the latter results in more shader / state switches but can sometimes be necessary to guarantee correctly rendered results. A typical example is when using multi-pass shaders that does alpha blending.</p>
<h1><a id="Wrap_up_122"></a>Wrap up</h1>
<p>The actual sort is implemented using a standard stable radix sort and happens immediately after the user has called <code>RenderDevice::dispatch()</code> handing over <em>n</em>-number of <code>RenderContexts</code> to the <code>RenderDevice</code> for translation into graphics API calls.</p>
<p>Next post will cover this and give an overview of what a typical rendering back-end (<code>RenderDevice</code>) looks like in Stingray. Stay tuned.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com449tag:blogger.com,1999:blog-1994130783874175266.post-29384100176024563872017-02-10T15:11:00.000+01:002017-02-10T15:11:10.216+01:00Stingray Renderer Walkthrough #3: Render Contexts<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #3: Render Contexts</title><style></style></head><body id="preview">
<h1><a id="Render_Contexts_Overview_0"></a>Render Contexts Overview</h1>
<p>In the last post we covered how to create and destroy various GPU resources. In this post we will go through the system we have for recording a stream of rendering commands/packages that later gets consumed by the render backend (<code>RenderDevice</code>) where they are translated into actual graphics API calls. We call this interface <code>RenderContext</code> and similar to <code>RenderResourceContext</code> we can have multiple <code>RenderContexts</code> in flight at the same time to achieve data parallelism.</p>
<p>Let’s back up and reiterate a bit what was said in the <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a> post. Typically in a frame we take the result of the view frustum culling, split it up into a number of chunks, allocate one <code>RenderContext</code> per chunk and then kick one worker thread per chunk. Each worker thread then sequentially iterates over its range of renderable objects and calls their <code>render()</code> function. The <code>render()</code> function takes the chunk’s <code>RenderContext</code> as one of its argument and is responsible for populating it with commands. When all worker threads are done the resulting <code>RenderContexts</code> gets “dispatched” to the <code>RenderDevice</code>.</p>
<p>So essentially the <code>RenderContext</code> is the output data structure for the second stage <code>Render</code> as discussed in the <a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a> post.</p>
<p>The <code>RenderContext</code> is very similar to the <code>RenderResourceContext</code> in the sense that it’s a fairly simple helper class for populating a command buffer. There is one significant difference though; the <code>RenderContext</code> also has a mechanics for reasoning about the ordering of the commands in the buffer before they get translated into graphics API calls by the <code>RenderDevice</code>.</p>
<h1><a id="Ordering__Buffers_10"></a>Ordering & Buffers</h1>
<p>We need a way to reorder commands in one or many <code>RenderContexts</code> to make sure triangles end up on the screen in the right order, or more generally speaking; to schedule our GPU work.</p>
<p>There are many ways of dealing with this but my favorite approach is to just associate one or many commands with a 64 bit sort key and when all commands have been recorded simply sort them on this key before translating them into actual graphics API calls. The approach we are using in Stingray is heavily inspired by Christer Ericsson’s blog post <a href="http://realtimecollisiondetection.net/blog/?p=86">“Order your graphics draw calls around!”</a>. I will be covering our sorting system in more details in my next post, for now the only thing important to grasp is that while the <code>RenderContext</code> records commands it does so by populating two buffers. One is a simple array of a POD struct called <code>Command</code>:</p>
<pre><code>struct Command
{
uint64_t sort_key;
void *head;
uint32_t command_flags;
};
</code></pre>
<ul>
<li><code>sort_key</code> - 64 bit sort key used for reordering commands before being consumed by the <code>RenderDevice</code>, more on this later.</li>
<li><code>head</code> - Pointer to the actual data for this command.</li>
<li><code>command_flags</code> - A bit flag encoding some hinting about what kind of command <code>head</code> is actually pointing to. This is simply an optimization to reduce pointer chasing in the <code>RenderDevice</code>, it will be covered in more detail in a later post.</li>
</ul>
<h2><a id="Render_Package_Stream_29"></a>Render Package Stream</h2>
<p>The other buffer is what we call a <code>RenderPackageStream</code> and is what holds the actual command data. The <code>RenderPackageStream</code> class is essentially just a few helper functions to put arbitrary length commands into memory. The memory backing system for <code>RenderPackageStreams</code> is somewhat more complex than a simple array though, this is because we need a way to keep its memory footprint under control. For efficiency, we want to recycle the memory instead of reallocating it every frame, but depending on workload we are likely to get some <code>RenderContexts</code> becoming much larger than others. This creates a problem when using simple arrays to store the commands as the workload will shift slightly over time causing all arrays having to grow to fit the worst case scenario, resulting in lots of wasted memory.</p>
<p>To combat this we allocate and return fixed size blocks of memory from a pool. As we know the size of each command before writing them to the buffer we can make sure that a command doesn’t end up spanning multiple blocks; if we detect that we are about to run out of memory in the active block we simply allocate a new block and move on. If we detect that a single command will span multiple blocks we make sure to allocate them sequentially in memory. We return a block to the pool when we are certain that the consumer of the data (in this case the <code>RenderDevice</code>) is done with it. (This memory allocation approach is well described in Christian Gyrling’s excellent GDC 2015 presentation <a href="http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine">Parallelizing the Naughty Dog Engine Using Fibers</a>)</p>
<p>You might be wondering why we put the <code>sort_key</code> in a separate array instead of putting it directly into the header data of the packages written to the <code>RenderPackageStream</code>, there are a number of reasons for that:</p>
<ol>
<li>
<p>The actual package data can become fairly large even for regular draw calls. Since we want to make the packages self contained we have to put all data needed to translate the command into an graphics API call inside the package. This includes handles to all resources, constant buffer reflections and similar. I don’t know of any way to efficiently sort an array with elements of varying sizes.</p>
</li>
<li>
<p>Since we allocate the memory in blocks, as described above, we would need to introduce some form of “jump label” and insert that into the buffer to know how and when to jump into the next memory block. This would further complicate the sorting and traversal of the buffers.</p>
</li>
<li>
<p>It allows us to recycle the actual package data from one draw call to another when rendering multi-pass shaders as we simply can inject multiple <code>Command</code>s pointing to the same package data. (Which shader pass to use when translating the package into graphic API calls can later be extracted from the <code>sort_key</code>.)</p>
</li>
<li>
<p>We can reduce pointer chasing by encoding hints in the <code>Command</code> about the contents of the package data. This is what we do in <code>command_flags</code> mentioned earlier.</p>
</li>
</ol>
<h1><a id="Render_Context_interface_45"></a>Render Context interface</h1>
<p>With the low-level concepts of the <code>RenderContext</code> covered let’s move on and look at how it is used from a users perspective.</p>
<p>If we break down the API there are essentially three different types of commands that populates a <code>RenderContext</code>:</p>
<ol>
<li><strong>State commands</strong> - Commands affecting the state of the rendering pipeline (e.g render target bindings, viewports, scissoring, etc) + some miscellaneous commands.</li>
<li><strong>Rendering commands</strong> - Commands used to trigger draw calls and compute work on the GPU.</li>
<li><strong>Resource update commands</strong> - Commands for updating GPU resources.</li>
</ol>
<h2><a id="1_State_Commands_55"></a>1. State Commands</h2>
<p>“State commands” are a series of commands getting executed in sequence for a specific <code>sort_key</code>. The interface for starting/stopping the recording looks like this:</p>
<pre><code>class RenderContext
{
void begin_state_command(uint64_t sort_key, uint32_t gpu_affinity_mask = GPU_DEFAULT);
void end_state_command();
};
</code></pre>
<ul>
<li><code>sort_key</code> - the 64 bit sort key.</li>
<li><code>gpu_affinity_mask</code> - I will cover this towards the end of this post but, for now just think of it as a bit mask for addressing one or many GPUs.</li>
</ul>
<p>Here’s a small example showing what the recording of a few state commands might look like:</p>
<pre><code>rc.begin_state_command(sort_key);
for (uint32_t i=0; i!=MAX_RENDER_TARGETS; ++i)
rc.set_render_target(i, nullptr);
rc.set_depth_stencil_target(depth_shadow_map);
rc.clear(RenderContext::CLEAR_DEPTH);
rc.set_viewports(1, &viewport);
rc.set_scissor_rects(1, &scissor_rect);
rc.end_state_command();
</code></pre>
<p>While state commands primarily are used for doing bigger graphics pipeline state changes (like e.g. changing render targets) they are also used for some miscellaneous things like clearing of bound render targets, pushing/poping timer markers, and some other stuff. There is no obvious reasoning for grouping these things together under the name “state commands”, it’s just something that has happened over time. Keep that in mind as we go through the list of commands below.</p>
<h3><a id="Common_commands_86"></a>Common commands</h3>
<ul>
<li>
<p><code>set_render_target(uint32_t slot, RenderTarget *target, const SurfaceInfo& surface_info);</code></p>
<ul>
<li><code>slot</code> - Which index of the “Multiple Render Target” (MRT) chain to bind</li>
<li><code>target</code> - What <code>RenderTarget</code> to bind</li>
<li><code>surface_info</code> - <code>SurfaceInfo</code> is a struct describing which surface of the <code>RenderTarget</code> to bind.</li>
</ul>
<pre><code>struct SurfaceInfo {
uint32_t array_index; // 0 in all cases except if binding a texture array
uint32_t slice; // 0 for 2D textures, 0-5 for cube maps, 0-n for volume textures
uint32_t mip_level; // 0-n depending on wanted mip level
};
</code></pre>
</li>
<li>
<p><code>set_depth_stencil_target(RenderTarget *target, const SurfaceInfo& surface_info);</code> - Same as above but for depth stencil.</p>
</li>
<li>
<p><code>clear(RenderContext::ClearFlags flags);</code> - Clears currently bound render targets.</p>
<ul>
<li><code>flags</code> - enum bit flag describing what parts of the bound render targets to clear.</li>
</ul>
<pre><code>enum ClearFlags {
CLEAR_SURFACE = 0x1,
CLEAR_DEPTH = 0x2,
CLEAR_STENCIL = 0x4
};
</code></pre>
</li>
<li>
<p><code>set_viewports(uint32_t n_viewports, const Viewport *viewports);</code></p>
<ul>
<li><code>n_viewports</code> - Number of viewports to bind.</li>
<li><code>viewports</code> - Pointer to first <code>Viewport</code> to bind. <code>Viewport</code> is a struct describing the dimensions of the viewport:</li>
</ul>
<pre><code>struct Viewport {
float x, y, width, height;
float min_depth, max_depth;
};
</code></pre>
<p>Note that <code>x</code>, <code>y</code>, <code>width</code> and <code>height</code> are in unsigned normalized [0-1] coordinates to decouple render target resolution from the viewport.</p>
</li>
<li>
<p><code>set_scissor_rects(uint32_t n_scissor_rects, const ScissorRect *scissor_rects);</code></p>
<ul>
<li><code>n_scissor_rects</code> - Number of scissor rectangles to bind</li>
<li><code>scissor_rects</code> - Pointer to the first <code>ScissorRect</code> to bind.</li>
</ul>
<pre><code>struct ScissorRect {
float x, y, width, height;
};
</code></pre>
<p>Note that <code>x</code>, <code>y</code>, <code>width</code> and <code>height</code> are in unsigned normalized [0-1] coordinates to decouple render target resolution from the scissor rectangle.</p>
</li>
</ul>
<h3><a id="A_bit_more_exotic_commands_132"></a>A bit more exotic commands</h3>
<ul>
<li><code>set_stream_out_target(uint32_t slot, RenderResource *resource, uint32_t offset);</code>
<ul>
<li><code>slot</code> - Which index of the stream out buffers to bind</li>
<li><code>resource</code> - Which <code>RenderResource</code> to bind to that slot (has to point to a <code>VertexStream</code>)</li>
<li><code>offset</code> - A byte offset describing where to begin writing in the buffer pointed to by <code>resource</code>.</li>
</ul>
</li>
<li><code>set_instance_multiplier(uint32_t multiplier);</code><br>
Allows the user to scale the number instances to render for each <code>render()</code> call (described below). This is a convenience function to make it easier to implement things like Instanced Stereo Rendering.</li>
</ul>
<h3><a id="Markers_140"></a>Markers</h3>
<ul>
<li><code>push_marker(const char *name)</code><br>
Starts a new marker scope named <code>name</code>. Marker scopes are both used for gathering <code>RenderDevice</code> statistics (number of draw calls, state switches and similar) as well as for creating GPU timing events. The user is free to nestle markers if they want to better group statistics. More on this in a later post.</li>
<li><code>pop_marker(const char *name)</code><br>
Stops an existing marker scope named <code>name</code>.</li>
</ul>
<h2><a id="2_Rendering_146"></a>2. Rendering</h2>
<p>With most state commands covered let’s move on and look at how to record commands for triggering draw calls and compute work to a <code>RenderContext</code>.</p>
<p>For that we have a single function called <code>render()</code>:</p>
<pre><code>class RenderContext
{
RenderJobPackage *render(const RenderJobPackage* job,
const ShaderTemplate::Context& shader_context, uint64_t interleave_sort_key = 0,
uint64_t shader_pass_branch_key = 0, float job_sort_depth = 0.f,
uint32_t gpu_affinity_mask = GPU_DEFAULT);
};
</code></pre>
<p><strong><code>job</code></strong></p>
<p>First argument piped to <code>render()</code> is a pointer to a <code>RenderJobPackage</code>, and as you can see the function also returns a pointer to a <code>RenderJobPackage</code>. What is going on here is that the <code>RenderJobPackage</code> piped as argument to <code>render()</code> gets copied to the <code>RenderPackageStream</code>, the copy gets patched up a bit and then a pointer to the modified copy is returned to allow the caller to do further tweaks to it. Ok, this probably needs some further explanation…</p>
<p>The <code>RenderJobPackage</code> is basically a header followed by an arbitrary length of data that together contains everything needed to make it possible for the <code>RenderDevice</code> to later translate it into either a draw call or a compute shader dispatch. In practice this means that after the <code>RenderJobPackage</code> header we also pack <code>RenderResource::render_resource_handle</code> for all resources to bind to all different shader stages as well as full representations of all non-global shader constant buffers.</p>
<p>Since we are building multiple <code>RenderContexts</code> in parallel and might be visiting the same renderable object (mesh, particle system, etc) simultaneously from multiple worker threads, we cannot mutate any state of the renderable when calling its <code>render()</code> function.</p>
<p>Typically all renderable objects have static prototypes of all <code>RenderJobPackages</code> they need to be drawn correctly (e.g. a mesh with three materials might have three <code>RenderJobPackages</code> - one per material). Naturally though, the renderable objects don’t know anything about in which context they will be drawn (e.g. from what camera or in what kind of lighting environment) up until the point where their <code>render()</code> function gets called and the information is provided. At that point their static <code>RenderJobPackages</code> prototypes somehow needs to be patched up with this information (which typically is in the form of shader constants and/or resources).</p>
<p>One way to handle that would be to create a copy of the prototype <code>RenderJobPackage</code> on the stack, patch up the stack copy and then pipe that as argument to <code>RenderContext::render()</code>. That is a fully valid approach and would work just fine, but since <code>RenderContext::render()</code> needs to create a copy of the <code>RenderJobPackage</code> anyway it is more efficient to patch up that copy directly instead. This is the reason for <code>RenderContext::render()</code> returning a pointer to the <code>RenderJobPackage</code> on the <code>RenderPackageStream</code>.</p>
<p>Before diving into the <code>RenderJobPackage</code> struct let’s go through the other arguments of <code>RenderContext::render()</code>:</p>
<p><strong><code>shader_context</code></strong></p>
<p>We will go through this in more detail in the post about our shader system but essentially we have an engine representation called <code>ShaderTemplate</code>, each <code>ShaderTemplate</code> has a number of <code>Contexts</code>.</p>
<p>A <code>Context</code> is basically a description of any rendering passes that needs to run for the <code>RenderJobPackage</code> to be drawn correctly when rendered in a certain “context”. E.g. a simple shader might declare two contexts: <em>“default”</em> and <em>“shadow”</em>. The <em>“default”</em> context would be used for regular rendering from a player camera, while the <em>“shadow”</em> context would be used when rendering into a shadow map.</p>
<p>What I call a “rendering pass” in this scenario is basically all shader stages (vertex, pixel, etc) together with any state blocks (rasterizer, depth stencil, blend, etc) needed to issue a draw call / dispatch a compute shader in the <code>RenderDevice</code>.</p>
<p><strong><code>interleave_sort_key</code></strong></p>
<p><code>RenderContext::render()</code> automatically figures out what sort keys / <code>Commands</code> it needs to create on it’s command array. Simple shaders usually only render into one layer in a single pass. In those scenarios <code>RenderContext::render()</code> will create a single <code>Command</code> on the command array. When using a more complex shader that renders into multiple layers and/or needs to render in multiple passes; more than one <code>Command</code> will be created, each command referencing the same <code>RenderJobPackage</code> in its <code>Command::head</code> pointer.</p>
<p>This can feel a bit abstract and is hard to explain without giving you the full picture of how the shader system works together with the data-driven rendering system which in turn dictates the bit allocation patterns of the sort keys, for now it’s enough to understand that the shader system somehow knows what <code>Commands</code> to create on the command array.</p>
<p>The shader author can also decide to bypass the data-driven rendering system and put the scheduling responsibility entirely in the hands of the caller of <code>RenderContext::render()</code>, in this case the sort key of all <code>Commands</code> created will simply become 0. This is where the <code>interleave_sort_key</code> comes into play, this variable will be bitwise ORed with the sort key before being stored in the <code>Command</code>.</p>
<p><strong><code>shader_pass_branch_key</code></strong></p>
<p>The shader system has a feature for allowing users to dynamically turn on/off certain rendering passes. Again this becomes somewhat abstract without providing the full picture but basically this system works by letting the shader author flag certain passes with a “tag”. A tag is simply a string that gets mapped to a bit within a 64 bit bit-mask. By bitwise ORing together multiple of these tags and piping the result in <code>shader_pass_branch_key</code> the user can control what passes to activate/deactivate when rendering the <code>RenderJobPackage</code>.</p>
<p><strong><code>job_sort_depth</code></strong></p>
<p>A signed normalized [0-1] floating point value used for controlling depth sorting between <code>RenderJobPackages</code>. As you will see in the next post this value simply gets mapped into a bit range of the sort key, removing the need for doing any kind of special trickery to manage things like back-to-front / front-to-back sorting of <code>RenderJobPackages</code>.</p>
<p><strong><code>gpu_affinity_mask</code></strong></p>
<p>Same as the <code>gpu_affinity_mask</code> parameter piped to <code>begin_state_command()</code>.</p>
<h3><a id="RenderJobPackage_204"></a><code>RenderJobPackage</code></h3>
<p>Let’s take a look at the actual <code>RenderJobPackage</code> struct:</p>
<pre><code>struct RenderJobPackage
{
BatchInfo batch_info;
#if defined(COMPUTE_SUPPORTED)
ComputeInfo compute_info;
#endif
uint32_t size; // size of entire package including extra data
uint32_t n_resources; // number of resources assigned to job.
uint32_t resource_offset; // offset from start of RenderJobPackage to first RenderResource.
uint32_t shader_resource_data_offset; // offset to shader resource data
RenderResource::Handle shader; // shader used to execute job
uint64_t instance_hash; // unique hash used for instance merging
#if defined(DEVELOPMENT)
ResourceID resource_tag; // debug tag associating job to a resource on disc
IdString32 object_tag; // debug tag associating job to an object
IdString32 batch_tag; // debug tag associating job to a sub-batch of an object
#endif
};
</code></pre>
<p><strong><code>batch_info</code> & <code>compute_info</code></strong></p>
<p>First two members are two nestled POD structs mainly containing the parameters needed for doing any kind of drawing or dispatching of compute work in the <code>RenderDevice</code>:</p>
<pre><code>struct BatchInfo
{
enum PrimitiveType {
TRIANGLE_LIST,
LINE_LIST
// ...
};
enum FrontFace {
COUNTER_CLOCK_WISE = 0,
CLOCK_WISE = 1
};
PrimitiveType primitive_type;
uint32_t vertex_offset; // Offset to first vertex to read from vertex buffer.
uint32_t primitives; // Number of primitives to draw
uint32_t index_offset; // Offset to the first index to read from the index buffer
uint32_t vertices; // Number of vertices in batch (used if batch isn't indexed)
uint32_t instances; // Number of instances of this batch to draw
FrontFace front_face; // Defines which triangle winding order
};
</code></pre>
<p>Most of these are self explanatory, I think the only thing worth pointing out is the <code>front_face</code> enum. This is here to dynamically handle flipping of the primitive winding order when dealing with objects that are negatively scaled on an uneven number of axes. For typical game content it’s rare that we see content creators using mesh mirroring when modeling, for other industries however it is a normal workflow.</p>
<pre><code>struct ComputeInfo
{
uint32_t thread_count[3];
bool async;
};
</code></pre>
<p>So while <code>BatchInfo</code> mostly holds the parameters needed to render something, <code>ComputeInfo</code> hold the parameters to dispatch a compute shader. The three element array <code>thread_count</code> containing the thread group count for x, y, z. If <code>async</code> is true the graphics API’s “compute queue” will be used instead of the “graphics queue”.</p>
<p><strong><code>resource_offset</code></strong></p>
<p>Byte offset from start of <code>RenderJobPackage</code> to an array of <code>n_resources</code> with <code>RenderResource::Handle</code>. Resources found in this array can be of the type <code>VertexStream</code>, <code>IndexStream</code> or <code>VertexDeclaration</code>. Based on the their type and order in the array they get bound to the input assembler stage in the <code>RenderDevice</code>.</p>
<p><strong><code>shader_resource_data_offset</code></strong></p>
<p>Byte offset from start of <code>RenderJobPackage</code> to a data block holding handles to all <code>RenderResources</code> as well as all constant buffer data needed by all the shader stages. The layout of this data blob will be covered in the post about the shader system.</p>
<p><strong><code>instance_hash</code></strong></p>
<p>We have a system for doing what we call “instance merging”, this system figures out if two <code>RenderJobPackages</code> only differ on certain shader constants and if so merges them into the same draw call. The shader author is responsible but not required to implement support for this feature. If the shader supports “instance merging” the system will use the <code>instance_hash</code> to figure out if two <code>RenderJobPackages</code> can be merged or not. Typically the <code>instance_hash</code> is simply a hash of all <code>RenderResource::Handle</code> that the shader takes as input.</p>
<p><strong><code>resource_tag</code> & <code>object_tag</code> & <code>batch_tag</code></strong></p>
<p>Three levels of debug information to make it easier to back track errors/warning inside the <code>RenderDevice</code> to the offending content.</p>
<h2><a id="3_Resource_updates_291"></a>3. Resource updates</h2>
<p>The last type of commands are for dynamically updating various <code>RenderResources</code> (Vertex/Index/Raw buffers, Textures, etc).</p>
<p>The interface for updating a buffer with new data looks like this:</p>
<pre><code>class RenderContext
{
void *map_write(RenderResource *resource, render_sorting::SortKey sort_key,
const ShaderTemplate::Context* shader_context = 0,
shader_pass_branching::Flags shader_pass_branch_key = 0,
uint32_t gpu_affinity_mask = GPU_DEFAULT);
};
</code></pre>
<p><strong><code>resource</code></strong></p>
<p>This function basically returns a pointer to the first byte of the buffer that will replace the contents of the <code>resource</code>. <code>map_write()</code> figures out the size of the buffer by casting the <code>resource</code> to the correct type (using the type information encoded in the <code>RenderResource::render_resource_handle</code>). It then allocates memory for the buffer and a small header on the <code>RenderPackageStream</code> and returns a pointer to the buffer.</p>
<p><strong><code>sort_key</code> & <code>shader_context</code> & <code>shader_pass_branch_key</code></strong></p>
<p>In some rare situations you might need to update the same buffer with different data multiple times within a frame. A typical example could be the vertex buffer of a particle system implementing some kind of level-of-detail system causing the buffers to change depending on e.g camera position. To support that the user can provide a bunch of extra parameters to make sure the contents of the GPU representation of the buffer is updated right before the graphics API draw calls are triggered for the different rendering passes. This works in a similar way how <code>RenderContext::render()</code> can create multiple <code>Commands</code> on the command array referencing the same data.</p>
<p>Unless you need to update the buffer multiple times within the frame it is safe to just set all of the above mentioned parameters to 0, making it very simple to update a buffer:</p>
<pre><code>void *buf = rc.map_write(resource, 0);
// .. fill bits in buffer ..
</code></pre>
<p>Note: To shorten the length of this post I’ve left out a few other flavors of updating resources, but <code>map_write</code> is the most important one to grasp.</p>
<h1><a id="GPU_Queues_Fences__Explicit_MGPU_programming_325"></a>GPU Queues, Fences & Explicit MGPU programming</h1>
<p>Before wrapping up I’d like to touch on a few recent additions to the Stingray renderer, namely how we’ve exposed control for dealing with different GPU Queues, how to synchronize between them and how to control, communicate and synchronize between multiple GPUs.</p>
<p>New graphics APIs such as DX12 and Vulkan exposes three different types of command queues: <em>Graphics</em>, <em>Compute</em> and <em>Copy</em>. There’s plenty of information on the web about this so I won’t cover it here, the only thing important to understand is that these queues can execute asynchronously on the GPU; hence we need to have a way to synchronize between them.</p>
<p>To handle that we have exposed a simple fence API that looks like this:</p>
<pre><code>class RenderContext
{
struct FenceMessage
{
enum Operation { SIGNAL, WAIT };
Operation operation;
IdString32 fence_name;
};
void signal_fence(IdString32 fence_name, render_sorting::SortKey sort_key,
uint32_t queue = GRAPHICS_QUEUE, uint32_t gpu_affinity_mask = GPU_DEFAULT);
void wait_fence(IdString32 fence_name, render_sorting::SortKey sort_key,
uint32_t queue = GRAPHICS_QUEUE, uint32_t gpu_affinity_mask = GPU_DEFAULT);
};
</code></pre>
<p>Here’s a pseudo code snippet showing how to synchronize between the graphics queue and the compute queue:</p>
<pre><code>uint64_t sort_key = 0;
// record a draw call
rc.render(graphics_job, graphics_shader, sort_key++);
// record an asynchronous compute job
// (ComputeInfo::async bool in async_compute_job is set to true to target the graphics APIs compute queue)
rc.render(async_compute_job, compute_shader, sort_key++);
// now lets assume the graphics queue wants to use the result of the async_compute_job,
// for that we need to make sure that the compute shader is done running
rc.wait_fence(IdString32("compute_done"), sort_key++, GRAPHICS_QUEUE);
rc.signal_fence(IdString32("compute_done"), sort_key++, COMPUTE_QUEUE);
rc.render(graphics_job_using_result_from_compute, graphics_shader2, sort_key++);
</code></pre>
<p>As you might have noticed all methods for populating a <code>RenderContext</code> described in this post also takes an extra parameter called <code>gpu_affinity_mask</code>. This is a bit-mask used for directing commands to one or many GPUs. The idea is simple, when we boot up the renderer we enumerate all GPUs present in the system and decide which one to use as our default GPU (<code>GPU_DEFAULT</code>) and assign that to bit 1. We also let the user decide if there are other GPUs present in the system that should be available to Stingray and if so assign them bit 2, 3, 4, and so on. By doing so we can explicitly direct control of all commands put on the <code>RenderContext</code> to one or many GPUs in a simple way.</p>
<p>As you can see that is also true for the fence API described above, on top of that there’s also a need for a copy interface to copying resources between GPUs:</p>
<pre><code>class RenderContext
{
void copy(RenderResource *dst_resource, RenderResource *src_resource,
render_sorting::SortKey sort_key, Box *src_box = 0, uint32_t dst_offsets[3] = 0,
uint32_t queue = GRAPHICS_QUEUE, uint32_t gpu_affinity_mask = GPU_DEFAULT,
uint32_t gpu_source = GPU_DEFAULT, uint32_t gpu_destination = GPU_DEFAULT);
};
</code></pre>
<p>Even though this work isn’t fully completed I still wanted to share the high-level idea of what we are working towards for exposing explicit MGPU control to the Stingray renderer. We are actively working on this right now and with some luck I might be able to revisit this with more concrete examples when getting to the post about the <em>render_config</em> & data-driven rendering.</p>
<h1><a id="Next_up_386"></a>Next up</h1>
<p>With that I think I’ve covered the most important aspects of the <code>RenderContext</code>. Next post will dive a bit deeper into bit allocation ranges of the sort keys and the system for sorting in general, hopefully that post will become a bit shorter.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com55tag:blogger.com,1999:blog-1994130783874175266.post-76513632512585302452017-02-01T21:07:00.002+01:002017-02-01T21:23:58.784+01:00Stingray Renderer Walkthrough #2: Resources & Resource Contexts<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #2: Resources & Resource Contexts</title><style></style></head><body id="preview">
<h1><a id="Render_Resources_0"></a>Render Resources</h1>
<p>Before any rendering can happen we need a way to reason about GPU resources. Since we want all graphics API specific code to stay isolated we need some kind of abstraction on the engine side, for that we have an interface called <code>RenderDevice</code>. All calls to graphics APIs like D3D, OGL, GNM, Metal, etc. stays behind this interface. We will be covering the <code>RenderDevice</code> in a later post so for now just know that it is there.</p>
<p>We want to have a graphics API agnostic representation for a bunch of different types of resources and we need to link these representations to their counterparts on the <code>RenderDevice</code> side. This linking is handled through a POD-struct called <code>RenderResource</code>:</p>
<pre><code>struct RenderResource
{
enum {
TEXTURE, RENDER_TARGET, DEPENDENT_RENDER_TARGET, BACK_BUFFER_WRAPPER,
CONSTANT_BUFFER, VERTEX_STREAM, INDEX_STREAM, RAW_BUFFER,
BATCH_INFO, VERTEX_DECLARATION, SHADER,
NOT_INITIALIZED = 0xFFFFFFFF
};
uint32_t render_resource_handle;
};
</code></pre>
<p>Any engine resource that also needs a representation on the RenderDevice side inherits from this struct. It contains a single member <code>render_resource_handle</code> which is used to lookup the correct graphics API specific representation in the RenderDevice.</p>
<p>The most significant 8 bits of <code>render_resource_handle</code> holds the type enum, the lower 24 bits is simply an index into an array for that specific resource type inside the RenderDevice.</p>
<h1><a id="Various_Render_Resources_24"></a>Various Render Resources</h1>
<p>Let’s take a look at the different render resource that can be found in Stingray:</p>
<ul>
<li><code>Texture</code> - A regular texture, this object wraps all various types of different texture layouts such as 2D, Cube, 3D.</li>
<li><code>RenderTarget</code> - Basically the same as <code>Texture</code> but writable from the GPU.</li>
<li><code>DependentRenderTarget</code> - Similar to <code>RenderTarget</code> but with logics for inheriting properties from another <code>RenderTarget</code>. This is used for creating render targets that needs to be reallocated when the output window (swap chain) is being resized.</li>
<li><code>BackBufferWrapper</code> - Special type of <code>RenderTarget</code> created inside the <code>RenderDevice</code> as part of the swap chain creation. Almost all render targets are explicitly created by the user, this is the only exception as the back buffer associated with the swap chain is typically created together with the swap chain.</li>
<li><code>ShaderConstantBuffer</code> - Shader constant buffers designed for explicit update and sharing between multiple shaders, mainly used for “view-global” state.</li>
<li><code>VertexStream</code> - A regular Vertex Buffer.</li>
<li><code>VertexDeclaration</code> - Describes the contents of one or many <code>VertexStreams</code>.</li>
<li><code>IndexStream</code> - A regular Index Buffer.</li>
<li><code>RawBuffer</code> - A linear memory buffer, can be setup for GPU writing through an UAV (Unordered Access View).</li>
<li><code>Shader</code> - For now just think of this as something containing everything needed to build a full pipeline state object (PSO). Basically a wrapper over a number of shaders, render states, sampler states etc. I will cover the shader system in a later post.</li>
</ul>
<p>Most of the above resources have a few things in common:</p>
<ul>
<li>They describe a buffer either populated by the CPU or by the GPU</li>
<li>CPU populated buffers has a validity field describing its update frequency:
<ul>
<li><code>STATIC</code> - The buffer is immutable and won’t change after creation, typically most buffers coming from DCC assets are <code>STATIC</code>.</li>
<li><code>UPDATABLE</code> - The buffer can be updated but changes less than once per frame, e.g: UI elements, post processing geometry and similar.</li>
<li><code>DYNAMIC</code> - The buffer frequently changes, at least once per frame but potentially many times in a single frame e.g: particle systems.</li>
</ul>
</li>
<li>They have enough data for creating a graphics API specific representation inside the RenderDevice, i.e they know about strides, sizes, view requirements (e.g should an UAV be created or not), etc.</li>
</ul>
<h1><a id="Render_Resource_Context_48"></a>Render Resource Context</h1>
<p>With the <code>RenderResource</code> concept sorted, we’ll go through the interface for creating and destroying the <code>RenderDevice</code> representation of the resources. That interface is called <code>RenderResourceContext</code> (RRC).</p>
<p>We want resource creation to be thread safe and while the <code>RenderResourceContext</code> in itself isn’t, we can achieve free threading by allowing the user to create any number of RRC’s they want, and as long as they don’t touch the same RRC from multiple threads everything will be fine.</p>
<p>Similar to many other rendering systems in Stingray the RRC is basically just a small helper class wrapping an abstract “command buffer”. On this command buffer we put what we call “packages” describing everything that is needed for creating/destroying <code>RenderResource</code> objects. These packages have variable length depending on what kind of object they represent. In addition to that the RRC can also hold platform specific allocators that allow allocating/deallocating GPU mapped memory directly, avoiding any additional memory shuffling in the <code>RenderDevice</code>. This kind of mechanism allows for streaming e.g textures and other immutable buffers directly into GPU memory on platforms that provides that kind of low-level control.</p>
<p>Typically the only two functions the user need to care about are:</p>
<pre><code>class RenderResourceContext
{
public:
void alloc(RenderResource *resource);
void dealloc(RenderResource *resource);
};
</code></pre>
<p>When the user is done allocating/deallocating resources they hand over the RRC either directly to the <code>RenderDevice</code> or to the <code>RenderInterface</code>.</p>
<pre><code>class RenderDevice
{
public:
virtual void dispatch(uint32_t n_contexts, RenderResourceContext **rrc, uint32_t gpu_affinity_mask = RenderContext::GPU_DEFAULT) = 0;
};
</code></pre>
<p>Handing it over directly to the <code>RenderDevice</code> requires the caller to be on the controller thread for rendering as <code>RenderDevice::dispatch()</code> isn’t thread safe. If the caller is on any other thread (like e.g. one of the worker threads or the resource streaming thread) <code>RenderInterface::dispatch()</code> should be used instead. We will cover the <code>RenderInterface</code> in a later post so for now just think of it as a way of piping data into the renderer from an arbitrary thread.</p>
<h1><a id="Wrap_up_80"></a>Wrap up</h1>
<p>The main reason of having the <code>RenderResourceContext</code> concept instead of exposing <code>allocate()/deallocate()</code> functions directly in the <code>RenderDevice/RenderInterface</code> interfaces is for efficiency. We have a need for allocating and deallocating lots of resources, sometimes in parallel from multiple threads. Decoupling the interface for doing so makes it easy to schedule when in the frame the actual RenderDevice representations gets created, it also makes the code easier to maintain as we don’t have to worry about thread-safety of the <code>RenderResourceContext</code>.</p>
<p>In the next post we will discuss the <code>RenderJobs</code> and <code>RenderContexts</code> which are the two main building blocks for creating and scheduling draw calls and state changes.</p>
<p>Stay tuned.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com140tag:blogger.com,1999:blog-1994130783874175266.post-25942089989920372242017-02-01T21:07:00.001+01:002017-02-01T21:23:41.809+01:00Stingray Renderer Walkthrough #1: Overview<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough #1: Overview</title><style></style></head><body id="preview">
<h1><a id="Introduction_0"></a>Introduction</h1>
<p>When we started writing Bitsquid back in mid 2009 all platforms we intended to run on were already multi-core architectures. This and the fact that we had some prior experience trying to get our last engine to run efficiently on the PS3 answered the question how <em>not</em> to architecture an efficient renderer that scales to many cores. We knew we needed more than functional parallelism, we wanted data-parallelism.</p>
<p>To solve that we divide the CPU view of a rendered frame into three stages:</p>
<ol>
<li><code>Culling</code> - Filter out visible renderable objects with respect to a camera from a potentially huge set of different type of objects (meshes, particle systems, lights, etc).</li>
<li><code>Render</code> - Iterate over the filtered result from <code>Culling</code> and “record” an intermediate representation of draw calls/state switches to a command buffer.</li>
<li><code>Dispatch</code> - Take result from <code>Render</code> and translate that into actual render API calls (D3D, OGL, Metal, GNM, etc).</li>
</ol>
<p>As you can see each stage pipes its result into the next. Rendering is typically very simple in that sense; we tend to have a one way flow of our data: [[user input or time affects state, state propagates into changes of the renderable objects (transforms, shader constants, etc), figure out what need to be rendered, iterate over that and finally generate render API calls. Rinse & Repeat :]]</p>
<p>If we ignore the problem of ordering the final API calls in the rendering backend it’s fairly easy to see how we can achieve data parallelism in this scenario. Just fork at each stage splitting the workload into a <em>n</em>-chunks (where <em>n</em> is however many worker threads you can throw at it). When all workers are done for a stage take the result and pipe into the next stage.</p>
<p>In essence this is how all rendering in Stingray works. Obviously I’ve glanced over some rather important and challenging details but as you will see they are not too hard to solve if you have good control over your data flows and are picky about when mutation of the data happens.</p>
<h1><a id="Design_Philosophies__Concepts_16"></a>Design Philosophies & Concepts</h1>
<p>The rendering code in Stingray tends to be heavily influenced by Data Oriented Programming principles. When designing new systems our biggest efforts usually goes into structuring our data efficiently and thinking about its flow through the systems, more so than writing the actual code that transforms the data from one form to another.</p>
<p>To achieve data-parallelism throughout the rendering code the first thing to realize is that we have to be very picky about when mutation of the renderable objects happens. Multiple worker threads will run over our objects and its not unlikely that more than one thread visits the same object at the same time, hence we must not mutate the state of our objects in its render function. Therefore all of our <code>render()</code> functions are <code>const</code>.</p>
<p>To further guard ourselves from the outer world (i.e gameplay, physics, etc) the renderer operates in complete isolation from the game logics. It has its own representation of the data it needs, and only the data relevant for rendering. While the gameplay logics usually wants to reason about high-level concepts such as game entities (which basically groups a number of meshes, particle systems, lights, etc together), we on the rendering side don’t really care about that. We are much more interested in just having an array of all renderable objects in a game world, in a memory layout that makes it efficient to access.</p>
<p>Another nice thing with decoupling the representation of the renderable objects from the game objects is that it allows us to run simulation in parallel with rendering (functional parallelism). So while simulation is updating frame <em>n</em> the renderer is processing frame <em>n-1</em>. Some of you might argue that overlaying rendering on top of simulation doesn’t give any performance improvements if the work in all systems is nicely parallelized. In reality though this isn’t really the case. We still have systems that don’t go wide, or have certain sections where they need to do synchronous processing (last generation graphics APIs: e.g DX11, OpenGL are good examples). This creates bubbles in the frame slowing us down.</p>
<p>By overlaying simulation and rendering we get a form of bubble filling among the worker threads which in most cases gives a big enough speed improvement to justify the added complexity that comes from this architecture. More specifically:</p>
<ol>
<li>Double buffering of state - since the simulation might mutate the state of an object for frame <code>n</code> at the same time as the renderer is processing frame <code>n-1</code> any mutable state needs to be double buffered.</li>
<li>Life scope tracking of immutable data - while immutable/read only state such as static vertex and index buffers are safe to read by both simulation and renderer we still need to be careful not pulling the rug under the renderers feet by freeing anything still being in use by the renderer.</li>
</ol>
<p>Here’s a conceptual graph showing the benefits of overlaying simulation and rendering:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVODNgfSOk6I1R9QaZMMQub_p6641HgBSSw3DUFl88xdsy4vF5cGtRWiRlIcy3myTm_lN15bi84NRqP6IqpVk0XaME3fsbCiyrul6LhCy7s-59Vr7aO8Qcm5ib748j6OVUXi8JEb74EYU/s1600/frame-overlaying.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVODNgfSOk6I1R9QaZMMQub_p6641HgBSSw3DUFl88xdsy4vF5cGtRWiRlIcy3myTm_lN15bi84NRqP6IqpVk0XaME3fsbCiyrul6LhCy7s-59Vr7aO8Qcm5ib748j6OVUXi8JEb74EYU/s1600/frame-overlaying.png" width="640" height="111" /></a></div>
<p>So basically what we got here is two “controller threads”: <code>simulation</code> and <code>render</code> both offloading work to the worker threads. In the case that a controller thread is blocked waiting for some work to finish it will assist the worker threads striving to never sit idle. One thing to note is that to prevent frames from stacking up, we never allow the simulation thread to run more than one frame ahead of the render thread.</p>
<p>As a comparison here’s the same workload with simulation and rendering running in sequence.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjevYgDYrKO7FAn_lQf8_9Hq_hbxzlvgJ3FrPhhx0erBzIHhgPtGcrdZsEdvz1AQtSkuqXFykk1GqNIyhXge7jMd7uLuXjagaGbIaQFstvp4PLU7BCWYkRO6fg9Afz-u2V3K5vb0q3SSwA/s1600/sequential-simulation-rendering.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjevYgDYrKO7FAn_lQf8_9Hq_hbxzlvgJ3FrPhhx0erBzIHhgPtGcrdZsEdvz1AQtSkuqXFykk1GqNIyhXge7jMd7uLuXjagaGbIaQFstvp4PLU7BCWYkRO6fg9Afz-u2V3K5vb0q3SSwA/s1600/sequential-simulation-rendering.png" width="640" height="111" /></a></div>
<p>As you can see we get significantly more idle time (bubbles) on the worker threads due to certain parts of both the simulation and rendering not being able to go wide.</p>
<h1><a id="Next_up_43"></a>Next up</h1>
<p>I think this pretty much covers the high level view of the core rendering architecture in Stingray. Now lets go into some more detail.</p>
<p>Since Andreas Asplund recently covered both how we handle propagation of state from simulation to the renderer (we call this “State reflection” in Stingray): <a href="http://bitsquid.blogspot.se/2016/09/state-reflection.html">http://bitsquid.blogspot.se/2016/09/state-reflection.html</a> as well as how our view frustum culling system(s) works: <a href="http://bitsquid.blogspot.se/2016/10/the-implementation-of-frustum-culling.html">http://bitsquid.blogspot.se/2016/10/the-implementation-of-frustum-culling.html</a> I won’t be covering that in this series.</p>
<p>Instead I will jump straight into how creating and destroying GPU resources works, and from there go through all the building blocks needed to implement the second stage <code>Render</code> mentioned above.</p>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com50tag:blogger.com,1999:blog-1994130783874175266.post-3867669073677532912017-02-01T21:07:00.000+01:002017-03-14T10:29:37.820+01:00Stingray Renderer Walkthrough<!DOCTYPE html><html><head><meta charset="utf-8"><title>Stingray Renderer Walkthrough</title><style></style></head><body id="preview">
<h1><a id="Welcome_0"></a>Welcome</h1>
<p>To simplify knowledge transferring inside the Autodesk development teams and in an attempt to improve my writing skills I’ve decided to do a walkthrough of the Stingray rendering architecture. The idea is to do this as a series of blog posts over the coming weeks starting from the low-level aspects of the renderer chewing my way up to more high-level concepts as I go.</p>
<p>I’ve covered some of these topics before in various presentations over the years but those have been more focused on how our data driven aspects of the renderer works and less on the core architecture behind it. This is an attempt to do a more complete walk-through of the entire rendering architecture.</p>
<p>When I started thinking about this it felt like an almost impossible undertaking considering how much slower I am at expressing myself in text than in code, but after spending a couple of days going through the entire stingray code base doing some spring cleaning it felt a bit more manageable so I’ve now decided to give it a try.</p>
<p>(Note: this has nothing at all to do with me feeling the pressure from Niklas Frykholm who’s currently doing a complete walk-through of the entire Stingray engine code base (well everything except rendering) as a series of youtube videos [<a href="https://www.youtube.com/playlist?list=PLUxuJBZBzEdxzVpoBQY9agA8JUgNkeYSV">1</a>]. Not at all… I feel no pressure, no guilt, nothing… I promise… Thanks Niklas for pushing me!)</p>
<h1><a id="Outline_10"></a>Outline</h1>
<p>Below is some kind of outline of what I intend to cover and in what order, I might swap things around as I go if I discover it makes more sense. This post will work as an index and I will link to the posts as they come online.</p>
<ol>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-1-overview.html">Overview</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-2.html">Resources & Resource Contexts</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-3-render.html">Render Contexts</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-4-sorting.html">Sorting</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-5.html">RenderDevice</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough-6.html">RenderInterface</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/03/stingray-renderer-walkthrough-7-data.html">Data-driven rendering</a></li>
<li><a href="http://bitsquid.blogspot.com/2017/03/stingray-renderer-walkthrough-8.html">Stingray-renderer & Mini-renderer</a></li>
<li>Shaders & Materials</li>
</ol>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com380tag:blogger.com,1999:blog-1994130783874175266.post-3535974906423371812016-10-04T21:40:00.000+02:002016-10-04T21:40:17.513+02:00The Implementation of Frustum Culling in Stingray<h1>Overview</h1>
<p>Frustum culling can be an expensive operation. Stingray accelerates it by making heavy use of SIMD and distributing the workload over several threads. The basic workflow is:</p>
<ul>
<li>Kick jobs to do frustum vs sphere culling
<ul>
<li>For each frustum plane, test plane vs sphere</li>
</ul>
</li>
<li>Wait for sphere culling to finish</li>
<li>For objects that pass sphere test, kick jobs to do frustum vs object-oriented bounding box (OOBB) culling
<ul>
<li>For each frustum plane, test plane vs OOBB</li>
</ul>
</li>
<li>Wait for OOBB culling to finish</li>
</ul>
<p>Frustum vs sphere tests are significantly faster than frustum vs OOBB. By rejecting objects that fail sphere culling first, we have fewer objects to process in the more expensive OOBB pass.</p>
<p>Why go over all objects brute force instead of using some sort of spatial partition data structure? We like to keep things simple and with the current setup we have yet to encounter a case where we've been bound by the culling. Brute force sphere culling followed by OOBB culling is fast enough for all cases we've encountered so far. That might of course change in the future, but we'll take care of that when it's an actual problem.</p>
<p>The brute force culling is pretty fast, because:</p>
<ol>
<li>The sphere and the OOBB culling use SIMD and only load the minimum amount of needed data.</li>
<li>The workload is distributed over several threads.</li>
</ol>
<p>In this post, I we will first look at the single threaded SIMD code and then how the culling is distributed over multiple threads.</p>
<p>I'll use a lot of code to show how it's all done. It's mostly actual code from the engine, but it has been cleaned up to a certain extent. Some stuff has been renamed and/or removed to make it easier to understand what's going on.</p>
<h1>Data structures used</h1>
<p>If you go back to my previous post about state reflection, <a href="http://bitsquid.blogspot.com/2016/09/state-reflection.html" title="State reflection">http://bitsquid.blogspot.ca/2016/09/state-reflection.html</a> you can read that each object on the main thread is associated with a render thread representation via a <code>render_handle</code>. The <code>render_handle</code> is used to get the <code>object_index</code> which is the index of an object in the <code>_objects</code> array. </p>
<p>Take a look at the following code for a refresher:</p>
<pre><code>void RenderWorld::create_object(WorldRenderInterface::ObjectManagementPackage *omp)
{
// Acquire an `object_index`.
uint32_t object_index = _objects.size();
// Same recycling mechanism as seen for render handles.
if (_free_object_indices.any()) {
object_index = _free_object_indices.back();
_free_object_indices.pop_back();
} else {
_objects.resize(object_index + 1);
_object_types.resize(object_index + 1);
}
void *render_object = omp->user_data;
if (omp->type == RenderMeshObject::TYPE) {
// Cast the `render_object` to a `MeshObject`.
RenderMeshObject *rmo = (RenderMeshObject*)render_object;
// If needed, do more stuff with `rmo`.
}
// Store the `render_object` and `type`.
_objects[object_index] = render_object;
_object_types[object_index] = omp->type;
if (omp->render_handle >= _object_lut.size())
_object_lut.resize(omp->handle + 1);
// The `render_handle` is used
_object_lut[omp->render_handle] = object_index;
}
</code></pre>
<p>The <code>_objects</code> array stores objects of all kinds of different types. It is defined as:</p>
<pre><code>Array<void*> _objects;
</code></pre>
<p>The types of the objects are stored in a corresponding <code>_object_types</code> array, defined as:</p>
<pre><code>Array<uint32_t> _object_types;
</code></pre>
<p>From <code>_object_types</code>, we know the actual type of the objects and we can use that to cast the <code>void *</code> into the proper type (mesh, terrain, gui, particle_system, etc).</p>
<p>The culling happens in the <code>// If needed, do more stuff with rmo</code> section above. It looks like this:</p>
<pre><code>void *render_object = omp->user_data;
if (omp->type == RenderMeshObject::TYPE) {
// Cast the `render_object` to a `MeshObject`.
RenderMeshObject *rmo = (RenderMeshObject*)render_object;
// If needed, do more stuff with `rmo`.
if (!(rmo->flags() & renderable::CULLING_DISABLED)) {
culling::Object o;
// Extract necessary information to do culling.
// The index of the object.
o.id = object_index;
// The type of the object.
o.type = rmo->type;
// Get the mininum and maximum corner positions of a boudning box in object space.
o.min = float4(rmo->bounding_volume().min, 1.f);
o.max = float4(rmo->bounding_volume().max, 1.f);
// World transform matrix.
o.m = float4x4(rmo->world());
// Depending on the value of `flags` add the culling representation to different culling sets.
if (rmo->flags() & renderable::VIEWPORT_VISIBLE)
_cullable_objects.add(o, rmo->node());
if (rmo->flags() & renderable::SHADOW_CASTER)
_cullable_shadow_casters.add(o, rmo->node());
if (rmo->flags() & renderable::OCCLUDER)
_occluders.add(o, rmo->node());
}
}
</code></pre>
<p>For culling <code>MeshObject</code>s and other cullable types are represented by <code>culling::Object</code>s that are used to populate the culling data structures. As can be seen in the code they are <code>_cullable_objects</code>, <code>_cullable_shadow_casters</code> and <code>_occluders</code> and they are all represented by an <code>ObjectSet</code>:</p>
<pre><code>struct ObjectSet
{
// Minimum bounding box corner position.
Array<float> min_x;
Array<float> min_y;
Array<float> min_z;
// Maximum bounding box corner position.
Array<float> max_x;
Array<float> max_y;
Array<float> max_z;
// Object->world matrix.
Array<float> world_xx;
Array<float> world_xy;
Array<float> world_xz;
Array<float> world_xw;
Array<float> world_yx;
Array<float> world_yy;
Array<float> world_yz;
Array<float> world_yw;
Array<float> world_zx;
Array<float> world_zy;
Array<float> world_zz;
Array<float> world_zw;
Array<float> world_tx;
Array<float> world_ty;
Array<float> world_tz;
Array<float> world_tw;
// World space center position of bounding sphere.
Array<float> ws_pos_x;
Array<float> ws_pos_y;
Array<float> ws_pos_z;
// Radius of bounding sphere.
Array<float> radius;
// Flag to indicate if an object is culled or not.
Array<uint32_t> visibility_flag;
// The type and id of an object.
Array<uint32_t> type;
Array<uint32_t> id;
uint32_t n_objects;
};
</code></pre>
<p>When an object is added to, e.g. <code>_cullable_objects</code> the <code>culling::Object</code> data is added to the <code>ObjectSet</code>. The <code>ObjectSet</code> flattens the data into a structure-of-arrays representation. The arrays are padded to the SIMD lane count to make sure there's valid data to read.</p>
<h1>Frustum-sphere culling</h1>
<p>The world space positions and sphere radii of objects are represented by the following members of the <code>ObjectSet</code>:</p>
<pre><code>Array<float> ws_pos_x;
Array<float> ws_pos_y;
Array<float> ws_pos_z;
Array<float> radius;
</code></pre>
<p>This is all we need to do frustum-sphere culling.</p>
<p>The frustum-sphere culling needs the planes of the frustum defined in world space. Information on how to find that can be found in: <a href="http://gamedevs.org/uploads/fast-extraction-viewing-frustum-planes-from-world-view-projection-matrix.pdf">http://gamedevs.org/uploads/fast-extraction-viewing-frustum-planes-from-world-view-projection-matrix.pdf</a>.</p>
<p>The frustum-sphere intersection code tests one plane against several spheres using SIMD instructions. The <code>ObjectSet</code> data is already laid out in a SIMD friendly way. To test one plane against several spheres, the plane's data is splatted out in the following way: </p>
<pre><code>// `float4` is our cross platform abstraction of SSE, NEON etc.
struct SIMDPlane
{
float4 normal_x; // the normal's x value replicted 4 times.
float4 normal_y; // the normal's y value replicted 4 times.
float4 normal_z; // etc.
float4 d;
};
</code></pre>
<p>The single threaded code needed to do frustum-sphere culling is:</p>
<pre><code>void simd_sphere_culling(const SIMDPlane planes[6], culling::ObjectSet &object_set)
{
const auto all_true = bool4_all_true();
const uint32_t n_objects = object_set.n_objects;
uint32_t *visibility_flag = object_set.visibility_flag.begin();
// Test each plane of the frustum against each sphere.
for (uint32_t i = 0; i < n_objects; i += 4)
{
const auto ws_pos_x = float4_load_aligned(&object_set->ws_pos_x[i]);
const auto ws_pos_y = float4_load_aligned(&object_set->ws_pos_y[i]);
const auto ws_pos_z = float4_load_aligned(&object_set->ws_pos_z[i]);
const auto radius = float4_load_aligned(&object_set->radius[i]);
auto inside = all_true;
for (unsigned p = 0; p < 6; ++p) {
auto &n_x = planes[p].normal_x;
auto &n_y = planes[p].normal_y;
auto &n_z = planes[p].normal_z;
auto n_dot_pos = dot_product(ws_pos_x, ws_pos_y, ws_pos_z, n_x, n_y, n_z);
auto plane_test_point = n_dot_pos + radius;
auto plane_test = plane_test_point >= planes[p].d;
inside = vector_and(plane_test, inside);
}
// Store 0 for spheres that didn't intersect or ended up on the positive side of the
// frustum planes. Store 0xffffffff for spheres that are visible.
store_aligned(inside, &visibility_flag[i]);
}
}
</code></pre>
<p>After the <code>simd_sphere_culling</code> call, the <code>visibility_flag</code> array contains <code>0</code> for all objects that failed the test and <code>0xffffffff</code> for all objects that passed. We chain this together with the OOBB culling by doing a compactness pass over the <code>visibility_flag</code> array and populating an <code>indirection</code> array:</p>
<pre><code>{
// Splat out the planes to be able to do plane-sphere test with SIMD.
const auto &frustum = camera.frustum();
const SIMDPlane planes[6] = {
float4_splat(frustum.planes[0].n.x),
float4_splat(frustum.planes[0].n.y),
float4_splat(frustum.planes[0].n.z),
float4_splat(frustum.planes[0].d),
float4_splat(frustum.planes[1].n.x),
float4_splat(frustum.planes[1].n.y),
float4_splat(frustum.planes[1].n.z),
float4_splat(frustum.planes[1].d),
float4_splat(frustum.planes[2].n.x),
float4_splat(frustum.planes[2].n.y),
float4_splat(frustum.planes[2].n.z),
float4_splat(frustum.planes[2].d),
float4_splat(frustum.planes[3].n.x),
float4_splat(frustum.planes[3].n.y),
float4_splat(frustum.planes[3].n.z),
float4_splat(frustum.planes[3].d),
float4_splat(frustum.planes[4].n.x),
float4_splat(frustum.planes[4].n.y),
float4_splat(frustum.planes[4].n.z),
float4_splat(frustum.planes[4].d),
float4_splat(frustum.planes[5].n.x),
float4_splat(frustum.planes[5].n.y),
float4_splat(frustum.planes[5].n.z),
float4_splat(frustum.planes[5].d),
};
// Do frustum-sphere culling.
simd_sphere_culling(planes, object_set);
// Make sure to align the size to the simd lane count.
const uint32_t n_aligned_objects = align_to_simd_lane_count(object_set.n_objects);
// Store the indices of the objects that passed the frustum-sphere culling in the `indirection` array.
Array<uint32_t> indirection(n_aligned_objects);
const uint32_t n_visible = remove_not_visible(object_set, object_set.n_objects, indirection.begin());
}
</code></pre>
<p>Where <code>remove_not_visible</code> is:</p>
<pre><code>uint32_t remove_not_visible(const ObjectSet &object_set, uint32_t count, uint32_t *output_indirection)
{
const uint32_t *visibility_flag = object_set.visibility_flag.begin();
uint32_t n_visible = 0U;
for (uint32_t i = 0; i < count; ++i) {
if (visibility_flag[i]) {
output_indirection[n_visible] = i;
++n_visible;
}
}
const uint32_t n_aligned_visible = align_to_simd_lane_count(n_visible);
const uint32_t last_visible = n_visible? output_indirection[n_visible- 1] : 0;
// Pad out to the simd alignment.
for (unsigned i = n_visible; i < n_aligned_visible; ++i)
output_indirection[i] = last_visible;
return n_visible;
}
</code></pre>
<p><code>n_visible</code> together with <code>indirection</code> provides the input for doing the frustum-OOBB culling on the objects that survived the frustum-sphere culling.</p>
<h1>Frustum-OOBB culling</h1>
<p>The frustum-OOBB culling takes ideas from Fabian Giesen's <a href="https://fgiesen.wordpress.com/2010/10/17/view-frustum-culling/">https://fgiesen.wordpress.com/2010/10/17/view-frustum-culling/</a> and Arseny Kapoulkine's <a href="http://zeuxcg.org/2009/01/31/view-frustum-culling-optimization-introduction/">http://zeuxcg.org/2009/01/31/view-frustum-culling-optimization-introduction/</a>.</p>
<p>More specifically we use the <code>Method 2: Transform box vertices to clip space, test against clip-space planes</code> that both Fabian and Arseny write about. But we also go with <code>Method 2b: Saving arithmetic ops</code> that Fabian mentions. I won't dwelve into how the culling actually works, to understand that please read their posts.</p>
<p>The code is SIMDified to process several OOBBs at the same time. The same corner of four multiple OOBBs is tested against one frustum plane as a single SIMD operation. </p>
<p>To be able to write the SIMD code in a more intuitive form a few data structures and functions are used:</p>
<pre><code>struct SIMDVector
{
float4 x; // stores x0, x1, x2, x3
float4 y; // stores y0, y1, y2, y3
float4 z; // etc.
float4 w;
};
</code></pre>
<p>A <code>SIMDVector</code> stores <code>x</code>, <code>y</code>, <code>z</code> & <code>w</code> for four objects. To store a matrix for four objects a <code>SIMDMatrix</code> is used:</p>
<pre><code>struct SIMDMatrix
{
SIMDVector x;
SIMDVector y;
SIMDVector z;
SIMDVector w;
};
</code></pre>
<p>A <code>SIMDMatrix</code>-<code>SIMDVector</code> multiplication can then be written as a regular matrix-vector multiplication:</p>
<pre><code>SIMDVector simd_multiply(const SIMDVector &v, const SIMDMatrix &m)
{
float4 x = v.x * m.x.x; x = v.y * m.y.x + x; x = v.z * m.z.x + x; x = v.w * m.w.x + x;
float4 y = v.x * m.x.y; y = v.y * m.y.y + y; y = v.z * m.z.y + y; y = v.w * m.w.y + y;
float4 z = v.x * m.x.z; z = v.y * m.y.z + z; z = v.z * m.z.z + z; z = v.w * m.w.z + z;
float4 w = v.x * m.x.w; w = v.y * m.y.w + w; w = v.z * m.z.w + w; w = v.w * m.w.w + w;
SIMDVector res = { x, y, z, w };
return res;
}
</code></pre>
<p>A <code>SIMDMatrix</code>-<code>SIMDMatrix</code> multiplication is:</p>
<pre><code>SIMDMatrix simd_multiply(const SIMDMatrix &lhs, const SIMDMatrix &rhs)
{
SIMDVector x = simd_multiply(lhs.x, rhs);
SIMDVector y = simd_multiply(lhs.y, rhs);
SIMDVector z = simd_multiply(lhs.z, rhs);
SIMDVector w = simd_multiply(lhs.w, rhs);
SIMDMatrix res = { x, y, z, w };
return res;
}
</code></pre>
<p>The code needed to do the actual frustum-OOBB culling is:</p>
<pre><code>void simd_oobb_culling(const SIMDMatrix &view_proj, const culling::ObjectSet &object_set, uint32_t n_objects, const uint32_t *indirection)
{
// Get pointers to the necessary members of the object set.
const float *min_x = object_set->min_x.begin();
const float *min_y = object_set->min_y.begin();
const float *min_z = object_set->min_z.begin();
const float *max_x = object_set->max_x.begin();
const float *max_y = object_set->max_y.begin();
const float *max_z = object_set->max_z.begin();
const float *world_xx = object_set->world_xx.begin();
const float *world_xy = object_set->world_xy.begin();
const float *world_xz = object_set->world_xz.begin();
const float *world_xw = object_set->world_xw.begin();
const float *world_yx = object_set->world_yx.begin();
const float *world_yy = object_set->world_yy.begin();
const float *world_yz = object_set->world_yz.begin();
const float *world_yw = object_set->world_yw.begin();
const float *world_zx = object_set->world_zx.begin();
const float *world_zy = object_set->world_zy.begin();
const float *world_zz = object_set->world_zz.begin();
const float *world_zw = object_set->world_zw.begin();
const float *world_tx = object_set->world_tx.begin();
const float *world_ty = object_set->world_ty.begin();
const float *world_tz = object_set->world_tz.begin();
const float *world_tw = object_set->world_tw.begin();
uint32_t *visibility_flag = object_set.visibility_flag.begin();
for (uint32_t i = 0; i < n_objects; i += 4) {
SIMDMatrix world;
// Load the world transform matrix for four objects via the indirection table.
const uint32_t i0 = indirection[i];
const uint32_t i1 = indirection[i + 1];
const uint32_t i2 = indirection[i + 2];
const uint32_t i3 = indirection[i + 3];
world.x.x = float4(world_xx[i0], world_xx[i1], world_xx[i2], world_xx[i3]);
world.x.y = float4(world_xy[i0], world_xy[i1], world_xy[i2], world_xy[i3]);
world.x.z = float4(world_xz[i0], world_xz[i1], world_xz[i2], world_xz[i3]);
world.x.w = float4(world_xw[i0], world_xw[i1], world_xw[i2], world_xw[i3]);
world.y.x = float4(world_yx[i0], world_yx[i1], world_yx[i2], world_yx[i3]);
world.y.y = float4(world_yy[i0], world_yy[i1], world_yy[i2], world_yy[i3]);
world.y.z = float4(world_yz[i0], world_yz[i1], world_yz[i2], world_yz[i3]);
world.y.w = float4(world_yw[i0], world_yw[i1], world_yw[i2], world_yw[i3]);
world.z.x = float4(world_zx[i0], world_zx[i1], world_zx[i2], world_zx[i3]);
world.z.y = float4(world_zy[i0], world_zy[i1], world_zy[i2], world_zy[i3]);
world.z.z = float4(world_zz[i0], world_zz[i1], world_zz[i2], world_zz[i3]);
world.z.w = float4(world_zw[i0], world_zw[i1], world_zw[i2], world_zw[i3]);
world.w.x = float4(world_tx[i0], world_tx[i1], world_tx[i2], world_tx[i3]);
world.w.y = float4(world_ty[i0], world_ty[i1], world_ty[i2], world_ty[i3]);
world.w.z = float4(world_tz[i0], world_tz[i1], world_tz[i2], world_tz[i3]);
world.w.w = float4(world_tw[i0], world_tw[i1], world_tw[i2], world_tw[i3]);
// Create the matrix to go from object->world->view->clip space.
const auto clip = simd_multiply(world, view_proj);
SIMDVector min_pos;
SIMDVector max_pos;
// Load the mininum and maximum corner positions of the bounding box in object space.
min_pos.x = float4(min_x[i0], min_x[i1], min_x[i2], min_x[i3]);
min_pos.y = float4(min_y[i0], min_y[i1], min_y[i2], min_y[i3]);
min_pos.z = float4(min_z[i0], min_z[i1], min_z[i2], min_z[i3]);
min_pos.w = float4_splat(1.0f);
max_pos.x = float4(max_x[i0], max_x[i1], max_x[i2], max_x[i3]);
max_pos.y = float4(max_y[i0], max_y[i1], max_y[i2], max_y[i3]);
max_pos.z = float4(max_z[i0], max_z[i1], max_z[i2], max_z[i3]);
max_pos.w = float4_splat(1.0f);
SIMDVector clip_pos[8];
// Transform each bounding box corner from object to clip space by sharing calculations.
simd_min_max_transform(clip, min_pos, max_pos, clip_pos);
const auto zero = float4_zero();
const auto all_true = bool4_all_true();
// Initialize test conditions.
auto all_x_less = all_true;
auto all_x_greater = all_true;
auto all_y_less = all_true;
auto all_y_greater = all_true;
auto all_z_less = all_true;
auto any_z_less = bool4_all_false();
auto all_z_greater = all_true;
// Test each corner of the oobb and if any corner intersects the frustum that object
// is visible.
for (unsigned cs = 0; cs < 8; ++cs) {
const auto neg_cs_w = negate(clip_pos[cs].w);
auto x_le = clip_pos[cs].x <= neg_cs_w;
auto x_ge = clip_pos[cs].x >= clip_pos[cs].w;
all_x_less = vector_and(x_le, all_x_less);
all_x_greater = vector_and(x_ge, all_x_greater);
auto y_le = clip_pos[cs].y <= neg_cs_w;
auto y_ge = clip_pos[cs].y >= clip_pos[cs].w;
all_y_less = vector_and(y_le, all_y_less);
all_y_greater = vector_and(y_ge, all_y_greater);
auto z_le = clip_pos[cs].z <= zero;
auto z_ge = clip_pos[cs].z >= clip_pos[cs].w;
all_z_less = vector_and(z_le, all_z_less);
all_z_greater = vector_and(z_ge, all_z_greater);
any_z_less = vector_or(z_le, any_z_less);
}
const auto any_x_outside = vector_or(all_x_less, all_x_greater);
const auto any_y_outside = vector_or(all_y_less, all_y_greater);
const auto any_z_outside = vector_or(all_z_less, all_z_greater);
auto outside = vector_or(any_x_outside, any_y_outside);
outside = vector_or(outside, any_z_outside);
auto inside = vector_xor(outside, all_true);
// Store the result in the `visibility_flag` array in a compacted way.
store_aligned(inside, &visibility_flag[i]);
}
}
</code></pre>
<p>The function <code>simd_min_max_transforms</code> used above is the function to transform each OOBB corner from object space to clip space by sharing some of the calculations, for completeness the function is: </p>
<pre><code>void simd_min_max_transform(const SIMDMatrix &m, const SIMDVector &min, const SIMDVector &max, SIMDVector result[])
{
auto m_xx_x = m.x.x * min.x; m_xx_x = m_xx_x + m.w.x;
auto m_xy_x = m.x.y * min.x; m_xy_x = m_xy_x + m.w.y;
auto m_xz_x = m.x.z * min.x; m_xz_x = m_xz_x + m.w.z;
auto m_xw_x = m.x.w * min.x; m_xw_x = m_xw_x + m.w.w;
auto m_xx_X = m.x.x * max.x; m_xx_X = m_xx_X + m.w.x;
auto m_xy_X = m.x.y * max.x; m_xy_X = m_xy_X + m.w.y;
auto m_xz_X = m.x.z * max.x; m_xz_X = m_xz_X + m.w.z;
auto m_xw_X = m.x.w * max.x; m_xw_X = m_xw_X + m.w.w;
auto m_yx_y = m.y.x * min.y;
auto m_yy_y = m.y.y * min.y;
auto m_yz_y = m.y.z * min.y;
auto m_yw_y = m.y.w * min.y;
auto m_yx_Y = m.y.x * max.y;
auto m_yy_Y = m.y.y * max.y;
auto m_yz_Y = m.y.z * max.y;
auto m_yw_Y = m.y.w * max.y;
auto m_zx_z = m.z.x * min.z;
auto m_zy_z = m.z.y * min.z;
auto m_zz_z = m.z.z * min.z;
auto m_zw_z = m.z.w * min.z;
auto m_zx_Z = m.z.x * max.z;
auto m_zy_Z = m.z.y * max.z;
auto m_zz_Z = m.z.z * max.z;
auto m_zw_Z = m.z.w * max.z;
{
auto xyz_x = m_xx_x + m_yx_y; xyz_x = xyz_x + m_zx_z;
auto xyz_y = m_xy_x + m_yy_y; xyz_y = xyz_y + m_zy_z;
auto xyz_z = m_xz_x + m_yz_y; xyz_z = xyz_z + m_zz_z;
auto xyz_w = m_xw_x + m_yw_y; xyz_w = xyz_w + m_zw_z;
result[0].x = xyz_x;
result[0].y = xyz_y;
result[0].z = xyz_z;
result[0].w = xyz_w;
}
{
auto Xyz_x = m_xx_X + m_yx_y; Xyz_x = Xyz_x + m_zx_z;
auto Xyz_y = m_xy_X + m_yy_y; Xyz_y = Xyz_y + m_zy_z;
auto Xyz_z = m_xz_X + m_yz_y; Xyz_z = Xyz_z + m_zz_z;
auto Xyz_w = m_xw_X + m_yw_y; Xyz_w = Xyz_w + m_zw_z;
result[1].x = Xyz_x;
result[1].y = Xyz_y;
result[1].z = Xyz_z;
result[1].w = Xyz_w;
}
{
auto xYz_x = m_xx_x + m_yx_Y; xYz_x = xYz_x + m_zx_z;
auto xYz_y = m_xy_x + m_yy_Y; xYz_y = xYz_y + m_zy_z;
auto xYz_z = m_xz_x + m_yz_Y; xYz_z = xYz_z + m_zz_z;
auto xYz_w = m_xw_x + m_yw_Y; xYz_w = xYz_w + m_zw_z;
result[2].x = xYz_x;
result[2].y = xYz_y;
result[2].z = xYz_z;
result[2].w = xYz_w;
}
{
auto XYz_x = m_xx_X + m_yx_Y; XYz_x = XYz_x + m_zx_z;
auto XYz_y = m_xy_X + m_yy_Y; XYz_y = XYz_y + m_zy_z;
auto XYz_z = m_xz_X + m_yz_Y; XYz_z = XYz_z + m_zz_z;
auto XYz_w = m_xw_X + m_yw_Y; XYz_w = XYz_w + m_zw_z;
result[3].x = XYz_x;
result[3].y = XYz_y;
result[3].z = XYz_z;
result[3].w = XYz_w;
}
{
auto xyZ_x = m_xx_x + m_yx_y; xyZ_x = xyZ_x + m_zx_Z;
auto xyZ_y = m_xy_x + m_yy_y; xyZ_y = xyZ_y + m_zy_Z;
auto xyZ_z = m_xz_x + m_yz_y; xyZ_z = xyZ_z + m_zz_Z;
auto xyZ_w = m_xw_x + m_yw_y; xyZ_w = xyZ_w + m_zw_Z;
result[4].x = xyZ_x;
result[4].y = xyZ_y;
result[4].z = xyZ_z;
result[4].w = xyZ_w;
}
{
auto XyZ_x = m_xx_X + m_yx_y; XyZ_x = XyZ_x + m_zx_Z;
auto XyZ_y = m_xy_X + m_yy_y; XyZ_y = XyZ_y + m_zy_Z;
auto XyZ_z = m_xz_X + m_yz_y; XyZ_z = XyZ_z + m_zz_Z;
auto XyZ_w = m_xw_X + m_yw_y; XyZ_w = XyZ_w + m_zw_Z;
result[5].x = XyZ_x;
result[5].y = XyZ_y;
result[5].z = XyZ_z;
result[5].w = XyZ_w;
}
{
auto xYZ_x = m_xx_x + m_yx_Y; xYZ_x = xYZ_x + m_zx_Z;
auto xYZ_y = m_xy_x + m_yy_Y; xYZ_y = xYZ_y + m_zy_Z;
auto xYZ_z = m_xz_x + m_yz_Y; xYZ_z = xYZ_z + m_zz_Z;
auto xYZ_w = m_xw_x + m_yw_Y; xYZ_w = xYZ_w + m_zw_Z;
result[6].x = xYZ_x;
result[6].y = xYZ_y;
result[6].z = xYZ_z;
result[6].w = xYZ_w;
}
{
auto XYZ_x = m_xx_X + m_yx_Y; XYZ_x = XYZ_x + m_zx_Z;
auto XYZ_y = m_xy_X + m_yy_Y; XYZ_y = XYZ_y + m_zy_Z;
auto XYZ_z = m_xz_X + m_yz_Y; XYZ_z = XYZ_z + m_zz_Z;
auto XYZ_w = m_xw_X + m_yw_Y; XYZ_w = XYZ_w + m_zw_Z;
result[7].x = XYZ_x;
result[7].y = XYZ_y;
result[7].z = XYZ_z;
result[7].w = XYZ_w;
}
}
</code></pre>
<p>To get a compact indirection array of all the objects that passed the frustum-OOBB culling, the <code>remove_not_visible</code> function needs to be slightly modified:</p>
<pre><code>uint32_t remove_not_visible(const ObjectSet &object_set, uint32_t count, uint32_t *output_indirection, const uint32_t *input_indirection/*new argument*/)
{
const uint32_t *visibility_flag = object_set.visibility_flag.begin();
uint32_t n_visible = 0U;
for (uint32_t i = 0; i < count; ++i) {
// Each element of `input_indirection` represents an object that has either been culled
// or not culled. If it's not null then do a lookup to get the actual object index else
// use `i` directly.
const uint32_t index = input_indirection? input_indirection[i] : i;
// `visibility_flag` is already compacted, so use `i` directly.
if (visibility_flag[i]) {
output_indirection[n_visible] = i;
++n_visible;
}
}
const uint32_t n_aligned_visible = align_to_simd_lane_count(n_visible);
const uint32_t last_visible = n_visible? output_indirection[n_visible- 1] : 0;
// Pad out to the simd alignment.
for (unsigned i = n_visible; i < n_aligned_visible; ++i)
output_indirection[i] = last_visible;
return n_visible;
}
</code></pre>
<p>Bringing the frustum-sphere and frustum-OOBB code together we get:</p>
<pre><code>{
// Splat out the planes to be able to do plane-sphere test with SIMD.
const auto &frustum = camera.frustum();
const SIMDPlane planes[6] = {
float4_splat(frustum.planes[0].n.x),
float4_splat(frustum.planes[0].n.y),
float4_splat(frustum.planes[0].n.z),
float4_splat(frustum.planes[0].d),
float4_splat(frustum.planes[1].n.x),
float4_splat(frustum.planes[1].n.y),
float4_splat(frustum.planes[1].n.z),
float4_splat(frustum.planes[1].d),
float4_splat(frustum.planes[2].n.x),
float4_splat(frustum.planes[2].n.y),
float4_splat(frustum.planes[2].n.z),
float4_splat(frustum.planes[2].d),
float4_splat(frustum.planes[3].n.x),
float4_splat(frustum.planes[3].n.y),
float4_splat(frustum.planes[3].n.z),
float4_splat(frustum.planes[3].d),
float4_splat(frustum.planes[4].n.x),
float4_splat(frustum.planes[4].n.y),
float4_splat(frustum.planes[4].n.z),
float4_splat(frustum.planes[4].d),
float4_splat(frustum.planes[5].n.x),
float4_splat(frustum.planes[5].n.y),
float4_splat(frustum.planes[5].n.z),
float4_splat(frustum.planes[5].d),
};
// Do frustum-sphere culling.
simd_sphere_culling(planes, object_set);
// Make sure to align the size to the simd lane count.
const uint32_t n_aligned_objects = align_to_simd_lane_count(object_set.n_objects);
// Store the indices of the objects that passed the frustum-sphere culling in the `indirection` array.
Array<uint32_t> indirection(n_aligned_objects);
const uint32_t n_visible = remove_not_visible(object_set, object_set.n_objects, indirection.begin(), nullptr);
const auto &view_proj = camera.view() * camera.proj();
// Construct the SIMDMatrix `simd_view_proj`.
const SIMDMatrix simd_view_proj = {
float4_splat(view_proj.v[xx]),
float4_splat(view_proj.v[xy]),
float4_splat(view_proj.v[xz]),
float4_splat(view_proj.v[xw]),
float4_splat(view_proj.v[yx]),
float4_splat(view_proj.v[yy]),
float4_splat(view_proj.v[yz]),
float4_splat(view_proj.v[yw]),
float4_splat(view_proj.v[zx]),
float4_splat(view_proj.v[zy]),
float4_splat(view_proj.v[zz]),
float4_splat(view_proj.v[zw]),
float4_splat(view_proj.v[tx]),
float4_splat(view_proj.v[ty]),
float4_splat(view_proj.v[tz]),
float4_splat(view_proj.v[tw]),
};
// Cull objects via frustum-oobb tests.
simd_oobb_culling(simd_view_proj, object_set, n_visible, indirection.begin());
// Build up the indirection array that represents the objects that survived the frustum-oobb culling.
const uint32_t n_oobb_visible = remove_not_visible(object_set, n_visible, indirection.begin(), indirection.begin());
}
</code></pre>
<p>The final call to <code>remove_not_visible</code> populates the <code>indirection</code> array with the objects that passed both the frustum-sphere and the frustum-OOBB culling. <code>indirection</code> together with <code>n_oobb_visible</code> is all that is needed to know what objects should be rendered.</p>
<h1>Distributing the work over several threads</h1>
<p>In Stingray, work is distributed by submitting jobs to a pool of worker threads -- conveniently called the <code>ThreadPool</code>. Submitted jobs are put in a thread safe work queue from which the worker threads pop jobs to work on. A task is defined as:</p>
<pre><code>typedef void (*TaskCallback)(void *user_data);
struct TaskDefinition
{
TaskCallback callback;
void *user_data;
};
</code></pre>
<p>For the purpose of this article, the interesting methods of the <code>ThreadPool</code> are:</p>
<pre><code>class ThreadPool
{
// Adds `count` tasks to the work queue.
void add_tasks(const TaskDefinition *tasks, uint32_t count);
// Tries to pop one task from the queue and do that work. Returns true if any work was done.
bool do_work();
// Will call `do_work` while `signal` == value.
void wait_atomic(std::atomic<uint32_t> *signal, uint32_t value);
};
</code></pre>
<p>The <code>ThreadPool</code> doesn't dictate how to synchronize when a job is fully processed, but usually a <code>std::atomic<uint32_t> signal</code> is used for that purpose. The value is <code>0</code> while the job is being processed and set to <code>1</code> when it's done. <code>wait_atomic()</code> is a convenience method that can be used to wait for such values:</p>
<pre><code>void ThreadPool::wait_atomic(std::atomic<uint32_t> *signal, uint32_t value)
{
while (signal->load(std::memory_order_acquire) == value) {
if (!do_work())
YieldProcessor();
}
}
</code></pre>
<p><code>do_work</code>:</p>
<pre><code>bool ThreadPool::do_work()
{
TaskDefinition task;
if (pop_task(task)) {
task.callback(task.user_data);
return true;
}
return false;
}
</code></pre>
<p>Multi-threading the culling only requires a few changes to the code. For the <code>simd_sphere_culling()</code> method we need to add <code>offset</code> and <code>count</code> parameters to specify the range of objects we are processing:</p>
<pre><code>void simd_sphere_culling(const SIMDPlane planes[6], culling::ObjectSet &object_set, uint32_t offset, uint32_t count)
{
const auto all_true = bool4_all_true();
const uint32_t n_objects = offset + count;
uint32_t *visibility_flag = object_set.visibility_flag.begin();
// Test each plane of the frustum against each sphere.
for (uint32_t i = offset; i < n_objects; i += 4)
{
const auto ws_pos_x = float4_load_aligned(&object_set->ws_pos_x[i]);
const auto ws_pos_y = float4_load_aligned(&object_set->ws_pos_y[i]);
const auto ws_pos_z = float4_load_aligned(&object_set->ws_pos_z[i]);
const auto radius = float4_load_aligned(&object_set->radius[i]);
auto inside = all_true;
for (unsigned p = 0; p < 6; ++p) {
auto &n_x = planes[p].normal_x;
auto &n_y = planes[p].normal_y;
auto &n_z = planes[p].normal_z;
auto n_dot_pos = dot_product(ws_pos_x, ws_pos_y, ws_pos_z, n_x, n_y, n_z);
auto plane_test_point = n_dot_pos + radius;
auto plane_test = plane_test_point >= planes[p].d;
inside = vector_and(plane_test, inside);
}
// Store 0 for spheres that didn't intersect or ended up on the positive side of the
// frustum planes. Store 0xffffffff for spheres that are visible.
store_aligned(inside, &visibility_flag[i]);
}
}
</code></pre>
<p>Bringing the previous code snippet together with multi-threaded culling:</p>
<pre><code>{
// Calculate the number of work items based on that each work will process `work_size` elements.
const uint32_t work_size = 512;
// `div_ceil(a, b)` calculates `(a + b - 1) / b`.
const uint32_t n_work_items = math::div_ceil(n_objects, work_size);
Array<CullingWorkItem> culling_work_items(n_work_items);
Array<TaskDefinition> tasks(n_work_items);
// Splat out the planes to be able to do plane-sphere test with SIMD.
const auto &frustum = camera.frustum();
const SIMDPlane planes[6] = {
same code as previously shown...
};
// Make sure to align the size to the simd lane count.
const uint32_t n_aligned_objects = align_to_simd_lane_count(object_set.n_objects);
for (unsigned i = 0; i < n_work_items; ++i) {
// The `offset` and `count` for the work item.
const uint32_t offset = math::min(work_size * i, n_objects);
const uint32_t count = math::min(work_size, n_objects - offset);
auto &culling_item = culling_work_items[i];
memcpy(culling_data.planes, planes, sizeof(planes));
culling_item.object_set = &object_set;
culling_item.offset = offset;
culling_item.count = count;
culling_item.signal = 0;
auto &task = tasks[i];
task.callback = simd_sphere_culling_task;
task.user_data = &culling_item;
}
// Add the tasks to the `ThreadPool`.
thread_pool.add_tasks(n_work_items, tasks.begin());
// Wait for each `item` and if it's not done, help out with the culling work.
for (auto &item : culling_work_items)
thread_pool.wait_atomic(&item.signal, 0);
}
</code></pre>
<p><code>CullingWorkItem</code> and <code>simd_sphere_culling_task</code> are defined as:</p>
<pre><code>struct CullingWorkItem
{
SIMDPlane planes[6];
const culling::ObjectSet *object_set;
uint32_t offset;
uint32_t count;
std::atomic<uint32_t> signal;
};
void simd_sphere_culling_task(void *user_data)
{
auto culling_item = (CullingWorkItem*)(user_data);
// Call the frustum-sphere culling function.
simd_sphere_culling(culling_item->planes, *culling_item->object_set, culling_item->offset, culling_item->count);
// Signal that the work is done.
culling_item->store(1, std::memory_order_release);
}
</code></pre>
<p>The same pattern is used to multi-thread the frustum-OOBB culling. That is "left as an exercise for the reader" ;)</p>
<h1>Conclusion</h1>
<p>This type of culling is done for all of the objects that can be rendered, i.e. meshes, particle systems, terrain, etc. We also use it to cull light sources. It is used both when rendering the main scene and for rendering shadows.</p>
<p>I've left out a few details of our solution. One thing we also do is something called <em>contribution culling</em>. In the frustum-OOBB culling step, the extents of the OOBB corners are projected to the near plane and from that the screen space extents are derived. If the object is smaller than a certain threshold in any axis the object is considered as culled. Special care needs to be considered if any of the corners intersect or is behind the near plane so we don't have to deal with "external line segments" caused by the projection. If you don't know what that is see: <a href="http://www.gamasutra.com/view/news/168577/Indepth_Software_rasterizer_and_triangle_clipping.php">http://www.gamasutra.com/view/news/168577/Indepth_Software_rasterizer_and_triangle_clipping.php</a>. In our case the contribution culling is disabled by expanding the extents to span the entire screen when any corner intersects or is behind the near plane.</p>
<p>For our cascaded shadow maps, the extents are also used to detect if an object is fully enclosed by a cascade. If that is the case, then that object is culled from the later cascades. Let me illustrate with some ASCII:</p>
<pre><code>+-----------+-----------+
| | |
| /\ | |
| /--\ | |
+-----------+-----------+
| | |
| | |
| | |
+-----------+-----------+
</code></pre>
<p>The squares are the different cascades. The top left square is the first cascades, the top right is the second cascade, bottom left the third and the bottom right is the fourth cascade. In this case the weird triangle shaped object is fully enclosed by the first cascade. What that means is that the object doesn't need to be rendered to any of the later cascades, since the shadow contribution from that object will be fully taken care of from the first cascade.</p>Andreas Asplundhttp://www.blogger.com/profile/07360893562204949248noreply@blogger.com487tag:blogger.com,1999:blog-1994130783874175266.post-51234853644765878742016-09-07T15:38:00.000+02:002016-09-23T19:57:29.325+02:00State reflection<h1>Overview</h1>
<p>The Stingray engine has two controller threads -- the main thread and the render thread. These two threads build up work for our job system, which is distributed on the remaining threads. The main thread and the render thread are pipelined, so that while the main thread runs the simulation/update for frame <em>N</em>, the render thread is processing the rendering work for the previous frame (<em>N-1</em>). This post will dive into the details how state is propagated from the main thread to the render thread.</p>
<p>I will use code snippets to explain how the state reflection works. It's mostly actual code from the engine but it has been cleaned up to a certain extent. Some stuff has been renamed and/or removed to make it easier to understand what's going on.</p>
<h1>The main loop</h1>
<p>Here is a slimmed down version of the update loop which is part of the main thread:</p>
<pre><code>while (!quit())
{
// Calls out to the mandatory user supplied `update` Lua function, Lua is used
// as a scripting language to manipulate objects. From Lua worlds, objects etc
// can be created, manipulated, destroyed, etc. All these changes are recorded
// on a `StateStream` that is a part of each world.
_game->update();
// Flush state changes recorded on the `StateStream` for each world to
// the rendering world representation.
unsigned n_worlds = _worlds.size();
for (uint32_t i = 0; i < n_worlds; ++i) {
auto &world = *_worlds[i];
_render_interface->update_world(world);
}
// Begin a new render frame.
_render_interface->begin_frame();
// Calls out to the user supplied `render` Lua function. It's up to the script
// to call render on worlds(). The script controls what camera and viewport
// are used when rendering the world.
_game->render();
// Present the frame.
_render_interface->present_frame();
// End frame.
_render_interface->end_frame(_delta_time);
// Never let the main thread run more than 1 frame a head of the render thread.
_render_interface->wait_for_fence(_frame_fence);
// Create a new fence for the next frame.
_frame_fence = _render_interface->create_fence();
}
</code></pre>
<p>First thing to point out is the <code>_render_interface</code>. This is not a class full of virtual functions that some other class can inherit from and override as the name might suggest. The word "interface" is used in the sense that it's used to communicate from one thread to another. So in this context the <code>_render_interface</code> is used to post messages from the main thread to the render thread.</p>
<p>As said in the first comment in the code snippet above, Lua is used as our scripting language and from Lua things such as worlds, objects, etc can be created, destroyed, manipulated, etc. </p>
<p>The state between the main thread and the render thread is very rarely shared, instead each thread has its own representation and when state is changed on the main thread that state is reflected over to the render thread. E.g., the <code>MeshObject</code>, which is the representation of a mesh with vertex buffers, materials, textures, shaders, skinning, data etc to be rendered, is the main thread representation and <code>RenderMeshObject</code> is the corresponding render thread representation. All objects that have a representation on both the main and render thread are setup to work the same way:</p>
<pre><code>class MeshObject : public RenderStateObject
{
};
class RenderMeshObject : public RenderObject
{
};
</code></pre>
<p>The corresponding render thread class is prefixed with <code>Render</code>. We use this naming convention for all objects that have both a main and a render thread representation.</p>
<p>The main thread objects inherit from <code>RenderStateObject</code> and the render thread objects inherit from <code>RenderObject</code>. These structs are defined as:</p>
<pre><code>struct RenderStateObject
{
uint32_t render_handle;
StateReflection *state_reflection;
};
struct RenderObject
{
uint32_t type;
};
</code></pre>
<p>The <code>render_handle</code> is an ID that identifies the corresponding object on the render thread. <code>state_reflection</code> is a stream of data that is used to propagate state changes from the main thread to the render thread. <code>type</code> is an enum used to identify the type of render objects.</p>
<h1>Object creation</h1>
<p>In Stingray a <em>world</em> is a container of renderable objects, physical objects, sounds, etc. On the main thread, it is represented by the <code>World</code> class, and on the render thread by a <code>RenderWorld</code>.</p>
<p>When a <code>MeshObject</code> is created in a world on the main thread, there's an explicit call to <code>WorldRenderInterface::create()</code> to create the corresponding render thread representation:</p>
<pre><code>MeshObject *mesh_object = MAKE_NEW(_allocator, MeshObject);
_world_render_interface.create(mesh_object);
</code></pre>
<p>The purpose of the call to <code>WorldRenderInterface::create</code> is to explicitly create the render thread representation, acquire a <code>render_handle</code> and to post that to the render thread:</p>
<pre><code>void WorldRenderInterface::create(MeshObject *mesh_object)
{
// Get a unique render handle.
mesh_object->render_handle = new_render_handle();
// Set the state_reflection pointer, more about this later.
mesh_object->state_reflection = &_state_reflection;
// Create the render thread representation.
RenderMeshObject *render_mesh_object = MAKE_NEW(_allocator, RenderMeshObject);
// Pass the data to the render thread
create_object(mesh_object->render_handle, RenderMeshObject::TYPE, render_mesh_object);
}
</code></pre>
<p>The <code>new_render_handle</code> function speaks for itself.</p>
<pre><code>uint32_t WorldRenderInterface::new_render_handle()
{
if (_free_render_handles.any()) {
uint32_t handle = _free_render_handles.back();
_free_render_handles.pop_back();
return handle;
} else
return _render_handle++;
}
</code></pre>
<p>There is a recycling mechanism for the render handles and a similar pattern reoccurs at several places in the engine. The <code>release_render_handle</code> function together with the <code>new_render_handle</code> function should give the complete picture of how it works.</p>
<pre><code>void WorlRenderInterface::release_render_handle(uint32_t handle)
{
_free_render_handles.push_back(handle);
}
</code></pre>
<p>There is one <code>WorldRenderInterface</code> per world which contains the <code>_state_reflection</code> that is used by the world and all of its objects to communicate with the render thread. The <code>StateReflection</code> in its simplest form is defined as:</p>
<pre><code>struct StateReflection
{
StateStream *state_stream;
};
</code></pre>
<p>The <code>create_object</code> function needs a bit more explanation though:</p>
<pre><code>void WorldRenderInterface::create_object(uint32_t render_handle, RenderObject::Type type, void *user_data)
{
// Allocate a message on the `state_stream`.
ObjectManagementPackage *omp;
alloc_message(_state_reflection.state_stream, WorldRenderInterface::CREATE, &omp);
omp->object_type = RenderWorld::TYPE;
omp->render_handle = render_handle;
omp->type = type;
omp->user_data = user_data;
}
</code></pre>
<p>What happens here is that <code>alloc_message</code> will allocate enough bytes to make room for a <code>MessageHeader</code> together with the size of <code>ObjectManagementPackage</code> in a buffer owned by the <code>StateStream</code>. The <code>StateStream</code> is defined as:</p>
<pre><code>struct StateStream
{
void *buffer;
uint32_t capacity;
uint32_t size;
};
</code></pre>
<p><code>capacity</code> is the size of the memory pointed to by <code>buffer</code>, <code>size</code> is the current amount of bytes allocated from <code>buffer</code>.</p>
<p>The <code>MessageHeader</code> is defined as:</p>
<pre><code>struct MessageHeader
{
uint32_t type;
uint32_t size;
uint32_t data_offset;
};
</code></pre>
<p>The <code>alloc_message</code> function will first place the <code>MessageHeader</code> and then comes the <code>data</code>, some ASCII to the rescue:</p>
<pre><code>+-------------------------------------------------------------------+
| MessageHeader | data |
+-------------------------------------------------------------------+
<- data_offset ->
<- size ->
</code></pre>
<p>The <code>size</code> and <code>data_offset</code> mentioned in the ASCII are two of the members of <code>MessageHeader</code>, these are assigned during the <code>alloc_message</code> call:</p>
<pre><code>template<Class T>
void alloc_message(StateStream *state_stream, uint32_t type, T **data)
{
uint32_t data_size = sizeof(T);
uint32_t message_size = sizeof(MessageHeader) + data_size;
// Allocate message and fill in the header.
void *buffer = allocate(state_stream, message_size, alignof(MessageHeader));
auto header = (MessageHeader*)buffer;
header->type = type;
header->size = message_size;
header->data_offset = sizeof(MessageHeader);
*data = memory_utilities::pointer_add(buffer, header->data_offset);
}
</code></pre>
<p>The <code>buffer</code> member of the <code>StateStream</code> will contain several consecutive chunks of message headers and data blocks.</p>
<pre><code>+-----------------------------------------------------------------------+
| Header | data | Header | data | Header | data | Header | data | etc |
+-----------------------------------------------------------------------+
</code></pre>
<p>This is the necessary code on the main thread to create an object and populate the <code>StateStream</code> which will later on be consumed by the render thread. A very similar pattern is used when changing the state of an object on the main thread, e.g:</p>
<pre><code>void MeshObject::set_flags(renderable::Flags flags)
{
_flags = flags;
// Allocate a message on the `state_stream`.
SetVisibilityPackage *svp;
alloc_message(state_reflection->state_stream, MeshObject::SET_VISIBILITY, &svp);
// Fill in message information.
svp->object_type = RenderMeshObject::TYPE;
// The render handle that got assigned in `WorldRenderInterface::create`
// to be able to associate the main thread object with its render thread
// representation.
svp->handle = render_handle;
// The new flags value.
svp->flags = _flags;
}
</code></pre>
<h1>Getting the recorded state to the render thread</h1>
<p>Let's take a step back and explain what happens in the main update loop during the following code excerpt:</p>
<pre><code>// Flush state changes recorded on the `StateStream` for each world to
// the rendering world representation.
unsigned n_worlds = _worlds.size();
for (uint32_t i = 0; i < n_worlds; ++i) {
auto &world = *_worlds[i];
_render_interface->update_world(world);
}
</code></pre>
<p>When Lua has been creating, destroying, manipulating, etc objects during <code>update()</code> and is done, each world's <code>StateStream</code> which contains all the recorded changes is ready to be sent over to the render thread for consumption. The call to <code>RenderInterface::update_world()</code> will do just that, it roughly looks like:</p>
<pre><code>void RenderInterface::update_world(World &world)
{
UpdateWorldMsg uw;
// Get the render thread representation of the `world`.
uw.render_world = render_world_representation(world);
// The world's current `state_stream` that contains all changes made
// on the main thread.
uw.state_stream = world->_world_reflection_interface.state_stream;
// Create and assign a new `state_stream` to the world's `_world_reflection_interface`
// that will be used for the next frame.
world->_world_reflection_interface->state_stream = new_state_stream();
// Post a message to the render thread to update the world.
post_message(UPDATE_WORLD, &uw);
}
</code></pre>
<p>This function will create a new message and post it to the render thread. The world being flushed and its <code>StateStream</code> are stored in the message and a new <code>StateStream</code> is created that will be used for the next frame. This new <code>StateStream</code> is set on the <code>WorldRenderInterface</code> of the <code>World</code>, and since all objects being created got a pointer to the same <code>WorldRenderInterface</code> they will use the newly created <code>StateStream</code> when storing state changes for the next frame.</p>
<h1>Render thread</h1>
<p>The render thread is spinning in a message loop:</p>
<pre><code>void RenderInterface::render_thread_entry()
{
while (!_quit) {
// If there's no message -- put the thread to sleep until there's
// a new message to consume.
RenderMessage *message = get_message();
void *data = data(message);
switch (message->type) {
case UPDATE_WORLD:
internal_update_world((UpdateWorldMsg*)(data));
break;
// ... And a lot more case statements to handle different messages. There
// are other threads than the main thread that also communicate with the
// render thread. E.g., the resource loading happens on its own thread
// and will post messages to the render thread.
}
}
}
</code></pre>
<p>The <code>internal_update_world()</code> function is defined as:</p>
<pre><code>void RenderInterface::internal_update_world(UpdateWorldMsg *uw)
{
// Call update on the `render_world` with the `state_stream` as argument.
uw->render_world->update(uw->state_stream);
// Release and recycle the `state_stream`.
release_state_stream(uw->state_stream);
}
</code></pre>
<p>It calls <code>update()</code> on the <code>RenderWorld</code> with the <code>StateStream</code> and when that is done the <code>StateStream</code> is released to a pool.</p>
<pre><code>void RenderWorld::update(StateStream *state_stream)
{
MessageHeader *message_header;
StatePackageHeader *package_header;
// Consume a message and get the `message_header` and `package_header`.
while (get_message(state_stream, &message_header, (void**)&package_header)) {
switch (package_header->object_type) {
case RenderWorld::TYPE:
{
auto omp = (WorldRenderInterface::ObjectManagementPackage*)package_header;
// The call to `WorldRenderInterface::create` created this message.
if (message_header->type == WorldRenderInterface::CREATE)
create_object(omp);
}
case (RenderMeshObject::TYPE)
{
if (message_header->type == MeshObject::SET_VISIBILITY) {
auto svp = (MeshObject::SetVisibilityPackage*>)package_header;
// The `render_handle` is used to do a lookup in `_objects_lut` to
// to get the `object_index`.
uint32_t object_index = _object_lut[package_header->render_handle];
// Get the `render_object`.
void *render_object = _objects[object_index];
// Cast it since the type is already given from the `object_type`
// in the `package_header`.
auto rmo = (RenderMeshObject*)render_object;
// Call update on the `RenderMeshObject`.
rmo->update(message_header->type, package_header);
}
}
// ... And a lot more case statements to handle different kind of messages.
}
}
}
</code></pre>
<p>The above is mostly infrastructure to extract messages from the <code>StateStream</code>. It can be a bit involved since a lot of stuff is written out explicitly but the basic idea is hopefully simple and easy to understand.</p>
<p>On to the <code>create_object</code> call done when <code>(message_header->type == WorldRenderInterface::CREATE)</code> is satisfied:</p>
<pre><code>void RenderWorld::create_object(WorldRenderInterface::ObjectManagementPackage *omp)
{
// Acquire an `object_index`.
uint32_t object_index = _objects.size();
// Same recycling mechanism as seen for render handles.
if (_free_object_indices.any()) {
object_index = _free_object_indices.back();
_free_object_indices.pop_back();
} else {
_objects.resize(object_index + 1);
_object_types.resize(object_index + 1);
}
void *render_object = omp->user_data;
if (omp->type == RenderMeshObject::TYPE) {
// Cast the `render_object` to a `MeshObject`.
RenderMeshObject *rmo = (RenderMeshObject*)render_object;
// If needed, do more stuff with `rmo`.
}
// Store the `render_object` and `type`.
_objects[object_index] = render_object;
_object_types[object_index] = omp->type;
if (omp->render_handle >= _object_lut.size())
_object_lut.resize(omp->handle + 1);
// The `render_handle` is used
_object_lut[omp->render_handle] = object_index;
}
</code></pre>
<p>So the take away from the code above lies in the general usage of the <code>render_handle</code> and the <code>object_index</code>. The <code>render_handle</code> of objects are used to do a look up in <code>_object_lut</code> to get the <code>object_index</code> and <code>type</code>. Let's look at an example, the same <code>RenderWorld::update</code> code presented earlier but this time the focus is when the message is <code>MeshObject::SET_VISIBILITY</code>:</p>
<pre><code>void RenderWorld::update(StateStream *state_stream)
{
StateStream::MessageHeader *message_header;
StatePackageHeader *package_header;
while (get_message(state_stream, &message_header, (void**)&package_header)) {
switch (package_header->object_type) {
case (RenderMeshObject::TYPE)
{
if (message_header->type == MeshObject::SET_VISIBILITY) {
auto svp = (MeshObject::SetVisibilityPackage*>)package_header;
// The `render_handle` is used to do a lookup in `_objects_lut` to
// to get the `object_index`.
uint32_t object_index = _object_lut[package_header->render_handle];
// Get the `render_object` from the `object_index`.
void *render_object = _objects[object_index];
// Cast it since the type is already given from the `object_type`
// in the `package_header`.
auto rmo = (RenderMeshObject*)render_object;
// Call update on the `RenderMeshObject`.
rmo->update(message_header->type, svp);
}
}
}
}
}
</code></pre>
<p>The state reflection pattern shown in this post is a fundamental part of the engine. Similar patterns appear in other places as well and having a good understanding of this pattern makes it much easier to understand the internals of the engine.</p>Andreas Asplundhttp://www.blogger.com/profile/07360893562204949248noreply@blogger.com66tag:blogger.com,1999:blog-1994130783874175266.post-82473440980044024842016-09-06T13:23:00.000+02:002016-09-06T13:23:13.820+02:00A New Localization System for Stingray<p>The current Stingray localization system is based around the concept of
<em>properties</em>. A property is any period separated part of the file name
before the extension. Consider the following three files:</p>
<ul>
<li><code>trees/larch_03.unit</code></li>
<li><code>trees/larch_03.fr.unit</code></li>
<li><code>trees/larch_03.ps4.unit</code></li>
</ul>
<p>These three files all have the same type (<code>.unit</code>), and the same name
(<code>trees/larch_03</code>), but their properties differ. The first one has
no properties set. The second one has the property <code>.fr</code> and the last
one has the property <code>.ps4</code>. (Note that resources can have more than
one property.)</p>
<p>Properties are resolved in slightly different ways, depending on
the kind of property. <em>Platform properties</em> are resolved at compile
time, so if you compile for PS4, you will get the PS4 version of
the resource (or the default version if there is no <code>.ps4</code> specific
version).</p>
<p>Other properties are resolved at <em>resource load time</em>. When you load
a bunch of resources, which property variant is loaded
depends on a global <em>property preference order</em> set
from the script. A property preference order of <code>['.fr', '.es']</code> means
that resources with the property <code>.fr</code> are be preferred, then resources
with the property <code>.es</code> (if no <code>.fr</code> resource is available), and finally
a resource without any properties at all.</p>
<p>This single mechanism is used for localizing strings, sounds, textures,
etc. Strings, for example, are stored in <code>.strings</code> files, which are
essentially just key-value stores:</p>
<pre><code>file = "File"
open = "Open"
...
</code></pre>
<p>To create a French localized of this <code>menu.strings</code> resource, you
just create a <code>menu.fr.strings</code> resource and fill it with:</p>
<pre><code>file = "Fichier"
open = "Ouvert"
...
</code></pre>
<p>This basic localization system has served us well for many years, but
it has some drawbacks that are starting to become more pronounced:</p>
<ul>
<li><p>It doesn't allow file names with periods in them. Since we always
interpret periods as properties, periods can't be a part of the
regular file name. This isn't a huge problem when users name their own
files, but as we are increasing the interoperability between Stingray
and other software packages we more and more run into software that has,
let's say <em>peculiar</em>, ways of naming its files. Renaming things by hand
is cumbersome and can also break things when files cross-reference
each other.</p></li>
<li><p>Switching language requires reloading the resource packages. This seems
overly complicated. We have more memory these days than when we started
building Stingray. In many cases,
especially for strings, it makes more sense to keep them in memory all
the time, so we can switch between them easily.</p></li>
<li><p>Just switching on platform isn't enough. Mobile devices range
from very low-end to at least mid-end. Rather than having <code>.ios</code> and <code>.android</code>
properties, we might want <code>.low-quality</code> and <code>.high-quality</code> and
select which one to use based on the actual capabilities of the
hardware.</p></li>
<li>
<p>Making editors work well with the property system has been surprisingly
complicated. For example, when the editor runs on Windows,
what should it show if there is a <code>.win32</code> specialization of a
resource -- the default version or the <code>.win32</code> one? How would you
edit a <code>.ps4</code> resource when those are normally stripped out of the
Windows runtime?</p>
<p>We used to have this wonky think where you could sort of
cross-compile the resources and say that "I want to run on Windows,
but <em>as if</em> I was running on PS4. But to be honest, that system
never really worked that well and in the new editor we have
gotten rid of it.</p>
</li>
</ul>
<p>Interestingly, out of all these problems, it is the first one -- the
most stupid one -- that is the main impetus for change.</p>
<h2>
<a id="user-content-the-new-system" class="anchor" href="#the-new-system" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>The New System</h2>
<p>The new system has several parts. First, we decided that for systems
that deal with localization a lot, such as strings and sounds it makes
sense to have the system actually be aware of localization. That way,
we can provide the best possible experience.</p>
<p>So the <code>.strings</code> format has changed to:</p>
<pre><code>file = {en = "File", fr = "Fichier", ...}
open = {en = "Open", fr = "Ouvert", ...}
...
</code></pre>
<p>All the languages are stored in the same file and to switch language
you just call <code>Localizer.set_language("fr")</code>. We keep all the different
languages in memory at all times. Even for a game with ridiculous
amounts of text this still doesn't use much memory and it means we
can hot-swap languages instantly.</p>
<p>This is a nice approach, but it doesn't work for all resources. We
don't want to add this deep kind of integration to resources that are
normally not localized, such as <code>.unit</code> and <code>.texture</code>. Still, there
sometimes is a need to localize such resources. For example, a <code>.texture</code>
might have text in it that needs to be localized. We may need a low-poly
version of a <code>.unit</code> for a less capable platform. Or a less gory version
of an animation for countries with stricter age ratings.</p>
<p>To make things easier for the editor we decided to ditch the property
system all together, and instead go for a substitution strategy. There
are no special magical parts of a resource's path -- it is just a name
and a type. But if you want to, you can say to the engine that all
instances of a certain resource should be replaced with another resource:</p>
<pre><code>trees/larch_03.unit → trees/larch_03_ps4.unit
</code></pre>
<p>Note here that there is nothing special or magical about the
<code>trees/larch_03_ps4.unit</code>. There is no problem with displaying it on Windows.
You just edit it in the editor, like any other unit. However, when you play
the game -- any time a <code>trees/larch_03.unit</code> is requested by the engine, a
<code>trees/larch_03_ps4.unit</code> is substituted. So if you have authored a level
full of <code>larch_03</code> units, when the override above is in place, you will instead
see <code>larch_03_ps4</code> units.</p>
<p>There are many ways for this scheme to go wrong. The gameplay script might
expect to find a certain node <code>branch_43</code> in the unit -- a node that exists in
<code>larch_03.unit</code>, but not in <code>larch_03_ps4.unit</code> and this may lead to unexpected
behavior. The same problem existed in the old property system. We don't try to
do anything special about this, because it is impossible. In the end, it is only
the gameplay script that can know what it means for two things to be <em>similar
enough</em> to be used interchangeably. Anyone working with localized resources just
has to be careful not to break things.</p>
<p>Overrides can be specified from the Lua script:</p>
<div class="highlight highlight-source-lua"><pre>Application.<span class="pl-c1">set_resource_override</span>(<span class="pl-s"><span class="pl-pds">"</span>unit<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>trees/larch_03<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>trees/larch_03_ps4<span class="pl-pds">"</span></span>);</pre></div>
<p>Note that this is a much more powerful system than the old property system.
Any resource can be set to override any other -- we are not restricted to work
within the strict naming scheme required by the property system. Also, the
override is dynamic and can be determined at runtime. So it can be based on dynamic
properties, such as measured CPU or GPU performance -- or a user setting
for the amount of gore they are comfortable with.</p>
<p>It can even be used for completely different things than localization or platform
specific resources -- such as replacing the units in a level for a night-time
or psychedelic version of the same level. And I'm sure our users will find many
other ways of (ab)using this mechanism.</p>
<p>But this dynamic system is not quite enough to do everything we want to do.</p>
<p>First, since the override is dynamic and only happens at runtime, our packaging
system can't be aware of it. Normally, our packaging system figures out all
resource dependencies automatically. So when you say that you want a package with
the <code>forest</code> level, the packaging system will automatically pull in the
<code>larch_03</code> unit that is used in that level, any textures used by that unit, etc.
But since the packaging system can't know that at runtime you will replace
<code>larch_03</code> with <code>larch_03_ps4</code>, it doesn't know that <code>larch_03_ps4</code> and its
dependencies should go into the package as well.</p>
<p>You could add <code>larch_03_ps4</code> to the package manually, since <em>you</em> know it will be
used. That might work if you only have one or two overrides. However,
even with a very small amount of overrides micromanaging packages in this way
becomes incredibly tedious and error prone.</p>
<p>Second, we don't want to burden the packages with resources that will never
be used. If we are making a game for digital distribution on iOS or Android
we don't want to include large PS4-only resources in that game.</p>
<p>So we need a static override mechanism that is known by the package manager
to make sure it includes and excludes the right resources. The simplest thing
would be a big file that just listed all the overrides. For example, to
override <code>larch_03</code> on PS4 we would write something like:</p>
<pre><code>resource_overrides = [
{
type = "unit"
name = "trees/larch_03"
override = "trees/larch_03_ps4"
platforms = ["ps4"]
}
]
</code></pre>
<p>This would work, but could again get pretty tedious if there are a lot of overrides.
It would be nice with something that was a bit more automatic.</p>
<p>Since our users are already used to using name suffixes such as <code>.fr</code> and <code>.ps4</code>
for localization, we decided to build on the same mechanism -- creating overrides
automatically based on suffix rules:</p>
<pre><code>resource_overrides = [
{suffix = "_ps4", platforms = ["ps4"]}
]
</code></pre>
<p>This rule says that when we are compiling for the platform PS4, if we find a resource
that has the same name as another resource, but with the added suffix <code>_ps4</code>, that
resource will automatically be registered as an override for that resource:</p>
<pre><code>trees/larch_03.unit → trees/larch_03_ps4.unit
leaves/larch_leaves.texture → leaves/larch_leaves_ps4.unit
</code></pre>
<p>In addition to platform settings, the system also generalizes to support other flags:</p>
<pre><code>resource_overrides = [
{suffix = "_fr", flags = ["fr"]}
{suffix = "_4k", flags = ["4K"]}
{suffix = "_noblood", flags = ["noblood", "PG-13"]}
]
</code></pre>
<p>This defines the <code>_fr</code> suffix for French localization. A 4K suffix <code>_4k</code> for high-quality
versions of resources suitable for 4K monitors. And a <code>_noblood</code> suffix that selects
resources without blood and gore.</p>
<p>The flags can be set at compile time with:</p>
<pre><code>--compile --resource-flag-true 4K
</code></pre>
<p>This means that we are compiling a <code>4K</code> version of the game, so when bundling <em>only</em> the
4K resources will be included and the other versions will be stripped out. Just as if
we were compiling for a specific platform.</p>
<p>But we can also choose to resolve the flags at runtime:</p>
<pre><code>--compile --resource-flag-runtime noblood
</code></pre>
<p>With this setting, both the regular resource and the <code>_noblood</code> resource will be included
in the package and loaded into memory. And we can hot swap between them with:</p>
<div class="highlight highlight-source-lua"><pre>Application.<span class="pl-c1">set_resource_flag</span>(<span class="pl-s"><span class="pl-pds">"</span>noblood<span class="pl-pds">"</span></span>, <span class="pl-c1">true</span>)</pre></div>
<p>I have not decided yet whether in addition to these two alternatives we should also
have an option that resolves at <em>package load time</em>. I.e., both variants of the resource
would be included on disk, but only one of them would be loaded into memory and if you
wanted to switch resource you would have to unload the package and load it back into memory
again.</p>
<p>I can see some use cases for this, but on the other hand adding more options complicates
the system and I like to keep things as simple as possible.</p>
<p>A nice thing about this suffix mapping is that it can be configured to be backwards
compatible with the old property system:</p>
<pre><code>resource_overrides = [
{suffix = ".fr", flags = ["fr"]}
{suffix = ".ps4", platforms = ["ps4"]}
{suffix = ".xb1", platforms = ["xb1"]}
]
</code></pre>
<p>Whenever we change something in Stingray we try to make it more flexible and data-driven,
while at the same time ensuring that the most common cases are still easy to work with.
This rewrite of the localization is a good example:</p>
<ul>
<li><p>It fixes the problem with periods in file names. Periods are now only an issue if you have
made an explicit suffix mapping that matches them.</p></li>
<li><p>We can switch language (or any other resource setting) at runtime.</p></li>
<li><p>The new system is more flexible -- it doesn't just handle localization and platform
specific resources, we can set up whatever resource categories we want.
And we can even dynamically override individual resources.</p></li>
<li><p>The editor no longer needs to do anything special to deal with the concept of "properties".
Resources that are used to override other resources can be edited in the editor just
like any other resource.</p></li>
<li><p>And the system can easily be configured to be backwards compatible with the old
localization system.</p></li>
</ul>
<p>I still feel slightly queasy about using name matching to drive parts of this system.
Name matching is a practice that can go horribly wrong. But in this case, since the
name matching is completely user controlled I think it makes a good compromise between
purity and usability.</p>
Niklashttp://www.blogger.com/profile/10055379994557504977noreply@blogger.com31tag:blogger.com,1999:blog-1994130783874175266.post-34371133218910644202016-08-16T12:26:00.000+02:002016-08-16T12:26:57.493+02:00Render Config Extensions<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
<p>The rendering pipe in Stingray is completely data-driven, meaning that everything from which GPU buffers (render targets etc) that are needed to compose the final rendered frame to the actual flow of the frames is described in the <code>render_config</code> file - a human readable json file. I have covered this in various presentations [1,2] over the years so I won’t be going into more details about it in this blog post, instead I’d like to focus on a new feature that we are rolling out in Stingray v1.5 - Render Config Extensions.</p>
<p>As Stingray is growing to cater to more industries than game development we see lots of feature requests that don’t necessarily fit in with our ideas of what should go into the default rendering pipe that we ship with Stingray.
This has made it apparent that we need a way of doing deep integrations of new rendering features without having to duplicate the entire <code>render_config</code> file.</p>
<p>This is where the <code>render_config_extension</code> files comes into play. A <code>render_config_extension</code> is very similar to the main <code>render_config</code> except that instead of having to describe the entire rendering pipe it appends and inserts different json blocks into the main <code>render_config</code>.</p>
<p>When the engine starts the boot <code>ini</code>-file specifies what <code>render_config</code> to use as well as an array of <code>render_config_extensions</code> to load when setting up the renderer.</p>
<pre><code>render_config = "core/stingray_renderer/renderer"
render_config_extensions = ["clouds-resources/clouds", "prism/prism"]
</code></pre>
<p>The array describes the initialization order of the extensions which makes it possible for the project author to control how the different extensions stacks on top of each other. It also makes it possible to build extensions that depends on other extensions.</p>
<p>A <code>render_config_extension</code> consists of two root blocks: <em>append</em> and <em>insert_at</em>:</p>
<h2><a id="append_18"></a><em>append</em></h2>
<p>The <em>append</em> block is used for everything that is order independent and allows you to append data to the following root blocks of the main <code>render_config</code>:</p>
<ul>
<li><em>shader_libraries</em> – lists additional shader_libraries to load</li>
<li><em>render_settings</em> – add more render_settings (quality settings, debug flags, etc.)</li>
<li><em>shader_pass_flags</em> – add more shader_pass_flags (used by shader system to dynamically turn on/off passes)</li>
<li><em>global_resources</em> – additional global GPU resources to allocate on boot</li>
<li><em>resource_generators</em> – expose new resource_generators</li>
<li><em>viewports</em> – expose new viewport templates</li>
<li><em>lookup_tables</em> – append to the list of resource_generators to execute when booting the renderer (mainly used for generating lookup tables)</li>
</ul>
<p>One thing to note about extending these blocks is that we currently do not do any kind of name collision checking, so using a prefix to mimic a namespace for your extension is probably a good idea.</p>
<pre><code>// example append block from JPs volumetric clouds plugin
append = {
render_settings = {
clouds_enabled = true
clouds_raw_data_visualization = false
clouds_weather_data_visualization = false
}
shader_libraries = [
"clouds-resources/clouds"
]
global_resources = [
// Clouds modelling resources:
{ name="clouds_result_texture1" type="render_target" image_type="image_3d" width=256 height=256 layers=256 format="R8G8B8A8" }
{ name="clouds_result_texture2" type="render_target" image_type="image_3d" width=64 height=64 layers=64 format="R8G8B8A8" }
{ name="clouds_result_texture3" type="render_target" image_type="image_2d" width=128 height=128 format="R8G8B8A8" }
{ name="clouds_weather_texture" type="render_target" image_type="image_2d" width=256 height=256 format="R8G8B8A8" }
]
}
</code></pre>
<h2><a id="insert_at_55"></a><em>insert_at</em></h2>
<p>The <em>insert_at</em> block allows you to insert layers and modifiers into already existing <em>layer_configurations</em> and <em>resource_generators</em>, either belonging to the main <code>render_config</code> file or a <code>render_config_extension</code> listed earlier in the <code>render_config_extensions</code> array of engine boot <code>ini</code>-file.</p>
<pre><code>// example insert_at block from JPs volumetric clouds plugin
insert_at = {
post_processing_development = {
modifiers = [
{ type="dynamic_branch" render_settings={ clouds_weather_data_visualization=true }
pass = [
{ type="fullscreen_pass" shader="debug_weather" input=["clouds_weather_texture"] output=["output_target"] }
]
}
]
}
skydome = {
layers = [
{ resource_generator="clouds_modifier" profiling_scope="clouds" }
]
}
}
</code></pre>
<p>The object names under the <em>insert_at</em> block refers to <code>extension_insertion_points</code> listed in the main <code>render_config</code> file or one of the previously loaded <code>render_config_extension</code> files. We’ve chosen not to allow extensions to inject anywhere they like (using line numbers or similar crazyness), instead we expose a bunch of extension “hooks” at various places in the main <code>render_config</code> file. By doing this we hope to have a somewhat better chance of not breaking existing extensions as we continue to develop and potentially do bigger refactorings of the default <code>render_config</code> file.</p>
<h2><a id="Future_work_82"></a>Future work</h2>
<p>This extension mechanism is somewhat of an experiment and we might need to rethink parts of it in a later version of Stingray. We’ve briefly discussed a potential need for dealing with versioning, i.e. allowing extensions to explicitly list what versions of Stingray they are compatible with (and maybe also allow extensions to have deviating implementations depending on version). Some kind of enforced name spacing and more aggressive validation to avoid name collisions have also been debated.</p>
<p>In the end we decided to ignore these potential problems for now and instead push for getting a first version out in 1.5 to unblock plugin developers and internal teams wanting to do efficient “deep” integrations of various rendering features. Hopefully we won’t regret this decision too much later on. ;)</p>
<h2><a id="References_88"></a>References</h2>
<ul>
<li>[1] Flexible Rendering for Multiple Platforms (Tobias Persson, GDC 2012)</li>
<li>[2] Benefits of data-driven renderer (Tobias Persson, GDC 2011)</li>
</ul>
</body></html>Tobiashttp://www.blogger.com/profile/16240529312060411542noreply@blogger.com249tag:blogger.com,1999:blog-1994130783874175266.post-88092063594548827192016-07-31T13:19:00.000+02:002016-07-31T13:19:03.301+02:00Volumetric Clouds<p>
There has been a lot of progress made recently with volumetric clouds in games. The folks from <a href="http://reset-game.net/">Reset</a> have posted a great <a href="http://reset-game.net/?p=284">article</a> regarding their custom dynamic clouds solution, Egor Yusov published <a href="http://gpupro.blogspot.ca/2015/01/gpu-pro-6-real-time-rendering-of.html">Real-time Rendering of Physics-Based Clouds using Precomputed Scattering</a> in GPU Pro 6, last year Andrew Schneider presented <a href="http://advances.realtimerendering.com/s2015/index.html">Real-time Volumetric Cloudscapes of Horizon: Zero Dawn</a>, and just last week Sébastien Hillaire presented <a href="http://s2016.siggraph.org/courses/sessions/physically-based-shading-theory-and-practice">Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite</a>. Inspired by all this latest progress we decided to implement a Stingray plugin to get a feel for the challenge that is real time clouds rendering.
</p>
<iframe align="middle" src="https://player.vimeo.com/video/176699598" width="685" height="275" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p>
Note: This article isn't an introduction to volumetric cloud rendering but more of a small log of the development process of the plugin. Also, you can try it out for yourself or look at the code by downloading the <a href="https://github.com/greje656/clouds">Stingray plugin</a>. Feel free to contribute!
</p>
<p></p>
<h3>Modeling</h3>
<p>
The modeling of our clouds is heavily inspired by the <a href="http://patapom.com/topics/Revision2013/Revision%202013%20-%20Real-time%20Volumetric%20Rendering%20Course%20Notes.pdf">Real-time Volumetric Rendering Course Notes</a> and <a href="http://advances.realtimerendering.com/s2015/index.html">Real-time Volumetric Cloudscapes of Horizon: Zero Dawn</a>. It uses a set of 3d and 2d noises that are modulated by a coverage and altitude term to generate the 3d volume to be rendered.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_OzJPrmNdTyvc2X3ubnb76HOALukJ4GoLl9QlzVW602dzbFuYAisD3lgOIRrxicQ4cXxrjG3EBQij4cUyK4_V98hNrCt2hDjMCRggU7gJeXIJC8YtIKCBj9uqyZQ6QYRE4DJCs3W9GL7R/s1600/noises.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_OzJPrmNdTyvc2X3ubnb76HOALukJ4GoLl9QlzVW602dzbFuYAisD3lgOIRrxicQ4cXxrjG3EBQij4cUyK4_V98hNrCt2hDjMCRggU7gJeXIJC8YtIKCBj9uqyZQ6QYRE4DJCs3W9GL7R/s640/noises.png" width="640" height="478" /></a></div>
<p>
I was really impressed at the shapes that can be created from such simple building blocks. While you can definitely see cases where some tiling occurs, it’s not as bad as you would imagine. Once the textures are generated the tough part is to find the right sampling spaces and scales at which they should be sampled in the atmosphere. It's difficult to get a good balance between tiling artifacts vs getting enough high frequency details for the clouds. On top of that cache hits are greatly affected by the sampling scale used so it's another factor to consider.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzWr-gQZkGvGlQckJh8wtf9p9zbq-i3u6dnANKolQC1OJ7ZBaW9kqxLK9NPslwUIJWKPS6MvUk3e7ry_PbNbond8fVLryvZXyKyZrEtn4HkQTrkS10ZkK0vG3yanLiZC3Hf-GlVjJEaKlf/s1600/scales.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzWr-gQZkGvGlQckJh8wtf9p9zbq-i3u6dnANKolQC1OJ7ZBaW9kqxLK9NPslwUIJWKPS6MvUk3e7ry_PbNbond8fVLryvZXyKyZrEtn4HkQTrkS10ZkK0vG3yanLiZC3Hf-GlVjJEaKlf/s640/scales.png" width="640" height="640" /></a></div>
<p>
Finding good sampling scales for all of these textures and choosing by how much the extrusion texture should affect the low frequency clouds is very time consuming. With some time you eventually build intuition for what will look good in most scenarios but it’s definitely a difficult part of the process.
</p>
<p>
We also generate some curl noise which is used to perturb and animate the clouds slightly. I've found that adding noise to the sampling position also reduces linear filtering artifacts that can arise when ray marching these low resolution 3d textures.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggVH_GCXdKlBvT2U3j5NTpAKeePXxdKtPsHpVJotzlK5DxX-X9f43v-FOhHCTKMv_0V4wD7ETUHFiCTjzUXEMeARpCNhul9mGY-Vkic-GYv5Q2hJKTxlbTEIubcnpMd67FDqu_7tqph71c/s1600/curl.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggVH_GCXdKlBvT2U3j5NTpAKeePXxdKtPsHpVJotzlK5DxX-X9f43v-FOhHCTKMv_0V4wD7ETUHFiCTjzUXEMeARpCNhul9mGY-Vkic-GYv5Q2hJKTxlbTEIubcnpMd67FDqu_7tqph71c/s640/curl.png" width="640" height="190" /></a></div>
<p>
One thing that often bothered me is the oddly shaped cumulus clouds that can arise from tilled 3d noise. Those cases are particularly noticeable for distant clouds. Adding extra cloud coverage for lower altitude sampling positions minimizes this artifact.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGKTCz529KGrgp9xyXZX1oZ9ibCUcdsgTm8djH-s8WN_-az8wNzJb5heLVbJfe4jp8XMk-5jkqKBRzqxE9RKadXMP3nhE0bI4FLh5zSi1uv6-fcuxrrq4M6KPhseGaYfcZnnYb-P6hTbyD/s1600/extra-cov.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGKTCz529KGrgp9xyXZX1oZ9ibCUcdsgTm8djH-s8WN_-az8wNzJb5heLVbJfe4jp8XMk-5jkqKBRzqxE9RKadXMP3nhE0bI4FLh5zSi1uv6-fcuxrrq4M6KPhseGaYfcZnnYb-P6hTbyD/s640/extra-cov.png" width="640" height="190" /></a></div>
<p>
Raymarching the volume at full resolution is too expensive even for high end graphics cards. So as suggested by <a href="http://advances.realtimerendering.com/s2015/index.html">Real-time Volumetric Cloudscapes of Horizon: Zero Dawn</a> we reconstruct a full frame over 16 frames. I've found that to retain enough high frequency details of the clouds, we need a fairly high number of samples. We are currently using 256 steps when raymarching. We offset the starting position of the ray by a 4x4 Bayer matrix pattern to reduce banding artifacts that might appear due to undersampling. Mikkel Gjoel shared some great tips for banding reduction while presenting <a href="http://www.gdcvault.com/play/1023002/Low-Complexity-High-Fidelity-INSIDE">The Rendering Of Inside</a> and encouraged the use of blue noise to remove banding patterns. While this gives better results there is a nice advantage of using a 4x4 pattern here: since we are rendering interleaved pixels it means that when rendering one frame we are rendering all pixels with the same Bayer offset. This yields a significant improvement in cache coherency compared to using a random noise offset per pixel. We also use an animated offset which allows us to gather a few extra samples through time. We use a 1d Halton sequence of 8 values and instead of using 100% of the 16ᵗʰ frame we use something like 75% to absorb the Halton samples.
</p>
<p>
To re-project the cloud volume we try to find a good approximation of the cloud's world position. While raymarching we track a weighted sum of the absorption position and generate a motion vector from it.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkqTf8Fru90f-mHp2Qekew9tu9-BJikSN2eNXzBz4pG7nWZp6gbQJRhzD5c0GJWTkBNePxDhwZdlVeUD5mIiwgQ-KwXUpSEsOIFoe2kSpSPWHw4LMwjAGV_J9A6Tf6Tr0rAF_Cmq3QjIAp/s1600/world-pos.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkqTf8Fru90f-mHp2Qekew9tu9-BJikSN2eNXzBz4pG7nWZp6gbQJRhzD5c0GJWTkBNePxDhwZdlVeUD5mIiwgQ-KwXUpSEsOIFoe2kSpSPWHw4LMwjAGV_J9A6Tf6Tr0rAF_Cmq3QjIAp/s640/world-pos.png" width="640" height="316" /></a></div>
<p>
This allows us to reproject clouds with <em>some</em> degree of accuracy. Since we build one full resolution frame every 16ᵗʰ frame it’s important to track the samples as precisely as possible. This is especially true when the clouds are animated. Finding the right number of temporal samples you want to integrate over time is a compromise between getting a smoother signal for trackable pixels vs having a more noisy signal for invalidated pixels.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD8IZDmrjh0cvtjmii3Srtb3ds-v2pcer0a4XjAayhGXnQHOM4wHdeb47Rc1RTZoDhIVKkv8UWhf2UviknsARX97Q5FOFM4HYCY4EBvPmaQGRxGvrsX02I54w4HzKWR-eFK5GcuxgomFrj/s1600/reprojection.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD8IZDmrjh0cvtjmii3Srtb3ds-v2pcer0a4XjAayhGXnQHOM4wHdeb47Rc1RTZoDhIVKkv8UWhf2UviknsARX97Q5FOFM4HYCY4EBvPmaQGRxGvrsX02I54w4HzKWR-eFK5GcuxgomFrj/s1600/reprojection.gif" /></a></div>
<p></p>
<h3>Lighting</h3>
<p>
To light the volume we use the "Beer-Powder" term described by <a href="http://advances.realtimerendering.com/s2015/index.html">Real-time Volumetric Cloudscapes of Horizon: Zero Dawn</a>. It's a nice model since it simulates some of the out-scattering that occurs at the edges of the clouds. We discovered early on that it was going to be difficult to find terms that looked good for both close and distant clouds. So (for now anyways) a lot of the scattering and extinction coefficients are view dependent. This proved to be a useful way of building intuition for how each term affects the lighting of the clouds.
</p>
<p>
We also added the ambient term described by the <a href="http://patapom.com/topics/Revision2013/Revision%202013%20-%20Real-time%20Volumetric%20Rendering%20Course%20Notes.pdf">Real-time Volumetric Rendering Course Notes</a> which is very useful to add detail where all light is absorbed by the volume.
</p>
<p>
The ambient function described takes three parameters: sampling altitude, bottom color and top color. Instead of using constant values, we calculate these values by sampling the atmosphere at a few key locations. This means our ambient term is dynamic and will reflect the current state of the atmosphere. We use two pairs of samples perpendicular to the sun vector and average them to get the bottom and top ambient colors respectively.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikBvaBJTzn2wXCsklo3xe6d2kWCMPm_kU3aCNPdRRJ5BwDbtfQFMeN441D3pi-Tg69JQr7xX5LEoMjfoYRves3VcA3TiJQntY_6p__Lzv3ubBEorlAqZUr7YuEI8GIpWbTgAeBG_WsRFtX/s1600/ambient.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikBvaBJTzn2wXCsklo3xe6d2kWCMPm_kU3aCNPdRRJ5BwDbtfQFMeN441D3pi-Tg69JQr7xX5LEoMjfoYRves3VcA3TiJQntY_6p__Lzv3ubBEorlAqZUr7YuEI8GIpWbTgAeBG_WsRFtX/s640/ambient.png" width="640" height="190" /></a></div>
<p>
Since we already calculated an approximate absorption position for the reprojection, we use this position to change the absorption color based on the absorption altitude.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix5wZJWp7eHZAVk4-iBsMhHZy2OuvRrfbB9IEckJdOLbhbG7pwxkriwRHT6LAOiW0tYpHwAJB22c718M9os80JdZyPVoLQwW-GwLH99FI2OIkYNnggivN8lfm_smkaFyyRgUV8IEmcJ73E/s1600/dark-bottom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix5wZJWp7eHZAVk4-iBsMhHZy2OuvRrfbB9IEckJdOLbhbG7pwxkriwRHT6LAOiW0tYpHwAJB22c718M9os80JdZyPVoLQwW-GwLH99FI2OIkYNnggivN8lfm_smkaFyyRgUV8IEmcJ73E/s640/dark-bottom.png" width="640" height="190" /></a></div>
<p>
Finally, we can reduce the alpha term by a constant amount to skew the absorption color towards the overlayed atmospheric color. By default this is disabled but it can be interesting to create some very hazy skyscapes. If this hack is used, it's important to protect the scattering highlight colors somewhat.
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_2lkQX1nJRt4-nnfVEskWSev4UBe22Kw9_gsPzJx9ygIXZDzVZulQlbcqZqTvJltlVpNxcdZ8XSxKXv31xYxh5is3Cv-s53NJ7QpM5zEqKEAmZmDbMl5vZkrcShozSmBcGsyB2F_TpNox/s1600/blend.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_2lkQX1nJRt4-nnfVEskWSev4UBe22Kw9_gsPzJx9ygIXZDzVZulQlbcqZqTvJltlVpNxcdZ8XSxKXv31xYxh5is3Cv-s53NJ7QpM5zEqKEAmZmDbMl5vZkrcShozSmBcGsyB2F_TpNox/s640/blend.png" width="640" height="190" /></a></div>
<p></p>
<h3>Animation</h3>
<p>
The animation of the clouds consists of a 2d wind vector, a vertical draft amount and a weather system.
</p>
<p>
We dynamically calculate a 512x512 weather map which consists of 5 octaves of animated Perlin noise. We remap the noise value differently for each rgb component. This weather map is then sampled during the raymarch to update the coverage, cloud type and wetness terms of the current cloud sample. Right now we resample this weather term for each ray step but a possible optimization would be to sample the weather data and the start and end of the ray positions and interpolate these values at each step. All of the weather terms come in sunny/stormy pairs so that we can lerp them based in a probability of rain percentage. This allows the weather system to have storms coming in and out.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5dbianUxVc52QPBOleeV_qlXXBsjLGzjrORJSDkIvMCone47SDwMiRzhLCk2Vr2v8595T3bciMUxkydJqZt5YZgkR-amUBFUhn49oofN7C77K76xORC07fb5iRxlBgcUsMXpv8pf4ufNH/s1600/storm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5dbianUxVc52QPBOleeV_qlXXBsjLGzjrORJSDkIvMCone47SDwMiRzhLCk2Vr2v8595T3bciMUxkydJqZt5YZgkR-amUBFUhn49oofN7C77K76xORC07fb5iRxlBgcUsMXpv8pf4ufNH/s640/storm.png" width="640" height="316" /></a></div>
<p>
The wetness term is used to update a structure of terms which defines how the clouds look based on how much humidity they carry. This is a very expensive lerp which happens per ray march and should be reduced to the bare minimum (the raymarch is instruction bound so each removed lerp is a big win optimization wise). But for the current exploratory phase it’s proving useful to be able to tweak a lot of these terms individually.
</p>
<h3>Future work</h3>
<p>
I think that as hardware gets more powerful realtime cloudscape solutions will be used more and more. There is tons of work left to do in this area. It is absolutely fascinating, challenging and beautiful. I am personally interested in improving the sense of scale the rendered clouds can have. To do so, I feel that the key is to reveal more and more of the high frequency details that shape the clouds. I think smaller cloud features are key to put in perspective the larger cloud features around them. But extracting higher frequency details usually comes at the cost of increasing the sampling rate.
</p>
<p>
We also need to think of how to handle shadows and reflections. We've done some quick tests by updating a 512x512 opacity shadow map which seemed to work ok. Since it is not a view frustum dependent term we can absorb the cost of updating the map over a much longer period of time than 16 frames. Also, we could generate this map by taking fewer samples in a coarser representation of the clouds. The same approach would work for generating a global specular cubemap.
</p>
<p>
I hope we continue to see more awesome presentations at GDC and Siggraph in the coming years regarding this topic!
</p>
<h3>Links</h3>
<ul>
<li><a href="http://s2016.siggraph.org/courses/sessions/physically-based-shading-theory-and-practice">Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite</a></li>
<li><a href="http://gpupro.blogspot.ca/2015/01/gpu-pro-6-real-time-rendering-of.html">Real-time Rendering of Physics-Based Clouds using Precomputed Scattering</a></li>
<li><a href="http://advances.realtimerendering.com/s2015/index.html">Real-time Volumetric Cloudscapes of Horizon: Zero Dawn</a></li>
<li><a href="http://patapom.com/topics/Revision2013/Revision%202013%20-%20Real-time%20Volumetric%20Rendering%20Course%20Notes.pdf">Real-time Volumetric Rendering Course Notes</a></li>
<li><a href="http://www.markmark.net/clouds/index.html">Real-time Cloud Rendering</a></li>
<li><a href="http://reset-game.net/?p=284">In Praxis: Atmosphere</a></li>
<li><a href="http://nenes.eas.gatech.edu/Cloud/Clouds.pdf">Common Cloud Names, Shapes, and Altitudes</a></li>
<li><a href="https://www.shadertoy.com/view/XslGRr">"Clouds" by iq</a></li>
</ul>
Jphttp://www.blogger.com/profile/09637484103636420407noreply@blogger.com783tag:blogger.com,1999:blog-1994130783874175266.post-84866906932304541042016-04-01T22:46:00.003+02:002016-05-10T00:57:14.182+02:00The Poolroom<h1>
The Poolroom
</h1>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiH-dxZmt8Dx_1uzCbncG3UHQQY_akoF5MGs2SHJKH0LDRkp3o1ZyRv1RaAG-zr2NuyKDgaPG4cZWl1M3KgAEL-BT4IopRmYAhrjXyRFqhy6yqqZqnrOwPW4NIV4UTVzxlFxilmuDP2j-qn/s1600/Poolroom_final.PNG" imageanchor="1">
<img border="0" width: 100%; height: auto; max-width: 100%; align="middle" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiH-dxZmt8Dx_1uzCbncG3UHQQY_akoF5MGs2SHJKH0LDRkp3o1ZyRv1RaAG-zr2NuyKDgaPG4cZWl1M3KgAEL-BT4IopRmYAhrjXyRFqhy6yqqZqnrOwPW4NIV4UTVzxlFxilmuDP2j-qn/s640/Poolroom_final.PNG" />
</a>
<p style="clear: both;">
<i>Figure 1 : Poolroom Pool Table</i>
</p>
<p style="clear: both;">
The poolroom was my first attempt at creating a truly rich
environmental experience with Stingray.
Most architectural visualization scenes you see are
antiseptically clean and uncomfortably modern. I wanted to break away from that.
I wanted an environment I would feel at home with, not one that a movie star
would buy for sheer resale value to another movie star. I also wanted the
challenge of working with natural and texturally rich materials. Not white on
white, as is generally the case.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVlX4rmN3q4Wg3mg9rb44sT_5vapMxUWDof18kplIT83FHyKd4hNmj2AlPReQDQikt-O8ejjV8zbdEAFjVODkgxUSaUUlWwYjsqB7EaFhCR0_yiagIDyJ-lgoZvLEC-bVwY2y1OL6bVanz/s1600/Poolroom_final4.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVlX4rmN3q4Wg3mg9rb44sT_5vapMxUWDof18kplIT83FHyKd4hNmj2AlPReQDQikt-O8ejjV8zbdEAFjVODkgxUSaUUlWwYjsqB7EaFhCR0_yiagIDyJ-lgoZvLEC-bVwY2y1OL6bVanz/s640/Poolroom_final4.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure : Poolroom Clock</i>
</p>
<p style="clear: both;">
To this end, I started looking for cozy but luxurious spaces
on google and eventually came across a nice reference photo I could work with. Warm
rich woods, lots of games, a bar, and well... those all speak to me. For better
or worse, I felt this room was one I would personally feel comfortable in. So I
took on the challenge of re-creating that environment in 3D inside Stingray.
</p>
<h2 style="text-align: left;">
The challenges
</h2>
<hr />
<p style="clear: both;">
The poolroom gave me some major challenges. Some I knew
would be trouble from the start, but some I didn’t realize until I started
rendering lightmaps. Most of my difficulties came down to handling materials
properly.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8Q7tACbHbXD_ENWqiMze0NJNK6hQ_WIokgPm5Yg2iS4nJ5P1Qodl7MCtYUvwxfjyVCEdVRI_uU1rUNHy6HoSvMUlEnDq8GnTFpjCgLgfY4BeOhy3AXXNgGHARXot6VivQeaxQtiOllHf7/s1600/Poolroom_final3.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8Q7tACbHbXD_ENWqiMze0NJNK6hQ_WIokgPm5Yg2iS4nJ5P1Qodl7MCtYUvwxfjyVCEdVRI_uU1rUNHy6HoSvMUlEnDq8GnTFpjCgLgfY4BeOhy3AXXNgGHARXot6VivQeaxQtiOllHf7/s640/Poolroom_final3.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 3 : Poolroom Bar</i>
</p>
<h3>Coming to grips with physically based shaders</h3>
<p style="clear: both;">
In addition to being my first complete Arch-Viz scene in
Stingray, this was also my first real stab at using physically based shading
(PBS). Although physically based shading is similar in many regards to traditional
texturing, it has its own set of tricks and gotchas. I actually had to re-do
the scenes materials more than once as I learned the proper way to do things.
</p>
<p style="clear: both;">
For example, my scene was
predominantly dark woods. With dark woods, you really have to be sure you get
the albedo material in the correct luminosity range or you end up with
difficulties when you light the scene. In my first attempts, I found my light
being just eaten up by the darkness of the wood’s color map. I kept cranking up
the light Intensities, but this would flood the scene and lead to harsh and
broken light bakes.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQTR8fl7j4EasoOtB9hAVa9DTTXGvXWu97W5-y6ALH4c5agfGyH-kn11o51jOUuqdMff8c-delxYZ2t8mujxT3jTIA0g0PvEC1YyRn0Q4N1cmPdW90p2EVRORZRFRm0WRbsXPTHRoKr620/s1600/Poolroom_final2.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQTR8fl7j4EasoOtB9hAVa9DTTXGvXWu97W5-y6ALH4c5agfGyH-kn11o51jOUuqdMff8c-delxYZ2t8mujxT3jTIA0g0PvEC1YyRn0Q4N1cmPdW90p2EVRORZRFRm0WRbsXPTHRoKr620/s640/Poolroom_final2.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 4 : Arcade Game</i>
/p>
<p style="clear: both;">
Eventually, once I understood the effect of the color map’s
luminosity and got the values in line, I started getting great results with
normalized light intensities. My lighting began responding favorably with deep,
rich lightmap bakes. When you get the physical properties of the materials
right, Stingray’s light baker is both fast and very good. But I can’t stress
enough: with PBS, you must ensure that your luminosity values are accurate.
</p>
<h3>Reference photo was HDR</h3>
<p style="clear: both;">
When I was building out the scene and trying to mimic the
reference photo’s lighting, I realized that the original image was made using
some high-dynamic range techniques. I couldn’t seem to get the same level of
exposure and visual detail in the shadowed areas of my scene.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqi3-BcJAtyfTPgNcx_exeFdTAyAGC042b-BRiqVAAFub6fThc2O8sQSOyvhlcmyH50dlZyL7aU7werqj76X5Ik3c0AcwEh4GD3tapRWV8LZz8NJ3AyzZZqQotisrnfR-psyZrRfdwFKzV/s1600/NoAmbients.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqi3-BcJAtyfTPgNcx_exeFdTAyAGC042b-BRiqVAAFub6fThc2O8sQSOyvhlcmyH50dlZyL7aU7werqj76X5Ik3c0AcwEh4GD3tapRWV8LZz8NJ3AyzZZqQotisrnfR-psyZrRfdwFKzV/s640/NoAmbients.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 5 : Before Ambient Fills</i>
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Z9GNIOCjEz5WysA4hrGt-hG1sa0mUR99U8UZ2n71R0nd7g9Ir-5ieXrhgRlwfTMgqwNbdYVJFs-tId8qdH6PwkIA0DL5SKiWWQORpT3wfT_Pni9Zsg4qkT1Ad1nXFl0yZyHjn8gxaz6M/s640/Ambient.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Z9GNIOCjEz5WysA4hrGt-hG1sa0mUR99U8UZ2n71R0nd7g9Ir-5ieXrhgRlwfTMgqwNbdYVJFs-tId8qdH6PwkIA0DL5SKiWWQORpT3wfT_Pni9Zsg4qkT1Ad1nXFl0yZyHjn8gxaz6M/s640/Ambient.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<i>Figure 6 : After Ambient Fills</i>
<p style="clear: both;">
Because of this, I had to do
some pretty fun trickery with my scene lighting. In the end, I got it by placing
some subtle, non-shadow casting lights in key areas to bring up the brightness
a little in those areas.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_zG8u9aZjX2lskFbaAX3DtFfpqGI1RRnu77E2vo8yG_dNfMlliq8ECmq-J0upGd38vjPNpQX2j15Lcj3O0tyixUvCBoOmCxAhRJsJj7usoDnIJMLCtcIqssrfHYPB6EsyHzxaj3WGQ8q7/s1600/livedin.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_zG8u9aZjX2lskFbaAX3DtFfpqGI1RRnu77E2vo8yG_dNfMlliq8ECmq-J0upGd38vjPNpQX2j15Lcj3O0tyixUvCBoOmCxAhRJsJj7usoDnIJMLCtcIqssrfHYPB6EsyHzxaj3WGQ8q7/s640/livedin.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 6 : Soft Controlled Lighting</i>
</p>
<p style="clear: both;">
All in all, the scene took a lot of lighting work to get
just right. I have to say that I was very happy with how closely I was able to
match the lighting, given that the original photo was HDR.
</p>
<h3>Lived-in but not dirty</h3>
<p style="clear: both;">
The last big challenge was also related to materials. I had
to find that fine balance of a room that is clean and tidy but also obviously
lived-in. So often I find Arch-Viz work feels unnaturally smooth and clean, which
can destroy the belief of the space. I really wanted my scene to
break through the uncanny valley and feel real.
</p>
<p style="clear: both;">
I handled this mostly by creating some very simple grunge
maps, and applying them to the roughness maps using a simple custom shader.
This was easy to build in Stingray’s node-based shader graph:
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgGrus-5iPnzdvGzUBhuS10gSf-MJrORhh9E95gbdutm3vUohzo0bj-EQrLVnJGC8ocH7gaF7IT-SEdqZ1oFOwbVMYhHFvhX9mghuT5U4t6sUxC5hRSY7-K4hjLCRRIG4SERqun227vb_9/s1600/simplerma_shader_withgrunge.PNG" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgGrus-5iPnzdvGzUBhuS10gSf-MJrORhh9E95gbdutm3vUohzo0bj-EQrLVnJGC8ocH7gaF7IT-SEdqZ1oFOwbVMYhHFvhX9mghuT5U4t6sUxC5hRSY7-K4hjLCRRIG4SERqun227vb_9/s640/simplerma_shader_withgrunge.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 8 : Simple RMA style shader with tiling and grunge map with adjustment.</i>
</p>
<p style="clear: both;">
I have this shader set up so I can control the tiling of the
color map, normals and other textures. The grunge map, on the other hand, is
sampled using UV coordinates from the lightmap channel. This helps to hide the
tiling over large areas like the walls, because the grunge value that gets multiplied
in to the roughness is always different each time the other textures repeat.
</p>
<p style="clear: both;">
Balancing the grunge properly was the biggest challenge
here, but in the end, some still shots even get me doing a double-take. When
that happens, I know I’m doing well. I also posted progress along the way on my
Facebook page — when I had friends saying, “whoa, when can I come visit?” I
knew I was nailing it.
</p>
<h2>
3D modeling</h2>
<hr />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJmGfqNCc_ZT6GWSlPL7WwOYZR4DYhNIFYX1pFMpcqkVsskKK3wJCJnVD1XEc5vCNaG8_uhzwCA2KOPs9r_NEy_7r3H_ckrgdzKAkZ8-vxY99LeH4bgJvpVz0S8s3zrEL_O-ZdYIm-r61b/s1600/recordplayer.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJmGfqNCc_ZT6GWSlPL7WwOYZR4DYhNIFYX1pFMpcqkVsskKK3wJCJnVD1XEc5vCNaG8_uhzwCA2KOPs9r_NEy_7r3H_ckrgdzKAkZ8-vxY99LeH4bgJvpVz0S8s3zrEL_O-ZdYIm-r61b/s640/recordplayer.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 9 : Record Player Model in Maya LT</i>
</p>
<p style="clear: both;">
I don’t have much that’s
special to say about the 3D modeling process. I simply modeled all my assets
the same way anyone would. Attention to detail is really the trick, and making
sure that I created hand-made lightmap UVs for every object was critical to
ensure the best light baking. Otherwise it was just simple modeling.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhex6Qyx49dke07Tt4-3a_8g2MHl6bJ6mwKppG3iNJyaRuh7k_qh0X1D1-Vkx1Z-1r0pV90w9kjcoR8MCu8KO96Vd7k0h0trGiNjVTnBZ4XUH3wEDd_P_in_88b0op_vXHX-TgLx2kM9G9j/s1600/Capture.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhex6Qyx49dke07Tt4-3a_8g2MHl6bJ6mwKppG3iNJyaRuh7k_qh0X1D1-Vkx1Z-1r0pV90w9kjcoR8MCu8KO96Vd7k0h0trGiNjVTnBZ4XUH3wEDd_P_in_88b0op_vXHX-TgLx2kM9G9j/s640/Capture.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 10 : Poolroom Model in MayaLT</i>
</p>
<p style="clear: both;">
One thing to note, however, is that I only used 3D tools
that came with the Stingray package, except for Substance Designer and a little
Photoshop. I did the entire scene’s modeling in MayaLT. Sometimes people think
cheap is not good, but I believe this proves otherwise. MayaLT is incredible. I
am super happy with the results and speed at which you can work with it. Best
of all, it’s part of the package, so no additional costs.
</p>
<h2>Material design</h2>
<hr />
<p style="clear: both;">
Laying out the materials in
the scene was pretty straightforward for the most part. At one point, I
experimented with using more species of wood, but the different parts of the room
started to feel disconnected. I started removing materials from my list, and
eventually when I ended up with only a small handful the room came together as
you see it.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBFy_bRAXUNLmAfOlWLfdb7_rE8x05tsXA6p1gOZZrrQSlC9v3jFsfIiffnXeUaHoADBuaOYMhY6hR01cRXGE9ybNTBV1-nLoR-Uy4cWJSHMgPzLrSkbsPm1sHctFceVg9o85yAZZtucyQ/s1600/recordplayer3.PNG" imageanchor="1"><img border="0" height="350" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBFy_bRAXUNLmAfOlWLfdb7_rE8x05tsXA6p1gOZZrrQSlC9v3jFsfIiffnXeUaHoADBuaOYMhY6hR01cRXGE9ybNTBV1-nLoR-Uy4cWJSHMgPzLrSkbsPm1sHctFceVg9o85yAZZtucyQ/s640/recordplayer3.PNG" width="640" align="middle" /></a>
<p style="clear: both;">
<i>Figure 11 : Record Player Material Design in Substance</i>
</p>
<p style="clear: both;">
I guess something else I should mention is performance
shaders. Stingray comes with a great, flexible standard shader, but I wanted to
eke out every little bit of performance I could on this scene while keeping the
quality very high. Without much trouble, I created a library of my own purpose-built
shaders (like the one mentioned earlier). I used these for various tasks. Simple
colors, RMA (roughness-metallic-ambient occlusion), RMA-tiling shaders and a
few others came together really quickly. From this handful of shaders, I was
able to increase performance while simplifying my design process. I find it
comforting how Stingray deals with shaders… it is just very easy to iterate and
save a version. Much better usability than other systems I have tried.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguoRnTgzjXLi_vbZaHuRhE9ayOhQr7V6B_-C9E5gajHZAj1ElA_Cvdu57zCGk6Dj5YVSaNnbSjHWDOazrfYH3l-f6DCBSpzBW5TiSwMz6s6CsWGGAEV6-y2x6XPsMO0hzx0vvAfShWnJQc/s1600/shadergraphs.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguoRnTgzjXLi_vbZaHuRhE9ayOhQr7V6B_-C9E5gajHZAj1ElA_Cvdu57zCGk6Dj5YVSaNnbSjHWDOazrfYH3l-f6DCBSpzBW5TiSwMz6s6CsWGGAEV6-y2x6XPsMO0hzx0vvAfShWnJQc/s640/shadergraphs.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 12 : Shader Library</i>
</p>
<h2>Fun stuff</h2>
<hr />
<p style="clear: both;">
Well, most game dev is hard
work, the fun is at the end when you get to finally relax and see your efforts
paid off. But there were definitely some really fun parts of making the
poolroom.
</p>
<p style="clear: both;">
One was the clock. It’s a
small, almost easter-egg kind of thing, but I programmed the clock fully. Meaning,
its hands move, the pendulum swings, and it also rings the hour. So if you are
exploring the poolroom and it happens to be when the hour changes in your
system clock, the clock in the game rings the hour for you. So two o’clock
rings two times, four o’clock rings four
times, etc. The half-hour always strikes once. I modeled the clock after one
that my father gave me, so I put some extra love into it. It is basically
exactly the clock that hangs in my living room.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhap8ZqWwUS1Vv-aLM1ak_KOSGvFweKfaTrtgfmhkTEwWtq-GbhIPBMjgsMZp791eQZvn3gqjC4gxN1G6ALEVYe0Nj0HF-BWL26xsQ1yaJvorngGilDYv1uFk_QJlSSLCSLTxxwHWn9ulq/s1600/clock.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhap8ZqWwUS1Vv-aLM1ak_KOSGvFweKfaTrtgfmhkTEwWtq-GbhIPBMjgsMZp791eQZvn3gqjC4gxN1G6ALEVYe0Nj0HF-BWL26xsQ1yaJvorngGilDYv1uFk_QJlSSLCSLTxxwHWn9ulq/s640/clock.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 13 : Clock Model in MayaLT</i>
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYk__PdRpwiHqzKI_yeTyBZUkaO5XZfASj95dqAh4zry7ijwKjgLB-2CTY8QQI0t2pW81wBlgXQtdx3KtZ_kMhO1psNPxjvXQGL4lRrRc38150Kb2pJtbU0ZIIOq0BR2MDQ4vk5eW8llVc/s1600/clock2.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYk__PdRpwiHqzKI_yeTyBZUkaO5XZfASj95dqAh4zry7ijwKjgLB-2CTY8QQI0t2pW81wBlgXQtdx3KtZ_kMhO1psNPxjvXQGL4lRrRc38150Kb2pJtbU0ZIIOq0BR2MDQ4vk5eW8llVc/s640/clock2.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 14 : Clock Model in Stingray</i>
</p>
<p style="clear: both;">
I also gave the record player
some extra attention, because my good friend Mathew Harwood was kind enough to
do all the audio for the project. I felt the music really set the scene, and he
even worked on it over my twitch stream so we could get feedback from some
people who were watching. So yeah, press <b>+</b>
or <b>-</b> in the game to start and stop
the record player, complete with animated tone arm. Nothing super crazy, just a
nice little touch.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoqFJGvWVUzZatcr3xJ1Kq2vLLZxUI6IqE68ukYgT4ojIctJjllc7Jex0v6OmZBxgFzMzthFRgoSs3qBqDm-Uvcg_6rzb5mVUYL69GBxaCRpHlNb_bjHG2P4VDFCwRlCRV3X1owwa_SCZR/s1600/recordplayer3.PNG" imageanchor="1">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoqFJGvWVUzZatcr3xJ1Kq2vLLZxUI6IqE68ukYgT4ojIctJjllc7Jex0v6OmZBxgFzMzthFRgoSs3qBqDm-Uvcg_6rzb5mVUYL69GBxaCRpHlNb_bjHG2P4VDFCwRlCRV3X1owwa_SCZR/s640/recordplayer3.PNG" width: 100%; height: auto; max-width: 100%; align="middle" /></a>
<p style="clear: both;">
<i>Figure 15 : Record Player in Stingray</i>
</p>
<h2>Community effort</h2>
<hr />
<p style="clear: both;">
One thing I found really neat about this project was that I
streamed the entire creation process on my Twitch channel. I have never
streamed much before this project, but it made the process much more fun. I had
people to talk with, and often my viewers were helpful to me in suggesting
ideas and noticing things I had not noticed. It was very collaborative and a
great learning exercise for me and for my viewers. We got to learn from each
other, which is the dream!
</p>
<p style="clear: both;">
For example, the record player likely would not have been
done to the level I did it had one of my viewers not pushed me to make a really
detailed player. Because of this push, it ended up being a focus of the level,
and even has some animation and basic controls a user can interact with.
</p>
<p style="clear: both;">
Stop by my Twitch channel sometime at <a href="http://twitch.tv/paulkind3d" target="_blank">twitch.tv/paulkind3d </a>and
say hi, I’d love to meet you.
</p>Anonymoushttp://www.blogger.com/profile/02745555817482590720noreply@blogger.com134tag:blogger.com,1999:blog-1994130783874175266.post-77335192477528183332016-01-31T22:02:00.000+01:002016-08-09T21:27:49.948+02:00Hot Reloadable JavaScript, Batman!<p>JavaScript is my new favorite prototyping language. Not because the language itself is fantastic. I mean, it's not too bad. It actually has a lot of similarity to Lua, but it's hidden under a heavy layer of <a href="https://www.destroyallsoftware.com/talks/wat">WAT!?</a>, like:</p>
<ul>
<li>Browser incompatibilities!?</li><li>Semi-colons are optional, but you "should" put them there anyway!?</li><li>Propagation of <code>null</code>, <code>undefined</code> and <code>NaN</code> until they cause an error very far from where they originated!?</li><li>Weird type conversions!? <code>"0" == false</code>!?</li><li>Every function is also an object constructor!? <code>x = new add(5,7)</code>!?</li><li>Every function is also a method!?</li><li>You must check everything with <code>hasOwnProperty()</code> when iterating over objects!?</li></ul>
<p>But since Lua is a work of genius and beauty, being a half-assed version of Lua is still pretty good. You could do worse, as languages go.</p>
<p>And JavaScript is actually getting better. Browser compatibility is improving, automatic updates is a big factor in this. And if your goal is just to prototype and play, as opposed to building robust web applications, you can just pick your favorite browser, go with that and don't worry about compatibility. The ES6 standard also adds a lot of nice little improvements, like <code>let</code>, <code>const</code>, <code>class</code>, lexically scoped <code>this</code> (for arrow functions), etc.</p>
<p>But more than the language, the nice thing about JavaScript is that comes with a lot of the things you need to do interesting stuff -- a user interface, 2D and 3D drawing, a debugger, a console REPL, etc. And it's ubiquitous -- everybody has a web browser. If you do something interesting and want to show it to someone else, it is as easy as sending a link.</p>
<p>OK, so it doesn't have file system access (unless you run it through <a href="https://nodejs.org/en/">node.js</a>), but who cares? What's so fun about reading and writing files anyway? The 60's called, they want their programming textbooks back!</p>
<p>I mean in JavaScript I can quickly whip up a little demo scene, add some UI controls and then share it with a friend. That's more exciting. I'm sure someone will tell me that I can do that in Ruby too. I'm sure I could, if I found the right gems to install, picked what UI library I wanted to use and learned how to use that, found some suitable bundling tools that could package it up in an executable, preferably cross-platform. But I would probably run into some annoying and confusing error along the way and just give up.</p>
<p>With increasing age I have less and less patience for the <em>sysadmin</em> part of programming. Installing libraries. Making sure that the versions work together. Converting a <code>configure.sh</code> script to something that works with our build system. Solving <code>PATH</code> conflicts between multiple installed <code>cygwin</code> and <code>mingw</code> based toolchains. Learning the intricacies of some weird framework that will be gone in 18 months anyway. There is enough of that stuff that I <em>have to</em> deal with, just to do my job. I don't need any more. When I can avoid it, I do.</p>
<p>One thing I've noticed since I started to prototype in JavaScript is that since drawing and UI work is so simple to do, I've started to use programming for things that I previously would have done in other ways. For example, I no longer do graphs like this in a drawing program:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6DwqK7xM6S6weZ50ubB1GR_NCg9V2vR-1-BsqDbNWrvxQHh3VAggnw98bxOFmAPqBjrQnfPOlHilykFnkXDcIaU_XvUCclGEO7hjl94hVeGlnyjaVhd-Zl6IcMO8DP8Xs5AoCwoHKyq0/s1600/javascript-hot-reloading-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6DwqK7xM6S6weZ50ubB1GR_NCg9V2vR-1-BsqDbNWrvxQHh3VAggnw98bxOFmAPqBjrQnfPOlHilykFnkXDcIaU_XvUCclGEO7hjl94hVeGlnyjaVhd-Zl6IcMO8DP8Xs5AoCwoHKyq0/s1600/javascript-hot-reloading-1.png" width="600" /></a></div>
<p>Instead I write a little piece of JavaScript code that draws the graph on an HTML canvas (code here: <a href="https://jsbin.com/xurego/edit?js,output">pipeline.js</a>).</p>
<p>JavaScript canvas drawing cannot only replace traditional drawing programs, but also Visio (for process diagrams), Excel (graphs and charts), Photoshop and <a href="http://graphviz.org">Graphviz</a>. And it can do more advanced forms of visualization and styling, that are not possible in any of these programs.</p>
<p>For simple graphs, you could ask if this really saves any time in the long run, as compared to using a regular drawing program. My answer is: I don't know and I don't care. I think it is more important to do something interesting and fun with time than to save it. And for me, using drawing programs stopped being fun some time around when <a href="https://en.wikipedia.org/wiki/AppleWorks">ClarisWorks</a> was discontinued. If you ask me, so called "productivity software" has just become less and less productive since then. These days, I can't open a Word document without feeling my pulse racing. You can't even print the damned things without clicking through a security warning. Software PTSD. Programmers, we should be ashamed of ourselves. Thank god for <a href="https://daringfireball.net/projects/markdown/">Markdown</a>.</p>
<p>Another thing I've stopped using is slide show software. That was never any fun either. Keynote was at least tolerable, which is more than you can say about Powerpoint. Now I just use <a href="http://remarkjs.com/#1">Remark.js</a> instead and write my slides directly in HTML. I'm much happier and I've lost 10 pounds! Thank you, JavaScript!</p>
<p>But I think for my next slide deck, I'll write it directly in JavaScript instead of using Remark. That's more fun! Frameworks? I don't need no stinking frameworks! Then I can also finally solve the issue of auto-adapting between 16:9 and 4:3 so I don't have to letterbox my entire presentation when someone wants me to run it on a 1995 projector. Seriously, people!</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcdv5fdzrR3g1B0RGC5Xemy43yJA2nLSL34xQOv40Ijfnh91xhTVivwFckGHmapgXkbVbtVJr9UgWu7A7IthYsLaAi-UD3ORE9CyrHXj4OkvF-Zdey5jZfLDgAVuSDNWzf45V6_itWmMM/s1600/svga-connector.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcdv5fdzrR3g1B0RGC5Xemy43yJA2nLSL34xQOv40Ijfnh91xhTVivwFckGHmapgXkbVbtVJr9UgWu7A7IthYsLaAi-UD3ORE9CyrHXj4OkvF-Zdey5jZfLDgAVuSDNWzf45V6_itWmMM/s1600/svga-connector.png" /></a></div>
<p><em>This is not the connector you are looking for!</em></p>
<p>And I can put HTML 5 videos directly in my presentation, so I don't have to shut down my slide deck to open a video in a separate program. Have you noticed that this is something that almost every speaker does at big conferences? Because apparently they haven't succeeded in getting their million dollar <em>presentation</em> software to reliably <em>present</em> a video file! Software! Everything is broken!</p>
<p>Anyhoo... to get back off topic, one thing that surprised me a bit about JavaScript is that there doesn't seem to be a lot of interest in hot-reloading workflows. Online there is <a href="https://jsbin.com/?html,output">JSBin</a>, which is great, but not really practical for writing bigger things. If you start googling for something you can use offline, with your own favorite text editor, you don't find that much. This is a bit surprising, since JavaScript is a dynamic language -- hot reloading should be a hot topic.</p>
<p>There are some node modules that can do this, like <a href="https://www.npmjs.com/package/budo">budo</a>. But I'd like something that is small and hackable, that works instantly and doesn't require installing a bunch of frameworks. By now, you know how I feel about that.</p>
<p>After some experimentation I found that adding a script node dynamically to the DOM will cause the script to be evaluated. What is a bit surprising is that you can remove the script node immediately afterwards and everything will still work. The code will still run and update the JavaScript environment. Again, since this is only for my personal use I've not tested it on Internet Explorer 3.0, only on the browsers I play with on a daily basis, Safari and <a href="https://www.google.com/chrome/browser/canary.html">Chrome Canary</a>.</p>
<p>What this means is that we can write a <code>require</code> function for JavaScript like this:</p>
<pre><code class="lang-js">function require(s)
{
var script = document.createElement("script");
script.src = s + "?" + performance.now();
script.type = "text/javascript";
var head = document.getElementsByTagName("head")[0];
head.appendChild(script);
head.removeChild(script);
}
</code></pre>
<p>We can use this to load script files, which is kind of nice. It means we don't need a lot of <code><script></code> tags in the HTML file. We can just put one there for our main script, <code>index.js</code>, and then require in the other scripts we need from there.</p>
<p>Also note the deftly use of <code>+ "?" + performance.now()</code> to prevent the browser from caching the script files. That becomes important when we want to reload them.</p>
<p>Since for dynamic languuages, reloading a script is the same thing as running it, we can get automatic reloads by just calling <code>require</code> on our own script from a timer:</p>
<pre><code class="lang-javascript">function reload()
{
require("index.js");
render();
}
if (!window.has_reload) {
window.has_reload = true;
window.setInterval(reload, 250);
}
</code></pre>
<p>This reloads the script every 250 ms.</p>
<p>I use the <code>has_reload</code> flag on the window to ensure that I set the reload timer only the first time the file is run. Otherwise we would create more and more reload timers with every reload which in turn would cause even more reloads. If I had enough power in my laptop the resulting chain reaction would vaporize the universe in under three minutes. Sadly, since I don't all that will happen is that my fans will spin up a bit. Damnit, I need more power!</p>
<p>After each <code>reload()</code> I call my <code>render()</code> function to recreate the DOM, redraw the canvas, etc with the new code. That function might look something like this:</p>
<pre><code class="lang-js">function render()
{
var body = document.getElementsByTagName("body")[0];
while (body.hasChildNodes()) {
body.removeChild(body.lastChild);
}
var canvas = document.createElement("canvas");
canvas.width = 650;
canvas.height = 530;
var ctx = canvas.getContext("2d");
drawGraph(ctx);
body.appendChild(canvas);
}
</code></pre>
<p>Note that I start by removing all the DOM elements under <code><body></code>. Otherwise each reload would create more and more content. That's still linear growth, so it is better than the exponential chain reaction you can get from the reload timer. But linear growth of the DOM is still pretty bad.</p>
<p>You might think that reloading all the scripts and redrawing the DOM every 250 ms would create a horrible flickering display. But so far, for my little play projects, everything works smoothly in both Safari and Chrome. Glad to see that they are double buffering properly.</p>
<p>If you do run into problems with flickering you could try using the <a href="http://tonyfreed.com/blog/what_is_virtual_dom">Virtual DOM</a> method that is so popular with JavaScript UI frameworks these days. But try it without that first and see if you really need it, because ugh frameworks, amirite?</p>
<p>Obviously it would be better to reload only when the files actually change and not every 250 ms. But to do that you would need to do something like adding a file system watcher connected to a web socket that could send a message when a reload was needed. Things would start to get complicated, and I like it simple. So far, this works well enough for my purposes.</p>
<p>As a middle ground you could have a small bootstrap script for doing the reload:</p>
<pre><code class="lang-js">window.version = 23;
if (window.version != window.last_version) {
window.last_version = window.version;
reload();
}
</code></pre>
<p>You would reload this small bootstrap script every 250 ms. But it would only trigger a reload of the other scripts and a re-render when you change the version number. This avoids the reload spamming, but it also removes the immediate feedback loop -- change something and see the effect immediately which I think is <a href="https://vimeo.com/36579366">really important</a>.</p>
<p>As always with script reloads, you must be a bit careful with how you write your scripts to ensure thy work nicely with the reload feature. For example, if you write:</p>
<pre><code class="lang-js">class Rect
{
...
};
</code></pre>
<p>It works well in Safari, but Chrome Canary complains on the second reload that you are redefining a class. You can get around that by instead writing:</p>
<pre><code class="lang-js">var Rect = class {
</code></pre>
<p>Now Chrome doesn't complain anymore, because obviously you are allowed to change the content of a variable.</p>
<p>To preserve state across reloads, I just put the all the state in a global variable on the window:</p>
<pre><code class="lang-js">window.state = window.state || {}
</code></pre>
<p>The first time this is run, we get an empty state object, but on future reloads we keep the old state. The <code>render()</code> function uses the state to determine what to draw. For example, for a slide deck I would put the current slide number in the <code>state</code>, so that we stay on the same page after a reload.</p>
<p>Here is a GIF of the hot reloading in action. Note that the browser view changes as soon as I save the file in Atom:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9fTETns2xTUHwU6IfRSzEDayC0yh6anZA-d0lCDz1qG-VfEPZ-RV7qwb7Vni7MAofz9hHkDwgR7xkiAh9w9ph-EX6e6j-AAxfipGs7YEB7nw7T7FiH6841t5SwbUtoP7s3Jv8MBLoPd8/s1600/hot-reload.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9fTETns2xTUHwU6IfRSzEDayC0yh6anZA-d0lCDz1qG-VfEPZ-RV7qwb7Vni7MAofz9hHkDwgR7xkiAh9w9ph-EX6e6j-AAxfipGs7YEB7nw7T7FiH6841t5SwbUtoP7s3Jv8MBLoPd8/s1600/hot-reload.gif" width="600" /></a></div>
<p>(No psychoactive substances where consumed during the production of this blog post. Except caffeine. Maybe I should stop drinking coffee?)</p>
Niklashttp://www.blogger.com/profile/10055379994557504977noreply@blogger.com31tag:blogger.com,1999:blog-1994130783874175266.post-85011142584535208252016-01-29T23:00:00.002+01:002016-05-23T18:26:17.984+02:00Stingray Support -- Hello, I Am Someone Who Can Help<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizVPes33314ez59IPOPG95TxxZKn64gmLrIsIDv1cL6JlFkUJmRwjwIdvbEg8J-NwXkTulJ55m8PzzLRM1nE_9Dj5HHmuYQ9a-aJjZBe_yme0JR6iUwWi52ZqmCmRgd_NKipM4LNXhaog/s1600/canhelp.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="135" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizVPes33314ez59IPOPG95TxxZKn64gmLrIsIDv1cL6JlFkUJmRwjwIdvbEg8J-NwXkTulJ55m8PzzLRM1nE_9Dj5HHmuYQ9a-aJjZBe_yme0JR6iUwWi52ZqmCmRgd_NKipM4LNXhaog/s200/canhelp.png" width="200" /></a></div>
<h2>
<br />Hello, I am someone who can help. </h2>
<br />
<div class="MsoNormal">
Here at the Autodesk Games
team, we pride ourselves on supporting users of the Stingray game engine in the
best ways possible – so to start, let’s cover where you can find information!<o:p></o:p></div>
<h3>
</h3>
<div>
<br /></div>
<div>
<br /></div>
<h3>
General Information Here!</h3>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Games Solutions Learning Channel on YouTube:</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">This is a series of videos about Stingray by the Autodesk Learning
Team. They'll be updating the playlist with new videos over time. They're
pretty responsive to community requests on the videos, so feel free to log in
and comment if there's something specific you'd like to see.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">Check out the<span class="apple-converted-space"> </span></span><a href="https://www.youtube.com/user/autodeskgameshowtos/playlists?view=50&sort=dd&shelf_id=7" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">playlist on YouTube</span></a><span style="font-family: , serif; font-size: 10.5pt;">.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Autodesk Stingray Quick Start Series, with Josh from Digital
Tutors:</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">We enlisted the help from Digital Tutors to set up a video series
that runs through the major sections of Stingray so you can get up and running
quickly.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">Check out the<span class="apple-converted-space"> </span></span><a href="https://www.youtube.com/playlist?list=PL_6ApchKwjN9mPXtRhL5za_KINjqbWaGV" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">playlist on YouTube</span></a><span style="font-family: , serif; font-size: 10.5pt;">.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Autodesk Make Games learning site:</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">This is a site that we've made for people who are brand new to
making games. If you've never made a game before, or never touched complex 3D
tools or a game engine, this is a good place to start. We run you through
Concept Art and Design phases, 3D content creation, and then using a game
engine. We've also made a bunch of assets available to help brand new game
makers get started.<o:p></o:p></span></div>
<div style="background: white; margin-bottom: .0001pt; margin: 0in; mso-line-height-alt: 11.25pt;">
<a href="http://www.autodesk.com/MakeGames" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">www.autodesk.com/MakeGames</span></a><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Creative Market:</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">The Creative Market is a storefront where game makers can buy or
sell 3D content. We've got a page set up just for Stingray, and it includes
some free assets to help new game makers get started.<o:p></o:p></span></div>
<div style="background: white; margin-bottom: .0001pt; margin: 0in; mso-line-height-alt: 11.25pt;">
<a href="https://creativemarket.com/apps/stingray" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">https://creativemarket.com/apps/stingray</span></a><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Stingray Online Help</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">Here you'll find more getting started movies, how-to topics, and
references for the scripting and visual programming interfaces. We're working
hard to get you all the info you need, and we're really excited to hear your
feedback.<o:p></o:p></span></div>
<div style="background: white; margin-bottom: .0001pt; margin: 0in; mso-line-height-alt: 11.25pt;">
<a href="http://help.autodesk.com/view/Stingray/ENU/" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">http://help.autodesk.com/view/Stingray/ENU/</span></a><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<strong><u><span style="font-family: , serif; font-size: 10.5pt;">Forum Support Tutorial Channel on YouTube:</span></u></strong><span style="font-family: , serif; font-size: 10.5pt;"><o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">This is a series of videos that answers recurring forums
questions by the Autodesk Support Team. They'll be updating the
playlist with new videos over time. They're pretty responsive to community
requests on the videos, so feel free to log in and comment if there's something
specific you'd like to see.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<span style="font-family: , serif; font-size: 10.5pt;">Check out the<span class="apple-converted-space"> </span></span><a href="https://www.youtube.com/channel/UC0fIe6XV1PjilADTei9JMOA" target="_self"><span style="color: #1858a8; font-family: "frutigernextw04-regular" , "serif"; font-size: 10.5pt; text-decoration: none;">playlist on YouTube</span></a><span style="font-family: , serif; font-size: 10.5pt;">.<o:p></o:p></span></div>
<div style="background: white; line-height: 11.25pt; margin-bottom: .0001pt; margin: 0in;">
<br /></div>
<div class="MsoNormal">
You should also visit the Stingray Public Forums <a href="http://forums.autodesk.com/t5/stingray/bd-p/800">here</a>, as there is a
growing wealth of information and knowledge to search from.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtwRlTVFVIHyfsFkkWQhADOGDRdJO1jEyqllbmxVhP7l_jl7pp9p0-_p9Bm5GXdE_CAlj0ysrm7WzqEheWRPJlXYFd2Y_JdkdT1xUeH9ZaeQQCw7SUUmzjOu-xMX-db0TL4eoKDA4_F4I/s1600/help-me-help-you.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="184" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtwRlTVFVIHyfsFkkWQhADOGDRdJO1jEyqllbmxVhP7l_jl7pp9p0-_p9Bm5GXdE_CAlj0ysrm7WzqEheWRPJlXYFd2Y_JdkdT1xUeH9ZaeQQCw7SUUmzjOu-xMX-db0TL4eoKDA4_F4I/s320/help-me-help-you.gif" width="320" /></a></div>
<h3>
<br /><br />Let's Get Started</h3>
<div class="MsoNormal">
Let’s get started. Hi, I’m Dan, nice to meet you. I am super
happy to help you with any of your Stingray problems, issues, needs or general
questions! However, I’m going to need to ask you to HELP ME, HELP YOU!!<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
It’s not always apparent when a user asks for help just
exactly what that user is asking for. That being the case, here is some useful
information on how to ask for help and what to provide us so that we can help
you better and more quickly!</div>
<div class="MsoNormal">
</div>
<ul>
<li><span style="font-size: 7pt; font-stretch: normal; text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;">Make sure you are very clear on what your
specific problem is and describe it as best you can.</span></li>
<ul>
<li><span style="text-indent: -0.25in;">Include pictures or screen shots you may have</span></li>
</ul>
<li><span style="text-indent: -0.25in;">Tell us how you came to have this problem</span><span style="text-indent: -0.25in;"> </span></li>
<ul>
<li><span style="text-indent: -0.25in;">Give us detailed reproduction steps on how to
arrive at the issue you are seeing</span></li>
</ul>
<li><span style="text-indent: -0.25in;">Attach your log files!</span></li>
<ul>
<li><span style="text-indent: -0.25in;">They can be found here: C:\Users\”USERNAME”\AppData\Local\Autodesk\Stingray\Logs</span></li>
</ul>
<li><span style="text-indent: -0.25in;">Attach any file that is a specific problem (zip
it so it attaches to forum post)</span></li>
<li><span style="font-size: 7pt; font-stretch: normal; text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;">Make sure to let us know your system
specifications</span><span style="font-size: 7pt; font-stretch: normal; text-indent: -0.25in;"> </span></li>
<li><span style="font-size: 7pt; font-stretch: normal; text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;">Make sure to let us know what Stingray engine
version you are using</span></li>
</ul>
<br />
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1OT4UJI2Dc-EY1bHIZQHBH8CTg4awe6HPklrUOQUUKahnJTe0BX586cCSogSZZUkRn5Ge9eGJvIzeadkDx2ARHX_mklIfvPbqfZLfRzGI9R7eUDI8hdsxnbMKPvm1fzoHQQW78Avmvts/s1600/translate-large.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1OT4UJI2Dc-EY1bHIZQHBH8CTg4awe6HPklrUOQUUKahnJTe0BX586cCSogSZZUkRn5Ge9eGJvIzeadkDx2ARHX_mklIfvPbqfZLfRzGI9R7eUDI8hdsxnbMKPvm1fzoHQQW78Avmvts/s200/translate-large.jpg" width="200" /></a></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.0in; mso-add-space: auto; mso-list: l0 level2 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.0in; mso-add-space: auto; mso-list: l0 level2 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.0in; mso-add-space: auto; mso-list: l0 level2 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
On another note … traduire, traduzir, 翻<span style="font-family: "mingliu"; mso-bidi-font-family: MingLiU;">译</span>, Übersetzen,
þýða, переведите, <span style="font-family: "raavi" , "sans-serif";">ਅਨੁਵਾਦ</span>,
, and ... translate! We use English as our main support language, however,
these days – translate.google.com is really, really good! If English is not
your first language, please feel free to write your questions and issues in
your native language and we will translate it and get back to you. I often find
that it is easier to understand from a translation and this helps us get you
help just that much more quickly!<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
In Conclusion</h3>
<div class="MsoNormal">
So just to recap, make sure you are ready when you come to
ask us a question! Have your issue sorted out, how to reproduce it, what engine
version you are running, your system specs and attach your log files. This will
help us, help you, just that much faster and we can get you on your way to
making super awesome content in the Stingray game engine. Thanks!<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="background: white;">
<b><span style="color: #1f497d; font-family: "candara" , "sans-serif"; font-size: 10.0pt;">Dan Matlack</span></b><span style="color: #500050; font-family: "candara" , "sans-serif"; font-size: 10.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white;">
<i><span style="font-family: "candara" , "sans-serif"; font-size: 10.0pt;">Product Support Specialist –
Games Solutions<o:p></o:p></span></i></div>
<div class="MsoNormal" style="background: white;">
<b><span style="color: #7f7f7f; font-family: "candara" , "sans-serif"; font-size: 10.0pt;">Autodesk, Inc.<o:p></o:p></span></b></div>
<div class="MsoNormal" style="margin-left: 1.45pt; text-autospace: ideograph-other;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/07591556737927082834noreply@blogger.com11tag:blogger.com,1999:blog-1994130783874175266.post-36605457689018987952016-01-20T11:48:00.001+01:002016-08-09T21:23:48.192+02:00Introducing the Stingray Package Manager (spm)<p>The <em>Stingray Package Manager</em>, or <code>spm</code>, is a small Ruby program that is responsible for downloading specific versions of the external artifacts (libraries, sample projects, etc) that are needed to build the Stingray engine and tools. It's a small but important piece of what makes <em>one-button builds</em> possible.</p>
<p>By <em>one-button builds</em> I mean that it should be possible to build Stingray with a single console command and no human intervention. It should work for any version in the code history. It should build all tools, plugins and utilities that are part of the project (as well as meaningful subsets of those for faster builds). In addition, it should work for all target platforms, build configurations (<em>debug</em>, <em>development</em>, <em>release</em>) and options (enabling/disabling Steam, Oculus, AVX, etc).</p>
<p>Before you have experienced <em>one-button builds</em> it's easy to think: So what? What's the big deal? I can download a few libraries manually, change some environment variables when needed, open a few Visual Studio projects and build them. Sure, it is a little bit of work every now and then, but not too bad.</p>
<p>In fact, there are big advantages to having a one-button build system in place:</p>
<ul>
<li><p>New developers and anyone else interested in the code can dive right in and don't have to spend days trying to figure out how to compile the damned thing.</p>
</li><li><p>Build farms don't need as much baby sitting (of course build farms always need <em>some</em> baby sitting).</p>
</li><li><p>All developers build the engine in the same way, the results are repeatable and you don't get bugs from people building against the wrong libraries.</p>
</li><li><p>There is a known way to build any previous version of the engine, so you can fix bugs in old releases, do bisect tests to locate bad commits, etc.</p>
</li></ul>
<p>But more than these specific things, having one-button builds also gives you <em>one less thing to worry about</em>. As programmers we are always trying to fit too much stuff into our brains. We should just come to terms with the fact that as a species, we're really not smart enough to be doing this. That is why I think that simplicity is the most important virtue of programming. Any way we can find to reduce cognitive load and context switching will allow us to focus more on the problem at hand.</p>
<p>In addition to <code>spm</code> there are two other important parts of our one-button build system:</p>
<ul>
<li><p>The <code>cmake</code> configuration files for building the various targets.</p>
</li><li><p>A front-end ruby script (<code>make.rb</code>) that parses command-line parameters specifying which configuration to build and makes the appropriate calls to <code>spm</code> and <code>cmake</code>.</p>
</li></ul>
<p>But let's get back to <code>spm</code>. As I said at the top, the responsibility of <code>spm</code> is to download and install external artifacts that are needed by the build process. There are some things that are important:</p>
<ul>
<li><p>Exact versions of these artifacts should be specified so that building a specific version of the source (git hash) will always use the same exact artifacts and yield a predictable result.</p>
</li><li><p>Since some of these libraries are big, hundreds of megabytes, even when zipped (computers are a sadness), it is important not to download more than absolutely necessary for making the current build.</p>
</li><li><p>For the same reason we also need control over how we cache older versions of the artifacts. We don't want to evict them immediately, because then we have to download hundreds of megabytes every time we switch branch. But we don't want to keep all old versions either, because then we would pretty soon run out of space on small disks.</p>
</li></ul>
<p>The last two points are the reason why something like <code>git-lfs</code> doesn't solve this problem out-of-the box and some more sophisticated package management is needed.</p>
<p><code>spm</code> takes inspiration from popular package managers like <code>npm</code> and <code>gem</code> and offers a similar set of sub commands. <code>spm install</code> to install a package. <code>spm uninstall</code> to uninstall, etc. At it's heart, what <code>spm</code> does is a pretty simple operation:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRoaFPOmsd0rXrRPWRdoTepzxUiBqQtiYaKantO0hQ0GTQ5mAaC_dWrZC-Z3NqQaHfM9tQeh4C32YFb18PfBlJmBUs4E4yWjnvanp-sHYwntt4ZJvyx4trYeq06xt767ifP8e_jhR7FQs/s1600/spm-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRoaFPOmsd0rXrRPWRdoTepzxUiBqQtiYaKantO0hQ0GTQ5mAaC_dWrZC-Z3NqQaHfM9tQeh4C32YFb18PfBlJmBUs4E4yWjnvanp-sHYwntt4ZJvyx4trYeq06xt767ifP8e_jhR7FQs/s1600/spm-1.png" /></a></div>
<p>Upon request, <code>spm</code> downloads a specific artifact version (identified by a hash) from an artifact repository. We support multiple artifact repositories, such as S3, git and Artifactory. The artifact is unzipped and stored in a local library folder where it can be accessed by the build scripts. As specific artifact versions are activated and deactivated we move them in and out of the local artifact cache.</p>
<p>We don't use unique folder names for artifact versions. So the folder name of an artifact (e.g., <code>luajit-2.1.0-windows</code>) doesn't tell us the exact version (<code>y0dqqY640edvzOKu.QEE4Fjcwxc8FmlM</code>). <code>spm</code> keeps track of that in internal data structures.</p>
<p>There are advantages and disadvantages to this approach:</p>
<ul>
<li>We don't have to change the build scripts when we do minor fixes to a library, only the version hash used by <code>spm</code>.</li><li>We avoid ugly looking hashes in the folder names and don't have to invent our own version numbering scheme, in addition to the official one.</li><li>We can't see at a glance which specific library versions are installed without asking <code>spm</code>.</li><li>We can't have two versions of the same library installed simultaneously, since their names could collide, so we can't run parallel builds that use different library versions.</li><li>If library version names were unique we wouldn't even need the cache folder, we could just keep all the versions in the library folder.</li></ul>
<p>I'm not 100 % sure we have made the right choice, it might be better to enforce unique names. But it is not a huge deal so unless there is a big impetus for change we will stay on the current track.</p>
<p><code>spm</code> knows which versions of the artifacts to install by reading configuration files that are checked in as part of the source code. These configuration files are simple JSON files with entries like this:</p>
<pre><code class="lang-json">cmake = {
groups = ["cmake", "common"]
platforms = ["win64", "win32", "osx", "ios", "android", "ps4", "xb1", "webgl"]
lib = "cmake-3.4.0-r1"
version = "CZRgSJOqdzqVXey1IXLcswEuUkDtmwvd"
source = {
type = "s3"
bucket = "..."
access-key-id = "..."
secret-access-key = "..."
}
}
</code></pre>
<p>This specifies the name of the packet (<code>cmake</code>), the folder name to use for the install (<code>cmake-3.4.0-r1</code>), the version hash and how to retrieve it from the artifact repository (these source parameters can be re-used between different libraries).</p>
<p>To update a library, you simply upload the new version to the repository, modify the version hash and check in the updated configuration file.</p>
<p>The <code>platforms</code> parameter specifies which platforms this library is used on and <code>groups</code> is used to group packages together in meaningful ways that make <code>spm</code> easier to use. For example, there is an <code>engine</code> group that contains all the packages needed to build the engine runtime and a corresponding <code>editor</code> group for building the editor.</p>
<p>So if you want to install all libraries needed to build the engine on Xbox One, you would do:</p>
<pre><code>spm install-group -p xb1 engine
</code></pre><p>This will install only the libraries needed to build the engine for Xbox One and nothing else. For each library, <code>spm</code> will:</p>
<ul>
<li>If the library is already installed -- do nothing.</li><li>If the library is in the cache -- move it to the library folder.</li><li>Otherwise -- download it from the repository.</li></ul>
<p>Downloads are done in parallel, for maximum speed, with a nice command-line based progress report:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_xVove9CPDBoDa-R20GCIOHoIB5acUBQs1c-bDWc_OqhBGEYHCaR59CF0aFkcsK36-SxeK3an1XLUQBK8FCFyW8FAp1yakKGpYj_QZZDdR-MwHTeRg9twU8ilTWvN5hfy2_Hpicurwew/s1600/spm-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_xVove9CPDBoDa-R20GCIOHoIB5acUBQs1c-bDWc_OqhBGEYHCaR59CF0aFkcsK36-SxeK3an1XLUQBK8FCFyW8FAp1yakKGpYj_QZZDdR-MwHTeRg9twU8ilTWvN5hfy2_Hpicurwew/s1600/spm-2.png" /></a></div>
<p>The cache is a simple MRU cache that can be pruned either by time (throw away anything I haven't used in a month) or by size (trim the cache down to 10 GB, keeping only the most recently used stuff).</p>
<p>Of course, you usually never have even have to worry about calling <code>spm</code> directly, because <code>make.rb</code> will automatically call it for you with the right arguments, based on the build parameters you have specified to <code>make.rb</code>. It all happens behind the scene.</p>
<p>Even the <code>cmake</code> binary itself is installed by the <code>spm</code> system, so the build is able to bootstrap itself to some extent. Unfortunately, the bootstrap process is not 100 % complete -- there are still some things that you need to do manually before you can start using the one-button builds:</p>
<ul>
<li>Install Ruby (for running <code>spm.rb</code> and <code>make.rb</code>).</li><li>Specify the location of your library folder (with an <code>SR_LIB_DIR</code> environment variable).</li><li>Install a suitable version of Visual Studio and/or XCode.</li><li>Install the platform specific SDKs and toolchains for the platforms you want to target.</li></ul>
<p>I would like to get rid of all of this and have a zero-configuration bootstrap procedure. You sync the repository, give one command and bam -- you have everything you need.</p>
<p>But some of these things are a bit tricky. Without Ruby we need something else for the initial step that at least is capable of downloading and installing Ruby. We can't put restricted software in public repositories and it might be considered hostile to automatically run installers on the users' behalf. Also, some platform SDKs need to be installed globally and don't offer any way of switching quickly between different SDK versions, thwarting any attempt to support quick branch switching.</p>
<p>But we will continue to whittle away at these issues, taking the simplifications where we can find them.</p>
Niklashttp://www.blogger.com/profile/10055379994557504977noreply@blogger.com9tag:blogger.com,1999:blog-1994130783874175266.post-78638310678585008842015-12-18T20:36:00.000+01:002015-12-18T20:36:16.620+01:00Data Driven Rendering in Stingray<br />
We’re all familiar with the benefits that a data driven architecture brings to gameplay: code is decoupled from data, enabling live linking and rapid iteration. Placing new objects in the editor or modifying the speed of a character has an immediate effect on a live game instance. Really speeds up the development process as you fine tune scripts, gameplay and other content.<br />
<br />
What about graphics programming? It turns out that the same architecture and associated benefits apply to Stingray’s renderer.<br />
<br />
Just by modifying configuration files (albeit somewhat complex configuration files) we can implement new shader programs, post-processing effects and even different cascading shadow map implementations. All in real time, on a live game instance. Which is a big win for graphics programmers: try out new ideas, fine tune shaders all with real-time feedback. No more of that long edit/compile/run/debug cycle. And this applies to the entire rendering pipeline: everything from the object space to world space transforms to shadow casting and the final rendering pass is all exposed as config file data, not as C++ code as with traditional architectures.<br />
<br />
I gave a presentation on this topic a while back which has now found it’s way to our YouTube channel:<br />
<br />
<a href="https://www.youtube.com/channel/UC0fIe6XV1PjilADTei9JMOA">https://www.youtube.com/channel/UC0fIe6XV1PjilADTei9JMOA</a><br />
<br />
By the way, there’s a lot of other great Stingray content up there so please check it out! The renderer presentation can be found under “Stingray Render Config Tutorial.”<br />
<br />
The details as well as a PowerPoint can be found there. The code changes to add a trivial greyscale post-processing effect involve:<br />
<br />
<b>settings.ini: </b><br />
<br />
The render_config variable points to the renderer.config file. Settings.ini also provides a section to override default settings found in the next file, renderer.render_config<br />
<br />
<b>core/stingray_renderer/renderer.render_config:</b><br />
<br />
Points to our shader libraries, text files containing actual shader programs. A section called global_resources allocates graphics buffers, such as scratch buffers for the cascading shadow maps and G-buffers for deferred rendering along with the main framebuffer. And most of the actual rendering is invoked in the resource_generators section. Again, more details in the YouTube video though a surprising amount can be learned just by grepping through the various config files and playing with the settings. Which is easy to do since it’s all data driven!<br />
<br />
<b>core/stingray_renderer/shader_libraries/development.shader_source:</b><br />
<br />
One of several shader libraries. While shader code can be entered as text here, Stingray also provides a graphical node-based shader editor. And we support ShaderFX materials from Max or Maya. It’s often easier (and more portable) to implement shaders graphically.<br />
<br />
But whatever method you choose to implement shaders in, the key point is that Stingray's entire rendering pipeline is fully accessible through configuration files. With our data driven architecture making complex rendering changes, while still non-trivial, is a whole lot faster and easier (and portable!) than working with platform-specific C++ code.<br />
<div>
<br /></div>
Ben Moweryhttp://www.blogger.com/profile/12579246590295341299noreply@blogger.com69