This allows apps to destroy the window and renderer in either order, but
makes sure that the renderer can properly clean up its resources while OpenGL
contexts and libraries are still loaded, etc.
If the window is destroyed first, the renderer is (mostly) destroyed but its
pointer remains valid. Attempts to use the renderer will return an error,
but it can still be explicitly destroyed, at which time the struct is free'd.
If the renderer is destroyed first, everything works as before, and a new
renderer can still be created on the existing window.
Fixes#9540.
Previously, each backend would allocate and free the renderer struct. Now
the higher level does it, so the backends only manage their private resources.
This removes some boilerplate and avoids some potential accidents.
Previously there were two different paths for renderer creation, and the HDR metadata initialization was missing when creating a software renderer for a surface. Now all the cases are handled in a single path, so regardless of whether you create a software renderer by name, a software renderer for a surface, or fall back to a software renderer, you'll get the correct initialization in all cases.
Fixes https://github.com/libsdl-org/SDL/issues/9221
* Make sure to always write pointSize in VS (fixes validation error in testsprite)
* Fix validation error from acquiring swapchain semaphore more than once
* Fix validation error from using incorrect framebuffer size in testautomation
Now passes testautomation with validation.
* Vulkan Renderer - implement YcBcCr using VK_KHR_sampler_ycbcr_conversion. This simplifies the shader code and will also allow support for additional formats that we don't yet support (such as SDL_PIXELFORMAT_P010 for ffmpeg). The renderer now queries for VK_KHR_sampler_ycbcr_conversion and dependent extensions and will enable it if it's present. It reimplements YUV/NV12 texture support using this extension (these formats are no longer supported if the extension is not present). For each YUV/NV12 texture, a VkSamplerYcbcrConversion object is created from SDL_Colorspace parameters. This is passed to the VkImageView and also an additional sampler is created for Ycbcr. Instead of using 1-3 textures, the shaders now all use 1 texture with a combined image sampler (required by the extension). Further, when using Ycbcr, a separate descriptor set layout is baked with the Ycbcr sampler as an immutable sampler. The code to copy the images now copies to the individual image planes. The swizzling between formats is handled in the VkSamplerYcbcrConversion object.
The VULKAN_InvalidateCachedState() function seems to be meant to
invalidate any _cached_ state, i.e. global state of the API which may
have been modified outside the renderer.
However, at the moment, the Vulkan renderer also resets a number of
internal variables which track buffers, offsets, etc, in use. As a
result, the renderer can get into an inconsistant state and/or lose
data.
For example, if VULKAN_InvalidateCachedState() is called in between two
calls to VULKAN_UpdateVertexBuffer(), the data from the first call will
be overwritten by that from the second, as the number of the next vertex
buffer to use will be reset to 0. This can result in rendering errors,
as the same vertex data is used incorrectly for several calls.
By no longer resetting this 'internal' state here, those glitches
disappear. However, I haven't tested this with any applications which
mix the Vulkan renderer with their own Vulkan code (do any such
applications exist?), so this may be insufficient in case a full flush
of the renderer state -- and possibly a wait on the appropriate fence
-- could be required.
Signed-off-by: David Gow <david@ingeniumdigital.com>
When selecting a memory type, there are some property flags we need
(e.g., VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, without which we cannot
vkMapMemory), and others we'd simply prefer (e.g.,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, which may have a performance
impact, but otherwise shouldn't be required).
By specifying these separately, we can fall back to a memory type which
doesn't have everything we want, but which should still work, rather
than giving up.
Signed-off-by: David Gow <david@ingeniumdigital.com>
VULKAN_FindMemoryTypeIndex() tries first to get a perfectly matching
memory type, then falls back to selecting any memory type which overlaps
with the requested flags.
However, some of the flags requested are actually required, so if -- for
example -- we request a memory type with
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, but get one without it, all future
calls to vkMapMemory() will fail with:
```
vkMapMemory(): Mapping memory without VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set. Memory has type 0 which has properties VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT. The Vulkan spec states:
memory must have been created with a memory type that reports VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
```
(This occurs, for instance, on the totally non-conformant hasvk driver
for Intel Haswell integrated GPUs, which otherwise works fine.)
Instead, make sure that any memory type found has a superset of the
requested flags, so it'll always be appropriate.
Of course, this makes it _less_ likely for a memory type to be found, so
it does make #9130 worse in some ways. See the next patch for details.
Signed-off-by: David Gow <david@ingeniumdigital.com>
When enabling the Vulkan validation layers, the 'validationLayerName'
variable technically went out of scope before vkCreateInstance() was
called. While most compilers won't clean up stack variables after random
'if' statements, some will, particularly when optimisation or memory
sanitizers are enabled.
This can lead to vkCreateInstance() segfaulting when
SDL_HINT_RENDER_VULKAN_DEBUG is enabled.
Instead, make the validationLayerName visible throughout the entire
VULKAN_CreateDeviceResources() function.
While we're at it, extract the validation layer name out into a
preprocessor #define, so that we are definitely using the same name in
VULKAN_ValidationLayersFound().
Signed-off-by: David Gow <david@ingeniumdigital.com>
This pull request adds an implementation of a Vulkan Render backend to SDL. I have so far tested this primarily on Windows, but also smoke tested on Linux and macOS (MoltenVK). I have not tried it yet on Android, but it should be usable there as well (sans any bugs I missed). This began as a port of the SDL Direct3D12 Renderer, which is the closest thing to Vulkan as existed in the SDL codebase. The shaders are more or less identical (with the only differences being in descriptor bindings vs root descriptors). The shaders are built using the HLSL frontend of glslang.
Everything in the code is pure Vulkan 1.0 (no extensions), with the exception of HDR support which requires the Vulkan instance extension `VK_EXT_swapchain_colorspace`. The code could have been simplified considerably if I used dynamic rendering, push descriptors, extended dynamic state, and other modern Vulkan-isms, but I felt it was more important to make the code as vanilla Vulkan as possible so that it would run on any Vulkan implementation.
The main differences with the Direct3D12 renderer are:
* Having to manage renderpasses for performing clears. There is likely some optimization that would still remain for more efficient use of TBDR hardware where there might be some unnecessary load/stores, but it does attempt to do clears using renderpasses.
* Constant buffer data couldn't be directly updated in the command buffer since I didn't want to rely on push descriptors, so there is a persistently mapped buffer with increasing offset per swapchain image where CB data gets written.
* Many more resources are dependent on the swapchain resizing due to i.e. Vulkan requiring the VkFramebuffer to reference the VkImageView of the swapchain, so there is a bit more code around handling that than was necessary in D3D12.
* For NV12/NV21 textures, rather than there being plane data in the texture itself, the UV data is placed in a separate `VkImage`/`VkImageView`.
I've verified that `testcolorspace` works with both sRGB and HDR linear. I've tested `testoverlay` works with the various YUV/NV12/NV21 formats. I've tested `testsprite`. I've checked that window resizing and swapchain out-of-date handling when minimizing are working. I've run through `testautomation` with the render tests. I also have run several of the tests with Vulkan validation and synchronization validation. Surely I will have missed some things, but I think it's in a good state to be merged and build out from here.
This better reflects how HDR content is actually used, e.g. most content is in the SDR range, with specular highlights and bright details beyond the SDR range, in the HDR headroom.
This more closely matches how HDR is handled on Apple platforms, as EDR.
This also greatly simplifies application code which no longer has to think about color scaling. SDR content is rendered at the appropriate brightness automatically, and HDR content is scaled to the correct range for the display HDR headroom.
Renamed the following property define names to have a type suffix to
match other property names.
SDL_PROP_TEXTURE_OPENGL_TEXTURE_TARGET (number)
SDL_PROP_TEXTURE_OPENGLES2_TEXTURE_TARGET (number)
SDL_PROP_WINDOW_CREATE_WAYLAND_SCALE_TO_DISPLAY (boolean)
SDL_PROP_WINDOW_RENDERER (pointer)
SDL_PROP_WINDOW_TEXTUREDATA (pointer)
Eventually we can re-add a fast path for that data down to the individual renderers. Setting color scale would still require converting to float, and most hardware accelerated renderers prefer to consume colors as float, so this requires some thought and performance testing.
Fixes https://github.com/libsdl-org/SDL/issues/9009
On some systems creating the entire set of available pipeline states is very time consuming. We'll only use a few of them in any given program, so we'll just create them on demand.
Fixes https://github.com/libsdl-org/SDL/issues/7634
The renderer will always use the sRGB colorspace for drawing, and will default to the sRGB output colorspace. If you want blending in linear space and HDR support, you can select the scRGB output colorspace, which is supported by the direct3d11 and direct3d12
You can't do blending directly in PQ space, which means you have to create a scene render target in linear space and use shaders to convert PQ texture data to linear, etc. All of this is out of scope for the SDL 2D renderer at the moment.
Testing: Modified testgeometry to clear the background to 0.5 and then changed the triangle color to 0.5, and verified that they were the same color when using the D3D11 renderer.
This allows color operations to happen in linear space between sRGB input and sRGB output. This is currently supported on the direct3d11, direct3d12 and opengl renderers.
This is a good resource on blending in linear space vs sRGB space:
https://blog.johnnovak.net/2016/09/21/what-every-coder-should-know-about-gamma/
Also added testcolorspace to verify colorspace changes
The drawing uses the origin of the viewport as the coordinate origin, so we only need to clip against the size of the viewport.
Also added a unit test to catch this case in the future
SDL window size, state, and position functions have been considered immediate, with their effects assuming to have taken effect upon successful return of the function. However, several windowing systems handle these requests asynchronously, resulting in the functions blocking until the changes have taken effect, potentially for long periods of time. Additionally, some windowing systems treat these as requests, and can potentially deny or fulfill the request in a manner differently than the application expects, such as not allowing a window to be positioned or sized beyond desktop borders, prohibiting fullscreen, and so on.
With these changes, applications can make requests of the window manager that do not block, with the understanding that an associated event will be sent if the request is fulfilled. Currently, size, position, maximize, minimize, and fullscreen calls are handled as asynchronous requests, with events being returned if the request is honored. If the application requires that the change take effect immediately, it can call the new SDL_SyncWindow function, which will attempt to block until the request is fulfilled, or some arbitrary timeout period elapses, the duration of which depends not only on the windowing system, but on the operation requested as well (e.g. a 100ms timeout is fine for most X11 events, but maximizing a window can take considerably longer for some reason). There is also a new hint 'SDL_VIDEO_SYNC_ALL_WINDOW_OPS' that will mimic the old behavior by synchronizing after every window operation with, again, the understanding that using this may result in the associated calls blocking for a relatively long period.
The deferred model also results in the window size and position getters not reporting false coordinates anymore, as they only forward what the window manager reports vs allowing applications to set arbitrary values, and fullscreen enter/leave events that were initiated via the window manager update the window state appropriately, where they didn't before.
Care was taken to ensure that order of operations is maintained, and that requests are not ignored or dropped. This does require some implicit internal synchronization in the various backends if many requests are made in a short period, as some state and behavior depends on other bits of state that need to be known at that particular point in time, but this isn't something that typical applications will hit, unless they are sending a lot of window state in a short time as the tests do.
The automated tests developed to test the previous behavior also resulted in previously undefined behavior being defined and normalized across platforms, particularly when it comes to the sizing and positioning of windows when they are in a fixed-size state, such as maximized or fullscreen. Size and position requests made when the window is not in a movable or resizable state will be deferred until it can be applied, so no requests are lost. These changes fix another long-standing issue with renderers recreating maximized windows, where the original non-maximized size was lost, resulting in the window being restored to the wrong size. All automated video tests pass across all platforms.
Overall, the "make a request/get an event" model better reflects how most windowing systems work, and some backends avoid spending significant time blocking while waiting for operations to complete.
At the earliest place, immediatly after driverdata is set.
(Doing it in SDL_render.c, after creation, would be too late, because there're renderers that already use/change those values in the CreateRender() function).
This means the allocator's caller doesn't need to use SDL_OutOfMemory directly
if the allocation fails.
This applies to the usual allocators: SDL_malloc, SDL_calloc, SDL_realloc
(all of these regardless of if the app supplied a custom allocator or we're
using system malloc() or an internal copy of dlmalloc under the hood),
SDL_aligned_alloc, SDL_small_alloc, SDL_strdup, SDL_asprintf, SDL_wcsdup...
probably others. If it returns something you can pass to SDL_free, it should
work.
The caller might still need to use SDL_OutOfMemory if something that wasn't
SDL allocated the memory: operator new in C++ code, Objective-C's alloc
message, win32 GlobalAlloc, etc.
Fixes#8642.
This uses the same `SDL_VerbNoun` format as the rest of SDL3, and also
adds stronger effort to invalidate cached state in the backend, so cooperation
improves with apps that are using lowlevel rendering APIs directly.
Fixes#367.
Otherwise, a massive line might generate gigabytes worth of points to render,
which the backend would simply throw away anyhow.
Fixes#8113.
(cherry picked from commit 4339647d90)
The following objects now have properties that can be user modified:
* SDL_AudioStream
* SDL_Gamepad
* SDL_Joystick
* SDL_RWops
* SDL_Renderer
* SDL_Sensor
* SDL_Surface
* SDL_Texture
* SDL_Window
Also switched the D3D11 and D3D12 renderers to use real NV12 textures for NV12 data.
The combination of these two changes allows us to implement 0-copy video decode and playback for D3D11 in testffmpeg without any access to the renderer internals.
Battle for Wesnoth apparently relies on being able to disable rendering
of UI elements by setting the clip rectangle to be empty.
Resolves: https://github.com/libsdl-org/SDL/issues/6896
Fixes: 00f05dcf "render: only enable clipping when the rectangle is valid"
Signed-off-by: Simon McVittie <smcv@collabora.com>
On some system like MacBook Pro Intel with AMD card, asking for the default device will always return the AMD GPU.
This is not an issue for 99% of the case when the renderer context is here to provide the maximum performance level like for game.
However, for video application using GPU for 1 quad and 1 texture, using the discrete GPU for that lead to an important power consumption (4 to 8W), heat increase, and fan noise.
With this patch, I successfully amend ffplay to only use the integrated GPU (i.e. the Intel one), instead of the discrete GPU (i.e. the AMD one).
Also, for some reason ID3D11DeviceContext_OMGetRenderTargets() was failing in the second read pixels call in the "testautomation --filter render_testViewport" test.
We already know the target view, so just use that.