It’s not only AMD that is working on Vulkan/SPIR-V support for machine learning / AI software but NVIDIA has been working on improvements too for enhancing Vulkan-powered machine learning software. The outlook for using Vulkan within machine learning software is quite positive and even able to offer similar performance to NVIDIA’s prized CUDA.
At the Vulkanised 2025 conference last month in Cambridge (UK), Jeff Bolz of NVIDIA presented on Vulkan machine learning and the work they’ve been engaged on particularly around the VK_NV_cooperative_matrix2 extension. VK_NV_cooperative_matrix2 adds extra features on the cooperative matrix types introduced by VK_KHR_cooperative_matrix for allowing accelerated features beyond just basic GEMM kernels.
The presentation included a look at the Vulkan code path with Llama.cpp, including the benefits to the VK_NV_cooperative_matrix2 extension that is part of the NVIDIA 575 driver series. The Llama.cpp benchmarks on NVIDIA hardware with the Vulkan back-end is becoming very competitive with CUDA, particularly when making use of the Cooperative Matrix 2 support:
Very nice results and in some cases the Vulkan performance even exceeds the NVIDIA CUDA back-end with Llama.cpp! And this Vulkan path works on all NVIDIA GeForce RTX GPUs, but make sure you are running the Vulkan beta driver or a NVIDIA 575 drive release when available to enjoy VK_NV_cooperative_matrix2.
Those interested can find the NVIDIA Vulkan AI presentation embedded above as well as the PDF slide deck for all the details on this great Vulkanised 2025 session by Jeff Bolz.