Skip to content

Prepend install lib dir to PATH on Windows so cuDNN sub-DLLs resolve#1465

Open
TroyHernandez wants to merge 1 commit into
mlverse:mainfrom
cornball-ai:fix/windows-cudnn-dll-search-path
Open

Prepend install lib dir to PATH on Windows so cuDNN sub-DLLs resolve#1465
TroyHernandez wants to merge 1 commit into
mlverse:mainfrom
cornball-ai:fix/windows-cudnn-dll-search-path

Conversation

@TroyHernandez

Copy link
Copy Markdown
Contributor

Addresses the Windows cuDNN DLL load failure tracked in #1287 (the Could not locate cudnn_graph64_9.dll reports there).

Problem

On Windows CUDA installs, cuDNN-backed ops fail at runtime even though every cuDNN 9 DLL is present in <install>/torch/lib:

library(torch)
x <- torch_randn(1, 3, 16, 16, device = "cuda")
w <- torch_randn(4, 3, 3, 3, device = "cuda")
nnf_conv2d(x, w)
#> Could not locate cudnn_graph64_9.dll. Please make sure it is in your library path!
#> Invalid handle. Cannot load symbol cudnnCreate

cuda_is_available() is TRUE and non-cuDNN CUDA ops (e.g. torch_mm) work. This lines up with the diagnosis in #1287 that the loader isn't finding torch/lib.

Cause

cuDNN 9 is split across several DLLs. lanternLoadLibrary loads lantern.dll with LoadLibraryEx(..., LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR | LOAD_LIBRARY_SEARCH_DEFAULT_DIRS), which resolves the static dependency chain (libtorch, cudnn64_9.dll) from the install lib dir. But cudnn64_9.dll then lazily loads cudnn_graph64_9.dll by name, and that load does not find the install lib dir unless it is on PATH.

Fix

Prepend the install lib dir to PATH at load on Windows, so cuDNN's by-name sub-DLL loads resolve. This mirrors the existing load_cudatoolkit_libs() handling, which already does the same Sys.setenv(PATH = ...) for the separate cudatoolkit-package case. The prepend is idempotent, so repeated library()/reload calls don't grow PATH.

Verification

Prepending torch/lib to PATH resolves the failure on torch 0.17.0 / libtorch 2.8.0+cu126, Windows 10, R 4.6.0 and R-devel: nnf_conv2d and a full torch_scaled_dot_product_attention forward then run and return correct results on GPU.

Refs #1287

cuDNN 9 is split across several DLLs. cuDNN's lazy load of its sub-DLLs
(e.g. cudnn_graph64_9.dll) does not find the install lib dir unless it is on
PATH, so cuDNN-backed CUDA ops fail with 'Could not locate cudnn_graph64_9.dll'
even though the DLL sits in torch/lib. Prepend the dir (once) at load,
mirroring the existing load_cudatoolkit_libs() PATH handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant