CEP 46 - The __cuda_arch virtual package
| Title | The __cuda_arch virtual package |
| Status | Accepted |
| Author(s) | Daniel Ching <dching@nvidia.com> |
| Created | Mar 13, 2026 |
| Updated | May 20, 2026 |
| Discussion | https://github.com/conda/ceps/pull/157 |
| Implementation | https://github.com/conda-incubator/nvidia-virtual-packages |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.
Abstract
This CEP standardizes the __cuda_arch virtual package, which exposes the minimum CUDA compute
capability of detected NVIDIA devices. This extends the virtual package framework defined in
CEP 30.
Motivation
The __cuda virtual package (defined in CEP 30) exposes the maximum CUDA version supported by the
installed GPU driver. However, the driver version is a distinct property from the compute capability
of the GPU hardware itself. Compute capability is a hardware property that determines which
instructions and features a GPU supports, and it does not change with driver updates.
The __cuda_arch virtual package addresses use cases that __cuda cannot:
- Declaring minimum compute capability requirements: Packages that require specific GPU
instructions (e.g. tensor cores, FP8 support) can declare a minimum
__cuda_archversion as a dependency, ensuring they are only installed on compatible hardware. - Distributing per-architecture variants: A package may be built and distributed as multiple
variants targeting different SASS (Streaming ASSembler) or PTX (Parallel Thread Execution)
targets. The solver can select the correct variant for the detected GPU hardware using the
__cuda_archvirtual package.
Specification
Implementing the __cuda_arch virtual package is RECOMMENDED. If a conda-compatible client
chooses to implement the __cuda_arch virtual package, it MUST follow these specifications:
The __cuda_arch virtual package MUST be absent when the __cuda virtual package is
absent.
When present, the version value MUST be set to the lowest compute capability of all CUDA
devices detected on the system, formatted as {major}.{minor}; subarchitecture letters
(e.g. a, f) are excluded. The build string MUST be 0.
The __cuda_arch virtual package MUST be present when a CUDA device is detected EXCEPT when
CONDA_OVERRIDE_CUDA_ARCH is set as described below.
For systems without CUDA devices (e.g. a driver is installed but no devices are present),
the virtual package MUST be absent EXCEPT when CONDA_OVERRIDE_CUDA_ARCH is set as
described below.
If the CONDA_OVERRIDE_CUDA_ARCH environment variable is set to a non-empty value that can
be parsed as a compute capability string, the __cuda_arch virtual package MUST be exposed
with that version with the build string set to 0 EXCEPT when the __cuda virtual package
is absent as described above.
If the CONDA_OVERRIDE_CUDA_ARCH environment variable is set to the empty string, the
__cuda_arch virtual package MUST be absent.
Rationale
There is no mechanism by which a conda package may express multiple versions simultaneously.
Therefore, it is not possible for a single virtual package to express multiple unique
compute capabilities in a multi-device system. Therefore, the __cuda_arch virtual package
is set to the minimum compute capability among all detected devices rather than the
maximum. A package that declares a minimum __cuda_arch requirement must run correctly on
every GPU in the system. Using the minimum ensures the solver only selects packages that are
compatible with the least-capable device present. Adding PTX (Parallel Thread Execution)
code to a CUDA binary enables forward compatibility with new architectures; there is no
mechanism to provide arbitrary backward compatibility for a CUDA binary. If the maximum
compute capability were used instead, a package could be installed that runs on the newest
GPU but fails on another older GPU in the same system.
Alternatives
Providing multiple virtual packages (one for each major compute capability) or one for the minimum and maximum compute capability on the system would provide more information to the solver, but would be more difficult to work when defining constraints in conda recipes.
Rejected Ideas
Setting the build string to the product name of the device whose compute capability is being reported was rejected. The build string is now always 0 in order to avoid discussions about how the device product name should be represented as a string. Additionally, we don't want people to use the device name as a constraint; they cannot do this if the build string does not contain that information.
Appending 'a', 'f' to the version to represent arch- and family- specific features. It is
improper to include 'a', 'f' in the version because these indicate extended instruction sets
not the device architecture itself. __cuda_arch answers the question: "What sm is
available?" Therefore 'a' , 'f' are redundant because any device which is 100 supports 100a,
100f instructions.
Backwards Compatibility
Adding the __cuda_arch virtual package is backwards compatible. It does not effect
preexisting packages in the ecosystem. Like __archspec, using this virtual package in a
conda recipe is opt-in. However, once a package depends on __cuda_arch it is not
installable by clients who do not have the __cuda_arch virtual package implemented because
the absence of __cuda_arch is equivalent to declaring that there are no CUDA-capable
devices installed on the system.
Notes on usage
When building packages that target specific GPU compute capabilities, package authors SHOULD
always include PTX (Parallel Thread Execution) code for the highest targeted compute
capability. PTX is forward-compatible, meaning it can be JIT-compiled for newer GPU
generations that were not available/targeted at build time. Package authors SHOULD NOT leave
compatibility gaps between the lowest and highest targeted compute capabilities.
Family-specific instruction sets such as those for compute capability 9.0a or 12.0f are
not forward-compatible and MUST NOT be used as the sole binary target.
References
- nvidia-virtual-packages implementation
- CUDA compute capability documentation
- CUDA Driver API:
cuDeviceGetAttribute() - CUDA Driver API:
cuDriverGetVersion() - Virtual packages framework (CEP 30)
Copyright
All CEPs are explicitly CC0 1.0 Universal.