Namespace Hybridizer.Runtime.CUDAImports
Classes
AbstractNativeMarshaler
Base class for NativeMarshaler
AllocatableTypeAttribute
Types that can be newed on the device using hybridizer::allocate<T>()
blockDim
Static class to guide work distribution on device for the Thread part : Dimension (maps on vector unit index for the vectorized flavors)
blockDimX64
Static class to guide work distribution on device for the Thread part : Dimension (maps on vector unit index for the vectorized flavors)
blockIdx
Static class to guide work distribution on device for the Block part : Index
blockIdxX64
Static class to guide work distribution on device for the Block part : Index
BuiltinMethodAttribute
LLVM as input only With this attribute, no code will be generated Provided .Net method will be used instead in generated assembly
ConstAttribute
LLVM as input only Mark a parameter as const
cublas
cuBLAS mapping
cuda
CUDA runtime API wrapper
CUDADynamicParallelismAttribute
EntryPoint can be called from device function, spawning dynamic parallelism
CUDAIntrinsics
CUDA intrinsics
CudaMarshaler
Marshaler to CUDA device memory
Usage example: \begin{lstlisting}[style=customcs]
[DllImport("{cuda DLL name}.dll", EntryPoint = "{EntryPointName}_ExternCWrapper_CUDA", CallingConvention = CallingConvention.Cdecl)] private static extern int methodName( int gridDimX, int gridDimY, int blockDimX, int blockDimY, int blockDimZ, int shared, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(CudaMarshaler))] TypeToBemarshaled param);
\end{lstlisting}
CudaMarshalerTimer
internal
cufft
cufft wrapper Complete documentation here
curand
curand mapping Full documentation here
DontVectorizeAttribute
dot no vectorize
DoubleResidentArray
A resident array of double precision real number
DoubleZeroCopyResidentArray
A resident array of double precision real numbers, allocated using zero-copy Complete documentation here
EntryPointAttribute
Entry point method called from host and executed on device
FieldTools
FloatResidentArray
A resident Array of float 32 elements
FloatZeroCopyResidentArray
A resident array of float 32 elements, allocated using zero-copy Zero-copy documentation here
gridDim
Static class to guide work distribution on device for the Block part : Dimension
gridDimX64
Static class to guide work distribution on device for the Block part : Dimension
HeapifyLocalsAttribute
HybMath
HybridArithmeticFunctionAttribute
Arithmetic function : no memory operations and no branches Allows aggressive optimizations
HybridCompletionDescriptionAttribute
HybridConstantAttribute
Defines a constant value
HybridHybOpFunctionAttribute
internal
HybridizerExtension
HybridizerExtension.CUDA
HybridizerExtension.CUDA.BlockDim
HybridizerExtension.CUDA.GridDim
HybridizerExtension.KEPLER
HybridizerExtension.KEPLER.BlockDim
HybridizerExtension.KEPLER.GridDim
HybridizerForceIgnoreAttribute
HybridizerIgnoreAttribute
Do not hybridize Likely because C# code contains non supported code constructs
HybridizerNativeFieldProxyAttribute
Use to wrap a constant array
HybridizerTemplatizeArgumentAttribute
HybridJavaImplementationAttribute
HybridLambdaIdentifierAttribute
HVL only Lambdas must have a unique identifier to be used in HVL
HybridMappedJavaTypeAttribute
HybridNakedFunctionAttribute
Naked function : no memory operation Allows optimizations
HybridRegisterTemplateAttribute
type to specialize template concept
HybridTemplateConceptAttribute
Register class as a template concept
HybridVectorizerAttribute
force vectorization of method parameter
HybRunner
This class allows to call hybridized methods without explicitly declaring native methods Usage: int res = HybRunner.Cuda(target).Distrib(gridDimX, blockDimX).methodName(args);
HybRunnerDefaultSatelliteNameAttribute
Assembly attribute providing name of generated satellite library
ICustomMarshalledSize
size of marshalled structure
IntPtrExtension
IntResidentArray
A resident array of int32 elements
IntrinsicAttribute
internal - base type for all intrinsics attribute IntrinsicFunctionAttribute IntrinsicTypeAttribute
IntrinsicConstantAttribute
Compile time constant
IntrinsicFunctionAttribute
Functions marked as intrinsic -- user shall provide an implementation
IntrinsicIncludeAttribute
Force include of some native header
IntrinsicIncludeCUDAAttribute
intrinsic include -- CUDA specific IntrinsicIncludeAttribute
IntrinsicIncludeOMPAttribute
intrinsic include -- OMP specific IntrinsicIncludeAttribute
IntrinsicIncludePhiAttribute
intrinsic include -- AVX specific IntrinsicIncludeAttribute
IntrinsicPrimitiveAttribute
IntrinsicTypeAttribute
Types marked as intrinsic -- user shall provide an implementation
JavaRandom
Simple random class (host)
JavaRuntime
helper functions
KernelAttribute
A function running on the device
KernelInteropTools
KnownReturnTypeAttribute
LaunchBoundsAttribute
Launch bounds provided to global function Hints to compiler to optimize register pressure Complete documentation here
LinuxKernelInteropTools
LLVMVectorIntrinsics
MainMemoryMarshaler
Marshaler to main memory - to be used for OMP and AVX flavors
Usage example:
\begin{lstlisting}[style=customcs]
[DllImport("{DLL name}.dll", EntryPoint = "{EntryPointName}", CallingConvention = CallingConvention.Cdecl)] private static extern int methodName ( [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(MainMemoryMarshaler))] TypeToBemarshaled param);
\end{lstlisting}
NativeDlls
wrapper class listing generated native dlls
NativeImportHeaderAttribute
NativeImportSymbolAttribute
LLVM as input only Native symbol shoud use provided .Net method
nvblas
factory class
OptixShaderAttribute
Kernel is an optix shader - special treatment required https://docs.nvidia.com/gameworks/content/gameworkslibrary/optix/optix_v4_0.htm
Parallel2D
Similar to Parallel class in System.Threading.Tasks.Parallel
ReadOnlyAttribute
ResidentArrayGeneric<T>
ResidentArrayHostAttribute
ReturnTypeInferenceAttribute
hint for return type vectorization
ReturnTypeInferrenceAttribute
obsolete ReturnTypeInferenceAttribute
SafeDictionary<TKey, TValue>
SerialVectorizeAttribute
Fallback when vectorization fails Method signature is vectorized, but implementation is serial
SharedMemoryAttribute
internal
SingleStaticAssignmentAttribute
Variables of this type can only be assigned once (at their declaration) The developer is responsible to ensure it's actually the case
StackAllocBehaviorAttribute
specifies stack allocation behavior
ThreadedCudaMarshaler
Cuda marshaler, threaded
threadIdx
Static class to guide work distribution on device for the Thread part : Index (maps on vector unit index for the vectorized flavors)
threadIdxX64
Static class to guide work distribution on device for the Thread part : Index (maps on vector unit index for the vectorized flavors)
TypeIdAttribute
Use this attribute to customize the typeId of one type
VectorUnit
Win32KernelInteropTools
WriteOnlyAttribute
Structs
AlignedAllocation
Aligned memory allocation helper
alignedindex
An index, aligned to 32 -- also representing the next 32 indices
alignedstorage_double
A DoubleResidentArray with an underlying pointer aligned to 32 Allows memory load/store optimization
alignedstorage_double_zerocopy
A alignedstorage_double using zero-copy
alignedstorage_float
A FloatResidentArray with underlying memory aligned to 32
alignedstorage_int
A IntResidentArray with memory aligned to 32
AtomicExpr
bool2
2 booleans
bool4
4 booleans
bool8
8 booleans
char2
two signed bytes
char4
four signed bytes
char8
8 signed bytes
coalesced_group
A group representing the current set of converged threads in a warp. The size of the group is not guaranteed and it may return a group of only one thread (itself). This group exposes warp-synchronous builtins.
Coalesced<T>
internal
cooperative_groups
global functions from cooperative_groups.h header
cublasHandle_t
Opaque pointer holding the cuBLAS library context Complete documentation on Nvidia documentation
cuComplex
complex single-precision
cuda.cudaDeviceProp_100
cudaArray_t
CUDA Array
cudaChannelFormatDesc
CUDA Channel format descriptor
cudaDeviceProp
CUDA device properties Complete documentation here
cudaEvent_t
CUDA event types
cudaExtent
CUDA extent
cudaFuncAttributes
CUDA function attributes
cudaIpcEventHandle_t
CUDA IPC event handle
cudaIpcMemHandle_t
CUDA IPC memory handle
cudaMemcpy3DParms
CUDA 3D memory copying parameters
cudaMemcpy3DPeerParms
CUDA 3D cross-device memory copying parameters
cudaMipmappedArray_const_t
CUDA mipmapped array (as source argument)
cudaMipmappedArray_t
CUDA mipmapped array
cudaPitchedPtr
CUDA Pitched memory pointer
cudaPos
CUDA 3D position
cudaResourceDesc
CUDA resource descriptor
cudaResourceViewDesc
CUDA resource view descriptor
CudaRuntimeProperties
cudaStream_t
CUDA stream
cudaStreamCallback_t
Type of stream callback functions.
cudaSurfaceObject_t
An opaque value that represents a CUDA Surface object
cudaTextureDesc
CUDA texture descriptor
cudaTextureObject_t
An opaque value that represents a CUDA texture object
cuDoubleComplex
complex double-precision
cufftHandle
Inner structure to carry handle.
curand.curandGenerator_t
dim3
dimension structure.
double2
2 64 bits floating point elements, packed
double4
4 64 bits floating points elements
FieldTools.FieldDeclaration
float2
2 32 bits float, packed
float3
3 32 bits floating points elements, packed
float4
4 32 bits floats
float8
8 32 bits floats
grid_group
half
Half (16 bits) precision floating point type
half2
half8
HandleDelegate
Handle for delegate serialized as structs
HybridizerProperties
Hybridizer properties at runtime
int2
2 32 bits integers
int4
4 integers, packed
int8
8 32 bits integers
long2
2 64 bits integers
NativeArrayIndexer<T>
SharedMemoryAllocator<T>
structure to allocate shared memory in device code
short2
2 26 bits integers
short4
4 16 bits integers
short8
8 16 bits integers
size_t
$size_t$ type has different bit-size storage depending on architecture.
StackArray<T>
An array on stack
thread_block
Every GPU kernel is executed by a grid of thread blocks, and threads within each block are guaranteed to reside on the same streaming multiprocessor. A thread_block represents a thread block whose dimensions are not known until runtime.
thread_block_tile_1
thread_block_tile_16
thread_block_tile_2
thread_block_tile_32
thread_block_tile_4
thread_block_tile_8
thread_group
A handle to a group of threads. The handle is only accessible to members of the group it represents.
uchar4
four unsigned signed bytes
VectorizerMask
INTERNAL TYPE
Interfaces
cuda.ICuda
interface wrapping all cuda versions
cuda.ICudaMarshalling
ICuda simplified for marshalling only
curand.ICurand
ICustomMarshalled
custom marshaler
IHybCustomMarshaler
custom marshaler
IKernelInteropTools
INVBLAS
Complete documentation on NVidia documentation
IResidentArray
Resident array -- user must manually control memory location
IResidentData
Resident data interface
Enums
ConstantLocation
constant memory location HybridConstantAttribute
cublasAtomicsMode_t
Indicates whether cuBLAS routines which has an alternate implementation using atomics can be used Complete documentation on Nvidia documentation
cublasDiagType_t
Indicates whether the main diagonal of the dense matrix is unity and consequently should not be touched or modified by the function Complete documentation on Nvidia documentation
cublasFillMode_t
The type indicates which part (lower or upper) of the dense matrix was filled and consequently should be used by the function. Complete documentation on Nvidia documentation
cublasGemmAlgo_t
An enumerant to specify the algorithm for matrix-matrix multiplication Complete documentation on Nvidia documentation
cublasMath_t
cublasMath_t enumerate type is used in cublasSetMathMode to choose whether or not to use Tensor Core operations in the library by setting the math mode to either CUBLAS_TENSOR_OP_MATH or CUBLAS_DEFAULT_MATH. Complete documentation on Nvidia documentation
cublasOperation_t
Indicates which operation needs to be performed with the dense matrix. Complete documentation on Nvidia documentation
cublasPointerMode_t
Indicates whether the scalar values are passed by reference on the host or device Complete documentation on Nvidia documentation
cublasSideMode_t
Indicates whether the dense matrix is on the left or right side in the matrix equation solved by a particular function Complete documentation on Nvidia documentation
cublasStatus_t
The type is used for function status returns Complete documentation on Nvidia documentation
cuda.VERBOSITY
cudaChannelFormatKind
Channel format kind
cudaDataType_t
The cudaDataType_t type is an enumerant to specify the data precision. It is used when the data reference does not carry the type itself (e.g void *) Complete documentation on Nvidia documentation
cudaDeviceAttr
CUDA device attributes
cudaDeviceP2PAttr
CUDA device P2P attributes
cudaError_t
cuda error codes https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gf599e5b8b829ce7db0f5216928f6ecb6 https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
cudaEventFlags
cuda event flags
cudaFuncCache
CUDA function cache configurations
cudaGetDevicePointerFlags
get device pointer flags
cudaGraphicsRegisterFlags
CUDA Graphics register flags
cudaHostAllocFlags
host allocation flags
cudaLimit
CUDA Limits
cudaMallocArrayFlags
array allocation flags
cudaMemAttach
cuda memory attach
cudaMemcpyKind
Defines the way in which copy is done
cudaMemmoryAdvise
CUDA Memory Advise values
cudaResourceType
CUDA resource types
cudaResourceViewFormat
CUDA texture resource view formats
cudaSharedMemConfig
CUDA shared memory configuration
cudaTextureAddressMode
CUDA texture address modes
cudaTextureFilterMode
CUDA texture filter modes
cudaTextureReadMode
CUDA texture read modes
cufftCompatibility
cufftResult
cufftType
curand.curandOrdering_t
curand.curandRngType_t
curand.curandStatus_t
curand.VERSION
deviceFlags
CUDA device flags
FieldTools.FieldDeclaration.FieldDeclarationType
FieldTools.FieldTypeEnum
GL_TEXTURE_MODE
texture modes for opengl
HybridizerFlavor
Supported flavors
libraryPropertyType_t
The libraryPropertyType_t is used as a parameter to specify which property is requested when using the routine cublasGetProperty Complete documentation on Nvidia documentation
ResidentArrayStatus
Memory status of resident array IResidentArray
StackAllocBehaviorEnum
Stack allocation behavior
VectorizerIntrinsicReturn
vectorization hint for return type