Namespace Hybridizer.Runtime.CUDAImports

[DllImport("{cuda DLL name}.dll", EntryPoint = "{EntryPointName}_ExternCWrapper_CUDA", CallingConvention = CallingConvention.Cdecl)] private static extern int methodName( int gridDimX, int gridDimY, int blockDimX, int blockDimY, int blockDimZ, int shared, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(CudaMarshaler))] TypeToBemarshaled param);

\end{lstlisting}

CudaMarshalerTimer

internal

cufft

cufft wrapper Complete documentation here

curand

curand mapping Full documentation here

DontVectorizeAttribute

dot no vectorize

DoubleResidentArray

A resident array of double precision real number

DoubleZeroCopyResidentArray

A resident array of double precision real numbers, allocated using zero-copy Complete documentation here

EntryPointAttribute

Entry point method called from host and executed on device

FieldTools

FloatResidentArray

A resident Array of float 32 elements

FloatZeroCopyResidentArray

A resident array of float 32 elements, allocated using zero-copy Zero-copy documentation here

gridDim

Static class to guide work distribution on device for the Block part : Dimension

gridDimX64

Static class to guide work distribution on device for the Block part : Dimension

HeapifyLocalsAttribute

HybMath

HybridArithmeticFunctionAttribute

Arithmetic function : no memory operations and no branches Allows aggressive optimizations

HybridCompletionDescriptionAttribute

HybridConstantAttribute

Defines a constant value

HybridHybOpFunctionAttribute

internal

HybridizerExtension

HybridizerExtension.CUDA

HybridizerExtension.CUDA.BlockDim

HybridizerExtension.CUDA.GridDim

HybridizerExtension.KEPLER

HybridizerExtension.KEPLER.BlockDim

HybridizerExtension.KEPLER.GridDim

HybridizerForceIgnoreAttribute

HybridizerIgnoreAttribute

Do not hybridize Likely because C# code contains non supported code constructs

HybridizerNativeFieldProxyAttribute

Use to wrap a constant array public static int[] aa = { 1, 2 }; [HybridizerNativeFieldProxy("aa")] public static int* a;

HybridizerTemplatizeArgumentAttribute

HybridJavaImplementationAttribute

HybridLambdaIdentifierAttribute

HVL only Lambdas must have a unique identifier to be used in HVL

HybridMappedJavaTypeAttribute

HybridNakedFunctionAttribute

Naked function : no memory operation Allows optimizations

HybridRegisterTemplateAttribute

type to specialize template concept

HybridTemplateConceptAttribute

HybridVectorizerAttribute

force vectorization of method parameter

HybRunner

This class allows to call hybridized methods without explicitly declaring native methods Usage: int res = HybRunner.Cuda(target).Distrib(gridDimX, blockDimX).methodName(args);

HybRunnerDefaultSatelliteNameAttribute

Assembly attribute providing name of generated satellite library

ICustomMarshalledSize

size of marshalled structure

IntPtrExtension

IntResidentArray

A resident array of int32 elements

IntrinsicAttribute

internal - base type for all intrinsics attribute IntrinsicFunctionAttribute IntrinsicTypeAttribute

IntrinsicConstantAttribute

Compile time constant

IntrinsicFunctionAttribute

Functions marked as intrinsic -- user shall provide an implementation

IntrinsicIncludeAttribute

Force include of some native header

IntrinsicIncludeCUDAAttribute

intrinsic include -- CUDA specific IntrinsicIncludeAttribute

IntrinsicIncludeOMPAttribute

intrinsic include -- OMP specific IntrinsicIncludeAttribute

IntrinsicIncludePhiAttribute

intrinsic include -- AVX specific IntrinsicIncludeAttribute

IntrinsicPrimitiveAttribute

IntrinsicTypeAttribute

Types marked as intrinsic -- user shall provide an implementation

JavaRandom

Simple random class (host)

JavaRuntime

helper functions

KernelAttribute

A function running on the device

KernelInteropTools

KnownReturnTypeAttribute

LaunchBoundsAttribute

Launch bounds provided to global function Hints to compiler to optimize register pressure Complete documentation here

LinuxKernelInteropTools

LLVMVectorIntrinsics

MainMemoryMarshaler

Marshaler to main memory - to be used for OMP and AVX flavors

Usage example:

\begin{lstlisting}[style=customcs]

[DllImport("{DLL name}.dll", EntryPoint = "{EntryPointName}", CallingConvention = CallingConvention.Cdecl)] private static extern int methodName ( [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(MainMemoryMarshaler))] TypeToBemarshaled param);

\end{lstlisting}

NativeDlls

wrapper class listing generated native dlls

NativeImportHeaderAttribute

NativeImportSymbolAttribute

LLVM as input only Native symbol shoud use provided .Net method

nvblas

factory class

OptixShaderAttribute

Kernel is an optix shader - special treatment required https://docs.nvidia.com/gameworks/content/gameworkslibrary/optix/optix_v4_0.htm

Parallel2D

Similar to Parallel class in System.Threading.Tasks.Parallel

ReadOnlyAttribute

ResidentArrayGeneric<T>

ResidentArrayHostAttribute

ReturnTypeInferenceAttribute

hint for return type vectorization

ReturnTypeInferrenceAttribute

obsolete ReturnTypeInferenceAttribute

SafeDictionary<TKey, TValue>

SerialVectorizeAttribute

Fallback when vectorization fails Method signature is vectorized, but implementation is serial

SharedMemoryAttribute

internal

SingleStaticAssignmentAttribute

Variables of this type can only be assigned once (at their declaration) The developer is responsible to ensure it's actually the case

StackAllocBehaviorAttribute

specifies stack allocation behavior

ThreadedCudaMarshaler

Cuda marshaler, threaded

threadIdx

Static class to guide work distribution on device for the Thread part : Index (maps on vector unit index for the vectorized flavors)

threadIdxX64

Static class to guide work distribution on device for the Thread part : Index (maps on vector unit index for the vectorized flavors)

TypeIdAttribute

Use this attribute to customize the typeId of one type

VectorUnit

Win32KernelInteropTools

WriteOnlyAttribute

Structs

AlignedAllocation

Aligned memory allocation helper

alignedindex

An index, aligned to 32 -- also representing the next 32 indices 0, 1, 2, ... 31 64, 65, 66, ... 95 Allows memory load/store optimization

alignedstorage_double

A DoubleResidentArray with an underlying pointer aligned to 32 Allows memory load/store optimization

alignedstorage_double_zerocopy

A alignedstorage_double using zero-copy

alignedstorage_float

A FloatResidentArray with underlying memory aligned to 32

alignedstorage_int

A IntResidentArray with memory aligned to 32

AtomicExpr

bool2

2 booleans

bool4

4 booleans

bool8

8 booleans

char2

two signed bytes

char4

four signed bytes

char8

8 signed bytes

coalesced_group

A group representing the current set of converged threads in a warp. The size of the group is not guaranteed and it may return a group of only one thread (itself). This group exposes warp-synchronous builtins.

Coalesced<T>

internal

cooperative_groups

global functions from cooperative_groups.h header

cublasHandle_t

Opaque pointer holding the cuBLAS library context Complete documentation on Nvidia documentation

cuComplex

complex single-precision

cuda.cudaDeviceProp_100

cudaArray_t

CUDA Array

cudaChannelFormatDesc

CUDA Channel format descriptor

cudaDeviceProp

CUDA device properties Complete documentation here

cudaEvent_t

CUDA event types

cudaExtent

CUDA extent

cudaFuncAttributes

CUDA function attributes

cudaIpcEventHandle_t

CUDA IPC event handle

cudaIpcMemHandle_t

CUDA IPC memory handle

cudaMemcpy3DParms

CUDA 3D memory copying parameters

cudaMemcpy3DPeerParms

CUDA 3D cross-device memory copying parameters

cudaMipmappedArray_const_t

CUDA mipmapped array (as source argument)

cudaMipmappedArray_t

CUDA mipmapped array

cudaPitchedPtr

CUDA Pitched memory pointer

cudaPos

CUDA 3D position

cudaResourceDesc

CUDA resource descriptor

cudaResourceViewDesc

CUDA resource view descriptor

CudaRuntimeProperties

cudaStream_t

CUDA stream

cudaStreamCallback_t

Type of stream callback functions.

cudaSurfaceObject_t

An opaque value that represents a CUDA Surface object

cudaTextureDesc

CUDA texture descriptor

cudaTextureObject_t

An opaque value that represents a CUDA texture object

cuDoubleComplex

complex double-precision

cufftHandle

Inner structure to carry handle.

curand.curandGenerator_t

dim3

dimension structure.

double2

2 64 bits floating point elements, packed

double4

4 64 bits floating points elements

FieldTools.FieldDeclaration

float2

2 32 bits float, packed

float3

3 32 bits floating points elements, packed

float4

4 32 bits floats

float8

8 32 bits floats

grid_group

half

Half (16 bits) precision floating point type

half2

half8

HandleDelegate

Handle for delegate serialized as structs

HybridizerProperties

Hybridizer properties at runtime

int2

2 32 bits integers

int4

4 integers, packed

int8

8 32 bits integers

long2

2 64 bits integers

NativeArrayIndexer<T>

SharedMemoryAllocator<T>

structure to allocate shared memory in device code

short2

2 26 bits integers

short4

4 16 bits integers

short8

8 16 bits integers

size_t

$size_t$ type has different bit-size storage depending on architecture.

StackArray<T>

An array on stack

thread_block

Every GPU kernel is executed by a grid of thread blocks, and threads within each block are guaranteed to reside on the same streaming multiprocessor. A thread_block represents a thread block whose dimensions are not known until runtime.

thread_block_tile_1

thread_block_tile_16

thread_block_tile_2

thread_block_tile_32

thread_block_tile_4

thread_block_tile_8

thread_group

A handle to a group of threads. The handle is only accessible to members of the group it represents.

uchar4

four unsigned signed bytes

VectorizerMask

INTERNAL TYPE

Interfaces

cuda.ICuda

interface wrapping all cuda versions

cuda.ICudaMarshalling

ICuda simplified for marshalling only

curand.ICurand

ICustomMarshalled

custom marshaler

IHybCustomMarshaler

custom marshaler

IKernelInteropTools

INVBLAS

Complete documentation on NVidia documentation

IResidentArray

Resident array -- user must manually control memory location

IResidentData

Resident data interface

Enums

ConstantLocation

constant memory location HybridConstantAttribute

cublasAtomicsMode_t

Indicates whether cuBLAS routines which has an alternate implementation using atomics can be used Complete documentation on Nvidia documentation

cublasDiagType_t

Indicates whether the main diagonal of the dense matrix is unity and consequently should not be touched or modified by the function Complete documentation on Nvidia documentation

cublasFillMode_t

The type indicates which part (lower or upper) of the dense matrix was filled and consequently should be used by the function. Complete documentation on Nvidia documentation

cublasGemmAlgo_t

An enumerant to specify the algorithm for matrix-matrix multiplication Complete documentation on Nvidia documentation

cublasMath_t

cublasMath_t enumerate type is used in cublasSetMathMode to choose whether or not to use Tensor Core operations in the library by setting the math mode to either CUBLAS_TENSOR_OP_MATH or CUBLAS_DEFAULT_MATH. Complete documentation on Nvidia documentation

cuda error codes https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gf599e5b8b829ce7db0f5216928f6ecb6 https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038