Hands-On GPU:Accelerated Computer Vision with OpenCV and CUDA
上QQ阅读APP看书,第一时间看更新

Memory-related properties

Memory on the GPU has a hierarchical architecture. It can be divided in terms of L1 cache, L2 cache, global memory, texture memory, and shared memory. The cudaDeviceProp provides many properties that help in identifying memory available with the device. memoryClockRate and memoryBusWidth provide clock rate and bus width of the memory respectively. The speed of the memory is very important. It affects the overall speed of your program. totalGlobalMem returns the size of global memory available with the device. totalConstMem returns the total constant memory available with the device. sharedMemPerBlock returns the total shared memory that can be used in tne device. The total number of registers available per block can be identified by using regsPerBlock. Size of L2 cache can be identified using the l2CacheSize property. The following code snippet shows how to use memory-related properties from the CUDA program:

printf( " Total amount of global memory: %.0f MBytes (%llu bytes)\n",
(float)device_Property.totalGlobalMem / 1048576.0f, (unsigned long long) device_Property.totalGlobalMem);
printf(" Memory Clock rate: %.0f Mhz\n", device_Property.memoryClockRate * 1e-3f);
printf(" Memory Bus Width: %d-bit\n", device_Property.memoryBusWidth);
if (device_Property.l2CacheSize)
{
printf(" L2 Cache Size: %d bytes\n", device_Property.l2CacheSize);
}
printf(" Total amount of constant memory: %lu bytes\n", device_Property.totalConstMem);
printf(" Total amount of shared memory per block: %lu bytes\n", device_Property.sharedMemPerBlock);
printf(" Total number of registers available per block: %d\n", device_Property.regsPerBlock);