Se instaló CUDA Tools y Toolkit versión 4.0 (release candidate 2, mejor arrancar con el soft de vanguardia dado que lo tienen casi en una versión productiva) junto con el GPU SDK examples de Nvidia (ver).
Linux
Luego de instalar todos los paquetes necesarios (libgl1-mesa-dev, libgl1-mesa-dri, libglu-mesa-dev, freeglut3-dev, libxmu-dev, libxi-dev, etc) y de instalar/reinstalar los drivers y toolkits de Nvidia varias veces llegamos a la primera compilación. Obviamente se trata de ejemplos pre-armados, dentro del SDK. Compilamos y corremos algunos tests como deviceQuery y bandwidthTest:
cd ~/workspace/NVIDIA_GPU_Computing_SDK
make
cd C/bin/linux/release
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "GeForce GTX 570"CUDA Driver Version / Runtime Version 4.0 / 4.0CUDA Capability Major/Minor version number: 2.0Total amount of global memory: 1279 MBytes (1341325312 bytes)(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA CoresGPU Clock Speed: 1.57 GHzMemory Clock rate: 2100.00 MhzMemory Bus Width: 320-bitL2 Cache Size: 655360 bytesMax Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048Total amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 32768Warp size: 32Maximum number of threads per block: 1024Maximum sizes of each dimension of a block: 1024 x 1024 x 64Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and execution: Yes with 1 copy engine(s)Run time limit on kernels: NoIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesConcurrent kernel execution: YesAlignment requirement for Surfaces: YesDevice has ECC support enabled: NoDevice is using TCC driver mode: NoDevice supports Unified Addressing (UVA): YesDevice PCI Bus ID / PCI location ID: 3 / 0Compute Mode:< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce GTX 570[./deviceQuery] test results...PASSED
./bandwidthTest
Win7./bandwidthTest Starting...
Running on...
Device 0: GeForce GTX 570Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memoryTransfer Size (Bytes) Bandwidth(MB/s)33554432 2961.9
Device to Host Bandwidth, 1 Device(s), Paged memoryTransfer Size (Bytes) Bandwidth(MB/s)33554432 2753.8
Device to Device Bandwidth, 1 Device(s)Transfer Size (Bytes) Bandwidth(MB/s)33554432 130016.2
[./bandwidthTest] test results...PASSED
La instalación demoró muchísimo menos tiempo (menos de dos horas totales) comparada con la instalación en Linux (unas 12 horas de lucha).
El deviceQuery entrega CASI los mismos resultados. Hay diferencia en
Run time limit on kernels: Yes
Device supports Unified Addressing (UVA): No
que luego investigaremos.
Por otro lado, el bandwidthTest entrega:
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2521.2
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2551.9
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 130377.2
que claramente ofrece menos performance que en la versión Linux.