Cache miss in kernel
Should I consider the caches of a single core ?
The input data is 2 3D matrices each contains 16x256x16 elements.
When the core access the data is does it slowly.
So I guess I caused a lot of cache miss.
Where can I find information about the size of L1,L2 cache of a display card ?
I'm using NVIDIA's GeForce 9400 GT: http://www.geforce.com/hardware/desk...specifications
The spec does not contains this information.
Geforce 9400 GT is compute capability 1.0 (see here: https://developer.nvidia.com/cuda-gpus)
Look at CUDA programming guide, Appendix G.3, for explanation on Compute Capability 1.x architecture, and how to access the memory (it's a split warp architectures). http://docs.nvidia.com/cuda/cuda-c-p...capability-1-x