Should I consider the caches of a single core ?
The input data is 2 3D matrices each contains 16x256x16 elements.
When the core access the data is does it slowly.
So I guess I caused a lot of cache miss.
Where can I find information about the size of L1,L2 cache of a display card ?
I'm using NVIDIA's GeForce 9400 GT: http://www.geforce.com/hardware/desk...specifications
The spec does not contains this information.