PDA

View Full Version : How Can I reduce the GPU Kernel Performance ?



chizhan
10-25-2012, 12:37 PM
Hi. I need to reduce the performance of the GPU. When the kernel OpenCL running, OS is not responding(even Ctrl+Alt+Del) and I have to restart the computer. That it will be not 100%, but let's say 90. When I do one famous benchmark test, I can see that its performance is less than 100%.

I'm sorry but I'm new to OpenCL. Where I can register in the code this limit?

notzed
10-25-2012, 02:27 PM
Hi. I need to reduce the performance of the GPU. When the kernel OpenCL running, OS is not responding(even Ctrl+Alt+Del) and I have to restart the computer. That it will be not 100%, but let's say 90. When I do one famous benchmark test, I can see that its performance is less than 100%.

I'm sorry but I'm new to OpenCL. Where I can register in the code this limit?

You just have to break your code up so it runs in more but shorter kernels, at this time even when the hardware supports concurrent scheduling the load balancing isn't very good.

Bugs can also crash the computer.

chizhan
10-25-2012, 07:21 PM
Hi. I need to reduce the performance of the GPU. When the kernel OpenCL running, OS is not responding(even Ctrl+Alt+Del) and I have to restart the computer. That it will be not 100%, but let's say 90. When I do one famous benchmark test, I can see that its performance is less than 100%.

I'm sorry but I'm new to OpenCL. Where I can register in the code this limit?

You just have to break your code up so it runs in more but shorter kernels, at this time even when the hardware supports concurrent scheduling the load balancing isn't very good.

Bugs can also crash the computer.
notzed, Thank you! This code is matrix multiplication. When they are not big like 1000x1000, the GPU time to count them before the crash. If more, the problems begin.

I think there is a way to avoid the capture all resources of the GPU. For example, in the ordinary programming the CPU (not OpenCL) I start a thread and assigns THREAD_PRIORITY_LOWEST

Maybe there is some directive does not use all the comp units? (CL_DEVICE_MAX_COMPUTE_UNITS)

notzed
10-25-2012, 09:13 PM
OpenCL 1.2 has some api's to partition the device. See section 4.3 of the opencl specification.

chizhan
10-25-2012, 11:43 PM
OpenCL 1.2 has some api's to partition the device. See section 4.3 of the opencl specification.
Thx, but it doesn't support GPU (CPU only).
http://devgurus.amd.com/thread/159523

chippies
10-28-2012, 04:02 AM
Since you are doing matrix multiplication, the kernel can be broken down into smaller parts. You could take the first hundred rows of your left matrix and multiply the first hundred columns of your right matrix, giving the first 100x100 block of your output matrix.

Currently, there aren't any other methods supported by all vendors.