PDA

View Full Version : openCL on clusters



niksighania
04-11-2011, 03:11 AM
Hi Everybody,
i have trying to run an openCL program on cluster of CPU , the system (head node) i configured was suse,64 bit. Though it installed successfully and program compiles with no error but it always throws "floating point exception" error.
I have no clue what is the problem here. Please suggest me where may be the problem. Any help regarding this would be very much appreciated.
Thanks
:roll: :roll:

david.garcia
04-11-2011, 04:43 AM
What implementation of OpenCL are you using? Is it AMD's SDK? When do you get that "floating point exception" error? Can you use a debugger to find out which function is raising that exception? It sounds like a division by zero or similar.

niksighania
04-13-2011, 02:18 AM
I did gnu debug and found this result

niksinghania@c5pc00:~/str_fac_parallel> gdb a.out
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb)

What does it mean? Did i run the debugger correctly?
yes it is Amd's SDK and the error in encountered in runtime on running any openCL program with no error in compilation.

david.garcia
04-13-2011, 04:44 AM
It's a very good idea to become familiar with C and the development tools available in your system before trying to use OpenCL. Many consider that learning GDB is a must if you will be developing software under Linux.

niksighania
04-18-2011, 02:44 AM
Thanx for the quick response. I will definitely look more into the gnu debugger, but right now i want some solution fast, and it would be very helpful if u can give some direction as to what might be going wrong.

I tried to compile the SDK codes of the AMD Stream SDK. they compiled perfectly but when i go to the bin directory and try to run any of the executables i again get an error message saying "Floating Point Exception". :( :?: :?:

It might be noted that i m running these programs on the cluster system of my institute and i definitely do not have administrator privileges on that. I am just another common user. So u think that there might be a problem there?

Thanx in advance.

niksighania
04-19-2011, 01:36 AM
niksinghania@c5pc00:~> gdb ./a.out
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) run
Starting program: /misc/home/niksinghania/a.out

Program received signal SIGFPE, Arithmetic exception.
0x00002b646c04a68f in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
(gdb)

This was the output of running debugger. What does it mean by Arithmetic exception because same program runs on my laptop successfully.

Gauge
04-21-2011, 07:50 AM
Things that might cause an FPE

Division by zero, anytime you do division, EVER, even if you know for a fact the denominator will not be zero, put an if statement in to check.

Overflow can also cause this to happen, like reading some values into a smaller sized variable can overflow. If you get a floating point error, there is definately a problem in the code, not in openCL. To my knowledge openCL will not toss a SIGFPE at your OS, it will just crash straight up. SO if you do get a SIGFPE it is likely in your C/C++ code. But I'm just a student so I could be wrong.