PDA

View Full Version : Traversing a Tree using the root pointer



preston
05-29-2010, 09:31 AM
hi

I would like to create a kernel that looks like this:

__kernel void Traverse(__global const TreeRoot *root)
{
// perform some operations...
}

Now, I need to copy the whole tree (obviously) to device. But how
can I get back a pointer to device memory when creating buffers
so that I can assemble the tree in device? I am sure this can be
done in CUDA, but so far I have the feeling that it's not possible
with OpenCL.

thanks

david.garcia
05-31-2010, 09:50 AM
In OpenCL you don't want to store pointers to device memory inside a buffer --you'll run into portability problems very easily. Store byte offsets relative to the buffer's base address instead.

ibbles
05-31-2010, 10:55 PM
In OpenCL you don't want to store pointers to device memory inside a buffer --you'll run into portability problems very easily. Store byte offsets relative to the buffer's base address instead.

Storing pointers sounds like a rather harmless thing to do. What kind of portability problems are you referring to?

preston
06-01-2010, 12:46 AM
Portability problems? With pointers? Could this be due to potential different pointer size
in some architectures (not being true 64-bit)?

In my opinion this is quite essential. Since OpenCL is a C-like language, we need to have
this! Why? simply because it's not always nice or intuitive to convert every structure that
we need inside cl program into a compact memory, then using offsets to access it.

So, is it possible? I haven't found yet any command to get a pointer from a cl buffer.
Perhaps I am blind and someone knows already the way... ;)

greetings

preston
06-01-2010, 12:53 AM
Also, another reason to have this, is because it will be then possible to pass just
a few structs as cl kernel arguments that will carry pointers to device memory.

Imagine for example, that I have a simple class in C++ that uses N different
memory allocations on the heap.

class foo
{
public:
void *memory1;
void *memory2;
...
void *memoryN;
};

Now, I want to copy that foo class inside device memory using a cl struct analog,
and pass that as kernel argument. How can I do that? so far, without having the
availability of cl buffer pointers, I will need to pass all memory1,2..N pointers
explicitly as kernel arguments - a very tedious job.

david.garcia
06-01-2010, 05:40 AM
Storing pointers sounds like a rather harmless thing to do. What kind of portability problems are you referring to?

Perhaps I misunderstood the OP, but it seemed to me like he wanted to use the host to store pointers inside a buffer object. Obviously those pointers will be in the host address space and won't work when you run a kernel in the device. Many devices have an address space that is separate from the host address space, making the pointers that he stored in the buffer equivalent to garbage.

I'll put it in a different way: try implementing this on a GPU device and let us know whether it works.

As I said, the only portable solution I can think of is storing offsets to the base address of the buffer object. I understand it's inconvenient, but it's not rocket science either. You can create some macros to make it easier.

ibbles
06-01-2010, 12:05 PM
I agree that the OP is a bit unclear on the details behind a TreeRoot*.

What I would like to do is something like the following



struct SparseMatrix{
cl_float* data;
cl_uint* indices;
}
cl_mem data = clCreateBuffer(...);
cl_mem indices = clCreateBuffer(..);

SparseMatrix matrix;
matrix.data = clDevicePointer(data);
matrix.indices = clDevicePointer(indices);
clSetKernelArg(..., sizeof(SparseMatrix), (void*)&matrix);

preston
06-01-2010, 12:18 PM
Oh my, I was looking exactly for that (clDevicePointer) :oops:

Muchas Gracias :)

ibbles
06-01-2010, 11:20 PM
I should probably clarify that clDevicePointer does not currently exist and was used here only to demonstrate the kind of operation we are discussing.

preston
06-02-2010, 12:19 AM
Nooooooooooooooooooo :(

oseiskar
07-20-2010, 04:19 AM
You can implement a "clDevicePointer" as an OpenCL kernel


__kernel void getPtr( __global void *ptr, __global void * __global *out )
{
*out = ptr;
}
that can be invoked as follows



...

cl_mem auxBuf = clCreateBuffer( context, CL_MEM_READ_WRITE, sizeof(void*), NULL, NULL );
void *gpuPtr;

clSetKernelArg( getterKernel, 0, sizeof(cl_mem), &myBuf );
clSetKernelArg( getterKernel, 1, sizeof(cl_mem), &auxBuf );
clEnqueueTask( commandQueue, getterKernel, 0, NULL, NULL );
clEnqueueReadBuffer( commandQueue, auxBuf, CL_TRUE, 0, sizeof(void*), &gpuPtr, 0, NULL, NULL );

clReleaseMemObject(auxBuf);

...
Now "gpuPtr" should contain the address of the beginning of "myBuf" in GPU memory space. Do the necessary modifications if you're not using a 32-bit matching-endianess architecture...

david.garcia
07-20-2010, 05:38 PM
Now "gpuPtr" should contain the address of the beginning of "myBuf" in GPU memory space. Do the necessary modifications if you're not using a 32-bit matching-endianess architecture...

There's no guarantee that this sort of code will be portable. Chances are it will break in some implementations.

thorne
02-07-2011, 09:57 AM
To get back a pointer to device memory when creating buffers so that assemble the tree in a device
will consume a lot of space in the memory and may thus result in the system slow down.This is because CUDA is not supported by most of the operating Systems .So I would rather advise to for some other software to solve this problem or recently launched symbian OS can also prove as an aid to the problem.

andrew.brownsword
02-11-2011, 11:06 AM
You really don't want to use pointers. Not only are they potentially different sizes between devices (and host), but they are also potentially different sizes between address spaces (global, local, constant, private). And, even worse, buffers move.

I recommend using indices. These can be byte indices, or they can be indices into arrays of structs (or other types). You can control their size (8, 16, 32 or 64 bit). You can range check them easily. They can be used to index into parallel arrays (e.g. if you decide to split structs into parallel arrays of sub-elements for optimization reasons). They remain valid if you transfer your data between address spaces (e.g. if you copy from global to local for performance reasons). They are valid across all devices and the host. They can be saved to disk and reloaded, or sent across networks. They are easily serialized and deserialized. And they remain valid no matter where the system copies your buffer.

You can still use pointers in your code, if need be... converting from an index to a pointer or back again is simple. Just try to avoid storing pointers in your data structures.