Search:

Type: Posts; User: bharath.s.jois

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. Re: Regarding async_work_group_copy(global to local)

    Regarding barriers, does the cost of using a barrier depend on the workgroup size? Like, lesser number of threads per workgroup, lesser is the time?

    -- Bharath
  2. Re: Regarding async_work_group_copy(global to local)

    So, that would mean we need to have enough work with the shared data to overcome the prefetch+sync time. But I see that the CUDA implementation does better here. But I guess it depends on the...
  3. Re: Regarding async_work_group_copy(global to local)

    Although I didn't get a response from the Nvidia forums, I figured out that there is a way to get some information while inside the kernel using a profiler trigger, but I don't think I will be able...
  4. Re: Regarding async_work_group_copy(global to local)

    Although its a good idea to put the val and wgt into the __const address space, I am not sure if it helps with performance. I have best come across an Nvidia article (or post) which says that the...
  5. Re: Regarding async_work_group_copy(global to local)

    To add to what I have said, I see that the branches and number of divergent branches has increased in the shared memory implementation. Do async_work_group_copy or wait_group_events contribute to the...
  6. Re: Regarding async_work_group_copy(global to local)

    Well, I am back with a few more questions.

    Previously, all the threads in a workgroup used a single value from the val[i] and wgt[i].

    Currently, I am working on a variant of the Knapsack...
  7. Replies
    3
    Views
    1,275

    Re: Arrays to the kernel

    The constant memory here... Is it in anyway faster than the global memory? Or does it depend on the GPU/Vendor?

    From the specification [6.5.3 __constant (or constant)], the constant memory is said...
  8. Replies
    3
    Views
    1,275

    Arrays to the kernel

    I just went through the section 5.7.2 Setting Kernel Arguments of the OpenCL specification and I am guessing that I cannot pass an array to the kernel without putting it on to the global memory....
  9. Re: Regarding async_work_group_copy(global to local)

    Bit late on this. Held up debugging one similar implementation.



    This helped a bit. :) But I will have to move back to the shared memory usage when the number of elements required by one thread...
  10. Re: Regarding async_work_group_copy(global to local)

    Actually, I am solving a knapsack problem.

    We'd have N items having value V(0).. V(N-1) and weights W(0)..W(N-1) and a bag of capacity C. I am currently using dynamic programming technique and the...
  11. Re: Regarding async_work_group_copy(global to local)

    I quite get the point regarding how the contents are brought form the global to local by separate threads. But I would still like to stick to the point that when every thread depends on the complete...
  12. Re: Regarding async_work_group_copy(global to local)

    How about this case?

    - Each thread needs lets say 1000 elements to complete its work
    - Number of threads in 1 work group = 1024

    Even in this case, the 1st thread or the first warp would have...
  13. Re: Regarding async_work_group_copy(global to local)

    So, there is no point to have all the threads executing the async copy until they fetch different data, is it? What about the cases where number of elements to be fetched is at most the size of a...
  14. Re: Regarding async_work_group_copy(global to local)

    You got my question right, but I don't think I understand the explanation. If the 1st warp that was scheduled already got the required data to the local memory, why would the later ones be required...
  15. Re: Regarding async_work_group_copy(global to local)

    I get the point. But when several threads try to access the Global memory, wouldn't there be clashes leading to further increase in the completion of copy?

    Also,

    Assuming the number of threads...
  16. Re: Regarding async_work_group_copy(global to local)

    I am not sure if my understanding of the local memory and async copy is correct. If I may ask a few questions...

    I would like to know why does the requirement of "same arguments" come in.

    Eg:...
  17. Regarding async_work_group_copy(global to local)

    Hi folks,

    I have a kernel where a particular element (of a data structure) from the global memory.
    Other words, all the threads executing the kernel use the data at the same address in the global...
  18. Re: CL_OUT_OF_RESOURCES on clEnqueueReadBuffer

    I thought I would have another thorough look onto the code before I conclude the behaviour. It was because of a invalid memory access by a few threads.

    I would consider this thread solved. :)
    ...
  19. Replies
    10
    Views
    3,675

    Re: Problem with clEnqueueReadBuffer

    Apologies. The previous reply was supposed to be on my own thread. I must find some sleep before I do something else.

    /Bharath
  20. Replies
    10
    Views
    3,675

    Re: Problem with clEnqueueReadBuffer

    I thought I would have another thorough look onto the code before I conclude the behaviour. It was because of a invalid memory access by a few threads.

    I would consider this thread solved. :)
    ...
  21. CL_OUT_OF_RESOURCES on clEnqueueReadBuffer

    Hi folks,

    My first few steps with OpenCL and I am facing this below problem.

    The Kernel signature looks like



    __kernel void knapsack(__global value_type *val,
    ...
  22. CL_OUT_OF_RESOURCES on clEnqueueReadBuffer

    Hi folks,

    My first few steps with OpenCL and I am facing this below problem.

    The Kernel signature looks like



    __kernel void knapsack(__global value_type *val,
    ...
Results 1 to 22 of 26
Page 1 of 2 1 2