Results 1 to 6 of 6

Thread: Some startup tips

  1. #1
    Junior Member
    Join Date
    Oct 2010
    Posts
    3

    Some startup tips

    Hi,

    I was thinking of trying out OpenCL for running calculations on a Mandelbrot program and I have a few questions.

    As far as I know you want to have as many workers as possible (well not as many as you can maybe but quite a few anyway). Mandelbrot is calculated by performing a number of iterations for each pixel on the screen.
    Is it a good idea to have one worker for each pixel (this would limit the window to 512x512 on my computer it seems since local_work_size == global_work_size == 512) or what do you think?

    If I have one worker per pixel how do I get the index of the pixel? I've tried get_global_id(0)*512 + get_local_id(0) but that didn't seem to work at all.

    Otherwise I could just calculate each row in one worker but the problem is if I have more than 512 rows, how is this best soloved?

    Regards
    Nicklas

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some startup tips

    As far as I know you want to have as many workers as possible (well not as many as you can maybe but quite a few anyway).
    Yes, you are right.

    Is it a good idea to have one worker for each pixel (this would limit the window to 512x512 on my computer it seems since local_work_size == global_work_size == 512) or what do you think?
    I think that one work-item per pixel is a great place to start (*). I don't think you will have to limit yourself to 512x512 since there's no need for the local work size to be a particular number. Correct me if I'm wrong, but in naive Mandelbrot computations each pixel is independent of the rest. If that is the case, then the local size does not matter and you can make the picture as large as you want.

    If I have one worker per pixel how do I get the index of the pixel? I've tried get_global_id(0)*512 + get_local_id(0) but that didn't seem to work at all.
    What you want is: x = get_global_id(0); y = get_global_id(1);

    Otherwise I could just calculate each row in one worker but the problem is if I have more than 512 rows, how is this best soloved?
    Computing one row in each work-item would produce too few work-items for the GPU to perform well.


    (*) The only downside of that approach is that some pixels in the Mandelbrot set take much longer to compute than some others and they will become the bottleneck of the algorithm. However, one work-item per pixel is definitely the right place to start; don't worry about performance too much at this stage.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Oct 2010
    Posts
    3

    Re: Some startup tips

    Quote Originally Posted by david.garcia
    I think that one work-item per pixel is a great place to start (*). I don't think you will have to limit yourself to 512x512 since there's no need for the local work size to be a particular number. Correct me if I'm wrong, but in naive Mandelbrot computations each pixel is independent of the rest. If that is the case, then the local size does not matter and you can make the picture as large as you want.
    Oh, I though that the global size was the number of "threads" and local size was the number of items per thread and that you couldn't have more "threads" than cores which is 512 in my case. Maybe this is incorrect?

    Quote Originally Posted by david.garcia
    What you want is: x = get_global_id(0); y = get_global_id(1);
    Ahh, thanks!

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some startup tips

    Oh, I though that the global size was the number of "threads" and local size was the number of items per thread and that you couldn't have more "threads" than cores which is 512 in my case. Maybe this is incorrect?
    The standard intentionally avoids the term "thread" since it means very different things to different people. The global size represents the total number of work-items you want to spawn. You can think of each work-item as a scalar processor.

    A small collection of work-items forms a work-group. Work-items within the same work-group can communicate through local memory and synchronize using execution barriers. Local memory and some other shared resources are the reason why you can't have very large work groups.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Oct 2010
    Posts
    3

    Re: Some startup tips

    Quote Originally Posted by david.garcia
    Oh, I though that the global size was the number of "threads" and local size was the number of items per thread and that you couldn't have more "threads" than cores which is 512 in my case. Maybe this is incorrect?
    The standard intentionally avoids the term "thread" since it means very different things to different people. The global size represents the total number of work-items you want to spawn. You can think of each work-item as a scalar processor.

    A small collection of work-items forms a work-group. Work-items within the same work-group can communicate through local memory and synchronize using execution barriers. Local memory and some other shared resources are the reason why you can't have very large work groups.
    Thanks for a very detailed answer
    I'll continue to mess around with it and see how it goes.

    BTW, what's the best/easiest way to represent floating numbers with a large amount of decimals in opencl?

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some startup tips

    BTW, what's the best/easiest way to represent floating numbers with a large amount of decimals in opencl?
    If your device supports double-precision floats, that's an easy way that may have enough range/precision for your needs. Query the device extension string for cl_khr_fp64 and start your kernels with this:
    Code :
    #pragma OPENCL EXTENSION
    cl_khr_fp64 : enable

    If doubles are not good enough there are other --more troublesome-- ways to go about it.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Performance tips
    By Rui in forum OpenCL
    Replies: 1
    Last Post: 06-03-2010, 12:16 PM
  2. OpenCL driver startup cost
    By dominik in forum OpenCL
    Replies: 0
    Last Post: 01-28-2010, 08:37 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •