PDA

View Full Version : thoughts on optimizing my software rasterizer.



tweakoz
02-05-2010, 04:27 PM
I am currently doing a software rasterizer/renderer with OpenCL as the engine for the fragment shading stage.
I eventually plan on moving as much as practical to OpenCL.
In this list's opinion, given current limitations with OpenCL and threads, and also HOST<>GPU communications overhead,
what would be the best practical strategy for optimizing my scenario.

I know that modifying command queues is not thread safe (I tried it;>).

Right now the thread hierarchy looks like this:

(CPUthread0:TransformGeometry) .. (CPUthread63:TransformGeometry) (using a thread pool)
\/
thread safe (but not locked) screen-space per material per screen tile post transform buckets
\/
(CPUthread0:RasterizePt1) .. (CPUthread15:RasterizePt1) (using the same thread pool)
\/ (SCAN CONVERT TRIANGLES INTO PRE-SHADED FRAGMENTS)
thread safe (but not locked) tile-space per material per screen tile preshaded-fragment buffers
\/
(Locked OpenCL Device: Fragment Shading) // CPU THREADS SERIALIZED HERE (Most time spent per frame is also here)
\/ (SHADE FRAGMENTS)
thread safe (but not locked) tile-space per material per screen tile postshaded-fragment A-Buffers
\/
(CPUthread0:RasterizePt3) .. (CPUthread15:RasterizePt3) (using the same thread pool, actually the same workqueue job as RasterizePt1 )
\/ (ZSort, A-Buffer Composite and AntiAlias Resolve TileBuffer to FrameBuffer)
DONE

If it matters, I am not currently concerned with all hardware platforms, just mine. I will be at some point, but I am not there yet...
I am using a dual Xeon E5520 and Geforce 260 Core 216.

you can see some performance tables at
http://www.tweakoz.com/michael/wordpress/?page_id=464

Thanks,

mtm