Reduction within one work item
how would the more experienced devs work out that problem:
I want to calculate a financial math problem called "Ichimoku" on GPU.
The actual problem can be shortened down to:
- you have a price series array - lets say an array of 10.000 doubles - 0 to 9.9999
Calculating Ichimoku involves basically the following task 2-3 times with different widths and a few minor challenges. All major calculations are independent from the previous / next one so the outer loop is perfectly parallel. The inner loop is a min/max reduction of the _X_ previous values:
perfect parallel outer loop:
- do the inner loop (kernel) for each array value independent from the prev / next value
(int) argument _X_ = 26
calculating the result of array index _I_ for width _X_:
- find the low of index _I_ to index (_I_ - _X_) = _LOW_
- find the high of index _I_ to index (_I_ - _X_) = _HIGH_
- result for _I_ = (_LOW_ + _HIGH_) / 2.0
so for _X_ = 26 and array_index = 100
- find the low of array to array[100-26-1] (inclusive)
- find the high of array to array[100-26-1]
- global result= (low+high)/2.0
- of course only calculate for index values > _X_ argument values
I could simply write a kernel which gets invoked with the array length and does a sequential calculation of the high/low in the kernel. I would gain over traditional cpu implementation because i can call that kernel for every array value perfectly in parallel but the inner loop main work load would still be sequential.
How could i do a min/max reducation within the kernel? Call array_size * _X_ work items and keep track which work items are supposed to do a min/max local mem reduction at a certain stage and nothing on the later stage?
Help is very much appreciated.