I would like to open this topic due to the nature of how these manufacturers' stream implementations are very different. I would like to know what the differences are exatly, and why one philosophy may be better than the other for a given computation. I believe that the true power of OpenCL may lay with the purposeful and careful seperation of computation and literally playing to the strengths of both cards.
If anyone would be kind enough to describe the differences, and give a few examples of why a certain computation is better/faster/more efficient due to the architecture of a certain card, I think that it would go a long way to showing the maximum potential that OpenCL has.
I look forward to a very technical and in-depth discussion that future coders can use to push the limits of this wonderful new language.
Thank you for your time!