Is there a way to calculate a max between a large number of processes. For instance I have a global datastructure that I want to store the max in and 19 Billion threads to calculate the max over. Obviously storage of the entire array of numbers is impossible. It would seem that the atomic_max function isn't defined for floats, so if I do this:

Code :
float tmp = max(overall_max, local_max);
if(tmp == local_max)
    atomic_xchg(&overall_max, tmp);

it seems like there would be some chance that threads would interfere with each other. I am also trying to save another statistic about the max, called maxt, but this also suffers from the same problem

Code :
float tmp = max(overall_max, local_max);
if(tmp == local_max) {
    atomic_xchg(&overall_max, tmp);
    atomic_xchg(&overallt, maxt);
}

This is filled with race conditions. For instance: What if two threads simultaneously find that they are the new max, then the higher one beats the lower one to the atomic_xchg?

Is there any way to do this?