I know that if within one warp the threads have different branches or number of loops, all the branches or the maximum number of loops would be executed for all of them.
However, I am confused by the "execution" of useless operations imposed on one thread (A) caused by another thread (B) who really needs to execute it. If it is an addition, does thread A also need to add two numbers? If it is a memory read, does thread A also need to read from somewhere in the global memory?
If such operation is just dummy, how much waste could it bring to the entire performance?