hi, I found there is almost no speed difference for -O3 and -O0 for my openCL code. is this normal? thanks!