i am currently doing a straight implementation of openCL in c++, meaning that I am not using the SDKs. I am experiencing a problem with the writebuffer and readbuffer that I cannot debug, and it has something to do with the amount of data that I am sending to the buffers.

Here is my code:
Code :
err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, sizeof(data)*dataSize, data, 0, NULL, NULL);
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &dataSize, NULL, 0, NULL, NULL);
err = clEnqueueCopyBuffer(queue, output, input, 0, 0, sizeof(data)*dataSize, 0, NULL, NULL);
err = clEnqueueReadBuffer(queue, output, CL_TRUE, 0, sizeof(data)*dataSize, dataOut, 0, NULL, NULL );
As dataSize gets bigger (up into 3000), this code fails and I cannot get into NVCuda.dll to figure out why. Does anyone have any suggestions as to how to fix this code or as to how i can go about doing algorithms on up to a million data sets?