This is somewhat an implementation issue, but I hope general enought to get here insights.

Consider two input vectors of N double (64-bit) elements, and one output vector, and allocating buffer objects (and then set as kernel arguments):

::size_t ds = sizeof(cl_double) * N;
cl::Buffer inA(someContext, CL_MEM_READ_ONLY, ds , NULL, &s);
cl::Buffer inB(someContext, CL_MEM_READ_ONLY, ds , NULL, &s);
cl::Buffer out(someContext, CL_MEM_WRITE_ONLY, ds , NULL, &s);

Consider also the AMD APP SDK 2.6 and Intel OpenCL SDK 1.5 (two platforms) on the same machine with a Core i7 2670QM as device and 4GB RAM.

If I print the CL_DEVICE_MAX_MEM_ALLOC_SIZE information for each platform and the CPU device, then for AMD its roughly 1000MB, while for Intel it's only 536.
The first surprise here is that they deviate (ok a platform issue), the second is that in my naive eyes these figures are fairly low and remind me of the 512MB RAM age.
Third surprise is that if N = 23 million both platforms can allocate the memory (although at 552 MB more than 536 MB are needed), while for 25 million both platforms fail to allocate the memory (although I am clearly below the 1000 MB limit for AMD).

Can anyone give me insights why for N = 23 million it works for both and why for N = 25 million for none? Also, why are the theoretical and practical limits so fairly low? The CPU must get it's memory from system RAM and I don't really get why on a 4 GB machine the OpenCL part is restricted to less than 600 MB. This is in so far serious as the application I am developing will need more than 600 MB for data.

Is there anything that can be done about it?

many thanks for your input !