PDA

View Full Version : Beginning OpenCL and "Access violation"



ilektrik
03-24-2011, 12:01 PM
Hello,

I want to start with OpenCL but at the very beginning I encountered some problems. I can't run code provided in "OpenCL getting started" tutorial which is provided with NVIDIA GPU Computing SDK 3.2.

I've installed "Notebook Developer Drivers for WinVista and Win7 (260.99)" and CUDA Toolkit 3.2 from this site: http://developer.nvidia.com/object/cuda ... loads.html (http://developer.nvidia.com/object/cuda_3_2_downloads.html)

I'm using Windows7 32bit on laptop with Nvidia Quadro 140M.

I've set up Visual 2008/2010 according to this tutorial:
http://opencl.codeplex.com/wikipage?tit ... ls%20-%200 (http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%200)

OK this is the code:

//************************************************** *****************
// Demo OpenCL application to compute a simple vector addition
// computation between 2 arrays on the GPU
// ************************************************** ****************
#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>

// OpenCL source code
const char* OpenCLSource[] = {
"__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)",
"{",
" // Index of the elements to add \n",
" unsigned int n = get_global_id(0);",
" // Sum the nth element of vectors a and b and store in c \n",
" c[n] = a[n] + b[n];",
"}"
};

// Some interesting data for the vectors
int InitialData1[20] = {37,50,54,50,56,0,43,43,74,71,32,36,16,43,56,100,5 0,25,15,17};
int InitialData2[20] = {35,51,54,58,55,32,36,69,27,39,35,40,16,44,55,14,5 8,75,18,15};

// Number of elements in the vectors to be added
#define SIZE 2048

// Main function
// ************************************************** *******************
int main(int argc, char **argv)
{
// Two integer source vectors in Host memory
int HostVector1[SIZE], HostVector2[SIZE];

// Initialize with some interesting repeating data
for(int c = 0; c < SIZE; c++)
{
HostVector1[c] = InitialData1[c%20];
HostVector2[c] = InitialData2[c%20];
}

// Create a context to run OpenCL on our CUDA-enabled NVIDIA GPU
cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU,
NULL, NULL, NULL);

// Get the list of GPU devices associated with this context
size_t ParmDataBytes;
clGetContextInfo(GPUContext, CL_CONTEXT_DEVICES, 0, NULL, &ParmDataBytes);
cl_device_id* GPUDevices = (cl_device_id*)malloc(ParmDataBytes);
clGetContextInfo(GPUContext, CL_CONTEXT_DEVICES, ParmDataBytes, GPUDevices, NULL);
// Create a command-queue on the first GPU device
cl_command_queue GPUCommandQueue = clCreateCommandQueue(GPUContext,
GPUDevices[0], 0, NULL);

// Allocate GPU memory for source vectors AND initialize from CPU memory
cl_mem GPUVector1 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector1, NULL);
cl_mem GPUVector2 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector2, NULL);

// Allocate output memory on GPU
cl_mem GPUOutputVector = clCreateBuffer(GPUContext, CL_MEM_WRITE_ONLY,
sizeof(int) * SIZE, NULL, NULL);

// Create OpenCL program with source code
cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext, 7,
OpenCLSource, NULL, NULL);

// Build the program (OpenCL JIT compilation)
clBuildProgram(OpenCLProgram, 0, NULL, NULL, NULL, NULL);

// Create a handle to the compiled OpenCL function (Kernel)
cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "VectorAdd", NULL);

// In the next step we associate the GPU memory with the Kernel arguments
clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem),(void*)&GPUOutputVector);
clSetKernelArg(OpenCLVectorAdd, 1, sizeof(cl_mem), (void*)&GPUVector1);
clSetKernelArg(OpenCLVectorAdd, 2, sizeof(cl_mem), (void*)&GPUVector2);

// Launch the Kernel on the GPU
size_t WorkSize[1] = {SIZE}; // one dimensional Range
clEnqueueNDRangeKernel(GPUCommandQueue, OpenCLVectorAdd, 1, NULL,
WorkSize, NULL, 0, NULL, NULL);

// Copy the output in GPU memory back to CPU memory
int HostOutputVector[SIZE];
clEnqueueReadBuffer(GPUCommandQueue, GPUOutputVector, CL_TRUE, 0,
SIZE * sizeof(int), HostOutputVector, 0, NULL, NULL);

// Cleanup
free(GPUDevices);
clReleaseKernel(OpenCLVectorAdd);
clReleaseProgram(OpenCLProgram);
clReleaseCommandQueue(GPUCommandQueue);
clReleaseContext(GPUContext);
clReleaseMemObject(GPUVector1);
clReleaseMemObject(GPUVector2);
clReleaseMemObject(GPUOutputVector);

// Print out the results
for (int Rows = 0; Rows < (SIZE/20); Rows++, printf("\n")){
for(int c = 0; c <20; c++){
printf("%c",(char)HostOutputVector[Rows * 20 + c]);
}
}
return 0;
}

And this is the error:

Unhandled exception at 0x00b7d7c4 in openCLexample.exe: 0xC0000005: Access violation reading location 0x00000000.

There is also problem with building for "release" this code: then I get
1>.\opencl.cpp(7) : fatal error C1083: Cannot open include file: 'CL/cl.h': No such file or directory

Can you give me some hints to run this code ? Maybe somethins is still missing in VS configuration?

I can only add that some sample applications like QueryDevice or Matrix Transpose run fine (with "PASSED" notification).

HolyGeneralK
03-25-2011, 12:11 PM
First off, I'd use the Visual Studio debugger to step through your code and find out where things are going wonky. For some reason, my suspicion is in the releasing of the OpenCL objects (I've noticed the nVidia drivers can be somewhat weak here). Make sure you've built debug, set a breakpoint at the top of main() and step through it.

As for your release issue, you need to repeat the steps in the setup link you gave us for the "Release" version of the code. You probably set them up only on the debug. When you go into the project properties window, use the dropdown box at the top to switch to the release type and repeat everything in the second step for it.

Let us know what you find

ilektrik
03-26-2011, 01:50 PM
Hi

Thank you for reply and hints. I've repeated VS configuartion for Release mode and application started, but I expected other output (see code below). Also there were four "beeps" from the speaker.


???Q?? g?Tg
C bC@EP C
? *a P P????
?P??'?C?8 ??P, ?
?? Cc???? ?P ݨ?
8
?
ɶ?J?L??L ?LT ?
?P?X ?? ??<`d
L lp<`d\\?x?? ?????
L??(?? ?? K@
L ?4 \?
? ?` ?O ?
0 ??
``` ??P P?
?@$? ??$ <<P,$ ?
$ ??? $^4x? ? *?
f ? '???a?? ?@
2 8 \eityMcieSse\ur
nCnrle\oto\l\otn\eso
s ?P| gDv ?? erT
n ???p??d]?????
- | D \idw\YTM2krebs
.l ?? ? 8?$9?y(?
XD ????uL?? ?` h
Hh???=? | ?{??
* ?E?? l ?Z??
*?? PXP0P??P??
?=? P$? ??** ?0 ?
c??? ??Lݨ??(
6? ???JTf??@??
?T?? X@
?$ 0))0 ?}))8
?} X H H?? X H"
?xHH?? H x? H?
? X?? ???
???????
?? "$&(*,.02468:<>@B
DFHJLNPRTVXZ\^`BDFHJ
LNPRTVXZ|~??? ?`Z}?
???`Z}???^?????^
==T????????G?X??bT?
???????G?X??b 6?? ?
??
????????? "$&(*
,.02468:<>@BDFHJLNPR
TVXZ\^`bdfhjlnprtvxz
|~??? ?`Z}????a[~?
??^?????_=>T?????
????HX??bU?
?Y??c ?X @?S ?
? pX?db ? ? <
?E?? H ((
H???????`?
I? h?Ih ?ZI?*?,
P?,P?$?P???PIpL?*
)T? ? "??? h
??? ix*p? p??
)? LP P?P?? ,??? ?
?8?? ?P?p(?@ ?
?????? hnh nh?@8
$? ? ? ??$ cp?
pp? t ?ݨ?,
,h???h ?h ?
G8,E ? ? d??php?
h?? p, p ?<?`t
x?o?? L ? ???/
???? ? "* D L
* 0 (? ?T< D L ԰?
? hExh??? $( ?dZ
??d*%@ PdhP?t???P?
??????` ???? ????
c???? ? 2?$ݨ?@
X??@ ?<@DHLXEhx???
? @ ?? ? 4?8?u P
??????v cY ?@
? ?$@8 4c?? p x `
ݨ? ?? ?b?? ? ?
H?8ݨ?@ p TX?*d
pp??? p T'*$?
??? ?\eieHrdsV
lm1Ueskb\ouet\iulSui
00Poet\oslO???D??
?? ? G8,a ??
??? ???Xx??,?
D?`?=?4 ?T?*$?
? ? ?????? ?
Still, in debug mode application doesn't work.

I've set a breakpoint before this function

// Create a context to run OpenCL on our CUDA-enabled NVIDIA GPU
cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU,
NULL, NULL, NULL);

And I think something bad is with GPUContext address; after executing mentioned method it has address

GPUContext = 0x00000000

ilektrik
04-04-2011, 09:38 AM
I managed to repair the code (I hope I've done it right):


cl_int errcode;
cl_uint num_platforms;
cl_platform_id platform_id;
errcode = clGetPlatformIDs(1, &platform_id,&num_platforms);
assert(errcode==CL_SUCCESS);

// prepare cl_context_properties
cl_context_properties props[3];
props[0] = (cl_context_properties)CL_CONTEXT_PLATFORM; // indicates that next element is platform
props[1] = (cl_context_properties)platform_id; // platform is of type cl_platform_id
props[2] = (cl_context_properties)0; // last element must be 0

// Create a context to run OpenCL on our CUDA-enabled NVIDIA GPU
cl_context GPUContext = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU,
NULL, NULL, &errcode);

assert(errcode==CL_SUCCESS);

sources:
http://www.khronos.org/registry/cl/sdk/ ... mType.html (http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clCreateContextFromType.html)
http://www.khronos.org/registry/cl/sdk/ ... rmIDs.html (http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clGetPlatformIDs.html)