Hello all,

The greatest difficulties I encounter are those with kernel writing. I am learning OpenCL for my job and as an exercise I am attempting to parallelize the explicit method for a 1D heat equation. The serial C implementation is actually very simple:

Code :
	int i,j;		// Discrete Mesh Indices
	double s;		// Numerical stability factor
	double x;		// Spatial dimension variable x			
	double StepSize_x;	// Step size for x;			
	int NumMeshPoints_x;	// Number of mesh points for x		
	double t;		// Time dimension variable t
	double StepSize_t;	// Step size for t 			
	int NumMeshPoints_t;	// Max steps for t			
 
	// INITIALIZATION
	NumMeshPoints_x = 9;
	StepSize_t = 0.0025;
	NumMeshPoints_t = 10;
	StepSize_x = 1.0/(NumMeshPoints_x+1); 	
	s = StepSize_t/pow(StepSize_x, 2 );	// Satisfy s < 0.25 for stability
	double w[NumMeshPoints_x+1];		// Level Down Array
	double v[NumMeshPoints_x+1];		// Level Array
 
   // POPULATE THE FIRST LAYER GIVEN THE BOUNDARY CONDITION
	for( i = 0; i < NumMeshPoints_x+1; i++ )
	{
		x = i*StepSize_x;
		w[i] = 100;
	}
 
   // COMPUTE TEMPERATURES USING MESH GRID
	t = 0;
	for( j = 1; j <= NumMeshPoints_t; j++ )
	{
      // BOUNDARY CONDITIONS AT END POINTS
		v[0] = 10;
		v[NumMeshPoints_x+1] = 10;
 
		for( i = 1; i < NumMeshPoints_x; i++ )
		{
         // POPULATE THE CURRENT LAYER
			v[i] = s*w[i-1] + (1-2*s)*w[i] + s*w[i+1];
		}
 
		t = j*StepSize_t;	
		for( i = 0; i < NumMeshPoints_x+1; i++ )
		{
         // MAKE CURRENT LAYER SEED FOR THE NEXT COMPUTATION
			w[i] = v[i];
		}
	}

So I am trying to make the meat of this program, the double for loop, into my kernel. So as it stands, each layer of the array v[] is calculated at a given time step t, so that stepping in the t direction expands the grid, layer by layer. However, to calculate a layer requires the previous layer calculated. So I had two thoughts about this and I wasn't sure how to really go about this:

1. Write a kernel that handles the two for loops, prototyped as such:
Code :
	__kernel void heat( 	__global double *prevLayer,
				__global double *currLayer,
				int NumMeshPoints_x,
				int NumMeshPoints_y );

but I just don't understand what would happen here. So I understand that I can access the arrays through the get_global_id(0) OpenCL command but can have each thread compute one point in the layer, then wrap that in the for loop to time step it? Otherwise my second though was to

2. Have the kernel only handle a single layer, then enqueue it in a for loop for the time steps.

I am having difficulties even realizing a problem space. In MPI there is the probelm that threads would have to communicate continuously to compute the end points of their respective segments of the layer. I don't know if this is a problem in OpenCL. OpenCL programmers, I need your help! Please any advice as I am lost!