wonwoolee

04-12-2010, 06:36 PM

Hello.

I wrote a simple kernel that multiplies a 3x3 matrix to vectors (x,y,1).

The kernel works good if I set the matrix values with simple ones, like

[1.0 2.0 3.0]

[4.0 5.0 6.0]

[7.0 8.0 9.0]

My kernel also works well when I tested it on the CPU.

However, when I set the matrix values like below, I get wrong results.

[0.000000 0.109586 1068.300049]

[41760.031250 0.438342 2670.750000]

[83520.062500 0.767098 4273.200195]

For example,

For a vector: (15, 0, 1)

GPU: 1068.300049, 629071.250000, 1257074.125000

True value: 1068.300049, 629071.250000, 1257074.250000

Diff: 0.000000, 0.000000, -0.125000

For: (124, 0, 1)

GPU: 1068.300049, 5180914.500000, 10360761.000000

True value: 1068.300049, 5180915.000000, 10360761.000000

Diff: 0.000000, -0.500000, 0.000000 --> Large errors.

The errors are not consistent and it is unpredictable.

Anybody knows why this happens ?? Please give me a clue for this.

I attached my kernel here.

struct my_vec4 {

float x;

float y;

float z;

float w;

};

typedef struct my_vec4 MyVec4;

//----------------------------------------------------

__kernel void compute_ep_lines(

__global MyVec4 *g_dst,

__constant float *c_fmat, //--> 3x3 matrix

int N)

{

// just get a global id and use them as a vector

int x = get_global_id(0) ;

int y = get_global_id(1) ;

int index = y * N + x ;

float e1 = c_fmat[0] * (float)x + c_fmat[1] * (float)y + c_fmat[2] ;

float e2 = c_fmat[3] * (float)x + c_fmat[4] * (float)y + c_fmat[5] ;

float e3 = c_fmat[6] * (float)x + c_fmat[7] * (float)y + c_fmat[8] ;

// assign result to global mem

g_dst[index].x = e1 ;

g_dst[index].y = e2 ;

g_dst[index].z = e3 ;

}

