You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen accel-sim has added configs for RTX 3070. But I was searching configs for RTX 3090. Is it possible to add this config? If not, could anyone please tell/help me how to change the required parameters in RTX 3070 config to make compatible with RTX 3090?
FYI, I am not specialist of GPU architecture. I just understand the basic stuffs. Currently I am running code for adding 2 vectors in parallel for 1000 array/vector elements. Using RTX 3070 config, simulator gives me ~5500 cycles. Where as the real HW (RTX 3090) is giving me ~3400 cycles. It's around ~38% error compared to the real HW cycles. The accel-sim paper claimed the MAE is ~15%, where as I am getting 38%.
I was also going through the official doc of GPGPU-sim. Changed some parameters here and there. But always the sim cycles remain unchanged.
So either I understood something wrong about the paper claim, or my config is wrong, or I am thinking the whole thing in a wrong way. Or everything could be right. I would be very very grateful if anyone shows me If I am doing anything wrong.
I am sharing the code that I am using
#include <stdio.h>
#define N 1000
__global__ void vector_add_cuda(float *out, float *a, float *b, int n) {
int id = blockDim.x * blockIdx.x + threadIdx.x;
if(id < N)
{
out[id] = a[id] + b[id];
}
}
int main(){
// Number of bytes to allocate for N doubles
size_t bytes = N * sizeof(float);
float *h_a, *h_b, *out, *h_out, *d_a, *d_b, *d_out;
// Allocate data in host pointer
cudaMallocHost(&h_a, bytes);
cudaMallocHost(&h_b, bytes);
cudaMallocHost(&h_out, bytes);
// Initialize array
for(int i = 0; i < N; i++){
h_a[i] = 1.0f;
h_b[i] = 2.0f;
}
// Allocate device memory for d_a, d_b, d_out
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);
cudaMalloc(&d_out, bytes);
// Transfer data from host to device memory
cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, bytes, cudaMemcpyHostToDevice);
int THREADS = 256;
int BLOCKS = (N + THREADS - 1) / THREADS;
// Kernel function
vector_add_cuda <<< BLOCKS, THREADS >>> (d_out, d_a, d_b, N);
// Transfer computed data from device to host memory
cudaMemcpy(h_out, d_out, bytes, cudaMemcpyDeviceToHost);
// Check/validate results
for(int i = 0; i < N; i++)
{
if (i < 10)
{
printf("h_a[%d] = %.5lf, h_out[%d] = %.5lf\n", i, h_a[i], i, h_out[i]);
}
if (out[i] != h_out[i])
{
printf(" Error at %d, h_out[%d] = %.5lf, out[%d] = %.5lf\n", i, i, h_out[i], i, out[i]);
break;
}
}
// Cleanup after kernel execution
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_out);
cudaFree(h_a);
cudaFree(h_b);
cudaFree(h_out);
return 0;
}
The text was updated successfully, but these errors were encountered:
I have seen accel-sim has added configs for RTX 3070. But I was searching configs for RTX 3090. Is it possible to add this config? If not, could anyone please tell/help me how to change the required parameters in RTX 3070 config to make compatible with RTX 3090?
FYI, I am not specialist of GPU architecture. I just understand the basic stuffs. Currently I am running code for adding 2 vectors in parallel for 1000 array/vector elements. Using RTX 3070 config, simulator gives me ~5500 cycles. Where as the real HW (RTX 3090) is giving me ~3400 cycles. It's around ~38% error compared to the real HW cycles. The accel-sim paper claimed the MAE is ~15%, where as I am getting 38%.
I was also going through the official doc of GPGPU-sim. Changed some parameters here and there. But always the sim cycles remain unchanged.
So either I understood something wrong about the paper claim, or my config is wrong, or I am thinking the whole thing in a wrong way. Or everything could be right. I would be very very grateful if anyone shows me If I am doing anything wrong.
I am sharing the code that I am using
The text was updated successfully, but these errors were encountered: