Need config for RTX 3090 #270

pal-stdr · 2022-11-30T15:14:33Z

I have seen accel-sim has added configs for RTX 3070. But I was searching configs for RTX 3090. Is it possible to add this config? If not, could anyone please tell/help me how to change the required parameters in RTX 3070 config to make compatible with RTX 3090?

FYI, I am not specialist of GPU architecture. I just understand the basic stuffs. Currently I am running code for adding 2 vectors in parallel for 1000 array/vector elements. Using RTX 3070 config, simulator gives me ~5500 cycles. Where as the real HW (RTX 3090) is giving me ~3400 cycles. It's around ~38% error compared to the real HW cycles. The accel-sim paper claimed the MAE is ~15%, where as I am getting 38%.

I was also going through the official doc of GPGPU-sim. Changed some parameters here and there. But always the sim cycles remain unchanged.

So either I understood something wrong about the paper claim, or my config is wrong, or I am thinking the whole thing in a wrong way. Or everything could be right. I would be very very grateful if anyone shows me If I am doing anything wrong.

I am sharing the code that I am using

#include <stdio.h>

#define N 1000


__global__ void vector_add_cuda(float *out, float *a, float *b, int n) {
    
    int id = blockDim.x * blockIdx.x + threadIdx.x;
    
    if(id < N)
    {
        out[id] = a[id] + b[id];
    }
}

int main(){

    // Number of bytes to allocate for N doubles
    size_t bytes = N * sizeof(float);

    float *h_a, *h_b, *out, *h_out, *d_a, *d_b, *d_out; 

    // Allocate data in host pointer
    cudaMallocHost(&h_a, bytes);
    cudaMallocHost(&h_b, bytes);
    cudaMallocHost(&h_out, bytes);

    // Initialize array
    for(int i = 0; i < N; i++){
        h_a[i] = 1.0f;
        h_b[i] = 2.0f;        
    }

    // Allocate device memory for d_a, d_b, d_out
    cudaMalloc(&d_a, bytes);
    cudaMalloc(&d_b, bytes);
    cudaMalloc(&d_out, bytes);


    // Transfer data from host to device memory
    cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, h_b, bytes, cudaMemcpyHostToDevice);


    int THREADS = 256;
    int BLOCKS = (N + THREADS - 1) / THREADS;


    // Kernel function
    vector_add_cuda <<< BLOCKS, THREADS >>> (d_out, d_a, d_b, N);


    // Transfer computed data from device to host memory
    cudaMemcpy(h_out, d_out, bytes, cudaMemcpyDeviceToHost);


    // Check/validate results
    for(int i = 0; i < N; i++)
    {
        if (i < 10)
        {
            printf("h_a[%d] = %.5lf, h_out[%d] = %.5lf\n", i, h_a[i], i, h_out[i]);
        }

        if (out[i] != h_out[i])
        {
            printf(" Error at %d, h_out[%d] = %.5lf, out[%d] = %.5lf\n", i, i, h_out[i], i, out[i]);
            break;
        }

    }

    // Cleanup after kernel execution
    cudaFree(d_a);
    cudaFree(d_b);
    cudaFree(d_out);

    cudaFree(h_a);
    cudaFree(h_b);
    cudaFree(h_out);

    return 0;
}

The text was updated successfully, but these errors were encountered:

pal-stdr mentioned this issue Dec 9, 2022

Need config for RTX 3090 (Or how to generate) accel-sim/accel-sim-framework#145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need config for RTX 3090 #270

Need config for RTX 3090 #270

pal-stdr commented Nov 30, 2022 •

edited

Loading

Need config for RTX 3090 #270

Need config for RTX 3090 #270

Comments

pal-stdr commented Nov 30, 2022 • edited Loading

pal-stdr commented Nov 30, 2022 •

edited

Loading