专门做选择题的网站,嵌入式软件开发简历,网站开发合同编号如何编写,电子商务网站建设的方法及流程图TensorRT及CUDA自学笔记004 内核函数以及两个简单的内核函数demo
内核函数是能够在GPU上被线程并发执行的函数
CUDA程序中的函数修饰符
修饰符执行位置调用注意global在设备#xff08;device#xff09;上执行主机和compute capabilitiy 3#xff08;计算能力三级#…TensorRT及CUDA自学笔记004 内核函数以及两个简单的内核函数demo
内核函数是能够在GPU上被线程并发执行的函数
CUDA程序中的函数修饰符
修饰符执行位置调用注意global在设备device上执行主机和compute capabilitiy 3计算能力三级的设备可以调用必须有一个void type的返回值device在设备device上执行只有设备可以调用只能在设备上执行host在主机host上执行只有主机能调用只能在主机上执行__host__可以省略也就是说__host__是默认的修饰符
内核函数的特性
只能访问GPU memory必须返回void type不能用变长参数不能使用静态变量不能使用函数指针有异步性当内核函数执行时CPU上的程序可以和内核函数并行执行
demo1 能在GPU上运行并打印信息的内核函数
代码
#includestdio.h__global__ void HellofromGPU(){printf(Hello from GPU!\n);
}int main(){printf(Hello from CPU!\n);HellofromGPU1,6();//grid,blockcudaDeviceReset();//释放GPU资源return 0;
}注意文件后缀名应为.cu
nvcc ./main.cu -o main.exe然后运行使用nvcc编译的可执行文件
运行结果
lukeubuntu:~/workspace/TensorRT_course/02_kenel_demo$ ./main.exe
Hello from CPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!demo2 能在GPU上运行并打印thread 标识和thread ID的内核函数
代码
#includestdio.h
#includeiostream__global__ void hellowfromGPU(){printf(Hello from block(%d,%d,%d) thread(%d,%d,%d)\t thread ID is %d \n,blockIdx.x,blockIdx.y,blockIdx.z,threadIdx.x,threadIdx.y,threadIdx.z,threadIdx.x threadIdx.y * blockDim.x threadIdx.z * blockDim.x * blockDim.y);
}int main(){std::cout Hello from CPU!std::endl;//block被grid包含dim3 grid1(2,2,1);//指定name为grid1的grid内部含有2x2x14个block块dim3 block1(2,2,2);//指定name为block1的block内部含有2x2x28个thread// printf(Launching kernel width gridDim:%d %d %d blockDim:%d %d %d ,gridDim.x,gridDim.y,gridDim.z,blockDim.x,blockDim.y,blockDim.z);// 实际测试发现不能在CPU端直接访问gridDim和blockDimstd::cout Launching kernel...std::endl;hellowfromGPUgrid1,block1();//会调用4x832个thread并行运行//std::cout Work done!std::endl;cudaDeviceReset();std::cout Work done!std::endl;return 0;
}注意文件后缀名应为.cu
nvcc ./main.cu -o main.exe然后运行使用nvcc编译的可执行文件
运行结果
lukeubuntu:~/workspace/TensorRT_course/03_cuda_idx$ ./main.exe
Hello from CPU!
Launching kernel...
Hello from block(1,1,0) thread(0,0,0) thread ID is 0
Hello from block(1,1,0) thread(1,0,0) thread ID is 1
Hello from block(1,1,0) thread(0,1,0) thread ID is 2
Hello from block(1,1,0) thread(1,1,0) thread ID is 3
Hello from block(1,1,0) thread(0,0,1) thread ID is 4
Hello from block(1,1,0) thread(1,0,1) thread ID is 5
Hello from block(1,1,0) thread(0,1,1) thread ID is 6
Hello from block(1,1,0) thread(1,1,1) thread ID is 7
Hello from block(1,0,0) thread(0,0,0) thread ID is 0
Hello from block(1,0,0) thread(1,0,0) thread ID is 1
Hello from block(1,0,0) thread(0,1,0) thread ID is 2
Hello from block(1,0,0) thread(1,1,0) thread ID is 3
Hello from block(1,0,0) thread(0,0,1) thread ID is 4
Hello from block(1,0,0) thread(1,0,1) thread ID is 5
Hello from block(1,0,0) thread(0,1,1) thread ID is 6
Hello from block(1,0,0) thread(1,1,1) thread ID is 7
Hello from block(0,1,0) thread(0,0,0) thread ID is 0
Hello from block(0,1,0) thread(1,0,0) thread ID is 1
Hello from block(0,1,0) thread(0,1,0) thread ID is 2
Hello from block(0,1,0) thread(1,1,0) thread ID is 3
Hello from block(0,1,0) thread(0,0,1) thread ID is 4
Hello from block(0,1,0) thread(1,0,1) thread ID is 5
Hello from block(0,1,0) thread(0,1,1) thread ID is 6
Hello from block(0,1,0) thread(1,1,1) thread ID is 7
Hello from block(0,0,0) thread(0,0,0) thread ID is 0
Hello from block(0,0,0) thread(1,0,0) thread ID is 1
Hello from block(0,0,0) thread(0,1,0) thread ID is 2
Hello from block(0,0,0) thread(1,1,0) thread ID is 3
Hello from block(0,0,0) thread(0,0,1) thread ID is 4
Hello from block(0,0,0) thread(1,0,1) thread ID is 5
Hello from block(0,0,0) thread(0,1,1) thread ID is 6
Hello from block(0,0,0) thread(1,1,1) thread ID is 7
Work done!