Samples for CUDA Developers which demonstrates features in CUDA Toolkit
-
Updated
Apr 10, 2024 - C
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Deep learning in Rust, with shape checked tensors and neural networks
CUDA C++ Core Libraries
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
CUDA Kernel Benchmarking Library
Safe rust wrapper around CUDA toolkit
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Kernel Tuner
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
Some CUDA design patterns and a bit of template magic for CUDA
Spiking Neural Networks in C++ with strong GPU acceleration through CUDA
CUDA kernel author's tools
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
A tool for examining GPU scheduling behavior.
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
Implementation of ConjugateGradients method using C and Nvidia CUDA
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."