NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
-
Updated
May 24, 2024 - C++
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A WebGL accelerated JavaScript library for training and deploying ML models.
WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.
A highly efficient implementation of Gaussian Processes in PyTorch
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Deep learning toolkit-enabled VLSI placement
Sionna: An Open-Source Library for Next-Generation Physical Layer Research
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Run, compile and execute JavaScript for Scientific Computing and Data Visualization TOTALLY TOTALLY TOTALLY in your BROWSER! An open source scientific computing environment for JavaScript TOTALLY in your browser, matrix operations with GPU acceleration, TeX support, data visualization and symbolic computation.
A high performance anime upscaler
Stretching GPU performance for GEMMs and tensor contractions.
Fast Neural Machine Translation in C++ - development repository
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
CUDA C++ Core Libraries
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
GPU-accelerated Deep Learning on Windows 10 native
Deep learning in Rust, with shape checked tensors and neural networks
A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers.
Add a description, image, and links to the gpu-acceleration topic page so that developers can more easily learn about it.
To associate your repository with the gpu-acceleration topic, visit your repo's landing page and select "manage topics."