Pulse · pytorch/pytorch · GitHub

April 23, 2024 – April 30, 2024

Overview

166 Active pull requests

254 Active issues

1 Release published by 1 person

v2.3.0 PyTorch 2.3: User-Defined Triton Kernels in torch.compile, Tensor Parallelism in Distributed
published Apr 24, 2024

2 Pull requests merged by 2 people

[EZ] Get rid of utf-8 quotes
#124932 merged Apr 25, 2024
Specify the exact table we upload metrics to
#124321 merged Apr 23, 2024

164 Pull requests opened by 105 people

Add basic sanity checks for graph ops to cache key
#124745 opened Apr 23, 2024
Fix issue 112919
#124746 opened Apr 23, 2024
Update descriptor fields to resolve fft precision issue
#124756 opened Apr 23, 2024
Allow device tensors that use numpy for serialization to use weights_only unpickler
#124763 opened Apr 23, 2024
[FSDP] Errored on wrapping `ModuleList`/`ModuleDict`
#124764 opened Apr 23, 2024
[DCP] Adds storage metadata, and passes it during the save path
#124772 opened Apr 23, 2024
Add Sanity Testing to Pytorch Profiler
#124773 opened Apr 23, 2024
[sparse] Fix type-dispatch errors
#124777 opened Apr 23, 2024
Attach stack traces to plain callables when using aot_export_joint_simple
#124792 opened Apr 23, 2024
Meta kernel for _pack_padded_sequence
#124794 opened Apr 23, 2024
[rfc]: vendor in open-telemetry
#124800 opened Apr 23, 2024
[NT] Support NestedTensor in is_concrete_int
#124803 opened Apr 23, 2024
Include support for the scatter gather cuda kernels to allow for comp…
#124809 opened Apr 24, 2024
Upgrade nightly wheels to rocm6.1
#124811 opened Apr 24, 2024
[DONT MERGE][dynamo] Turn on inlining of nn modules
#124815 opened Apr 24, 2024
Prevent rendezvous shutdown on worker restarts
#124819 opened Apr 24, 2024
[inductor] add uint8 SDPA pattern
#124832 opened Apr 24, 2024
[privateuse1] _refs.masked_fill support privateuse1 when value.device.type is cpu
#124835 opened Apr 24, 2024
update xla pin
#124839 opened Apr 24, 2024
[RFC] Switch from black to ruff fmt
#124845 opened Apr 24, 2024
[not for land] check CI with symint set_() fixes
#124867 opened Apr 24, 2024
[dtensor] implement shard dim change with alltoall
#124872 opened Apr 24, 2024
[pipelining] Add util and debug facilities
#124875 opened Apr 24, 2024
[dtensor] delete the old unused mesh_alltoall
#124879 opened Apr 24, 2024
[dynamo][eval_frame] Create a dynamic wrapper fn to avoid cache collisions
#124881 opened Apr 24, 2024
Added FixedQParam per-tensor observer for weight tensors
#124883 opened Apr 24, 2024
Add Efficient Attention support on ROCM
#124885 opened Apr 24, 2024
[easy] remove some unecessary windows skips for mmap tests
#124891 opened Apr 24, 2024
[export] handle weight sharing in FQN mapping + unflattening
#124892 opened Apr 24, 2024
test_cuda.py test_grad_scaling_autocast_fused_optimizers migration to OptimizerInfo
#124893 opened Apr 24, 2024
[MPS] Remove in place views (causes too many crashes)
#124895 opened Apr 25, 2024
Implemented isin_Tensor_Tensor_out for MPS backend
#124896 opened Apr 25, 2024
[optim]fix ut and sgd kernel
#124904 opened Apr 25, 2024
[optim] add fused_adagrad support for CPU device
#124905 opened Apr 25, 2024
[onnx.export] Avoid linear look up in env for exist_in_env
#124909 opened Apr 25, 2024
[onnx.export] Cache SetGraphInputTypeReliable
#124912 opened Apr 25, 2024
test
#124916 opened Apr 25, 2024
Modify device check in capturable optimizer to support more devices
#124919 opened Apr 25, 2024
remove empty partition
#124920 opened Apr 25, 2024
Support aten operations with out tensor
#124926 opened Apr 25, 2024
[Inductor max autotune] Make autotune_select_algorithm more robust
#124928 opened Apr 25, 2024
[Inductor cutlass backend] Remove epilogue nodes from Kernel call
#124929 opened Apr 25, 2024
[Inductor cutlass backend] Fix cutlass_utils.get_max_alignment() for strided layouts.
#124930 opened Apr 25, 2024
[DCP] Provides default AsyncStager
#124939 opened Apr 25, 2024
[DCP] Move async logic into filesystem for better encapsulation
#124944 opened Apr 25, 2024
[export] disable_forced_specializations
#124949 opened Apr 25, 2024
[pipelining] Add stage backward function
#124958 opened Apr 25, 2024
Wrap the test func with try/except to always call destroy_process_group
#124961 opened Apr 25, 2024
[dtensor] Make distribute_tensor support distributing DTensors
#124962 opened Apr 25, 2024
[wip] add comm params in et
#124963 opened Apr 25, 2024
PT2 Inductor ComboKernels
#124969 opened Apr 25, 2024
Delete erroneous print
#124972 opened Apr 25, 2024
[dynamo] Add ID_MATCH guards on inlined functions to force compilation on monkeypatching
#124975 opened Apr 25, 2024
Fix bfloat16 serialization for ONNXProgram.save
#124977 opened Apr 25, 2024
[dynamo] support torchbind object input
#124978 opened Apr 25, 2024
WIP: [Inductor] log fusion failure due to loop orders
#124986 opened Apr 26, 2024
[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d
#124987 opened Apr 26, 2024
add meta for segment_reduce_backward
#124988 opened Apr 26, 2024
Print flexibility for tensor output
#124991 opened Apr 26, 2024
Remove Inductor IRs for legacy functional collectives
#124992 opened Apr 26, 2024
Enable UFMT on `test/test_datapipe.py`
#124994 opened Apr 26, 2024
[BE] Remove JNI from libtorch builds
#124995 opened Apr 26, 2024
[ROCm] Enable int_mm_error tests for rocm 6.0+
#124999 opened Apr 26, 2024
[Torch] Add more mm kernel choices
#125000 opened Apr 26, 2024
Convert `ForeachFuncInfo` to `dataclass`
#125001 opened Apr 26, 2024
Fix process group initialize twice error in distributed launcher test
#125006 opened Apr 26, 2024
Adding Compare in torch.utils.benchmark documentation
#125009 opened Apr 26, 2024
[DataLoader] Select available CUDA or 3rd devices automatically to pin memory
#125016 opened Apr 26, 2024
[AMD] [Draft] New inductor gemm configs
#125017 opened Apr 26, 2024
[BE] Remove static files from libtorch builds
#125027 opened Apr 26, 2024
[BE]: Update ruff to v0.4.2
#125031 opened Apr 26, 2024
[WIP] Introduce bind_unbacked HOP
#125034 opened Apr 26, 2024
[Profiler] Add TSC Clock Callback to CUPTI
#125036 opened Apr 26, 2024
[Torch][Timer] Skip expired timer logging for empty expired timers
#125039 opened Apr 26, 2024
[unwind] replace LONG_LONG_MAX by the portable LLONG_MAX
#125043 opened Apr 26, 2024
Remove caffe2 image and video
#125045 opened Apr 26, 2024
[WIP] Added a function to calculate the deterministic version of MaxPool3D …
#125048 opened Apr 26, 2024
[TD] Enable td on cpu windows
#125049 opened Apr 26, 2024
[Draft] Warning message when user calls unscriptable component
#125053 opened Apr 26, 2024
[aoti] Add pt2 package save/load (python)
#125054 opened Apr 26, 2024
[BE] Make macos build libtorch with wheel
#125060 opened Apr 26, 2024
[dynamo, 3.12] xfail refleaking tests due to buggy getattr_static
#125062 opened Apr 26, 2024
Skip ONNX optimization when it fails
#125063 opened Apr 26, 2024
Force specialization of bool in scaled_dot_product_attention
#125067 opened Apr 26, 2024
[ROCm] Implement forward AD for miopen_batch_norm
#125069 opened Apr 26, 2024
Updating optims and combining torch functions
#125071 opened Apr 26, 2024
Forcing specialization of bool from symbool
#125072 opened Apr 26, 2024
[DTensor] allow numel 1 tensor operand to be implicitly replicate DTensor
#125073 opened Apr 26, 2024
Adding state_dicts test to adam_test
#125076 opened Apr 26, 2024
forward fix preferred blas backend and windows CI
#125080 opened Apr 26, 2024
add uuid in cudaDeviceProperties
#125083 opened Apr 26, 2024
[dnl] add NCCL/PT debug log for S413673
#125085 opened Apr 27, 2024
[WIP] Implement matrix_exp Batching Rule
#125086 opened Apr 27, 2024
[inductor][easy] add buffer layout to SchedulerNode.debug_str
#125088 opened Apr 27, 2024
[inductor] add triton code to SchedulerNode.debug_str
#125089 opened Apr 27, 2024
Remove caffe2 db
#125092 opened Apr 27, 2024
[Test][Distributed] Make more tests multi-threaded.
#125095 opened Apr 27, 2024
[pruning]add dropout to list of supported activation functions
#125101 opened Apr 27, 2024
Enable clang-tidy coverage on torch/csrc/distributed/c10d/*
#125102 opened Apr 27, 2024
[WIP] make torch.amp.autocast more generic
#125103 opened Apr 27, 2024
Add extra cuda_to_hip_mappings.py
#125108 opened Apr 27, 2024
Allow linalg.lstsq to use svd to compute the result for rank deficient matrices.
#125110 opened Apr 28, 2024
Enable UFMT on test_indexing&test_view_ops
#125112 opened Apr 28, 2024
Use BFloat16 in distributed quantization when supported by NCCL
#125113 opened Apr 28, 2024
Move autocast op list to autocast_mode.h to make sure other backends can reuse it.
#125114 opened Apr 28, 2024
Add propagate_real_tensors mode for unbacked
#125115 opened Apr 28, 2024
Enable UFMT on `test_decomp.py`, `test_expanded_weights.py` and some files
#125117 opened Apr 28, 2024
Refactor and Fix Some Prombles on Autocast
#125118 opened Apr 28, 2024
[inductor] Check if n is the input tensor of conv_pointwise
#125119 opened Apr 28, 2024
[Storage_ipc] Provides IPC extensions for 3rd devices.
#125122 opened Apr 28, 2024
Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure
#125127 opened Apr 28, 2024
Fix exception handling in torch._dynamo.utils.same and add corresponding test
#125132 opened Apr 29, 2024
Fix bug in graph partitioner and update graph signature after partitioning.
#125133 opened Apr 29, 2024
[PT2][Optimus] Read the patterns from the config instead of hard-code passes
#125136 opened Apr 29, 2024
Add templated attention BLOCK_M & BLOCK_N default size for different head_dim
#125139 opened Apr 29, 2024
Remove Caffe2 python
#125143 opened Apr 29, 2024
Fix AttributeError when doing mock patch for FileTimerServerTest.test_expired_timers
#125144 opened Apr 29, 2024
add avx512 specialization for vec_shuffle_down
#125147 opened Apr 29, 2024
save the reciprocal of weights for welford_reduce
#125148 opened Apr 29, 2024
[inductor][cpp] move some common cpp utils to cpp_utils.py
#125152 opened Apr 29, 2024
Merge the pyi files into py files of optimizer
#125153 opened Apr 29, 2024
Test benchmark suite with data dependent options on for dynamic shapes
#125156 opened Apr 29, 2024
WIP save to cache
#125157 opened Apr 29, 2024
[inductor] autotune benchmark support for cpu
#125159 opened Apr 29, 2024
Renable running before
#125160 opened Apr 29, 2024
[MPS] And naive int8 and int4 Linear
#125163 opened Apr 29, 2024
temp
#125164 opened Apr 29, 2024
ignore verify placeholders
#125165 opened Apr 29, 2024
fix more invalid inputs, remove fuse
#125166 opened Apr 29, 2024
Disable running before and update error message to be less verbose
#125167 opened Apr 29, 2024
Enable running before
#125168 opened Apr 29, 2024
Enable fuse
#125169 opened Apr 29, 2024
[ATen][CUDA][AMP] Fix dtype mismatch in linalg_vector_norm
#125175 opened Apr 29, 2024
TEST: add some random logging
#125176 opened Apr 29, 2024
Add a space on APPEND to CUDA flags
#125178 opened Apr 29, 2024
Refactored _remove_auto_functionalization_from_graph_helper
#125180 opened Apr 29, 2024
Add `write_record_metadata` to PyTorchFileWriter
#125184 opened Apr 29, 2024
[export] Don't create a new fake mode if dynamo tracing
#125185 opened Apr 29, 2024
Use torch._check for safety assert in _reshape_view_helper
#125187 opened Apr 29, 2024
Don't short circuit if shape is same
#125188 opened Apr 29, 2024
[export] Fix for unflattening modules with duplicate tensors
#125192 opened Apr 29, 2024
[ez][CI] Move test_modules and test_schema_check off CI_SERIAL_LIST
#125193 opened Apr 29, 2024
Fix bug in get_update_constraint
#125194 opened Apr 29, 2024
Fix mem size mismatch from split/chunk in const folding
#125199 opened Apr 29, 2024
[compiled autograd] compile fwd in inference mode
#125201 opened Apr 29, 2024
[DONT MERGE][FOR CI][dynamo] Turn on guard_nn_modules
#125202 opened Apr 29, 2024
[dynamo] support inactive context managers across graph breaks
#125203 opened Apr 30, 2024
FP8 rowwise scaling
#125204 opened Apr 30, 2024
Fix PT2E Dynamic Quant regression
#125207 opened Apr 30, 2024
[quant][pt2e] Fix conv-bn weight + bias per channel QAT
#125208 opened Apr 30, 2024
Export `torch.jit.interface` from `torch.jit` package
#125209 opened Apr 30, 2024
[Don't Merge] dump indoctor build command.
#125210 opened Apr 30, 2024
Enable AOTI shim v2 build and add into libtorch
#125211 opened Apr 30, 2024
Intel GPU: specify the tolerance for torchbench models
#125213 opened Apr 30, 2024
fix loading optimizer options from archive
#125215 opened Apr 30, 2024
Require nnz==0 in sparse meta tensors
#125221 opened Apr 30, 2024
[autocast] using new autocast api with device name.
#125225 opened Apr 30, 2024
fix: typo
#125226 opened Apr 30, 2024
Fix logic to find sbgemm in BLAS library
#125227 opened Apr 30, 2024
Fix random_mps_impl to accept non-contiguous tensors
#125231 opened Apr 30, 2024
Change to fix cuda python on corp based cluster (#117789)
#125232 opened Apr 30, 2024
[AOTI] Update C-shim codegen to handle rocm
#125233 opened Apr 30, 2024
[ncclx] Rename NCCL-EXP to NCCLX
#125238 opened Apr 30, 2024
[Inductor] Properly package target info for triton.compile
#125241 opened Apr 30, 2024

116 Issues closed by 43 people

Cudnn 9 is out!
#119400 closed Apr 30, 2024
Add option to `torch.load(mmap=True)` to do `MAP_SHARED` rather than `MAP_PRIVATE`
#124528 closed Apr 30, 2024
DISABLED test_comprehensive_nn_functional_conv1d_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#123874 closed Apr 30, 2024
DISABLED test_super_resolution_cuda (__main__.TestModels)
#105332 closed Apr 30, 2024
DISABLED test_torchvision_models_efficientnet_v2_m (__main__.TestVisionTracing)
#124152 closed Apr 30, 2024
device_mesh / fsdp issue with _get_device_handle
#124327 closed Apr 30, 2024
Unable to use `torch.compile()` with triton's `TensorWrapper` in custom triton kernel
#124601 closed Apr 30, 2024
torch._dynamo.exc.Unsupported: comparison AutogradFunctionVariable() <built-in function is_not> ConstantVariable(NoneType)
#125140 closed Apr 30, 2024
ERROR: No matching distribution found for torchvision==0.6.0+cu121
#124587 closed Apr 30, 2024
[inductor][cpu]LoweringException: AttributeError: 'NoneType' object has no attribute 'get_origin_node' in 2024-04-20 nightly release
#124844 closed Apr 30, 2024
DISABLED test_torchvision_models_alexnet (__main__.TestVisionTracing)
#123908 closed Apr 30, 2024
problematic math backend for F.scaled_dot_product_attention in ROCm 6.0 when testing using vllm for generate
#119389 closed Apr 30, 2024
DISABLED test_open_device_registration (__main__.TestCppExtensionOpenRgistration)
#100152 closed Apr 29, 2024
How can i train my model with torchrun on multiple GPUs but without DDP?
#125012 closed Apr 29, 2024
[typing] `nn.Parameter` return type identified as `Tensor` by `pyright`
#125105 closed Apr 29, 2024
DISABLED test_max_autotune_remote_caching_dynamic_False (__main__.TestMaxAutotune)
#121166 closed Apr 29, 2024
DISABLED test_expand_cuda (__main__.TestUnbackedSymintsCUDA)
#124074 closed Apr 29, 2024
DISABLED test_binary_op_list_error_cases__foreach_add_cuda_bool (__main__.TestForeachCUDA)
#122900 closed Apr 29, 2024
DISABLED test_max_autotune_remote_caching_dynamic_True (__main__.TestMaxAutotune)
#121194 closed Apr 29, 2024
Dataloader codeowner
#124473 closed Apr 29, 2024
torch.export.export doesn't capture input argument names from the module's forward(...) function.
#122842 closed Apr 29, 2024
CI with >8G CUDA memory
#18856 closed Apr 29, 2024
No continuous integration coverage for Python 2 CUDA
#21467 closed Apr 29, 2024
Auto-applying labels if one-or-more Labels are applied?
#117051 closed Apr 29, 2024
[ignore this] Testing
#125189 closed Apr 29, 2024
Warning can not be disabled ?
#123626 closed Apr 29, 2024
Bug when indexing 2D tensors using an MPS device
#125100 closed Apr 29, 2024
Accessing the `device` attribute of bias terms of `TransformerEncoderLayer` initialized with `bias = False` causes Attribute error
#125015 closed Apr 29, 2024
[inductor] Get wrong results when supporting module buffer mutation
#124583 closed Apr 29, 2024
[dynamo] "step unsupported" graph break will make dynamo can't completely trace code after break
#125138 closed Apr 29, 2024
use bfloat16 on nvidia V100 GPU
#124996 closed Apr 29, 2024
doc link failure of `torch.compile`
#125123 closed Apr 29, 2024
DISABLED test_triton_kernel_extern_kernel_arg_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#118545 closed Apr 29, 2024
DISABLED test_triton_kernel_reinterpret_view_mem_leak_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#118652 closed Apr 29, 2024
DISABLED test_triton_kernel_multi_output_arg_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#118632 closed Apr 29, 2024
DISABLED test_triton_kernel_reinterpret_view_mem_leak_abi_compatible_cuda (__main__.AOTInductorTestABICompatibleCuda)
#118640 closed Apr 29, 2024
DISABLED test_triton_kernel_extern_kernel_arg_abi_compatible_cuda (__main__.AOTInductorTestABICompatibleCuda)
#118544 closed Apr 29, 2024
DISABLED test_torchvision_models_regnet_y_8gf (__main__.TestVisionTracing)
#123977 closed Apr 29, 2024
DISABLED test_torchvision_models_regnet_y_16gf (__main__.TestVisionTracing)
#123976 closed Apr 29, 2024
GuardOnDataDependent error with differentiable output that has data dependent size
#124766 closed Apr 28, 2024
Can't fine-tune a MeloTTS model on my dataset in Google Colab
#125128 closed Apr 28, 2024
`torch.autocast` produces confusing error message when passing `torch.device`
#124738 closed Apr 28, 2024
Running speechbrain demo on aarch64 with pytorch 2.1.2 is much slower than pytorch 1.10.0
#123143 closed Apr 28, 2024
[inductor][cpu]basic_gnn_gcn AMP static/dynamic shape default/cpp wrapper single thread performance regression
#123502 closed Apr 28, 2024
TOR901 lint is too aggressive
#125050 closed Apr 27, 2024
[numpy] Add torch.newdim/torch.newaxis
#65307 closed Apr 27, 2024
Compiled model raises error "attn_bias is not correctly aligned" in pytorch 2.2
#121943 closed Apr 27, 2024
Unexpected instruction types specified for 'sub' on TIMM seresnext26d_32x4d model
#118589 closed Apr 27, 2024
UNSTABLE rocm / linux-focal-rocm6.0-py3.8 / test (default)
#119908 closed Apr 27, 2024
[MPS] `torch.nextafter` incorrect handling of negative inputs
#124985 closed Apr 27, 2024
[INDUCTOR] [CPU] [GPT-FAST-MOE] large perf regression with coordinate_descent_tuning disabled
#124697 closed Apr 27, 2024
DISABLED test_comprehensive_fft_fft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#122715 closed Apr 26, 2024
[ONNX] Memory leak
#86518 closed Apr 26, 2024
torch dynamo's ORT backend uses "ort" dispatch key instead of "maia"
#124966 closed Apr 26, 2024
DISABLED test_all_to_all_single_inductor (__main__.TestFunctionalAutograd)
#123933 closed Apr 26, 2024
DISABLED test_load_tensor_cuda (__main__.TestContentStoreCUDA)
#123849 closed Apr 26, 2024
DISABLED test_buffer_mutation_3_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#123321 closed Apr 26, 2024
DISABLED test_equivalent_backed_unbacked_cuda (__main__.TestUnbackedSymintsCUDA)
#123947 closed Apr 26, 2024
"torch._dynamo.exc.Unsupported: torch.* op returned non-Tensor bool call_method is_complex" error
#122692 closed Apr 26, 2024
[v.2.3.0] Release Tracker
#121760 closed Apr 26, 2024
Add a GitHub actions workflow for Macos
#63466 closed Apr 26, 2024
a weird bug in torch.compile
#124817 closed Apr 26, 2024
h
#124993 closed Apr 26, 2024
DISABLED test_torchvision_models_efficientnet_b1 (__main__.TestVisionTracing)
#123889 closed Apr 26, 2024
DISABLED test_broadcast_tensors_cuda (__main__.TestUnbackedSymintsCUDA)
#123862 closed Apr 26, 2024
DISABLED test_autotuning_cuda (__main__.TestUnbackedSymintsCUDA)
#123729 closed Apr 26, 2024
DISABLED test_conv_unary_fusion_nnc (__main__.TestMkldnnFusion)
#123905 closed Apr 26, 2024
DISABLED test_output_misaligned_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#123818 closed Apr 26, 2024
DISABLED test_basic_cuda (__main__.TestContentStoreCUDA)
#100209 closed Apr 26, 2024
DISABLED test_torchvision_models_maxvit_t (__main__.TestVisionTracing)
#123918 closed Apr 26, 2024
DISABLED test_comprehensive_ones_cuda_int64 (__main__.TestInductorOpInfoCUDA)
#123837 closed Apr 26, 2024
DISABLED test_torchvision_models_efficientnet_b3 (__main__.TestVisionTracing)
#123891 closed Apr 26, 2024
DISABLED test_torchvision_models_efficientnet_b7 (__main__.TestVisionTracing)
#123890 closed Apr 26, 2024
DISABLED test_torchvision_models_densenet169 (__main__.TestVisionTracing)
#123907 closed Apr 26, 2024
DISABLED test_batched_mm_bfloat16_bs_10_cuda_bfloat16 (__main__.TestDecompCUDA)
#123728 closed Apr 26, 2024
[2.3 dynamic shapes] backend='inductor' raised: LoweringException: AssertionError: indices must be int64, byte or bool. Got [torch.float32]
#124006 closed Apr 26, 2024
Support benchmark fusion for TemplateKernel
#108716 closed Apr 26, 2024
DISABLED test_comprehensive_amax_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#124640 closed Apr 26, 2024
[onnx] support more combinations of args/kwargs as model inputs for pytorch-onnx converter
#81478 closed Apr 26, 2024
VecISA.__bool__ is very expensive (nearly a second) on startup
#100378 closed Apr 25, 2024
on MPS, torch.embedding, Linear and others raise: RuntimeError: Placeholder storage has not been allocated on MPS device!
#123995 closed Apr 25, 2024
Torch nightly wheels no longer include `torchgen` YAML files
#124941 closed Apr 25, 2024
Export serializes empty list as list of bools
#123480 closed Apr 25, 2024
[Dynamo] Unsupported: missing: DELETE_SUBSCR
#123317 closed Apr 25, 2024
XML test-reports get overwritten in case of retry
#123882 closed Apr 25, 2024
How to catch NCCL collective timeout in Python
#124887 closed Apr 25, 2024
Maybe consider vendoring opentelemetry by hand, rather than submodule
#124612 closed Apr 25, 2024
UNSTABLE rocm
#124951 closed Apr 25, 2024
Dataloader crashes with FileNotFoundError randomly when num_workers>0 on Ubuntu 22.04
#124903 closed Apr 25, 2024
torch.compile : RuntimeError: "foreach_tensor_copy" not implemented for 'Int'
#124170 closed Apr 25, 2024
pytorch Windows MKL cmake file don't support static link mkl.
#124869 closed Apr 25, 2024
Backwards with cat with data-dependent sizes doesn't work
#124652 closed Apr 25, 2024
Support CUDA 12.4
#104417 closed Apr 25, 2024
`Enum` used as a key of the input raises guards error
#111603 closed Apr 25, 2024
Find a bug from beta-released "scaled_dot_product_attention"
#124464 closed Apr 25, 2024
Interpolate nearest
#121390 closed Apr 25, 2024
[dynamo] Tracing triton kernel unexpectedly
#122768 closed Apr 25, 2024
Substitutions result in unbacked SymInts showing up before their definition sites
#123854 closed Apr 25, 2024
SDPA + torch.compile: (*bias): last dimension must be contiguous
#124289 closed Apr 24, 2024
Correctly handle `F.interpolate` upsample with amp
#121072 closed Apr 24, 2024
[ONNX] Discuss improvements to Diagnostic public API
#103713 closed Apr 24, 2024
topk, bmm, max ops - Multiple dispatch failed for 'torch.ops.aten.size'; all __torch_dispatch__ handlers returned NotImplemented:
#122772 closed Apr 24, 2024
Not loading optimizer state separately from checkpoint causes errors with FQNs
#124546 closed Apr 24, 2024
Switch `libopenblas` and `libopenblas-dev` to `libopenblas64` and `libopenblas64-dev`
#123534 closed Apr 24, 2024
Checkpointed function does not preserve `requires_grad` if output is a dataclass.
#124725 closed Apr 24, 2024
Release 2.3 manual validations
#123736 closed Apr 24, 2024
Validate cheerry-picks for release 2.3
#123734 closed Apr 24, 2024
`aminmax` will trigger INTERNAL ASSERT if input is empty on cuda
#85439 closed Apr 24, 2024
`test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16` is broken on A100
#124333 closed Apr 24, 2024
[PREEMPTIVE] Migration for ARC runners - Possible disrruption of jobs or increased queue times
#124831 closed Apr 24, 2024
[Profiler] Maybe append a kernel to an unrelated event.
#124388 closed Apr 24, 2024
schedular
#124367 closed Apr 24, 2024
AOT Inductor / Support dict[string][tensor] as the argument of the C++ AOTIModelContainerRunnerCpu::run functions (same for cuda)
#121785 closed Apr 23, 2024
Clean way to distinguish python subclass NT vs. C++ NT
#110543 closed Apr 23, 2024
Verbose log: [__aot_joint_graph] could not reconstruct view by re-applying a ViewMeta sequence.
#124499 closed Apr 23, 2024
In `_scaled_mm`, `scale_result` not changing result tensor at all
#119135 closed Apr 23, 2024

138 Issues opened by 87 people

DISABLED test_bmm_multithreaded (__main__.TestTorch)
#125240 opened Apr 30, 2024
Improved strategy for dealing with deterministically flaky tests which are order sensitive
#125239 opened Apr 30, 2024
torch.no_grad() is not working for dynamo inductor backend
#125236 opened Apr 30, 2024
[Inductor] [Distributed] DDP torch.compile model hangs on exit (python 3.8/3.9)
#125235 opened Apr 30, 2024
torch.Library can easily cause segfault on loading/unloading
#125234 opened Apr 30, 2024
ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1`
#125230 opened Apr 30, 2024
DISABLED test_variable_traverse (__main__.TestAutogradWithCompiledAutograd)
#125229 opened Apr 30, 2024
DISABLED test_issue106555 (__main__.TestCompiledAutograd)
#125228 opened Apr 30, 2024
Strange behavior of randint using device=cuda
#125224 opened Apr 30, 2024
torch.uniform_() is single-threaded on CPU
#125223 opened Apr 30, 2024
DISABLED test_var_mean_differentiable (__main__.TestAutogradWithCompiledAutograd)
#125220 opened Apr 30, 2024
DISABLED test_inplace_grad_update (__main__.TestCompiledAutograd)
#125219 opened Apr 30, 2024
DISABLED test_perfect_match_on_sequence_and_bool_attributes (__main__.TestFxToOnnx)
#125218 opened Apr 30, 2024
MaxPool2D memory leakage on device MPS
#125217 opened Apr 30, 2024
torch.inference_mode documentation not availble
#125216 opened Apr 30, 2024
[NT] Implementing Multi-Head Attention with NestedTensors
#125214 opened Apr 30, 2024
DISABLED test_type_conversions (__main__.TestAutogradWithCompiledAutograd)
#125206 opened Apr 30, 2024
DISABLED test_unused_output (__main__.TestAutogradWithCompiledAutograd)
#125205 opened Apr 30, 2024
[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`
#125198 opened Apr 29, 2024
Pytorch running on macosx-14.xlarge reports MPS is available when it is not.
#125197 opened Apr 29, 2024
DISABLED test_too_many_grads (__main__.TestAutogradWithCompiledAutograd)
#125195 opened Apr 29, 2024
multithreaded autograd backward doesn't respect autocast dtype context manager
#125186 opened Apr 29, 2024
torch/_refs/__init__.py is not autoreloadable in bento / jupyter notebook
#125183 opened Apr 29, 2024
QR Decomposition for Sparse Matrix
#125182 opened Apr 29, 2024
Mitigate pypi issue with space (short term)
#125179 opened Apr 29, 2024
Can't load on rank 0 only with `set_optimizer_state_dict`
#125177 opened Apr 29, 2024
[CUDA][AMP] Size-1 (scalar) norms are broken on CUDA + AMP following #122143
#125174 opened Apr 29, 2024
Flight Recorder Sequence IDs are insufficient
#125173 opened Apr 29, 2024
datettime.now() is not supported by Dynamo
#125171 opened Apr 29, 2024
Calling `get_model_state_dict/set_model_state_dict` requires forward pass for `_lazy_init`
#125170 opened Apr 29, 2024
`scale` parsed as `float` in ONNX `scaled_dot_product_attention` implementation
#125158 opened Apr 29, 2024
DISABLED test_comprehensive_special_bessel_y1_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#125151 opened Apr 29, 2024
DISABLED test_slice_expanded_v (__main__.TestAutogradWithCompiledAutograd)
#125149 opened Apr 29, 2024
Race condition in FileTimerServerTest.test_expired_timers
#125146 opened Apr 29, 2024
The "step unsupported" graph break will make dynamo can't completely trace code after break
#125141 opened Apr 29, 2024
Tensor.abs() gives incorrect results on Complex64 when using MPS
#125135 opened Apr 29, 2024
binaries/dump_operator_names.cc missing iostream include
#125134 opened Apr 29, 2024
DISABLED test_sharded_grad (__main__.TestAutogradWithCompiledAutograd)
#125130 opened Apr 29, 2024
FP6 dtype!
#125129 opened Apr 28, 2024
torch.is_signed on new uint dtypes raises Unknown ScalarType
#125124 opened Apr 28, 2024
Does torch.nn.Linear check the weights shape before assignment?
#125116 opened Apr 28, 2024
2.3.0 on Windows missing dependency?
#125109 opened Apr 27, 2024
Further explanation for `batch_isend_irecv`
#125099 opened Apr 27, 2024
RuntimeError: MPS: Unsupported Border padding mode
#125098 opened Apr 27, 2024
PyTorch Distributed Load Updates or Returns `state_dict`
#125096 opened Apr 27, 2024
Broken Docker Image on dockerhub
#125094 opened Apr 27, 2024
Triton installation not found.
#125093 opened Apr 27, 2024
[Distributed] P2P Operations on NCCL do not respect tag
#125079 opened Apr 26, 2024
`torch.compile` fails with `jacfwd` when multiplying/dividing float and tensor
#125078 opened Apr 26, 2024
[Inductor] Generate triton block pointers for discontiguous strided tensors
#125077 opened Apr 26, 2024
Inductor can not fuse cat with a pointwise
#125075 opened Apr 26, 2024
DISABLED test_setting_default_saved_variable_hooks_twice_should_use_inner (__main__.TestAutogradWithCompiledAutograd)
#125074 opened Apr 26, 2024
2.2.0+ regresses SDPA performance on Windows
#125070 opened Apr 26, 2024
ValueError: weight_norm of 'weight' not found in ParametrizedConvTranspose1d
#125064 opened Apr 26, 2024
404 on torch.inference_mode doc page
#125059 opened Apr 26, 2024
DISABLED test_sdpa_backwards_cuda_bfloat16 (__main__.TestNestedTensorSubclassCUDA)
#125058 opened Apr 26, 2024
DISABLED test_foreach_matches_forloop_RAdam_cpu_float64 (__main__.TestOptimRenewedCPU)
#125057 opened Apr 26, 2024
DISABLED test_select_expanded_v (__main__.TestAutogradWithCompiledAutograd)
#125056 opened Apr 26, 2024
DISABLED test_dynamic_shapes (__main__.TestCompiledAutograd)
#125055 opened Apr 26, 2024
MPS backend thinks that small floats are less than zero
#125051 opened Apr 26, 2024
Support auto_functionalized for None returns
#125044 opened Apr 26, 2024
[Feature Request] Support `dtype` arg in `torch._foreach_norm`
#125040 opened Apr 26, 2024
DISABLED test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64 (__main__.TestForeachCUDA)
#125035 opened Apr 26, 2024
Tensorboard's SummaryWriter.add_graph() doesn't work with packedSequence
#125033 opened Apr 26, 2024
Allow operators with no returns to be "out=" operations
#125030 opened Apr 26, 2024
OpInfo testing is inadequate for `nextafter`
#125028 opened Apr 26, 2024
CUDA memory summary in FSDP all_gather causes excessive noise.
#125025 opened Apr 26, 2024
DISABLED test_custom_fn_output_metadata (__main__.TestCompiledAutograd)
#125024 opened Apr 26, 2024
DISABLED test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32 (__main__.TestForeachCUDA)
#125022 opened Apr 26, 2024
DISABLED test_saved_variable_packing_unpacking_saved_original_with_default_hooks (__main__.TestAutogradWithCompiledAutograd)
#125023 opened Apr 26, 2024
DISABLED test_allocation_id_uniqueness (__main__.TestTorchTidyProfiler)
#125021 opened Apr 26, 2024
DISABLED test_unbacked_cat_backwards_cuda (__main__.TestInductorDynamicCUDA)
#125019 opened Apr 26, 2024
DISABLED test_save_with_without_initializer_dont_include_initializer_no_fake_mode_no_exported_program (__main__.TestFxToOnnx)
#125020 opened Apr 26, 2024
libtorch C++ windows: The specified module could not be found. mkl_vml_def.1.dll
#125013 opened Apr 26, 2024
Tensor can not be accessed!
#125010 opened Apr 26, 2024
Support exporting the compute graph JiT compiled by inductor?
#125007 opened Apr 26, 2024
RuntimeError: Unsupported qscheme: per_channel_affine. When using quantise_fx(
#125004 opened Apr 26, 2024
bool as scalar in jit ir API not working as expected.
#125003 opened Apr 26, 2024
When I set the operator weights all to 1, the operator weights change as soon as an input is made to the operator
#124998 opened Apr 26, 2024
[DTensor] Sharding strategy not implemented error should be thrown earlier
#124990 opened Apr 26, 2024
[DTensor] Keep track of data-dependent ops and skip _propagate_tensor_meta to go thru fake tensor
#124989 opened Apr 26, 2024
[Releng] Triton version for minor releases starting with 2.4
#124974 opened Apr 25, 2024
Export llama v3 to ONNX
#124973 opened Apr 25, 2024
Give the possibly to get back normal `Tensor`s as `MaskedTensor` gradients
#124964 opened Apr 25, 2024
`ascii` error during recompilation after cache limit is reached (likely due to `einsum`)
#124960 opened Apr 25, 2024
torch.onnx.export generates an incorrect bias shape for certain conv transpose models
#124956 opened Apr 25, 2024
PRs with unbalanced quotes can not be merged
#124953 opened Apr 25, 2024
[distributed] First NCCL barrier does not respect timeout
#124950 opened Apr 25, 2024
torch.compile fails on hugging face Mistral7b
#124946 opened Apr 25, 2024
Support torch.distributed.checkpoint.state_dict.get_model_state_dict/set_model_state_dict when torch dist is not initialized
#124942 opened Apr 25, 2024
Conflict between bias=False and why_not_sparsity_fast_path in Transformer Module
#124937 opened Apr 25, 2024
Tensor's storage changes computation outcome for CPU tensors.
#124934 opened Apr 25, 2024
[RFC] Support reinplaceble ops for custom ops in Inductor
#124933 opened Apr 25, 2024
Missing description of Transformer argument "memory_mask" shape in 3D (including the batch dimension) case
#124931 opened Apr 25, 2024
Time taken to data loading increased in newer builds (ARM)
#124922 opened Apr 25, 2024
Custom Operator Design for torch.compile: Must Output Tensors Always Be Returned?
#124918 opened Apr 25, 2024
Exporting torch slice_scatter to onnx Identity
#124915 opened Apr 25, 2024
[inductor][cpu]shufflenet_v2_x1_0 QAT performance regression in 2024-04-20 nightly release
#124913 opened Apr 25, 2024
DataLoader's pin_memory is default to CUDA if parameter pin_memory_device is not set
#124908 opened Apr 25, 2024
2.3.0 not backward compatible with torchdata
#124907 opened Apr 25, 2024
Many cases in distributed/elastic/multiprocessing/redirects_test.py fails when use pytest
#124906 opened Apr 25, 2024
Add support for IPC features for PrivateUse devices
#124902 opened Apr 25, 2024
`RuntimeError: invalid dtype for bias` when use compile + autocast
#124901 opened Apr 25, 2024
[dynamo] Crash when context manager object crosses a graph break
#124900 opened Apr 25, 2024
Certain .pyi files are not encoded as UTF-8 in Windows
#124897 opened Apr 25, 2024
Windows build step did not fail despite error
#124886 opened Apr 24, 2024
ONNX dynamic sized model export with torch.onnx.dynamo_export fails when torch.nn.functional.interpolate is used
#124884 opened Apr 24, 2024
Nested wrapper subclasses with torch.compile is broken
#124878 opened Apr 24, 2024
SDPA memory efficient kernel returns NaNs when the query and key are different lengths
#124877 opened Apr 24, 2024
tensor.dtype.to_complex() crashes kernel after ~100 calls in ipython kernel
#124868 opened Apr 24, 2024
support side effects in HOPs?
#124866 opened Apr 24, 2024
Deprecate unsupported types in operator registration
#124863 opened Apr 24, 2024
torch._dynamo.assume_constant_result does not work outside nn.Module
#124858 opened Apr 24, 2024
Use lintrunner-adapters for our adapters when possible
#124857 opened Apr 24, 2024
ShapeEnv canonicalization is still over-aggressive
#124855 opened Apr 24, 2024
[inductor] unexpected cuda:0 device usage when compiling and runing a model on cuda:1
#124854 opened Apr 24, 2024
aten::nonzero calls taking a huge amount of time when using MPS backend vs CPU
#124850 opened Apr 24, 2024
Forwardpropagation of high-dimensional tensors through the nn.Linear module becomes multiple times slower since pytorch 2.1.0
#124838 opened Apr 24, 2024
[inductor][cpu]RuntimeError: no channels last format strides exist in 1 dimensions in 2024-04-21 nightly release
#124837 opened Apr 24, 2024
nn.functional.ELU output differs on MPS vs CPU if input is noncontiguous
#124834 opened Apr 24, 2024
MPS RNG state fails to progress immediately after fork_rng
#124833 opened Apr 24, 2024
Add privateuse1 check on capturable optimizer
#124830 opened Apr 24, 2024
Compiled autograd doesn't support reading attribute from autograd context if the attribute is created in eager forward
#124827 opened Apr 24, 2024
CI is downloading model definitions and model weights from external sources
#124825 opened Apr 24, 2024
torch.compile error: Unsupported reduction type from torch.float32 to torch.int64
#124821 opened Apr 24, 2024
[RFC] Mix and Match CUDA Allocators using Private Pools
#124807 opened Apr 24, 2024
Dynamo Export Support for Qwen/Qwen-7B-Chat: Mutating module attribute _ntk_alpha_cached_list during export
#124796 opened Apr 23, 2024
Dynamo Export Support for Google/Gemma-2B: Mutating module attribute inv_freq during export
#124793 opened Apr 23, 2024
Caffe2 usage of cuDNN RNNv6 API blocks upgrade to cuDNN v9+
#124790 opened Apr 23, 2024
torch.nn.checkpoint.checkpoint ignores default device in backward() call
#124788 opened Apr 23, 2024
Scatter_add limitation when accumulate beyond 2^24 under float32 precision
#124783 opened Apr 23, 2024
Desne-sparse broadcasted multiplication fails in the backward pass
#124778 opened Apr 23, 2024
Release artifacts for rc releases
#124759 opened Apr 23, 2024
DISABLED test_comprehensive_randn_like_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#124758 opened Apr 23, 2024
DISABLED test_saved_variable_packing_unpacking_did_not_save_original_with_hooks (__main__.TestAutogradWithCompiledAutograd)
#124757 opened Apr 23, 2024
Disable PYTORCH_TEST_WITH_DYNAMO=1 tests over the AOTAutograd tests
#124750 opened Apr 23, 2024
Improve Dynamo (and other) flaky tests
#124749 opened Apr 23, 2024
TestAOTAutograd.test_mem_leak_from_save_for_bw fails locally but not on CI when run with dynamo
#124747 opened Apr 23, 2024

367 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[dynamo] Automatically convert loop bodies to function calls
#113538 commented on Apr 26, 2024 • 30 new comments
[Meta Tensor] fix meta inplace set storage
#123880 commented on Apr 29, 2024 • 25 new comments
[vision hash update] update the pinned vision hash
#123227 commented on Apr 30, 2024 • 24 new comments
[NT] Make NestedTensor register as having symbolic sizes/strides
#124687 commented on Apr 27, 2024 • 22 new comments
[executorch hash update] update the pinned executorch hash
#123043 commented on Apr 30, 2024 • 21 new comments
[traced-graph][sparse] propagate sparsity metadata into traced graph
#117907 commented on Apr 30, 2024 • 20 new comments
Setup initial testing harness and cache key generation for AOTAutograd Cache
#124642 commented on Apr 30, 2024 • 19 new comments
UserWarning: Plan failed with a cudnnException
#121834 commented on Apr 30, 2024 • 19 new comments
ARC dynamic rollout
#124721 commented on Apr 30, 2024 • 18 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Apr 30, 2024 • 16 new comments
Add CUDA 12.4 workflows
#121684 commented on Apr 30, 2024 • 16 new comments
Refresh OpOverloadPacket if a new OpOverload gets added
#124654 commented on Apr 26, 2024 • 16 new comments
Add helper function to decompose and inline nn module
#123683 commented on Apr 29, 2024 • 15 new comments
fix Invalid call to aoti_torch_tensor_copy_ #123039
#124037 commented on Apr 30, 2024 • 14 new comments
Add a cache mechanism to accelerate torch.compile-for-eager
#116368 commented on Apr 29, 2024 • 12 new comments
optim.apply_optimizer_in_backward does not account for gradient accumulation
#124523 commented on Apr 30, 2024 • 11 new comments
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on Apr 29, 2024 • 11 new comments
General MPS op coverage tracking issue
#77764 commented on Apr 30, 2024 • 10 new comments
Fakify script object inputs and attributes for non-strict export
#124239 commented on Apr 30, 2024 • 10 new comments
[dynamo] Refactor into torch/_inductor/runtime/compile_tasks.py
#124681 commented on Apr 30, 2024 • 9 new comments
[inductor] Remove usage of device_interface from _inductor.runtime
#124592 commented on Apr 30, 2024 • 8 new comments
[custom_op] use new python custom ops API on prims ops
#124665 commented on Apr 30, 2024 • 8 new comments
Verify types in custom op schemas
#124520 commented on Apr 26, 2024 • 8 new comments
[DeviceMesh] Fix hash and eq not match
#123572 commented on Apr 27, 2024 • 7 new comments
ARC runners timeout during Docker updates
#124727 commented on Apr 29, 2024 • 7 new comments
set_default_device/torch.device has performance impact for non-factory functions
#92701 commented on Apr 24, 2024 • 7 new comments
[DCP] Introduce async staging extension points
#122965 commented on Apr 25, 2024 • 7 new comments
[ROCm] TunableOp improvements
#124362 commented on Apr 30, 2024 • 7 new comments
Fixes two build problems on ROCM 6.1 + Ubuntu 22.04
#118216 commented on Apr 23, 2024 • 6 new comments
inductor: Add Conv3d support
#124361 commented on Apr 29, 2024 • 6 new comments
Allow tensor subclasses and add `torch.serialization.mark_safe_globals` that allows users to allowlist classes for `weights_only` load
#124331 commented on Apr 30, 2024 • 6 new comments
Fix typo under torch/_inductor directory
#119658 commented on Apr 30, 2024 • 6 new comments
[RFC] Autoload Device Extension
#122468 commented on Apr 26, 2024 • 6 new comments
[Memory Snapshot] Add recordAnnotations to capture record_function annotations
#124179 commented on Apr 26, 2024 • 6 new comments
Add a variable for some testcases.
#124708 commented on Apr 29, 2024 • 6 new comments
[DCP] Adds strict option to DefaultPlanner
#123869 commented on Apr 26, 2024 • 6 new comments
[c10d] abort a communicator at most once
#124436 commented on Apr 29, 2024 • 5 new comments
[guards][cpp-guards] Optimize NN module getattr guards
#124522 commented on Apr 30, 2024 • 5 new comments
Better core binding in torch.backends.xeon.run_cpu when launced from torchrun with --nproc-per-node
#123711 commented on Apr 30, 2024 • 4 new comments
Investigate torch.compile Windows support.
#122094 commented on Apr 30, 2024 • 4 new comments
Add LR as tensor tests
#123750 commented on Apr 26, 2024 • 4 new comments
Initial implementation of Inductor FX Graph Remote Cache
#124669 commented on Apr 23, 2024 • 4 new comments
Pytorch Conda nightly build failures due to timeout
#124667 commented on Apr 29, 2024 • 4 new comments
torch.Tensor.remainder raises a floating point exception when divisor is -1
#124644 commented on Apr 29, 2024 • 4 new comments
RuntimeError: derivative for aten::_scaled_dot_product_flash_attention_backward is not implemented
#116350 commented on Apr 29, 2024 • 4 new comments
Enable dynamo-traced optimizer peak memory tests
#124543 commented on Apr 25, 2024 • 4 new comments
[dynamo] Support numpy.dtype
#124481 commented on Apr 24, 2024 • 4 new comments
Fix the specific setting in documentation to match with elsewhere
#124463 commented on Apr 27, 2024 • 4 new comments
Fix for addcdiv contiguous problem
#124442 commented on Apr 25, 2024 • 4 new comments
[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat
#122667 commented on Apr 30, 2024 • 3 new comments
torch.compiled model output gets overwritten despite tensor.detach()
#104435 commented on Apr 24, 2024 • 3 new comments
Remove caffe2/
#122527 commented on Apr 30, 2024 • 3 new comments
[PT2D] Make the speedup benchmark works with DDP + CompiledAutograd
#121315 commented on Apr 24, 2024 • 3 new comments
[ONNX] STFT ExportProgram error
#113504 commented on Apr 26, 2024 • 3 new comments
DISABLED test_split_with_sizes_aot_autograd_cleans_up_traceback_meta (__main__.AotAutogradFallbackTests)
#122767 commented on Apr 26, 2024 • 3 new comments
DISABLED test_split_with_sizes_aot_autograd_cleans_up_traceback_meta_dynamic_shapes (__main__.DynamicShapesAotAutogradFallbackTests)
#122766 commented on Apr 26, 2024 • 3 new comments
Implement Copy-on-write (COW) tensors
#109833 commented on Apr 26, 2024 • 3 new comments
aten::_linalg_solve_ex.result' is not currently implemented for the MPS
#98222 commented on Apr 27, 2024 • 3 new comments
ARM libtorch links to libomp but libomp is no longer bundled
#124732 commented on Apr 29, 2024 • 3 new comments
DISABLED test_comprehensive_special_bessel_y1_cuda_int64 (__main__.TestInductorOpInfoCUDA)
#123919 commented on Apr 30, 2024 • 3 new comments
DISABLED test_free_activation_memory (__main__.TestCompiledAutograd)
#123949 commented on Apr 30, 2024 • 3 new comments
s390x: remove workaround for sleef issue
#124730 commented on Apr 25, 2024 • 3 new comments
Fast standalone symbolize for unwinding
#123966 commented on Apr 25, 2024 • 3 new comments
[ROCm][CI] upgrade CI to ROCm 6.1
#124300 commented on Apr 29, 2024 • 3 new comments
[draft] cuda 124 arm wheel test
#124112 commented on Apr 26, 2024 • 3 new comments
[DO NOT MERGE] Test new ROCm CI nodes
#124424 commented on Apr 30, 2024 • 3 new comments
[dynamo] Unexpected SymBool appearing in "is_causal" inside scaled_dot_product_attention()
#124707 commented on Apr 24, 2024 • 3 new comments
[torch.compile][FlopCounter] AssertionError: Global is not OptimizedModule._orig_mod
#124196 commented on Apr 23, 2024 • 3 new comments
[xla hash update] update the pinned xla hash
#124599 commented on Apr 29, 2024 • 3 new comments
DISABLED test_mark_non_differentiable (__main__.TestAutogradWithCompiledAutograd)
#124470 commented on Apr 30, 2024 • 2 new comments
Add aten._unsafe_masked_index
#116491 commented on Apr 29, 2024 • 2 new comments
DISABLED test_buffer_mutation_3_abi_compatible_cuda (__main__.AOTInductorTestABICompatibleCuda)
#123251 commented on Apr 28, 2024 • 2 new comments
torch.load with weights_only=True to support pickle protocol 3/4/5
#118166 commented on Apr 25, 2024 • 2 new comments
Revisit security implications of #31875
#111806 commented on Apr 27, 2024 • 2 new comments
[Environment Variable][1/N] Use thread-safe env variable API in c10
#119449 commented on Apr 24, 2024 • 2 new comments
sdp::SDPBackend::flash_attention support PrivateUse1
#124368 commented on Apr 30, 2024 • 2 new comments
DISABLED test_isolated_node (__main__.TestAutogradWithCompiledAutograd)
#124460 commented on Apr 26, 2024 • 2 new comments
[inductor] switch assume_aligned_inputs to False
#124336 commented on Apr 29, 2024 • 2 new comments
Torch compile does not work on python 3.12
#120233 commented on Apr 26, 2024 • 2 new comments
Investigate Strictness of torch.compile `is_big_gpu`
#109489 commented on Apr 26, 2024 • 2 new comments
Should be able to query a schema for HOPs
#119592 commented on Apr 26, 2024 • 2 new comments
CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx
#124262 commented on Apr 26, 2024 • 2 new comments
Optionally use hipblaslt
#120551 commented on Apr 26, 2024 • 2 new comments
[Profiler] NCCL collectives text garbled and times reported in ns
#124102 commented on Apr 24, 2024 • 2 new comments
Ensure only builtins functions are wrapped in new frame for torch.compile
#124720 commented on Apr 30, 2024 • 2 new comments
Avoid using thrust:: directly, use THRUST_NS_QUALIFIER:: instead
#72582 commented on Apr 30, 2024 • 2 new comments
RX 6800 GPU reset when using ROCm in Stable Diffusion with Torch backend (not sure if relevant)
#120775 commented on Apr 30, 2024 • 2 new comments
GroupNorm & InstanceNorm does not handle channels_last correctly
#111824 commented on Apr 29, 2024 • 2 new comments
Update CUDA out of memory mesage with private pool info
#124673 commented on Apr 24, 2024 • 2 new comments
Fails to compile with nvidia-cuda-toolkit-12.4.0
#122169 commented on Apr 30, 2024 • 2 new comments
DISABLED test_multi_backward (__main__.TestAutogradWithCompiledAutograd)
#124491 commented on Apr 30, 2024 • 2 new comments
Unbacked SymInts: Should backwards graph with unbacked SymInts be recompiled with hints
#124686 commented on Apr 23, 2024 • 2 new comments
ROCm & Windows Support
#106608 commented on Apr 29, 2024 • 2 new comments
torch.compile does not work since 2.2.1 on MacOS for some models
#124497 commented on Apr 23, 2024 • 2 new comments
TypeError: unhashable type: non-singleton SymInt in AOTAutograd merge_view_inputs
#114366 commented on Apr 29, 2024 • 2 new comments
Correct error message for aten::_local_scalar_dense on meta tensor
#124554 commented on Apr 24, 2024 • 2 new comments
benchmark.Compare raises: TypeError: object of type 'NoneType' has no len()
#63971 commented on Apr 29, 2024 • 2 new comments
[RFC] PyTorch DistributedTensor
#88838 commented on Apr 29, 2024 • 2 new comments
[triton hash update] update the pinned triton hash
#115529 commented on Apr 29, 2024 • 2 new comments
Fix common_utils's retry decorator, add run_tests call to test_hub
#116067 commented on Apr 30, 2024 • 2 new comments
DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes
#13246 commented on Apr 28, 2024 • 2 new comments
[2/N] Non-Tensor: Scalar Support: Add scalar to the cache for eager-through-torch.compile
#124070 commented on Apr 28, 2024 • 2 new comments
input.is_sparse() INTERNAL ASSERT FAILED
#120989 commented on Apr 24, 2024 • 2 new comments
No factory functions for strided quantized tensors
#74540 commented on Apr 25, 2024 • 2 new comments
scatter_reduce method do not support complex number multiplication on CUDA
#121965 commented on Apr 24, 2024 • 2 new comments
Placing LSTM model on bfloat16 on GPU causes error
#88136 commented on Apr 25, 2024 • 2 new comments
Add 2nd shard to ROCm trunk workflow for core distributed UTs
#121716 commented on Apr 25, 2024 • 2 new comments
`torch.distributed` hangs when using `torch.distributed.barrier` before any other communication primitives.
#124714 commented on Apr 25, 2024 • 2 new comments
[Inductor][Quant] Change the QConv output scale name
#124246 commented on Apr 30, 2024 • 2 new comments
upstream `apex.normalization.FusedRMSNorm`
#72643 commented on Apr 25, 2024 • 2 new comments
Grad strides do not match bucket view strides.
#47163 commented on Apr 25, 2024 • 2 new comments
Improving format of communication metadata in PyTorch Execution Trace.
#124674 commented on Apr 25, 2024 • 2 new comments
Grad strides do not match bucket view strides
#83909 commented on Apr 25, 2024 • 2 new comments
Batching rule for `aten::_scaled_dot_product_efficient_attention`
#102457 commented on Apr 25, 2024 • 2 new comments
[Performance] Potential Performance optimization for SDPA
#100270 commented on Apr 26, 2024 • 2 new comments
Fixed an undefined combination of inputs for torch.fmod.
#120624 commented on Apr 27, 2024 • 2 new comments
make tensor data const correct
#97856 commented on Apr 26, 2024 • 2 new comments
profiler.export_stacks doesn't return stack trace unless experimental_config is provided
#100253 commented on Apr 30, 2024 • 1 new comment
Clang tidy torch csrc16
#120573 commented on Apr 25, 2024 • 1 new comment
Skip fx passes for split-cat with Node dims
#124629 commented on Apr 30, 2024 • 1 new comment
Tensor.nonzero fails on GPU for tensors containing more than INT_MAX elements
#51871 commented on Apr 30, 2024 • 1 new comment
Massive initial memory overhead GPU
#12873 commented on Apr 30, 2024 • 1 new comment
Request for deterministic support for reflection_pad2d_backward_cuda
#98925 commented on Apr 30, 2024 • 1 new comment
Prefer construction via DLPack to costly element-by-element copy
#120615 commented on Apr 28, 2024 • 1 new comment
Upgrade submodule oneDNN to v3.4
#122472 commented on Apr 30, 2024 • 1 new comment
RuntimeError: reflection_pad2d_backward_cuda does not have a deterministic implementation
#123843 commented on Apr 30, 2024 • 1 new comment
Fix a check message in pickler
#120701 commented on Apr 27, 2024 • 1 new comment
While loop autograd
#124573 commented on Apr 26, 2024 • 1 new comment
functionalize storage resizing, minimal ppFSDP traceable forward
#122434 commented on Apr 25, 2024 • 1 new comment
[dtensor] from_local broadcast use functional collective
#120457 commented on Apr 24, 2024 • 1 new comment
Remove dtype check on meta device
#120634 commented on Apr 26, 2024 • 1 new comment
Connection closed by peer when using dist.isend in gloo backend
#75512 commented on Apr 30, 2024 • 1 new comment
[feature request] Rank-Revealing QR - Adding dgeqp3 support to torch.qr
#10454 commented on Apr 30, 2024 • 1 new comment
Some fixups
#124658 commented on Apr 24, 2024 • 1 new comment
torch native functions cannot be used with inspect.signature
#28233 commented on Apr 30, 2024 • 1 new comment
Make CI less noisy
#124664 commented on Apr 29, 2024 • 1 new comment
Cannot re-initialize CUDA in forked subprocess
#40403 commented on Apr 30, 2024 • 1 new comment
[Quant][Inductor] Enable lowering of qlinear-binary(-unary) fusion for X86Inductor
#122593 commented on Apr 29, 2024 • 1 new comment
Fix DDP no_sync when find_unused_parameters is True
#124193 commented on Apr 26, 2024 • 1 new comment
DISABLED test_index (__main__.TestPythonBuiltinOP)
#119160 commented on Apr 30, 2024 • 1 new comment
torch.utils.cpp_extension.load recompiling every time
#124454 commented on Apr 30, 2024 • 1 new comment
Fix absolute links in pytorch repository and allow it to be proxied
#101798 commented on Apr 30, 2024 • 1 new comment
[dynamo] Function => FunctionCtx for placeholder obj
#120577 commented on Apr 29, 2024 • 1 new comment
[1/N] Non-Tensor: Scalar Support: Enable aot compile to support aten operations with scalar input like alpha
#124177 commented on Apr 28, 2024 • 1 new comment
[tensor] Replace raw loops with std::reduce for size calc.
#120580 commented on Apr 25, 2024 • 1 new comment
Understand the oneDNN graph fusion with torch script
#124458 commented on Apr 30, 2024 • 1 new comment
[inductor] add cpp builder code.
#124045 commented on Apr 30, 2024 • 1 new comment
Prevent cuda:0 context initialization when working on another cuda device
#124722 commented on Apr 24, 2024 • 1 new comment
Set simdlen based on ATEN_CPU_CAPABILITY
#123514 commented on Apr 30, 2024 • 1 new comment
[WIP] support map impl in pre-dispatch IR
#120159 commented on Apr 30, 2024 • 1 new comment
[executorch] Add support for method variant functions in ExecuTorch codegen
#120840 commented on Apr 30, 2024 • 1 new comment
Fix dynamo issue "Failed running call_function <built-in method sparse_coo_tensor of type object at 0xDEADBEEF"
#118192 commented on Apr 28, 2024 • 1 new comment
ProcessGroupWrapper support custom backend
#124447 commented on Apr 28, 2024 • 1 new comment
[dynamo] fix compiling Dataclass construction with default_factory
#120827 commented on Apr 29, 2024 • 1 new comment
[MPS] Add support for max_unpool2d
#118665 commented on Apr 26, 2024 • 1 new comment
[ROCm] hipSPARSELt Integration
#124320 commented on Apr 30, 2024 • 1 new comment
[ROCm] amdsmi library integration
#119182 commented on Apr 24, 2024 • 1 new comment
[typing] Rename argument of `nn.Sequential.forward` from `input` to `__input`
#119209 commented on Apr 28, 2024 • 1 new comment
Fix missing parameter check in at::batch_norm
#119361 commented on Apr 29, 2024 • 1 new comment
Fix stream type to generic in comms default hooks
#120069 commented on Apr 25, 2024 • 1 new comment
[Distributed] Add P2P versions of *object_list operations
#124379 commented on Apr 26, 2024 • 1 new comment
[Inductor][AMD] Enable pipeliner for Gemm
#120637 commented on Apr 26, 2024 • 1 new comment
[Don't merge] Refactor device bound check for xpu code
#120768 commented on Apr 29, 2024 • 1 new comment
[fbcode] Upstream parallel fast cat on cpu in OSS cat op
#120753 commented on Apr 28, 2024 • 1 new comment
Update expecttest in conda env
#120711 commented on Apr 27, 2024 • 1 new comment
DISABLED test_mm_batching (__main__.TestScript)
#119747 commented on Apr 30, 2024 • 1 new comment
Add default values to PyTorchMemEffAttention::AttentionKernel::Params members
#112215 commented on Apr 30, 2024 • 1 new comment
[sym_shapes][perf] Optimize bound_sympy avoiding sympy equals
#124211 commented on Apr 23, 2024 • 1 new comment
Fix `as_strided` functionalization for lazy backend.
#120435 commented on Apr 28, 2024 • 1 new comment
[dynamo] Handle np.iinfo/finfo/dtype as input
#124482 commented on Apr 24, 2024 • 1 new comment
Add dist hooks support for custom device
#114730 commented on Apr 27, 2024 • 1 new comment
Add back non standard shapes test samples for SDPA in common_methods_…
#115464 commented on Apr 26, 2024 • 1 new comment
[FAILURE] quantized test
#120941 commented on Apr 30, 2024 • 1 new comment
Enable test_embedding_bag_device_* with PYTORCH_TEST_WITH_DYNAMO
#120884 commented on Apr 29, 2024 • 1 new comment
Add _to_copy op for jagged NT
#115749 commented on Apr 29, 2024 • 1 new comment
[dynamo] fix silent incorrectness caused by variable tracker caching
#120861 commented on Apr 29, 2024 • 1 new comment
Some update
#124450 commented on Apr 29, 2024 • 1 new comment
[not4land] Batch norm consolidation disable xfails/skips
#120844 commented on Apr 29, 2024 • 1 new comment
Add `torch._dynamo.is_fullgraph_compiling` to allow different codepath depending on fullgraph tracing
#120400 commented on Apr 26, 2024 • 1 new comment
[WIP] inductor use rand4x
#117125 commented on Apr 30, 2024 • 1 new comment
Hacks to work around the fact that ScriptMethod does not have code/signature
#124449 commented on Apr 29, 2024 • 1 new comment
[inductor][cpu]GPT2ForSequenceClassification AMP static/dynamic shape default/cpp wrapper single thread accuracy crash
#123503 commented on Apr 24, 2024 • 1 new comment
Package manager install on Nvidia Grace Hopper does not make cuda available
#123835 commented on Apr 24, 2024 • 1 new comment
HOP dispatch isn't faithful
#124484 commented on Apr 24, 2024 • 1 new comment
Placeholder tensor is empty!
#123171 commented on Apr 24, 2024 • 1 new comment
Dynamo Export: Support for PixelShuffle
#124338 commented on Apr 24, 2024 • 1 new comment
arm64-v8a not compiling due to libpytorch_jni.so
#51020 commented on Apr 24, 2024 • 1 new comment
torch.compiler.disable doesn't disable nested functions (also doesn't work as a context manager)
#123771 commented on Apr 24, 2024 • 1 new comment
c10::CUDAError
#67978 commented on Apr 25, 2024 • 1 new comment
Batch size is hardcoded using torch.jit.trace with LSTMCell
#59530 commented on Apr 25, 2024 • 1 new comment
NCCL error of PyTorch 2.1.0 when using multiple gpus
#113245 commented on Apr 25, 2024 • 1 new comment
Doesn't work when register hook to torch.nn.MultiheadAttention.out_proj
#78109 commented on Apr 25, 2024 • 1 new comment
jit.freeze throws RuntimeError: stack_out && stack_out->size() == 1 INTERNAL ASSERT FAILED at "../torch/csrc/jit/passes/frozen_conv_folding.cpp":281
#80861 commented on Apr 25, 2024 • 1 new comment
calling nn.utils.parametrize inside torch.compile leads to error
#115744 commented on Apr 25, 2024 • 1 new comment
torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict)
#120219 commented on Apr 25, 2024 • 1 new comment
Update test_cuda.py and test_torch.py optim tests to use OptimizerInfo and optim_db
#123451 commented on Apr 25, 2024 • 1 new comment
PyTorch 2.0.0 encountered CUDA error: an illegal memory access was encountered
#99372 commented on Apr 25, 2024 • 1 new comment
Transformer Engine Checkpointing Broken on Torch 2.3
#122946 commented on Apr 25, 2024 • 1 new comment
[inductor][cpu]pyhpc_turbulent_kinetic_energy AMP multithread static/dynamic shape default/cpp wrapper performance regression
#123801 commented on Apr 26, 2024 • 1 new comment
orch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::unflatten' to ONNX opset version 12 is not supported.
#124716 commented on Apr 26, 2024 • 1 new comment
masked_fill supports PrivateUse1, when value.device.type is cpu
#124693 commented on Apr 26, 2024 • 1 new comment
Discrepancy between CPU->GPU and GPU->CPU data transfer speeds
#52718 commented on Apr 26, 2024 • 1 new comment
Advanced indexing with uint8 tensor versus int64 tensor is inconsistent
#20149 commented on Apr 26, 2024 • 1 new comment
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on Apr 26, 2024 • 1 new comment
DISABLED test_inplace_on_view_weak_grad_fn (__main__.TestAutogradWithCompiledAutograd)
#124453 commented on Apr 26, 2024 • 1 new comment
Unexpected modification to CPU affinity of Dataloader workers
#101850 commented on Apr 26, 2024 • 1 new comment
fusion in fx graph mode did not take care of direct attribute access
#68892 commented on Apr 26, 2024 • 1 new comment
DTensor + compile error's during backward when output is non-contiguous
#118219 commented on Apr 23, 2024 • 1 new comment
Significant performance degradation with multiprocessing in PyTorch 2.x compared to 1.13.1
#122626 commented on Apr 23, 2024 • 1 new comment
DISABLED test_aot_export_module_joint (__main__.TestAOTExport)
#124166 commented on Apr 23, 2024 • 1 new comment
DISABLED test_source_multithreaded_complex_work_in_main_thread_True (__main__.TestProfiler)
#119536 commented on Apr 23, 2024 • 1 new comment
torch_dispatch has unfaithful behavior w.r.t. wrapped numbers
#124731 commented on Apr 23, 2024 • 1 new comment
Importing polars before torch causes a segfault
#124656 commented on Apr 23, 2024 • 1 new comment
[torch.compile] torch._dynamo.exc.TorchRuntimeError: Failed running call_function <method 'numpy' of 'torch._C.TensorBase' objects>(*(FakeTensor(..., size=(32, 3, 64, 64)),), **{})
#124247 commented on Apr 23, 2024 • 1 new comment
`test_scatter_bf16_cuda` fails on V100
#118581 commented on Apr 23, 2024 • 1 new comment
DISABLED test_sparse_tensors (__main__.TestTorchTidyProfiler)
#124253 commented on Apr 23, 2024 • 1 new comment
Find a common home for decompositions, perhaps outside of the obliquely named _refs directory
#124427 commented on Apr 23, 2024 • 1 new comment
DISABLED test_save_on_cpu_and_checkpoint (__main__.TestAutogradWithCompiledAutograd)
#124706 commented on Apr 23, 2024 • 1 new comment
DISABLED test_forward_mode_AD_linalg_lu_cuda_float64 (__main__.TestFwdGradientsCUDA)
#86774 commented on Apr 23, 2024 • 1 new comment
DISABLED test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks (__main__.TestAutogradWithCompiledAutograd)
#124733 commented on Apr 23, 2024 • 1 new comment
DISABLED test_saved_tensor_hooks_custom_function_intermediates (__main__.TestAutogradWithCompiledAutograd)
#124723 commented on Apr 23, 2024 • 1 new comment
DISABLED test_inplace (__main__.TestAutogradWithCompiledAutograd)
#124446 commented on Apr 23, 2024 • 1 new comment
Add BufferDict container
#37386 commented on Apr 23, 2024 • 1 new comment
DISABLED test_source_multithreaded_multiple_preexisting_work_in_main_thread_True (__main__.TestProfiler)
#119576 commented on Apr 23, 2024 • 1 new comment
DISABLED test_source_multithreaded_open_in_scope_work_in_main_thread_True (__main__.TestProfiler)
#119668 commented on Apr 23, 2024 • 1 new comment
aot_export_joint_simple on plain callable (not graph module) doesn't attach stack traces
#102205 commented on Apr 23, 2024 • 1 new comment
MPS memory leak in training
#121113 commented on Apr 24, 2024 • 1 new comment
[nightly][jit] bad constant exponent (e+38.f) in default_program fused_mul_div_add
#107503 commented on Apr 24, 2024 • 1 new comment
Change `GradScaler` to respect an existing `grad_scale` value.
#123428 commented on Apr 24, 2024 • 1 new comment
CudaHostAlloc takes a lot of time during training
#124456 commented on Apr 24, 2024 • 1 new comment
RuntimeError: derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented
#117974 commented on Apr 24, 2024 • 1 new comment
ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed.
#123097 commented on Apr 24, 2024 • 1 new comment
[functorch] transforms like jacrev, jacfwd, grad, etc don't work with BatchNorm
#85533 commented on Apr 24, 2024 • 1 new comment
Export swallows exception
#111075 commented on Apr 27, 2024 • 1 new comment
Multi Scale Deformable Attention Support
#112827 commented on Apr 27, 2024 • 1 new comment
FSDP crashes when submodule calls method that isn't `forward()`
#109385 commented on Apr 28, 2024 • 1 new comment
[inductor][cpu] FP32/AMP models multiple/single thread static/dynamic shape default/CPP wrapper accuracy crash in 2024-04-14 nightly release
#124286 commented on Apr 28, 2024 • 1 new comment
Custom ROCm hip and C++ extensions (replicated from pytorch/tutorials)
#119429 commented on Apr 29, 2024 • 1 new comment
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Apr 29, 2024 • 1 new comment
ONNX export is unnecessarily slow (O(N^2))
#121422 commented on Apr 28, 2024 • 1 new comment
[RFC] PyTorch next wheel build platform: manylinux-2.28
#123649 commented on Apr 28, 2024 • 1 new comment
vec_test_all_types_xxx with dtype c10::complex<float> and c10::complex<double> has failures on division
#104516 commented on Apr 29, 2024 • 1 new comment
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
#104259 commented on Apr 29, 2024 • 1 new comment
RecursionError when running torch.jit.script inside JitTestCase
#76881 commented on Apr 29, 2024 • 1 new comment
Add a requirements.txt for windows pip packages
#103354 commented on Apr 29, 2024 • 1 new comment
PyTorch Memory Management in GPU-to-CPU Transfers issue
#124487 commented on Apr 29, 2024 • 1 new comment
Compile doesn't guard on user NN module attribute
#124717 commented on Apr 29, 2024 • 1 new comment
Error: IndexError: map::at When using torch.distributed.all_reduce(tensor)
#116393 commented on Apr 29, 2024 • 1 new comment
IndexError: map::at with MPI CUDA collectives
#114040 commented on Apr 29, 2024 • 1 new comment
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#32867 commented on Apr 29, 2024 • 1 new comment
DISABLED test_leaf_assignment (__main__.TestAutogradWithCompiledAutograd)
#124405 commented on Apr 29, 2024 • 1 new comment
[FX] Ability to wrap functions in other modules for symbolic tracing
#53534 commented on Apr 29, 2024 • 1 new comment
switch more test cases to use MultithreadTestCase
#108744 commented on Apr 27, 2024 • 1 new comment
Support using SymBool in arithmetics
#110738 commented on Apr 27, 2024 • 1 new comment
Loading traced pytorch model to C++
#124009 commented on Apr 27, 2024 • 1 new comment
[RFC] Dynamo Single Step Graph
#117394 commented on Apr 29, 2024 • 1 new comment
large model, low memory: need `torch.load` that loads one submodule at a time
#75242 commented on Apr 29, 2024 • 1 new comment
torch.onnx： operator 'aten::unflatten' to ONNX is not supported.
#121301 commented on Apr 26, 2024 • 1 new comment
Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices
#117122 commented on Apr 27, 2024 • 1 new comment
Libtorch crashes docker when included in header file
#124197 commented on Apr 27, 2024 • 1 new comment
Segfault in TCPStore and FileStore compare_set()
#123983 commented on Apr 29, 2024 • 1 new comment
[Meta Tensor] Inplace set storage of meta tensor will alter the storage's nbytes if meta tensor's nbytes is smaller
#123879 commented on Apr 29, 2024 • 1 new comment
DISABLED test_mark_non_differentiable_none (__main__.TestAutogradWithCompiledAutograd)
#124475 commented on Apr 30, 2024 • 1 new comment
Registering function that takes `const SymInt&` to op that accepts `SymInt` leads to cryptic error
#124645 commented on Apr 23, 2024 • 0 new comments
DISABLED test_default_partitioner_output_tensor_shape_tensor (__main__.TestPartitioning)
#124355 commented on Apr 29, 2024 • 0 new comments
DISABLED test_contiguous (__main__.TestPartitioning)
#124323 commented on Apr 29, 2024 • 0 new comments
DISABLED test_default_partitioner_getitem (__main__.TestPartitioning)
#124278 commented on Apr 29, 2024 • 0 new comments
DISABLED test_aot_export_simplified_basic (__main__.TestAOTExport)
#124254 commented on Apr 29, 2024 • 0 new comments
DISABLED test_aot_export_multiple_outputs_require_grad_banned (__main__.TestAOTExport)
#124221 commented on Apr 29, 2024 • 0 new comments
Conv with permutation on MPS will lead to negative MSE loss
#124621 commented on Apr 24, 2024 • 0 new comments
mps bug: failed assertion `[MPSNDArrayDescriptor sliceDimension:withSubrange:] error: subRange.start (6) is not less than length of dimension[0] (6)'
#96153 commented on Apr 24, 2024 • 0 new comments
[inductor][cpu]DebertaV2ForQuestionAnswering AMP static/dynamic shape multiple thread default wrapper regression
#122390 commented on Apr 24, 2024 • 0 new comments
torch.quantile on MPS doesn't sort values when dim is not None
#101878 commented on Apr 24, 2024 • 0 new comments
Foreach tests should xfail on all dtypes that are not supported
#124726 commented on Apr 23, 2024 • 0 new comments
[DONOTREVIEW][DTenosr][Test] DTensor 2D sharding
#124339 commented on Apr 23, 2024 • 0 new comments
[inductor][cpu]adv_inception_v3, gluon_inception_v3 and inception_v3 AMP performance regression
#122393 commented on Apr 24, 2024 • 0 new comments
squash of flight_5 vs flightbase
#124229 commented on Apr 25, 2024 • 0 new comments
torch._export can't export resnet50 model
#124595 commented on Apr 23, 2024 • 0 new comments
flight51 squashed vs flightbase
#124236 commented on Apr 25, 2024 • 0 new comments
Automated submodule update: kineto
#106149 commented on Apr 30, 2024 • 0 new comments
fix a typo in the householder_product docs
#124279 commented on Apr 30, 2024 • 0 new comments
[Tracker] torch.sparse semi-structured 2.3 beta release
#115662 commented on Apr 24, 2024 • 0 new comments
[feature request] Caching allocator diagnostics and memory allocation tracing/visualization
#1529 commented on Apr 24, 2024 • 0 new comments
Dynamo-based ONNX Export: Failed to produce a graph during tracing as no tensor operations were found.
#123973 commented on Apr 23, 2024 • 0 new comments
DISABLED test_refcounts (__main__.TestTorchTidyProfiler)
#124220 commented on Apr 29, 2024 • 0 new comments
`torch.func.functional_call` doesn't work with compiled models
#97909 commented on Apr 23, 2024 • 0 new comments
[TESTING] Don't clamp upper to 2
#124631 commented on Apr 23, 2024 • 0 new comments
DISABLED test_binary_op_list_error_cases__foreach_add_cuda_int16 (__main__.TestForeachCUDA)
#124636 commented on Apr 23, 2024 • 0 new comments
squash of flight_5.3 vs flightbase
#124672 commented on Apr 25, 2024 • 0 new comments
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 commented on Apr 23, 2024 • 0 new comments
Output Discrepancy between PyTorch Model and Converted ONNX Model
#124711 commented on Apr 23, 2024 • 0 new comments
torch._dynamo.allow_in_graph seems to silently no-op on staticmethods
#124735 commented on Apr 23, 2024 • 0 new comments
DISABLED test_resnet18_backward_trace_cpu (__main__.TestPythonKeyCPU)
#124641 commented on Apr 29, 2024 • 0 new comments
Updated test_cuda.py optim tests to use OptimizerInfo
#124563 commented on Apr 28, 2024 • 0 new comments
[WIP][Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 3)
#124702 commented on Apr 27, 2024 • 0 new comments
inductor creates unnecessary buffers
#124653 commented on Apr 23, 2024 • 0 new comments
RFC: Turn on no-undefined
#124545 commented on Apr 30, 2024 • 0 new comments
DISABLED test_aot_module_simplified_preserves_stack_trace (__main__.TestAOTModuleSimplified)
#124609 commented on Apr 29, 2024 • 0 new comments
Stable Diffusion Model Error: torch._dynamo.exc.InternalTorchDynamoError: raw
#124477 commented on Apr 23, 2024 • 0 new comments
[supermodules] Remove all supermodule labels
#124521 commented on Apr 25, 2024 • 0 new comments
DISABLED test_aot_module_simplified_fake_tensor_gm_raises (__main__.TestAOTModuleSimplified)
#124590 commented on Apr 29, 2024 • 0 new comments
[minimizer] Add exclusion function to minimizer base
#124504 commented on Apr 30, 2024 • 0 new comments
[dynamo] Allow inlining of hooks for the top module
#124501 commented on Apr 30, 2024 • 0 new comments
[dynamo] Support ndarray.dtype attribute access
#124490 commented on Apr 24, 2024 • 0 new comments
fix torch.compile with triton kernels under inference_mode
#124489 commented on Apr 26, 2024 • 0 new comments
DISABLED test_aot_module_simplified_dynamic (__main__.TestAOTModuleSimplified)
#124510 commented on Apr 29, 2024 • 0 new comments
DISABLED test_aot_module_simplified (__main__.TestAOTModuleSimplified)
#124476 commented on Apr 29, 2024 • 0 new comments
DISABLED test_aot_dispatch_incorrect_backward (__main__.TestAOTDispatch)
#124459 commented on Apr 29, 2024 • 0 new comments
[WIP] [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 2)
#124147 commented on Apr 27, 2024 • 0 new comments
[codemod][lowrisk] Remove extra semi colon from caffe2/c10/core/SymNodeImpl.h
#123055 commented on Apr 27, 2024 • 0 new comments
[FSDP2] Eager-Mode Execution Tracker
#120003 commented on Apr 29, 2024 • 0 new comments
[onnx.export] Avoid linear loop over symbol_dim_map
#123029 commented on Apr 25, 2024 • 0 new comments
FlexAttention isn't using decompositions
#124643 commented on Apr 24, 2024 • 0 new comments
Add Gaudi support to benchmarks/dynamo/* benchmark.
#122960 commented on Apr 24, 2024 • 0 new comments
[WIP][Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1)
#122866 commented on Apr 27, 2024 • 0 new comments
Make c10::Error empty backtrace as an optional argument
#122611 commented on Apr 26, 2024 • 0 new comments
[typing] Rename argument of `nn.Sequential.forward` from `input` to `__input`
#119208 commented on Apr 28, 2024 • 0 new comments
Avoid always building stack trace strings in c10::Error
#122086 commented on Apr 26, 2024 • 0 new comments
[WIP] Arm64 Enablement
#117274 commented on Apr 27, 2024 • 0 new comments
[FSDP] Use generic device handle instead of cuda
#121620 commented on Apr 24, 2024 • 0 new comments
[Inductor Cutlass backend] DO NOT REVIEW - to be split up
#121492 commented on Apr 25, 2024 • 0 new comments
custom ops should have needs_fixed_stride_order by default
#124647 commented on Apr 25, 2024 • 0 new comments
Conflict between ``torch.func`` transformations and ``torch.jit.trace``
#98724 commented on Apr 25, 2024 • 0 new comments
Add `ciflow/inductor` for test only changes
#118206 commented on Apr 23, 2024 • 0 new comments
lintrunner should fail on badly formatted docstrings
#102227 commented on Apr 25, 2024 • 0 new comments
Bfloat16 tensor .numpy() support
#90574 commented on Apr 25, 2024 • 0 new comments
[ONNX] stft export fails with dynamo_export
#113067 commented on Apr 25, 2024 • 0 new comments
No module named 'caffe2' when using `add_scalar` with string
#119195 commented on Apr 25, 2024 • 0 new comments
[dynamo] Validate check_fn
#118448 commented on Apr 25, 2024 • 0 new comments
CUDAGraph Tree TORCH_CHECK failed when NCCL operator exists.
#124391 commented on Apr 26, 2024 • 0 new comments
[PT2] Return int32 indices in max_pool2d_with_indices
#103785 commented on Apr 26, 2024 • 0 new comments
torch.normal ignores default_device
#122886 commented on Apr 26, 2024 • 0 new comments
DISABLED test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex32 (__main__.TestModuleCUDA)
#81732 commented on Apr 26, 2024 • 0 new comments
Switch batch norm stack to consolidated ops
#119496 commented on Apr 30, 2024 • 0 new comments
S390x binaries
#120398 commented on Apr 27, 2024 • 0 new comments
[FSDP] Removed clamp to `NO_SHARD` for world size 1
#120334 commented on Apr 24, 2024 • 0 new comments
Fix implicit fallthroughs where it is simple to do so in caffe2/
#119700 commented on Apr 27, 2024 • 0 new comments
[FSDP] Add device in pin_memory argument
#119878 commented on Apr 24, 2024 • 0 new comments
[CI] CPU Inductor codepath for AVX2/Default is not tested in CI
#123224 commented on Apr 24, 2024 • 0 new comments
[inductor] Enable fx graph caching by default
#124091 commented on Apr 30, 2024 • 0 new comments
cpu performance for int4mm kernels
#122813 commented on Apr 24, 2024 • 0 new comments
[WIP][inductor] refine loop split logic
#124060 commented on Apr 30, 2024 • 0 new comments
Inconsistent results when training a model containing SyncBatchNorm with multiple GPUs
#124680 commented on Apr 24, 2024 • 0 new comments
[Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern
#124041 commented on Apr 30, 2024 • 0 new comments
Add scaled_dot_product_attention "scale" argument to nn.MultiHeadAttention
#124718 commented on Apr 24, 2024 • 0 new comments
Migrating from setup.py install/develop to leverage pip standards
#124027 commented on Apr 28, 2024 • 0 new comments
[inductor][cpp] GEMM template
#124021 commented on Apr 30, 2024 • 0 new comments
Questions about parameter initialization, especially with torch.bfloat16 precision
#124719 commented on Apr 24, 2024 • 0 new comments
Fix constant propagation pass
#114471 commented on Apr 26, 2024 • 0 new comments
[discussion] Route pointwise Conv1d/Conv2d to matmul? (also in eager)
#116506 commented on Apr 24, 2024 • 0 new comments
Initial LR Scheduler composability tests
#123753 commented on Apr 25, 2024 • 0 new comments
Fix user warning for tensor LR
#123752 commented on Apr 25, 2024 • 0 new comments
Swap warning counter to flag in LRScheduler
#123751 commented on Apr 25, 2024 • 0 new comments
Add decomposition for slice_scatter
#123744 commented on Apr 26, 2024 • 0 new comments
Dynamo x autograd.Function: graph breaks on all the staticmethods on autograd.Function
#118397 commented on Apr 24, 2024 • 0 new comments
[debug] a debug PR to test perf regression due to triton
#123694 commented on Apr 30, 2024 • 0 new comments
Dynamo x autograd.Function: graph breaks on freevars in forward
#118394 commented on Apr 24, 2024 • 0 new comments
Use _unsafe_masked_index in masked_scatter decomposition
#123667 commented on Apr 26, 2024 • 0 new comments
Improve decomposition for constand_pad_nd
#123661 commented on Apr 26, 2024 • 0 new comments
Dynamo x autograd.Function: silently ignores all of the ctx.methods
#118396 commented on Apr 24, 2024 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Apr 30, 2024 • 0 new comments
Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc
#123431 commented on Apr 27, 2024 • 0 new comments
Reenable dim for python 3.12
#123384 commented on Apr 24, 2024 • 0 new comments
[Dynamic Shapes] Fix error handling for indirectly fully constrained dynamic dimensions
#123293 commented on Apr 30, 2024 • 0 new comments
[CI] Node-20 update
#122115 commented on Apr 24, 2024 • 0 new comments
Add mode to MemoryDep to track atomic accumulates
#123223 commented on Apr 26, 2024 • 0 new comments
Decompositions for upsample linear backward
#123222 commented on Apr 26, 2024 • 0 new comments