cuda : refactor into multiple files #6269

slaren · 2024-03-24T03:09:07Z

The main goal is to make it easier to work on the CUDA backend and improve compilation times.

slaren · 2024-03-24T14:49:23Z

If you have any suggestions to improve the organization of the files, the naming scheme or anything else, please let me know, I am sure it could be improved.

Other than that, the only thing left to do is fix the HIP and Makefile builds.

ggml-ci

airMeng · 2024-03-25T00:01:38Z

I think previously we ought to unify all functions into one file #3965 (comment), now the design has changed? Could you share any more details?

slaren · 2024-03-25T00:22:22Z

There was a brief discussion about this in #5434. The compilation time of the CUDA backend was increasing to the point that it was complicating working on it. Personally, I also prefer to work on multiple smaller files than in one large file, but I don't know if everybody agrees about that.

airMeng · 2024-03-25T01:27:16Z

There was a brief discussion about this in #5434. The compilation time of the CUDA backend was increasing to the point that it was complicating working on it. Personally, I also prefer to work on multiple smaller files than in one large file, but I don't know if everybody agrees about that.

Will you summary it in a document, for example, "guidelines for adding new backends"? I see some backends are WIP like #6035 and following new designs might be easier.

ggerganov

I like how the files are organized. We should also take the opportunity to deprecate the LLAMA_CUBLAS option with the more meaningful LLAMA_CUDA. But not super important, and probably better for a follow-up PR

@airMeng Starting with single-file implementations remains the preferred option. We can probably add a dummy backend implementation to serve as a starting point for new backend development - might be easier to keep it up-to-date compared to a document

anhnami · 2024-03-25T14:23:45Z

I apologize for nagging but LLAMA_CUDA_F16 is broken again.

ggml-cuda/dmmv.cu(768): error: identifier "to_fp16_cuda_t" is undefined
          const to_fp16_cuda_t to_fp16_cuda = ggml_get_to_fp16_cuda(src1->type);
                ^

ggml-cuda/dmmv.cu(768): error: identifier "ggml_get_to_fp16_cuda" is undefined
          const to_fp16_cuda_t to_fp16_cuda = ggml_get_to_fp16_cuda(src1->type);
                                              ^

2 errors detected in the compilation of "ggml-cuda/dmmv.cu".

slaren · 2024-03-25T14:31:59Z

Should be fixed in #6298

ikawrakow · 2024-03-25T18:00:07Z

Considering the extremely long ggml-cuda.cu compilation times I was pleasantly surprised by this refactoring. Until I had to adapt PR #6302 to the refactoring: touching vecdotq.cuh, which one needs to do all the time when developing new quants or optimizing dot product kernels, leads to the very same extremely long compilation as before (this time of mmq.cu).

slaren · 2024-03-25T18:03:07Z

The build time is dominated by mmvq.cu, and to a lesser degree by mmq.cu, so as you say, this will not help much if you are changing these files. It should be possible to split each quant type to a different file as well to improve this.

cuda : refactor into multiple files

0f304d9

ggerganov added the high priority Very important issue label Mar 24, 2024

slaren added 3 commits March 24, 2024 16:32

update Makefile

2cdb44d

update cmake for HIP

290f81a

update Makefile for HIP

475824e

ggml-ci

slaren marked this pull request as ready for review March 24, 2024 15:38

fix HIP build

9694154

ggerganov approved these changes Mar 25, 2024

View reviewed changes

slaren merged commit ae1f211 into master Mar 25, 2024
58 checks passed

slaren deleted the sl/cuda-refactor-files branch March 25, 2024 12:50

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

cuda : refactor into multiple files (ggerganov#6269)

ddb4741

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024

cuda : refactor into multiple files (ggerganov#6269)

026d5cd

tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024

cuda : refactor into multiple files (ggerganov#6269)

d600b40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : refactor into multiple files #6269

cuda : refactor into multiple files #6269

slaren commented Mar 24, 2024 •

edited

slaren commented Mar 24, 2024

airMeng commented Mar 25, 2024

slaren commented Mar 25, 2024

airMeng commented Mar 25, 2024

ggerganov left a comment

anhnami commented Mar 25, 2024 •

edited

slaren commented Mar 25, 2024

ikawrakow commented Mar 25, 2024

slaren commented Mar 25, 2024

cuda : refactor into multiple files #6269

cuda : refactor into multiple files #6269

Conversation

slaren commented Mar 24, 2024 • edited

slaren commented Mar 24, 2024

airMeng commented Mar 25, 2024

slaren commented Mar 25, 2024

airMeng commented Mar 25, 2024

ggerganov left a comment

Choose a reason for hiding this comment

anhnami commented Mar 25, 2024 • edited

slaren commented Mar 25, 2024

ikawrakow commented Mar 25, 2024

slaren commented Mar 25, 2024

slaren commented Mar 24, 2024 •

edited

anhnami commented Mar 25, 2024 •

edited